A New Function to Make Scalar Functions More Powerful in an Array World
When Dynamic Arrays (DA) were introduced, it opened a plethora of possibilities and a lot of formula nerds and automation ninjas were empowered to begin a new wave of creativity while building more powerful worksheets with fewer formulas. As we explore the potential, we are discovering that there is a new limitation that begs for a solution. This is a proposal for a new function, SEGMENT, that can expand the flexibility of scalar functions that accept arrays as parameters so they can operate more robustly in an array world.
Consider the following problem as an example.
We have a twodimension table of numeric values – we'll call each number a score. The values are organized by row – one row for each student. Each of 20 students has 10 scores, and we want to find the maximum of the averages of the top five scores for each student. We're going to need to use the LARGE, AVERAGE, and MAX functions.
Without getting into arrays, we create five "helper" columns and, in each, put our LARGE function. For the first student, the formula in the first helper column is:
=LARGE(A1:J1,1)
The second helper column:
=LARGE(A1:J1,2)
And so on. We average these in a sixth column and then get the maximum of the averages for all 20 students. Six additional columns and a single cell dedicated to the maximum of the averages across all students.
We can introduce arrays and cut our six helper columns down to one by coercing the LARGE function to produce an array and passing that array to the AVERAGE function, one formula per student (row):
=AVERAGE(LARGE(A1:J1,{1,2,3,4,5}))
Now we have one helper column and a cell dedicated to the maximum of the averages.
Can we go further in our effort to produce a solution with a minimal number of formulas? Given the current state of things, that is as simple as I can go. There may be some magic out there that can take this further but I'm at a loss at this point.
The problem with going further with arrays is we are bound by the inability of Excel to know that, if we pass our entire array containing all students and their scores to LARGE, we want it to process each student separately and generate one result per student. Or, in this case, one result per student and for each of the five ranks we passed to it. This is because the LARGE function along with many other functions that accept arrays as parameters handle the arrays internally. Whereas any function that accepts a scalar value can be coerced to produce an array – not by itself, but by Excel creating multiple invocations of that function, one for each element in the array, and then assembling the results in a result array.
But what if we had an explicit way to tell Excel to process each row or column in an array as a separate invocation of that scalar function just as it already knows to do when we pass an array where a scalar parameter is expected?
Consider a new function: SEGMENT. The purpose of the function is to instruct Excel to take a twodimension array, segment it by row or by column, and pass each of those segments to the scalar function just as it knows how to do with our second array we are passing to the LARGE function for the "k" parameter to generate the five largest values as an array. The result would be a single dimension array of the scalar values produced for each of the segments generated by the SEGMENT function. Whether it is a row or column depends on how we instruct SEGMENT to operate: if by row than a column is returned; if by column than a row is returned.
The SEGMENT function syntax:
SEGMENT(array, [by_column], [indices], [size])
array – any one or twodimension array.
by_column – Optional. Pass TRUE to segment the array by columns. Default is FALSE which results in the array segmented by rows.
Indices – Optional. Pass an array of row or column indices to return. If omitted, then all rows or columns are returned. For example, if array is {1,2,3,4;2,3,4,5;3,4,5,6} and indices is {2,4} then two arrays are returned: {2,3,4} and {4,5,6}.
Size – Optional. Pass a single integer representing the size of the returned arrays. If array is only one row or column wide, then passing this value tells the function to pass multiple overlapping arrays. For example, if array is {1,2,3,4,5} and size is 3 then three arrays are returned: {1,2,3}, {2,3,4}, {3,4,5}.
In our LARGE example, we can generate one array of all top five scores for all 20 students:
=LARGE(SEGMENT(A1:J20),{1,2,3,4,5})
The result would be 20 rows by 5 columns where each row is a single student, and each column is one of the five ranked values. The next step is to average each student's top five scores. We apply the SEGMENT function again and pass that result to the AVERAGE function:
=AVERAGE(SEGMENT(LARGE(SEGMENT(A1:J20),{1,2,3,4,5})))
The result is a single dimension array (a column) of 20 averages, one element per student. Ultimately, we want the maximum average across all students, so we pass this array to MAX asis:
=MAX(AVERAGE(SEGMENT(LARGE(SEGMENT(A1:J10),{1,2,3,4,5}))))
And we have our scalar answer with one formula.
I suppose we could add an additional parameter to SEGMENT to tell it how many rows or columns constitute a single element in its result but that may be overkill.
As a final note, there are several posts in this forum for more array aware functions such as here:
I am pretty sure the SEGMENT function would be a solution for the many new functions that are being proposed without adding any new functions beyond the one proposed here.
Some more examples
Sum all columns in a table, one result per column:
=SUM(SEGMENT(Ref#,TRUE))
Average all rows in a table, one result per row:
=AVERAGE(SEGMENT(Ref#))
Add a running average of the last five values in a column, one result per row except for the first four rows:
=AVERAGE(SEGMENT(A1:A100,TRUE,,5))
Join the text values in rows, one result per row:
=TEXTJOIN(", ",TRUE,SEGMENT(A1:D20))
4 comments

First260 commented
Although nested arrays are not supported, this sort of thing may possible soon using a proposed higher order MAP function which has been stated as being in the pipeline.
ROWMAP / COLMAP functions for evaluating functions over arrays in rows or columns appeared in a prototype used by the MS research team (http://www.itu.dk/people/sestoft/funcalc/ITUTR20111422up.pdf). To sum over columns one could then use a formula of the form:
=COLMAP( SUM, Table)
Not sure if builtin functions can be passed this way, if not one might need to replace SUM with LAMBDA(x,SUM(x)).

Charles Williams commented
Great idea Kevin, but I think it needs the calc engine to support arrays of arrays (which is something I would very much like to see).

Kevin Jones commented
There might be some confusion as to how the result of a SEGMENT function interacts with other arrays in a formula. In the example I gave with SEGMENT, we have two arrays – the SEGMENT generated array which is a “column” of “row” arrays and the k parameter which is a “row”.
=LARGE(SEGMENT(Array),{1,2,3,4,5})
The result of a column of values combined with a row of values is a twodimension grid with the same number of rows as in the “column” array and the same number of columns as are in the “row” array. If Array is 10 rows, the result of the LARGE function call is a twodimension array of 10 rows and five columns.
This follows established Excel array processing protocols. If SEGMENT produces a “Row” of arrays, then Excel produces a singledimension row array with a number of elements equal to the larger of the two arrays. If one array was larger than the other, then the extra elements would contain “#N/A”. With SEGMENT generating either a “column” of “rows” or a “row” of “columns”, all array processing rules apply.

Anthony Mellor commented
That is sheer genius and will empower the entire world of accountants and other financial people to use dynamic arrays in practice: we need rows and columns to be dynamically accessible without the traditional "copy down" or across. At present the inability to address dynamic arrays such that the results spill down really rather frustratingly cripples our attempts to use this new dynamic Excel world, because the mix of dynamic and non dynamic defeats the objectives. Pivot tables work faster and far more easily. I can do with a Pivot table in several seconds what takes me ages using dynamic formulae, not to mention the complexity and error prone nature that is conjured by copying down thousands of formulae, when one spill should do the job..
Anthony  FCA/CPA