Following a reader’s request, today we will unleash the power of List.Accumulate.
The official documentation on List.Accumulate here was very confusing for me:
List.Accumulate(list as list, seed as any, accumulator as function) as any
Argument | Description |
---|---|
list | The List to check. |
seed | The initial value seed. |
accumulator | The value accumulator function. |
OK. I understood the list argument quite right, but the other two arguments were post-nuclear-bomb science fiction.
So let’s try to understand the example that was used in the official page:
// This accumulates the sum of the numbers in the list provided. List.Accumulate({1, 2, 3, 4, 5}, 0, (state, current) => state + current) equals 15
Oh, so the code above sums up all the elements in the list. That is nice. Let’s make sure the calculation was done right. 1+2+3+4+5 = 15. Yes, this is right 🙂
To prove that the code works, we can paste it to the Query Editor after creating a blank query:
But wait a minute, why should we bother using List.Accumulate when we can already sum up the numbers in a list using the function List.Sum?
So there must be something more powerful here… We will try to reveal it today:
List.Accumulate: More than meets the eye…
Before we start, let’s try to understand the seed and accumulator. In the example above, zero is used as the value for the seed argument, and a function that receives state and current is used as the accumulator.
The terms state and current can better help us to understand List.Accumulate.
The state is an object in Power Query which can be of any type. In our example it is a number. Defined as zero in its initial state (This is the seed, the initial state), and transformed on each iteration of the list. The value in state will increase by the number that is represented by current at each step (current represents the current element in the list). To make sure we understand this example, let’s review each step.
In the first step state is zero (as defined by the seed argument) and current is 1 (the first element in the list). During this step Power Query implements the function, and performs the sum operation state + current.
Iteration 1: state = 0, current = 1, Assign new value to state = state + current = 1
Iteration 1: state = 1, current = 2, Assign new value to state = state + current = 3
Iteration 1:state = 3, current = 3, …
state = 6, current = 4
state = 10, current = 5, Return state+current as the result = 15
Now that we better understand the flow, here are few important insights about List.Accumulate.
List.Accumulate(list as list, seed as any, accumulator as function)as any
You must use two-argument function as the accumulator.
The seed argument (or as we call it the initial state), should be of the same type as the first argument of your accumulator function and of the same type as of the function’s output.
Let’s try to demonstrate it with pseudo code:
myList = {... items of TYPE_X ...), fnMyAccumulator = (state as TYPE_Y, current as TYPE_X) as TYPE_Y => ... , myAccumulatedResult = List.Accumulate( myList, some_initial_state_whose_type_is_Y, fnMyAccumulator)
So in the pseudo code above, myList is a list whose elements are of type TYPE_X (e.g. a record, list, number or text), and our initial state is of type TYPE_Y, we must define our accumulator function as a function that receives a state argument whose type is TYPE_Y and a current argument whose type is TYPE_X, and returns a value whose type is TYPE_Y.
If you are completely lost at this stage, that’s fine. Perhaps few examples will List.Accunmulate to sink in.
First exmaple – implementing List.Max
= List.Accumulate( {1, 2, 100, -40, 5}, 0, (max, current) => if max < current then current else max)
This is still a basic example. We are warming up. This function will return the number 100 which is the maximal number in the list {1, 2, 100, 4, 5}.
But again, it is not so useful as we already have List.Max.
Let’s try to use List.Accumulate to calculate both the max and the min values on the same “pipeline”.
Here is the code:
= List.Accumulate( {1, 2, 100, -40, 4}, [min=#infinity,max=-#infinity], (state, current) => [min=if state[min] > current then current else state[min], max=if state[max] < current then current else state[max]] ),
The result is a record: [min=-40, max=100]
Let’s elaborate what we did here. Unlike the first example where we used a number as a state that will “remember” the maximal number in the list at each iteration, in this example we needed to “remember” two numbers as we go over the list. To pass two numbers as the state for List.Accumulate we defined a record whose keys are min and max. The accumulator function also calculates the new record of min and max according to the previous min and max and the current number in the list.
You can scale up and reuse the example above to perform multiple “online” calculations on your data with a single pass. Online means that you don’t require to store all the data to perform the calculation, and that you can gradually get your result by calculating an interim result at each step based on the previous results and the “online” data of the current step.
For example, in the code below we perform multiple calculations on the data which is serialized in myList. The calculations are performed at each step of the list iteration by the functions fnForResult1, fnForResult2, etc.
= List.Accumulate( myList, [min=#infinity,max=-#infinity], (state, current) => [result1 = fnForResult1(current, state), result2 = fnForResult2(current, state), ... resultN = fnForResultN(current, state)] }
And here is a useful function that uses List.Accumulate to create a list of column names from a list of their column indices.
// If table tbl has the columns names {"Column1", "Column2", "Column3"} then // using this function with columnIndices = {0,2} // returns: {"Column1", "Column3"} (tbl as table, columnIndices as list) as list => let res = List.Accumulate( columnIndices, {}, (state, current)=> List.Combine({state, {Table.ColumnNames(tbl){current}}}) ) in res
An additional way to think about List.Accumulate is as a pipeline of machines that transform your data in a sequence of operations. Each machine in its turn, gets the data from its previous machine, transforms it, and passes it to the next machine in the pipeline. The definitions for each machine are serialized in the list, so each machine can be customized to perform a unique flavor of the same transformation. The processed data is initially defined as the seed argument, and it will “evolve” into the final product when the last machine will apply its transformation.
In conclusion, List.Accumulate is quite a powerful function in Power Query, and I hope that be now I motivated you to seek for interesting new scenarios to explore it.
Looking forward to hearing from you how you use List.Accumulate.
Reblogged this on George's BI Blog.
I think that real power of this function could be revealed when using it with not-so-usual tasks, for example, when result needed is record or table. Or, when constructing some kind of progressions. For example, to calculate a progression of Sum(X/n) where n = 1 to i,
= List.Accumulate({1,1,1,1,1}, [X=0,n=0],(state, current) => [X= state[X]+current/(state[n]+1), b=state[n]+1])[X]
gives us 2,2833333333333332
When it comes to simple operations, it is easy to replace List.Accumulate with one-two simple functions, for example, [min=List.Min(myList),max=List.Max(myList)]
It is really powerful when it comes to ITERATIONS
Thanks, Gil, it is nice inspiration
Thank you Maxim for sharing and for the idea to cover this topic.
Very nice explanation – thanks a lot Gil!
I was always thinking of List.Accumulate and List.Generate as recursive functions, because this is actually the only reason I would use them for: As they are costly in terms of performance (due to the iterative nature) and (at least for me) a bit tricky to edit/write, as they don’t reveal their results stepwise like M normally does.
What I didn’t realize before is that List.Accumulate can actually return more than one value. Could it also return lists or just one record only? Wondering then when to use List.Generate and when List.Accumulate…
Thank you Imke. List.Accumulate can return a lists if it is invoked with a list as its initial state. The use of a record is not mandatory. You can use any data type.
The main differences between List.Accumulate and List.Generate is that the former gets a list and returns an aggregated product of that list, while the latter gets instructions to generate a list from scratch.
BTW, I don’t think that these functions are implemented by Microsoft as recursions (too costly), but as iterations. That is why it is recommended to use these functions instead of recursions whenever it is possible.
Pingback: BIN2DEC in Power Query using List.Accumulate | Excel Inside
oops, there an error in my code
should be
= List.Accumulate({1,1,1,1,1}, [X=0,n=0],(state, current) => [X=state[X]+current/(state[n]+1), n=state[n]+1])[X]
Hey Gil… this is one of my favorite articles on List.Accumulate. Since I don’t use it often, I tend to forget the specifics, so I keep coming back to this page. I just posted an article showing how to do multiple find / replacements with List.Accumulate. Here is the link if anyone needs it – https://chandoo.org/wp/multiple-find-replace-list-accumulate/
Hi, Gil! Thanks for a great book on Power Query! Read it last year and got a lot of insights. About the List.Accumulate function: today I got an idea if there is an Excel/PowerPivot SWITCH (or CHOOSE) function analog in PQ. Couldn’t find it and created my own with List.Accumultate:
(value as any,values as list,results as list, optional other as any) =>
let
OtherText = if other = null then null else other,
Source = List.Accumulate({0..(List.Count(values)-1)}, OtherText, (state, current) => if value = values{current} then results{current} else state)
in
Source
and an example of usage
fnSwitch([Text],{3,6,12},{“March”,”June”,”December”},”no data”)
Applied in a Custom Column, this function get’s a value from [Text] column and converts values of the first list {3,6,12} into the values of the second list {“March”,”June”,”December”}
I think it’s a very good substitution for nested if’s. It even can be used with Transform>Replace Value tool as non-existing ReplaceAnyValue function that you were talking about in your book.
Nice!!!