By now, if you read my previous posts in the series The Definitive Guide to Unpivot in Excel (all posts here), you will feel confident with the following challenge: How to split comma-separated values in Power Query.
We have a table of event names, dates and comma-separated participant names, and we want to build a PivotTable that will show us the participants and their count of events (as the following screenshot shows).
In this post we will examine the intuitive solution that includes a split and unpivot transformations, but then we will see a much better and scalable way to achieve our goal.
Let’s start with a walkthrough of the intuitive solution.
Open this workbook, select any cell in Table1, and click From Table in the Get & Transform section of the Data tab in Excel 2016 (Or if you use Excel 2010, or 2013 click From Table in Power Query tab, after you installed the Add-In).
The Query Editor will open. Right click the Participants column, select Split Column, and then click By Delimiter…
In the Split Column by Delimiter dialog, select Comma as the delimiter.
Now you can click OK, but I encourage you to expand the Advanced options section before you leave this dialog.
Update: This article was written before MIcrosoft introduced the Split Into Rows option. It is recommended to use it. Once you do it, you will no longer need to perform unpivot, or take care of missing values. You can keep reading, if the Split Into Rows option is not relevant for your data challenge, and you still want to find a way to split the column into columns in a robust way.
You can see that Power Query proposes to split the Participants column into 7 columns. Why only 7? Because the Query Editor identified a maximum of 7 comma-separated values in the preview. By now you can guess that the entire dataset is usually larger than the preview, so we may encounter cases where we have more than 7 comma-separated values. OK, let’s click OK.
Now we have 7 columns: Participants.1 to Participants.7, and we can perform the unpivot transformation.
Select columns Event and Date, right click on one of their headers and select Unpivot Other Columns.
Now we can remove the column Attribute and change the name of column Value to Participant. We can also change the type of Date to Date.
That’s it. We have reached our desired format of Event, Date and Participant. But will it work well?
Remember we saw the number 7 which was used to determine how many columns to split to?
Well, if this number is hard-coded in the transformation steps, we will surely miss participants in cells that contain more than 7 comma-separated values.
Let’s check it out.
In Home tab, click Advanced Editor.
We can see in the M expression that the split was made to 7 hard-coded columns.
Would it be cool, if you could just change the number from 7 to, let’s say 10, to split cells with more than 7 comma-separated values?
The answer is that you can. Just replace the third parameter of the function Table.SplitColumn in the line that starts with #”Split Column by Delimiter” and use 10, instead of the list of “Participants.1″,…,”Participants.7”.
You should also remove the following line that performs Table.TransformColumnTypes this function also uses the 7 hard-coded column names.
OK, we have a better solution now that assumes that no more than 10 comma-separated values can be used in column Participants.
But as you can check out, the workbook we use in this walkthrough has more than 10 comma-separated participants. The next thing you would think is to use a high enough number. Perhaps 100?
Well, this approach will not lead us to an efficient transformation, as we will waste too much memory and CPU in transforming many null value columns. In addition, what if our assumption to use the number 100 will be wrong?
Split & Expand – The best approach
It’s time to see the best approach to resolve our problem.
Let’s start again and import Table1 to the Query Editor.
We will now right click on the column Participants, select Transform and click lowercase (We are not going to apply a lowercase transformation here, it is just a shortcut to generate part of the formula we will need).
At this stage, if you didn’t turn on the formula bar, now is the time to do so. Just check the Formula Bar box in View tab.
Edit the formula and replace Text.Lower with Splitter.SplitTextByDelimiter(“,”)
Press Enter after you apply the change, and you will see that the column Participants has turned into a list.
Now take a deep breath and get ready to see the magic in action – Ready? Click the expand button (The small button in the header of column Participants).
That was fun. We managed to get to our desired table without the use of columns for the split values, and without the unpivot step.
We can even cleanup the formula (but that is no necessary to make it work), and remove all mentions of#”Lowercased Text” and use SplitParticipants instead.
Here is the resulted M expression:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], SplitParticipants = Table.TransformColumns(Source,{{"Participants", Splitter.SplitTextByDelimiter(",")}}), #"Expanded Participants" = Table.ExpandListColumn(SplitParticipants, "Participants"), #"Changed Type" = Table.TransformColumnTypes(#"Expanded Participants",{{"Date", type date}}) in #"Changed Type"
That’s it. You can now click Close & Load in Home tab, and create a PivotTable on each of the three queries we have mentioned.
You can select the second worksheet on our workbook, and see how only the third table is correct in getting all participants.
Hope you enjoyed this post.
@Gil – Nesting the Splitter.SplitTextByDelimiter inside Table.TransformColumns is Brilliant !!!
Pingback: Presidential Word Cloud in Power BI – Part 2 – Data Chant
Pingback: Split & Merge Columns - Pitfalls #7 and #8 - DataChant
Is there a reverse of this?
Can I get BACK to a comma separated cell?
There is a relatively new feature: Extract values from nested list: https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-january-feature-summary/#extractValues
Hi Gil! First of all, thank you for this pitfalls series. It is very useful.
I see there is at least one more pitfall in column splitting via Table.SplitColumn: a catch for the advanced PQ users.
As you know, the 4th argument of this function is optional, and if it is omitted/null, then PQ will make as much columns as it get after splitting the _first row_ of the source table. It looks more like a kludge to me.
Thank you Maxim for sharing.