How many times did you have to work with tables that had headers like these?
Client_ID, Transaction_ID, Product_Name, My_DBA_is_in_Love_with_Underscored_Headers
No need to manually rename the column names any more, if you work with Excel Get Data / Power Query, or Power BI. You can just apply the following formula in Query Editor:
= Table.TransformColumnNames(#"Previous Step", each Replacer.ReplaceText(_,"_", " "))
Let’s explain. Imagine that you already loaded your source table into Excel Get Data or Power BI. Now when Query Editor is open, let’s say your last step in Applied Steps is Previous Step. To remove all the underscore characters from your headers you can follow these steps:
- Click the fx control near the formula bar.
- The Custom1 step will be added in Applied Steps, with the formula: = #”Previous Step” in formula bar (If you don’t see the formula bar, please turn it on).
- Replace the formula with the formula with the preceding Table.TransformColumnNames formula, and press Enter.
You will see that the underscores in your headers are magically gone:
Client ID, Transaction ID, Product Name, My DBA is in Love with Underscored Headers – but I don’t care
The formula above uses the function Table.TransformColumnNames, which receives a table and a name generator function as its main arguments. The name generator function, should receive a single argument of text (the column name), and returns text (the transformed column name). It can be a built-in function like Replacer.ReplaceText and Text.Proper, or can be any custom function you create (And you will see a great example in a minute).
In our case above, the name generator returns a text with spaces instead of underscores.
You can use similar techniques to capitalize each word of your headers:
= Table.TransformColumnNames(#"Previous Step", Text.Proper)
Or use this formula to lowercase your header (I don’t like lowercased headers, but I won’t be judgemental):
= Table.TransformColumnNames(#"Previous Step", Text.Lower)
Or you can use this formula to change the headers to uppercase (I may start being judgemental if you do it).
= Table.TransformColumnNames(#"Previous Step", Text.Upper)
The CamelCase Challenge
But what would you do if your source tables have the following headers:
ClientID, TransactionID, ProductName, MyDBAIsInLoveWithCamelCaseHeaders
I cannot even try to count how many times I have manually renamed such column names in the Query Editor. Can you?
Luckily there is a solution. You can split CamelCase headers using the Table.TransformColumnName and the following list generator custom function:
Important Note: This function should not be used on very large text input. On very large text, you will get a Stack Overflow error. Read the Important Updates section below for better custom function, that will not lead to errors.
(Source) => let res = List.Accumulate( Text.ToList(Source), [result="", index=0, source=Source], fnAccumulator ), IsUpper = (txt) => txt <> "" and txt = Text.Upper(txt) and txt <> Text.Lower(txt), fnAccumulator = (state as record, current as text) as record => let prevCharacter = if state[index]=0 then "" else Text.At(state[source], state[index] - 1), prevCharacter2 = if state[index]<=1 then "" else Text.At(state[source], state[index] - 2), nextCharacter = if state[index] = Text.Length(state[source]) - 1 then "" else Text.At(state[source], state[index] + 1), aggregatedResult = if state[index]=0 or current = "" then current else if IsUpper(current) and (not IsUpper(prevCharacter) or (IsUpper(prevCharacter2) and not IsUpper(nextCharacter))) then state[result] & " " & current else state[result] & current, resultRecord = [result = aggregatedResult, index = state[index]+1, source = state[source]] in resultRecord in res[result]
- In Query Editor, create a blank query. In Home tab, click Advanced Editor and paste the expression above. Close the Advanced Editor, and rename the blank query as FnSplitCamelCase.
- Select your CamelCased table in Queries pane. Click the fx control near the formula bar.
- Paste the following line in the formula bar:
= Table.TransformColumnNames(#"Previous Step", FnSplitCamelCase)
Once you press Enter you will see that your column names are changed. Here are few examples:
Before:
ClientID, TransactionID, ProductName, MyDBAIsInLoveWithCamelCaseHeaders
After:
Client ID, Transaction ID, Product Name, My DBA Is In Love With Camel Case Headers
Note that FnSplitCamelCase keeps the “DBA” part together in the last column name above.
To understand the code of the custom function, you can read my article on List.Accumulate here. List.Accumulate enables us to pass the column name as a list of single characters, and accumulate the transformed column name as the result. The decision to add a space between two characters is relies on several factors:
- The index of the current character – which can be accessed in the function via state[index]
- The case of the current character – a character is uppercased if its uppercased version is equal to the character itself. To test it we used: Text.Upper(character) = character
Important Updates – A Record Of Promises
Sergey Lossev shared in the comments below a brilliant alternative custom function that is shorter in code. You can use it instead of the original one as your custom function: FnSplitCamelCase. Thank you Sergey.
// Shared by Sergey Lossev (Source) => let to_list = Text.ToList(Source), accum_chars = List.Accumulate(to_list, "", (sum, x) => sum & (if x = Text.Upper(x) then " " & x else x)), split = Text.Split(Text.Trim(accum_chars), " "), accum_words = List.Accumulate(split, "", (sum, x) => sum & (if Text.Length(x) = 1 then x else " " & x & " ")), replace = Text.Trim(Text.Replace(accum_words, " ", " ")) in replace
As I was learning Sergey’s solution, I thought it should perform slower, as the function runs four times over the text input, while my function runs once. So, I compared the performance of the two functions and realized that I was wrong. The refresh time was similar as my function was accessing four characters in each iteration. So, the two functions run over the characters four times.
When I tested the functions on large text messages (e.g. 10000 characters or more), I found out that my original function led to stack overflow. Download the workbook here to find out for yourself.
I suspected that the overflow is related to the long text that I had stored in the state variable of the accumulator function. So, I contacted Microsoft, and got a great (and even poetic) answer from Curt Hagenlocher:
There’s a conspiracy between the List.Accumulate and the “let” in your function. Let members are calculated lazily, so what’s happening is that each step – instead of returning a record of values – returns a record of promises. When it comes time to actually produce the final value, we have to traverse the linked list of promises in order to perform the calculation and this happens using the stack. When the list is long enough *boom* stack overflow.
The way to work around this behavior is to force the let member to be evaluated inside the function body.
Here is the fixed version of my custom function, following Curt’s recommendation:
(Source) => let res = List.Accumulate( Text.ToList(Source), [result="", index=0, source=Source], fnAccumulator ), IsUpper = (txt) => txt <> "" and txt = Text.Upper(txt) and txt <> Text.Lower(txt), fnAccumulator = (state as record, current as text) as record => let prevCharacter = if state[index]=0 then "" else Text.At(state[source], state[index] - 1), prevCharacter2 = if state[index]<=1 then "" else Text.At(state[source], state[index] - 2), nextCharacter = if state[index] = Text.Length(state[source]) - 1 then "" else Text.At(state[source], state[index] + 1), aggregatedResult = if state[index]=0 or current = "" then current else if IsUpper(current) and (not IsUpper(prevCharacter) or (IsUpper(prevCharacter2) and not IsUpper(nextCharacter))) then state[result] & " " & current else state[result] & current, resultRecord = if aggregatedResult = null then null else [result = aggregatedResult, index = state[index]+1, source = state[source]] in resultRecord in res[result]
This function doesn’t reach the stack overflow. The if statement that is used inside resultRecord enforces the M engine to compute the record in each iteration, and prevents the stack overflow.
Finally
FnSplitCamelCase is a very handly custom function. It can also help you to handle CamelCase values in columns – not just headers. For example, if you have CamelCase values in Column1, you can click fx after Previous Step in Applied Steps, and paste the following formula:
= Table.TransformColumns(#"Previous Step",{{"Column1", FnSplitCamelCase, type text}})
Do you have a better function to split CamelCase? Please share in the comments below.
Subscribe to DataChant to download the solution file CamelCase.pbix from the subscriber’s folder.
[rad_rapidology_inline optin_id=optin_5]
Great function and thanks for sharing! Here’s a case (pun intended) FOR CamelCase… I often PURPOSELY remove spaces from headers in order to allow a “proper” title for DAX measures. For example, Sales Amount = SUM(SalesAmount).
Tooooo complicated… ifs, nexts, prevs, offsets… Here is an easier way
====
let
Source = “MyDBAIsInLoveWithCamelCaseHeaders”,
to_list = Text.ToList(Source),
accum_chars = List.Accumulate(to_list, “”, (sum, x)=> sum & (if x=Text.Upper(x) then ” “&x else x)),
split = Text.Split(Text.Trim(accum_chars), ” “),
accum_words = List.Accumulate(split, “”, (sum,x)=>sum & (if Text.Length(x)=1 then x else ” “&x&” “)),
replace = Text.Trim(Text.Replace(accum_words, ” “, ” “))
in
replace
====
My DBA Is In Love With Camel Case Headers
Thank you Sergey for sharing. Very creative approach. I am pasting it here again, because your code was modified when it was published as a comment.
let
Source = "MyDBAIsInLoveWithCamelCaseHeaders",
to_list = Text.ToList(Source),
accum_chars = List.Accumulate(to_list, "", (sum, x)=> sum & (if x=Text.Upper(x) then " " & x else x)),
split = Text.Split(Text.Trim(accum_chars), " "),
accum_words = List.Accumulate(split, "", (sum,x)=>sum & (if Text.Length(x)=1 then x else " "&x&" ")),
replace = Text.Trim(Text.Replace(accum_words, " ", " ")) // Second argument is doubled space. Not single space.
in
replace
Your approach is smarter and shorter in code. But performance wise – Tooooo slow! your code performs two list accumulates, 1 split and 1 replace. So, my code will run 4 times faster 🙂
Just teasing… I would never come up with the idea you shared. Very cool.
How do you compare perfomance? What is your method?
I use Text.Repeat to create a very long text message, than I load the two queries in Excel and see who complete loading the first. Nothing scientific. By the way, my code is not faster, and even worse, when I run it on very long text (e.g. 100K characters, the M engine I receive this error: Expression.Error: Evaluation resulted in a stack overflow and cannot continue. I will update this blog post with your solution, and any update I get from the M team.
Hello again, Gil!
You will be surprised, but here is one more solution, that uses no text functions except Text.Lower, Text.Combine and Text.Length 😉
Here it is
====
let
Source = “myDBAIsInLoveWithCamelCaseHeaders”,
to_list = Text.ToList(Source),
to_table = Table.FromList(to_list, (x)=>{x, Text.Lower(x)=x}, {“word”, “isLower”}),
group_chars = Table.Group(to_table, {“isLower”}, {{“word”, each Text.Combine([word])}}, 0, (a,b)=>if b[isLower] then 0 else 1),
len = Table.AddColumn(group_chars, “len”, each Text.Length([word])),
group_words = Table.Group(len, {“len”}, {{“word”, each Text.Combine([word])}}, 0, (a,b)=>if (a[len]=1 and b[len]=1) then 0 else 1),
combine = Text.Combine(group_words[word], ” “)
in
combine
====
Would you mind to compare perfomance?
I wonder what is faster this time )
Thank you for sharing. You can download the workbook from the link I added in the post, and include the new function there to test which one ends first and loads the text to the table. You can change the Table.Repeat parameter to test longer or shorter text.
My benchmarks from your CameCase.xlsx
I’ve tested thousands repetitions of start-string and here are the results (in seconds)
10k repetitions
====
Fixed – 27
Sergey – 28
Sergey group local – 2 (approximately)
20k repetitions
====
Fixed – 113
Sergey – 120
Sergey group local – 3 (It’s also quite difficult to catch the exact time)
50k repetitions
====
Fixed – XXX
Sergey – XXX
Sergey group local – 8
50k test took so much time that I could not wait for the tests to be completed. It took more then several minutes (maybe 5-7 or so)
I’ve also tested 100k repetitions. So “Sergey group local” took 23 seconds
And finally 200k took 50 seconds. It’s amazing =)
As you can see, in our first two functions time increases quadratically, while group local – linearly
We have a winner 🙂
In the end, I managed to achieve amazing results – processing a million repetitions takes 50 seconds!
Incredible! Impossible! Unbelievable! 😎
====
(Source) =>
let
// Source = “myDBAIsInLoveWithCamelCaseHeaders”,
to_list = List.Buffer(Text.ToList(Source)),
to_table = Table.FromList(to_list, (x)=>{x}, {“word”}),
add = Table.AddColumn(to_table, “isLower”, each Text.Lower([word])=[word]),
group_chars = Table.Group(add, {“isLower”}, {{“word”, each Text.Combine([word])}}, 0, (a,b)=>if b[isLower] then 0 else 1),
len = Table.AddColumn(group_chars, “len”, each Text.Length([word])),
group_words = Table.Group(len, {“len”}, {{“word”, each Text.Combine([word])}}, 0, (a,b)=>if (a[len]=1 and b[len]=1) then 0 else 1),
combine = Text.Combine(group_words[word], ” “)
in
combine
Sergey, you promised to start blogging. It’s a very good reason (one of them 😀 )
Hey friends,
50 seconds is a long time – how about halving it ? 😉
(Source) =>
let
to_list = List.Buffer(Text.ToList(Source)),
Transform = List.Transform(to_list, each if Text.Lower(_) = _ then _ else ” “&_),
Reassemble = Text.Combine(Transform)
in
Reassemble
Cheers, Imke
Thank you for sharing. The challenge is that we want to keep acronyms together. MyDBALovesCamelCase -> My DBA Loves Camel Case. MyDBIsExcel -> My DB Is Excel
Oh, I see – sorry – didn’t get that.
BTW: Sergey’s solution is a very nice utilization of the 5th parameter(http://www.thebiccountant.com/2018/01/21/table-group-exploring-the-5th-element-in-power-bi-and-power-query/) – don’t think that it has any room for improvement 🙂
Gil, could you please describe in more details – what happened when you changed your code to this:
resultRecord =
if aggregatedResult = null then
null
else
[result = aggregatedResult, index = state[index]+1, source = state[source]]
Do I understand it correctly that comparison with ‘null’ forces evaluation of aggregatedResult, so there is no more “record of promises”?
I do not understand this collision clearly…
In fact, if the third argument of List.Accumulate is the short function (no ‘let’ expression), there will be no “record of promises”?
What is the difference (or, is there is a difference) if we write something like:
resultRecord = [result = if aggregatedResult = null then null else aggregatedResult, index = state[index]+1, source = state[source]]
will this record be evaluated without promises?
For CamelCase splitting, there is an easier solution (from https://community.powerbi.com/t5/Desktop/Split-CamelCase-text-to-new-column/m-p/324578/highlight/true#M144697)
So one could write
Table.TransformColumnNames( PreviousStep, each Text.Insert( _, Text.PositionOfAny( _, {“A”..”Z”}, Occurrence.Last), ” “))
Too late?
(Source as text) as text =>
let
str2= Source & “A”,
TxtList=List.Buffer(Text.ToList(str2)),
NumRng={1..List.Count(TxtList)-2},
Uppers={“A”..”Z”},
logic=(x)=>List.PositionOf(Uppers,TxtList{x})-1 and
(List.PositionOf(Uppers,TxtList{x-1})=-1 or List.PositionOf(Uppers,TxtList{x+1})=-1),
ret=List.Transform(NumRng,(_)=>if logic(_) then ” ” & TxtList{_} else TxtList{_}),
ret2=Text.Combine({TxtList{0}} & ret)
in
ret2
Thank you Kaz. It’s never too late 🙂