Split CamelCase headers in M (And a fix for the Record of Promises)

How many times did you have to work with tables that had headers like these?

Client_ID, Transaction_ID, Product_Name, My_DBA_is_in_Love_with_Underscored_Headers

No need to manually rename the column names any more, if you work with Excel Get Data / Power Query, or Power BI. You can just apply the following formula in Query Editor:

= Table.TransformColumnNames(#"Previous Step", each Replacer.ReplaceText(_,"_", " "))

Let’s explain. Imagine that you already loaded your source table into Excel Get Data or Power BI. Now when Query Editor is open, let’s say your last step in Applied Steps is Previous Step. To remove all the underscore characters from your headers you can follow these steps:

  1. Click the fx control near the formula bar.
  2. The Custom1 step will be added in Applied Steps, with the formula: = #”Previous Step” in formula bar (If you don’t see the formula bar, please turn it on).
  3. Replace the formula with the formula with the preceding Table.TransformColumnNames formula, and press Enter.

You will see that the underscores in your headers are magically gone:

Client ID, Transaction ID, Product Name, My DBA is in Love with Underscored Headers – but I don’t care

The formula above uses the function Table.TransformColumnNames, which receives a table and a name generator function as its main arguments. The name generator function, should receive a single argument of text (the column name), and returns text (the transformed column name). It can be a built-in function like Replacer.ReplaceText and Text.Proper, or can be any custom function you create (And you will see a great example in a minute).

In our case above, the name generator returns a text with spaces instead of underscores.

You can use similar techniques to capitalize each word of your headers:

= Table.TransformColumnNames(#"Previous Step", Text.Proper)

Or use this formula to lowercase your header (I don’t like lowercased headers, but I won’t be judgemental):

= Table.TransformColumnNames(#"Previous Step", Text.Lower)

Or you can use this formula to change the headers to uppercase (I may start being judgemental if you do it).

= Table.TransformColumnNames(#"Previous Step", Text.Upper)

The CamelCase Challenge

But what would you do if your source tables have the following headers:

ClientID, TransactionID, ProductName, MyDBAIsInLoveWithCamelCaseHeaders

I cannot even try to count how many times I have manually renamed such column names in the Query Editor. Can you?

Luckily there is a solution. You can split CamelCase headers using the Table.TransformColumnName and the following list generator custom function:

Important Note: This function should not be used on very large text input. On very large text, you will get a Stack Overflow error. Read the Important Updates section below for better custom function, that will not lead to errors.

(Source) =>
let    
    res = List.Accumulate(
        Text.ToList(Source),
        [result="", index=0, source=Source],
        fnAccumulator
    ),

    IsUpper = (txt) => txt <> "" and txt = Text.Upper(txt) and txt <> Text.Lower(txt),

    fnAccumulator =
        (state as record, current as text) as record =>
        let
            prevCharacter =
                if state[index]=0 then
                    ""
                else
                    Text.At(state[source], state[index] - 1),

            prevCharacter2 =
                if state[index]<=1 then
                    ""
                else
                    Text.At(state[source], state[index] - 2),
            
            nextCharacter =
                if state[index] = Text.Length(state[source]) - 1 then
                    ""
                else
                    Text.At(state[source], state[index] + 1),

            aggregatedResult = 
                if state[index]=0 or current = "" then
                    current
                else
                    if  IsUpper(current) and
                        (not IsUpper(prevCharacter) or 
                         (IsUpper(prevCharacter2) and not IsUpper(nextCharacter))) then
                    state[result] & " " & current
                else
                    state[result] & current,
            
            resultRecord =
                [result = aggregatedResult, index = state[index]+1, source = state[source]]

        in
            resultRecord
in
    res[result]
  1. In Query Editor, create a blank query. In Home tab, click Advanced Editor and paste the expression above. Close the Advanced Editor, and rename the blank query as FnSplitCamelCase.
  2. Select your CamelCased table in Queries pane. Click the fx control near the formula bar.
  3. Paste the following line in the formula bar:
= Table.TransformColumnNames(#"Previous Step", FnSplitCamelCase)

Once you press Enter you will see that your column names are changed. Here are few examples:

Before:

ClientID, TransactionID, ProductName, MyDBAIsInLoveWithCamelCaseHeaders

After:

Client ID, Transaction ID, Product Name, My DBA Is In Love With Camel Case Headers

Note that FnSplitCamelCase keeps the “DBA” part together in the last column name above.

To understand the code of the custom function, you can read my article on List.Accumulate here. List.Accumulate enables us to pass the column name as a list of single characters, and accumulate the transformed column name as the result. The decision to add a space between two characters is relies on several factors:

  • The index of the current character – which can be accessed in the function via state[index]
  • The case of the current character – a character is uppercased if its uppercased version is equal to the character itself. To test it we used: Text.Upper(character) = character

Important Updates – A Record Of Promises

Sergey Lossev shared in the comments below a brilliant alternative custom function that is shorter in code. You can use it instead of the original one as your custom function: FnSplitCamelCase. Thank you Sergey.

// Shared by Sergey Lossev
(Source) =>
let
    to_list = Text.ToList(Source),
    accum_chars = List.Accumulate(to_list, "", (sum, x) => sum & (if x = Text.Upper(x) then " " & x else x)),
    split = Text.Split(Text.Trim(accum_chars), " "),
    accum_words = List.Accumulate(split, "", (sum, x) => sum & (if Text.Length(x) = 1 then x else " " & x & " ")),
    replace = Text.Trim(Text.Replace(accum_words, "  ", " "))
in
    replace

As I was learning Sergey’s solution, I thought it should perform slower, as the function runs four times over the text input, while my function runs once. So, I compared the performance of the two functions and realized that I was wrong. The refresh time was similar as my function was accessing four characters in each iteration. So, the two functions run over the characters four times.

When I tested the functions on large text messages (e.g. 10000 characters or more), I found out that my original function led to stack overflow. Download the workbook here to find out for yourself.

I suspected that the overflow is related to the long text that I had stored in the state variable of the accumulator function. So, I contacted Microsoft, and got a great (and even poetic) answer from Curt Hagenlocher:

There’s a conspiracy between the List.Accumulate and the “let” in your function. Let members are calculated lazily, so what’s happening is that each step – instead of returning a record of values – returns a record of promises. When it comes time to actually produce the final value, we have to traverse the linked list of promises in order to perform the calculation and this happens using the stack. When the list is long enough *boom* stack overflow.

The way to work around this behavior is to force the let member to be evaluated inside the function body.

Here is the fixed version of my custom function, following Curt’s recommendation:

(Source) =>
let    
    res = List.Accumulate(
        Text.ToList(Source),
        [result="", index=0, source=Source],
        fnAccumulator
    ),

    IsUpper = (txt) => txt <> "" and txt = Text.Upper(txt) and txt <> Text.Lower(txt),

    fnAccumulator =
        (state as record, current as text) as record =>
        let
            prevCharacter =
                if state[index]=0 then
                    ""
                else
                    Text.At(state[source], state[index] - 1),

            prevCharacter2 =
                if state[index]<=1 then
                    ""
                else
                    Text.At(state[source], state[index] - 2),
            
            nextCharacter =
                if state[index] = Text.Length(state[source]) - 1 then
                    ""
                else
                    Text.At(state[source], state[index] + 1),

            aggregatedResult = 
                if state[index]=0 or current = "" then
                    current
                else
                    if  IsUpper(current) and
                        (not IsUpper(prevCharacter) or 
                         (IsUpper(prevCharacter2) and not IsUpper(nextCharacter))) then
                    state[result] & " " & current
                else
                    state[result] & current,
            
            resultRecord =
            if aggregatedResult = null then
                null
            else
                [result = aggregatedResult, index = state[index]+1, source = state[source]]

        in
            resultRecord
in
    res[result]

This function doesn’t reach the stack overflow. The if statement that is used inside resultRecord enforces the M engine to compute the record in each iteration, and prevents the stack overflow.

Finally

FnSplitCamelCase is a very handly custom function. It can also help you to handle CamelCase values in columns – not just headers. For example, if you have CamelCase values in Column1, you can click fx after Previous Step in Applied Steps, and paste the following formula:

= Table.TransformColumns(#"Previous Step",{{"Column1", FnSplitCamelCase, type text}})

Do you have a better function to split CamelCase? Please share in the comments below.

Subscribe to DataChant to download the solution file CamelCase.pbix from the subscriber’s folder.

[rad_rapidology_inline optin_id=optin_5]

18 comments

  1. Danny Dennison Reply

    Great function and thanks for sharing! Here’s a case (pun intended) FOR CamelCase… I often PURPOSELY remove spaces from headers in order to allow a “proper” title for DAX measures. For example, Sales Amount = SUM(SalesAmount).

  2. Sergey Lossev Reply

    Tooooo complicated… ifs, nexts, prevs, offsets… Here is an easier way
    ====
    let
    Source = “MyDBAIsInLoveWithCamelCaseHeaders”,
    to_list = Text.ToList(Source),
    accum_chars = List.Accumulate(to_list, “”, (sum, x)=> sum & (if x=Text.Upper(x) then ” “&x else x)),
    split = Text.Split(Text.Trim(accum_chars), ” “),
    accum_words = List.Accumulate(split, “”, (sum,x)=>sum & (if Text.Length(x)=1 then x else ” “&x&” “)),
    replace = Text.Trim(Text.Replace(accum_words, ” “, ” “))
    in
    replace
    ====
    My DBA Is In Love With Camel Case Headers

    • Gil Raviv Post authorReply

      Thank you Sergey for sharing. Very creative approach. I am pasting it here again, because your code was modified when it was published as a comment.

      let
      Source = "MyDBAIsInLoveWithCamelCaseHeaders",
      to_list = Text.ToList(Source),
      accum_chars = List.Accumulate(to_list, "", (sum, x)=> sum & (if x=Text.Upper(x) then " " & x else x)),
      split = Text.Split(Text.Trim(accum_chars), " "),
      accum_words = List.Accumulate(split, "", (sum,x)=>sum & (if Text.Length(x)=1 then x else " "&x&" ")),
      replace = Text.Trim(Text.Replace(accum_words, " ", " ")) // Second argument is doubled space. Not single space.
      in
      replace

      Your approach is smarter and shorter in code. But performance wise – Tooooo slow! your code performs two list accumulates, 1 split and 1 replace. So, my code will run 4 times faster 🙂
      Just teasing… I would never come up with the idea you shared. Very cool.

    • Gil Raviv Post authorReply

      I use Text.Repeat to create a very long text message, than I load the two queries in Excel and see who complete loading the first. Nothing scientific. By the way, my code is not faster, and even worse, when I run it on very long text (e.g. 100K characters, the M engine I receive this error: Expression.Error: Evaluation resulted in a stack overflow and cannot continue. I will update this blog post with your solution, and any update I get from the M team.

  3. Sergey Lossev Reply

    Hello again, Gil!

    You will be surprised, but here is one more solution, that uses no text functions except Text.Lower, Text.Combine and Text.Length 😉

    Here it is
    ====
    let
    Source = “myDBAIsInLoveWithCamelCaseHeaders”,
    to_list = Text.ToList(Source),
    to_table = Table.FromList(to_list, (x)=>{x, Text.Lower(x)=x}, {“word”, “isLower”}),
    group_chars = Table.Group(to_table, {“isLower”}, {{“word”, each Text.Combine([word])}}, 0, (a,b)=>if b[isLower] then 0 else 1),
    len = Table.AddColumn(group_chars, “len”, each Text.Length([word])),
    group_words = Table.Group(len, {“len”}, {{“word”, each Text.Combine([word])}}, 0, (a,b)=>if (a[len]=1 and b[len]=1) then 0 else 1),
    combine = Text.Combine(group_words[word], ” “)
    in
    combine
    ====

    Would you mind to compare perfomance?
    I wonder what is faster this time )

    • Gil Raviv Post authorReply

      Thank you for sharing. You can download the workbook from the link I added in the post, and include the new function there to test which one ends first and loads the text to the table. You can change the Table.Repeat parameter to test longer or shorter text.

  4. Sergey Lossev Reply

    My benchmarks from your CameCase.xlsx

    I’ve tested thousands repetitions of start-string and here are the results (in seconds)

    10k repetitions
    ====
    Fixed – 27
    Sergey – 28
    Sergey group local – 2 (approximately)

    20k repetitions
    ====
    Fixed – 113
    Sergey – 120
    Sergey group local – 3 (It’s also quite difficult to catch the exact time)

    50k repetitions
    ====
    Fixed – XXX
    Sergey – XXX
    Sergey group local – 8

    50k test took so much time that I could not wait for the tests to be completed. It took more then several minutes (maybe 5-7 or so)

    I’ve also tested 100k repetitions. So “Sergey group local” took 23 seconds
    And finally 200k took 50 seconds. It’s amazing =)

    As you can see, in our first two functions time increases quadratically, while group local – linearly

  5. Sergey Lossev Reply

    In the end, I managed to achieve amazing results – processing a million repetitions takes 50 seconds!
    Incredible! Impossible! Unbelievable! 😎
    ====
    (Source) =>
    let
    // Source = “myDBAIsInLoveWithCamelCaseHeaders”,
    to_list = List.Buffer(Text.ToList(Source)),
    to_table = Table.FromList(to_list, (x)=>{x}, {“word”}),
    add = Table.AddColumn(to_table, “isLower”, each Text.Lower([word])=[word]),
    group_chars = Table.Group(add, {“isLower”}, {{“word”, each Text.Combine([word])}}, 0, (a,b)=>if b[isLower] then 0 else 1),
    len = Table.AddColumn(group_chars, “len”, each Text.Length([word])),
    group_words = Table.Group(len, {“len”}, {{“word”, each Text.Combine([word])}}, 0, (a,b)=>if (a[len]=1 and b[len]=1) then 0 else 1),
    combine = Text.Combine(group_words[word], ” “)
    in
    combine

  6. Imke Feldmann Reply

    Hey friends,
    50 seconds is a long time – how about halving it ? 😉

    (Source) =>
    let
    to_list = List.Buffer(Text.ToList(Source)),
    Transform = List.Transform(to_list, each if Text.Lower(_) = _ then _ else ” “&_),
    Reassemble = Text.Combine(Transform)
    in
    Reassemble

    Cheers, Imke

    • Gil Raviv Post authorReply

      Thank you for sharing. The challenge is that we want to keep acronyms together. MyDBALovesCamelCase -> My DBA Loves Camel Case. MyDBIsExcel -> My DB Is Excel

  7. Maxim Zelensky Reply

    Gil, could you please describe in more details – what happened when you changed your code to this:

    resultRecord =
    if aggregatedResult = null then
    null
    else
    [result = aggregatedResult, index = state[index]+1, source = state[source]]

    Do I understand it correctly that comparison with ‘null’ forces evaluation of aggregatedResult, so there is no more “record of promises”?

    I do not understand this collision clearly…
    In fact, if the third argument of List.Accumulate is the short function (no ‘let’ expression), there will be no “record of promises”?
    What is the difference (or, is there is a difference) if we write something like:

    resultRecord = [result = if aggregatedResult = null then null else aggregatedResult, index = state[index]+1, source = state[source]]

    will this record be evaluated without promises?

  8. Kaz Reply

    Too late?

    (Source as text) as text =>
    let
    str2= Source & “A”,
    TxtList=List.Buffer(Text.ToList(str2)),
    NumRng={1..List.Count(TxtList)-2},
    Uppers={“A”..”Z”},
    logic=(x)=>List.PositionOf(Uppers,TxtList{x})-1 and
    (List.PositionOf(Uppers,TxtList{x-1})=-1 or List.PositionOf(Uppers,TxtList{x+1})=-1),
    ret=List.Transform(NumRng,(_)=>if logic(_) then ” ” & TxtList{_} else TxtList{_}),
    ret2=Text.Combine({TxtList{0}} & ret)
    in
    ret2

Leave a Reply