Power BI and Excel offer a wide variety of collaboration and sharing experiences. In today’s blog post, we will share a useful method to easily obfuscate / mask confidential text (e.g. client names) in Power BI, and share the reports and dashboards without exposing the original text to others. With the method below, you can streamline the obfuscation phase into your queries in Power BI and Excel, so the actual data that is stored in the report is already obfuscated when the data is refreshed. And everything is automatic!!!
To download the entire Power BI solution, subscribe to DataChant, get the link to the subscribers’ folder by email, and download the file: Text Obfuscation.pbix.
How can we obfuscate text in our Power BI reports?
We will build queries that will randomize our text messages, by converting each English letter to a different one.
Note: In this blog post we will share a method that convert English letters, or any language that shares the English characters. The solution that is shared with DataChant subscribers has another method to obfuscate any language and any text.
So, the text message: “This is a top secret text”, can randomly be converted to “Nlri ri s vqt imzhmv vmwv”, or any other combination of letters. Each refresh, different conversion rules are generated, so the likelihood to decode the obfuscated text is low (Especially if you frequently refresh the report).
Ready to start?
Let’s assume you have a Power BI report with table that contains confidential textual column.
Create a blank query, and paste the following M expression in the Advanced Editor:
let FnConvertRange = (StartingChararcterNum, Count) => let Source = List.Random(Count), #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error), #"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "To"}}), #"Added Index" = Table.AddIndexColumn(#"Renamed Columns", "Index", StartingChararcterNum, 1), #"Sorted Rows" = Table.Sort(#"Added Index",{{"To", Order.Ascending}}), #"Added Index1" = Table.AddIndexColumn(#"Sorted Rows", "Index.1", StartingChararcterNum, 1), #"Removed Columns" = Table.RemoveColumns(#"Added Index1",{"To"}), #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"Index.1", "Index"}), #"Renamed Columns1" = Table.RenameColumns(#"Reordered Columns",{{"Index.1", "From"}, {"Index", "To"}}), #"Transposed Table" = Table.Transpose(#"Renamed Columns1"), ConvertedResult = Table.PromoteHeaders(#"Transposed Table", [PromoteAllScalars=true]) in ConvertedResult, Uppercase = FnConvertRange(65, 26), Lowercase = FnConvertRange(97, 26), Source = Record.Combine(List.Combine({Table.ToRecords(Uppercase),Table.ToRecords(Lowercase)})), #"Converted to Table" = Record.ToTable(Source), #"Transposed Table" = Table.Transpose(#"Converted to Table"), #"Promoted Headers" = Table.PromoteHeaders(#"Transposed Table", [PromoteAllScalars=true]) in #"Promoted Headers"
Rename the new query to ConversionTableEnglish.
Explaining the code:
The code above, receives two ranges of numbers, that represent the ASCII codes for the uppercase letters ABC..Z and the lowercase ones: abc…z. Next, we create two tables that translate each ASCII number in the uppercase and lowercase subsets into a different random number in the subset. Next, we append the two tables into a single record of conversions, as shown in the screenshot below. Each key-value pair is a conversion rule between a letter (as a key), and the converted letter (as a value).
Next, in the code above, we convert the record to a table:
Next, we transpose the table and use the first row as headers, to allow us to easily access each letter by its column name and find the new letter we should use.
And here is the final preview of the query ConversionTableEnglish:
Now, let’s create a new blank query, and paste the following expression in the Advanced Editor:
(txt, convTable) => let FnConvertCharacter = (c, convTable) => let num = Character.ToNumber(c), converted_num = try Table.Column(convTable, Text.From(num)){0} otherwise num, converted_char = Character.FromNumber(converted_num) in converted_char, ObfuscatedList = List.Transform(Text.ToList(txt), each FnConvertCharacter(_, convTable)), res = Text.Combine(List.Transform(ObfuscatedList, Text.From)) in res
Rename the new query to FnObfuscateText.
Select the query with the textual column you want to obfuscate, and click on the fx icon to add a new step.
Next, copy and paste the formula below into the formula bar
= Table.TransformColumns(Source,{"Confidential Column", each FnObfuscateText(_, ConversionTableEnglish)})
Inside the formula, you should modify Source to reflect the actual name you have in previous step of Applied Steps, Then, modify Confidential Column to reflect the actual column name that needs to be obfuscated.
Now you can close the Query Editor by clicking Close & Apply, and all your visuals who used the textual column will now show obfuscated text.
Now you can publish the report, create a dashboard and share it with colleagues and clients with better means to hide confidential data. If you are using the Publish to Web functionality to share Power BI report on your website, this is a great method to automate and streamline an obfuscation step, which will allow you to share reports without the manual masking of your confidential data.
Hope you’ll find this blog post useful. Don’t forget to subsribe to gain access to the Power BI report. You will be able to reuse it from now one, when you need to mask data before you share your reports.
I think I would be inclined to leave the field off my report or simply replace it with some standard text like “[customer name removed]” than to scramble it with random letters. While this is a creative use of M, I’m not sure that it is something that would likely be used at my company. Am I missing a reason why this would be the preferred approach?
Thanks for your great insight into PowerQuery/M!
Hi Cj,
I don’t think that the obfuscation approach is better than your approach. It is a matter of design. In your approach, you may still need to assign unique IDs as a suffix for “[customer name removed]” otherwise some of your visuals and slicers will show a single customer.
Nice! Hadn’t thought through how the dynamic aspects of this might work. I was simply seeing rows on a report. Very good!
I tried to refresh many times but the text in Japanese just remained unchaged without obfuscating, why?
For non-English languages you can find my solution in DataChant subscription folder that I send by email.
I’ve not received any email from you and I checked your folder the “Text Obfuscation.pbix” remained unchanged – last modified on 03/01?
Hi Julian,
The “Text Obfuscation.pbix” file contains a query that obfuscate all languages. If you cannot find it, please let me know. I will reply with detailed example soon.
Hi Julian.
I have fixed the report. Japanese characters are now obfuscated. My original code didn’t handle Japanese.
Hi, could you also let us know how to do a random replacement for numbers in a column ?
I tried using Number.RandomBeteen but I am getting the following error:
Expression.Error: We cannot convert the value xxxxxxxxxxxxx to type Function.
Sure. I will write a follow up post soon.
That would be great, thank you.
Hi, thanks for this – this is exactly what I’m after. However if I try to obfuscate multiple columns in the same table, it only retains the last step. The previous columns go back to their original state. Is there a way to obfuscate multiple columns in the same table?
Thank you Kyle.
You can obfuscate column by column in two separate steps. Not sure I understand why it didn’t work for you. Can you share a sample report?
Hi Gil, I figured out my mistake. I wasn’t modifying the “Source” with the previous step, therefore it wasn’t keeping that step. It’s working now! Thanks
Hello Gil
Can this be applied to numbers?
Also I subscribed but didnt receive the Text Obfuscation.pbix file. What could be the issue. I am new to power BI and I will be working on a Telecom client. I have 2 tables which is linked on the confidencial field (Customer Telephone no.). I want to mask that field. since that column is also used for linking two table so i need to just scarmble the number in the dashboard.
Say I have a list of clients names as aaa, bbb, ccc, and ddd. I want to apply the obfuscation to all except client bbb. How would I go about doing that?
say I want to implement role based masking. Manager of team A should see the employee names of team A, and masked employee names of team B. Similarly for manager of team B. Additionally employee of team A should see other members of team A as masked (own name not masked).
I would have two disconnected tables. One with all the employees with unique ID and masked names. The other table with Employee unique ID, Email and unmasked name. On the second table you would apply RLS (dynamic). From here you can start building measures and use LOOKUPVALUE whenever you need to show the employee names for the relevant manager’s team. As an alternative to LOOKUPVALUE you may also use inactive relationship between the two tables (to avoid having the unmasked second table filtering the first table). And use USERELATIONSHIP to activate the filter when you need to show the names of your team in a measure.