In the last blog post, we outlined the challenge in creating robust queries that will be resilient to changes in data structure, and will not fail a refresh, or worse – show wrong results. In today’s post, we will focus on the second pitfall, which is probably the most common catalyst for refresh errors.
Pitfall #1 was the hidden Formula Bar
After activating the Formula Bar, we have avoided the first pitfall, and have taken away our blindfolds. From now on, as you click through the commands in the Query Editor, you can occasionally inspect the auto-generated formulas.
The first thing you should look for in the formula bar is column names. Watch out from unnecessary references to column names.
In the last post, we have briefly walked through the most common weak spot that is created by the Query Editor. The Changed Type step.
Meet Pitfall #2 – The Automatic Changed Type Step
Although you have already met, when we illustrated pitfall #1, the second pitfall wasn’t “officially” introduced. And this nasty fellow definitely deserves a dedicated spot in the pitfalls’ hall of fame (or shame). The automatic detection of column types is probably the number one source of refresh failures.
To find pitfall #2, check if you have a Changed Type step in APPLIED STEPS after you import a table. Fortunately, now when your formula bar is active, you know why the query can fail to refresh in the future. Any minor change in the column names, or a column removal will yield an error. Are all the columns crucial? Did you establish a “data contract” with the owner of the external data source to keep all the columns, and never to rename them?
The simplest and most common solution to avoid Pitfall #2 is to delete the step (as shown in the next screenshot). You don’t need this step. Really!!!
Let me clarify – You do need correct column types for some columns in your tables. Without explicit column types, you will not be able to perform numeric calculations, or apply time intelligence, or use the column as Values in PivotTables in Excel and in Power BI visualizations.
But by all means, don’t rely on the automatic Changed Type. Instead, change the types by yourself, and consider doing it in the latest possible step.
A Good Practice: Better the Later
As a good practice, if you perform type changes as the latest necessary step, you gain the following advantages:
Performance / Refresh Time: It’s better to first apply a filter that drastically reduces the number of rows, and then change the type, rather than changing the type and then apply the filters. It makes sense, right?
Error Handling: Changing column types leads to cell-based errors for values that cannot be converted to the new type. When these errors appear earlier in the chain of transformation steps, it is sometimes more difficult to troubleshoot it. Here, for example, a table with the textual value “Two” in column Age.
Obviously, after you change the column type to Whole Number, you’ll get an error on the relevant cell. It’s east to detect the error (You can also use Home –> Keep Rows –> Keep Errors command to isolate the relevant rows).
But now, as a next step, if you perform Grouping on column Age, you’ll get a query-wide refresh error, which is more difficult to troubleshoot. Grouping before the type change would be easier to troubleshoot.
Types are not persistent in advanced scenarios: When you apply type changes, and perform additional transformations, the types will not be preserved. For example: When you build a query that cleans an Excel workbook, then convert the query into a function, and apply it on a folder of Excel workbooks. Did you notice that the types you changed inside the function are lost when you extract the custom column that contains all the tables?
Note: To read more about column types in Power BI go here.
So why did Microsoft Introduced this auto-generated step?
For simplicity reasons, of course. The need to make an explicit change of types makes the solution complex. For example: How many times have you seen the error below? Without the correct column types, you cannot apply a Sum on a textual column.
But failures to refresh the data are much more serious than an initial failure to apply a Sum on a column. So – If you agree – by now, you’re probably asking yourself the following question:
Can we disable the default auto-generated Changed Type?
Yes, we can.
You can open the Options dialog (Query Editor –> File –> Options and Settings –>Options) and uncheck the first checkbox under CURRENT FILE –> Data Load.
Unfortunately, we don’t have this checkbox under Global –> Data Load, so to apply the override for Pitfall #2, we will need to repeat unchecking this box on each new file we create (PBIX, or Excel workbook). I hope that Power BI team, will implement this checkbox under Global. This way, we will have a perfect solution for Pitfall #2.
Can we manipulate the auto-generated formula, instead of deleting the entire step?
Sure, let’s demonstrate it. In the following screenshot, you can find the auto-generated formula below.
= Table.TransformColumnTypes(Table2_Table,{{"Region", type text}, {"Jan-16", Int64.Type}, {"Feb-16", Int64.Type}, {"Mar-16", Int64.Type}, {"Apr-16", Int64.Type}, {"May-16", Int64.Type}, {"Jun-16", Int64.Type}, {"Jul-16", Int64.Type}, {"Aug-16", Int64.Type}, {"Sep-16", Int64.Type}, {"Oct-16", Int64.Type}, {"Nov-16", Int64.Type}, {"Dec-16", 11Int64.Type}})
Let’s assume that in the example above, we would like to only keep the column Region. Here is the modified formula:
= Table.TransformColumnTypes(Table2_Table,{{"Region", type text}})
The initial code includes a list of nested lists. Each nested list is a pair of:
{"[column name]", type [some type]}
If you don’t have a “data contract” on specific columns, and you expect that some specific column names will change or that some columns will no longer exist in the future, you can modify the Changed Type step, by identifying the columns you want to remove, and for each column, deleting the section above.
Conclusions
Let’s hope that one day, the checkbox Automatically detect column types, will be available in Options –> Global, so we’ll be able to change the default behavior, and with a single click, get rid of the second pitfall for good (versus deleting the step every time we have it, or unchecking the box for each file). To make this change, please vote for it on Power BI Ideas. Here is the link.
What’s Next? Pitfall #3
(Spoiler Alert) Filtering can be a dangerous business. Continue reading here.
Hope you find this series useful.
Great post! Looking forward to continuation 🙂
Pingback: 10 common mistakes you do in #PowerBI #PowerQuery – Pitfall #3 - DataChant
Pingback: The Ten Pitfalls of the Data Wrangler (aka the Wielder of M/Power Query) - PowerPivotPro
This is great work… loving it .. thanks for all your efforts.
Thank you Afzal.
Thank you a lot!