10 common mistakes you do in #PowerBI #PowerQuery – And how to avoid pitfalls

This is the first article in a series I would like to call Robust Queries in Power BI and Power Query. To follow this series click here. I’ve already used the term Robust Queries here, but I thought it would be important to provide a better coverage of this topic, explain the challenge at hand, and raise awareness to the common mistakes we make in Power BI and Power Query in Excel, and how to avoid them.

In today’s post, we will define the challenge, and will jump right to the first mistake you might do!

The Challenge

Data wrangling and cleansing is so easy with the Query Editor of Power BI and Excel (Power Query Add-In, or Get & Transform). The user interface is easy and rewarding, and it is even fun to use it. As you build your query using the UI, the Query Editor builds a series of formulas (AKA “M”, or Power Query Formula Language) which is based on the transformation steps you performed on a preview of the data. And this is an important thing to remember – The transformation is built on a preview of the data, and is heavily dependent on its format. When the real data starts deviating from the preview data, your queries may fail to refresh, or even worse – Incorrect transformation can lead to invalid data in your reports, which can eventually lead to wrong and dangerous business decisions.

Robust Queries in Power BI & Power Query in Excel
The challenge at hand: Will your query succeed to transform and refresh the data when it gets changed?

What kind of changes in data will we address?

We will focus on four common changes in the data:

  1. Changes in column names
  2. Changes in column types
  3. New columns
  4. New values in columns
  5. Changes in nested field names in JSON/XML

Which Data Sources will we address?

The challenge at hand is not Data Source specific. Any imported table, which is handled by the Query Editor, is prone to the common mistakes we usually do when we create the queries. But the most common pitfalls occur in Self-Service BI scenarios, when there is no “data contract” between the analyst and the owner of the external data source.

For example, one of the most common loose “data contracts” is when you need to combine multiple Excel or CSV files from a folder. While all files are expected to be in the same format, they are gradually modified by different owners, and will eventually break your queries.

Ready to Start? Let’s start with the first mistake:

Mistake #1 – Not Showing the Formula Bar

Mistake #1 - Not showing the Formula Bar
Mistake #1 – Not showing the Formula Bar

What is the Formula Bar?

If you don’t know what is the formula bar, that is alright. Just open the Query Editor and check the Formula Bar checkbox in View tab. From now on, you will always have this bar visible on any query you are working on. You can also enable the Formula Bar in the File tab –> Options & Settings –> Options –> Global –> Query Editor as shown in this screenshot:

The Formula Bar in the Query Editor of Power BI and Power Query in Excel

Why is the Formula Bar so important? Know Your Enemy

If this is the first time you see the formula bar, you may feel a warm surprise, or a shriek of fear. If you have the latter symptom, don’t worry.

The Formula Bar in Query Editor of Power BI and Power Query in Excel

I know, some of you don’t really want to learn a new functional language like M. Excel functions, DAX, or whatever language you are mastering is enough. I agree with you. M is not mandatory for you to master. BUT, if you want to avoid having breaking changes in your reports, or wrong business decisions, you should at least understand where your queries are weak.

By getting to know what to look for in the formula bar, even without knowing the language itself, you will be able to understand the weakness of your queries, and to be better equipped to fix them when they fail. In addition, you will be able to make better assumptions on the data you expect to get, and even to establish “data contracts” with your colleagues who own the extrenal data sources.

Like in Sun Tzu’s The Art of War famous saying “Know Your Enemy”, with the formula bar active, you will better detect the potential failures of your query.

An example of a weak query that the formula bar can detect

Here is an example for a weak query that the Formula Bar can detect.

Note: There are many different weaknesses that the Formula Bar can detect. And in the next posts in this series, you will learn what to look for in the formula bar to detect weaknesses in your queries. In some cases you will also learn how to modify the formula to turn your weak query into a robust one.

Let’s import a table from an external Excel file. In my example, the table contains the months of the year as column names: Jan-16, Feb-16, etc.

Robust Queries example in Power BI and Power Query in Excel

To start importing the file, click Get Data –> Excel in Power BI Desktop (or Data –> New Query –> From File –> From Excel in Excel 2016).

Robust Queries in Power BI and Power Query in Excel

In the Navigator window, select a table, and click Edit.

Robust Queries in Power BI and Power Query in Excel

Now, in the Query Editor, when looking in the last step of APPLIED STEPS, you can see that the Query Editor converted the column types.

Looking at the formula bar reveals a function Table.TransformColumnTypes. Inside the function, you can see a code that references the actual column names. For example: {“Jan-16“, Int64.Type}, {“Feb-16“, Int64.Type}.

Robust Queries in Power BI and Power Query in Excel

Realizing that actual column names were used in the code is important. You don’t need to understand the code! Just seek for column names. Once you find the columns names in the code, you know that the query will not work properly if the column names will change.

In our case, you will get this error if in the future Jan-16 will be replaced with Jan-17.

Robust Queries in Power BI

The solution:

While you can always contact your collegues to define a “data contract” that will ensure that the column names will not changes, in many scenarios the default Changed Types step is unnecessary. You can delete Changed Type in APPLIED STEPS by clicking the X icon.

On some cases, if you feel confident, and some of the column names are crucial, you can modify the formula to include only the column names which are crucial to you.

Note: If you are an advanced user, you may prefer using the Advanced Editor to review the entire expression of the query. It’s a good practice to review the entire code and look for weaknesses like unnecessary use of column names.

Conculsions for Pitfall #1

By all means, make sure you activate the formula bar in the Query Editor. It will help you detecting weaknesses in your queries. In today’s example we learned why it is important to seek for column names in the formula bar. This is the first step in creating robust queries and avoiding common mistakes that may lead to breaking changes in your reports or, even worse, to wrong reporting that will lead to dangerous business decisions.

In the next post of Robust Queries in Power BI and Power Query (AKA “The 10 Pitfalls”), we will continue our journey and learn the next mistake we always do when we create queries in Power BI or Excel.

Click here for the Pitfall #2.

 

10 comments

  1. Anthony NewellReply

    From my point of view, I’ve always had a small sense of fear in how power queries can be tightly bound to the data source once you’ve set them up. If you do have a ‘data contract’ in place and can have certainty the source won’t be screwed up that is excellent. However, and despite this I’ve suffered instances where characteristics of the data source have changed. For example if you are pulling data from an excel file someone may change the worksheet name you are pulling from. The best I could come up with was the following:
    Drive your queries from a set of parameters you maintain for example, filename, folder name. Should they change you can easily update the parameter without reverting to changing the M code – so ‘parameterise’ your queries. Obviously, there is a sensible limit to how far you would want to take this and you’d need assumptions that column names etc would remain consistent
    If for any reason the query still choked, then an M error handler should be used to provide an obviously recognisable alternate output, alerting you to the fact that something in the data source has changed, exiting gracefully rather than leaving an ugly error
    The analogy is similar to recorded macros vs structured VBA code. If you record a macro you establish reliance on hardcoded, tightly bound references. So ‘recorded’ power queries should be evolved to be more robust via parameterising and error trapping
    I welcome any further evolvement in M to help facilitate this further, for example the ability to trigger a warning message like you can do in VBA: “Warning, data source has changed, cannot execute query” or such like

    • Gil Raviv Post authorReply

      Thank you Anthony for sharing. These are really great points, especially for those who feels safe with M. This series is more about the prevention of errors. Error handling will not be covered, but is worth a dedicated coverage.

  2. Pingback: 10 Common Mistakes You Do in #PowerBI #PowerQuery – Pitfall #2 - DataChant

  3. Pingback: 10 common mistakes you do in #PowerBI #PowerQuery – Pitfall #3 - DataChant

  4. Pingback: Import all CSV files from a folder with their filenames in Excel - PowerPivotPro

  5. Pingback: Column Reordering Disorder - Pitfall #4 in #PowerBI #PowerQuery - DataChant

  6. Pingback: Split & Merge Columns - Pitfalls #7 and #8 - DataChant

  7. Pingback: Expand Table Column - Pitfall #9 - DataChant

  8. Pingback: The Ten Pitfalls of the Data Wrangler (aka the Wielder of M/Power Query) - PowerPivotPro

  9. Pingback: #Excel Super Links #3 – shared by David Hager | Excel For You

Leave a Reply