10 common mistakes you do in #PowerBI #PowerQuery – Pitfall #3

Prologue

In our last episode of the Pitfalls series here, we were introduced to Pitfall #2 – The automatic detection of column types, which can trigger common refresh errors. In today’s post, we will learn about the third pitfall, which is probably more dangerous than all the pitfalls combined!

Pitfall #3 is so dangerous. It is almost invisible. You will easily miss it when you create the report, or when you refresh it. Pitfall #3 will create a bias in your reports. But when you finally detect the wrong results, it may be too late. Pitfall #3 is fatal. It exposes you to biased reports, wrong insights, and potentially disastrous business decisions.

And it all starts with a filter in Power BI Query Editor. Such a trivial and common UI control, right? Almost all the queries that are out there include filter steps.

Let’s build the suspense, and reveal the third pitfall in a memorable way – through a short story. A fiction story about a Power BI / Excel analyst, called Bob.

Chapter 1

It was a stormy morning at FakeCompany Inc. Bob, was working on his cool Power BI report, when his boss stormed into his cubicle.

“Look at this report”, said the boss, and handed a sheet of paper to Bob. With trembling hands, Bob unfolded the paper, and read this spreadsheet:

Pitfall 3 in Power BI and Power Query

“Oh, I know this report, from last year! Every year all the employees receive a rank from 1 to 5”, said Bob, “and those who gets a rank of 1 are being laid off”.

“Yes, but this time, it’s different”, said the boss, with a sinister look. “There is a bigger layoff this time. The layoff will also include those with rank 2. Please get the list from HR, and send me a Power BI report of all the employees that stays with us”, demanded the boss. “Oh, and don’t worry, Bob. With your outstanding reporting talent, you got a rank of 5 this year, so get ready for your promotion as our new VP of Business Intelligence!”.

An adrenaline rush swept down on Bob. He opened Power BI Desktop and started to work.

Chapter 2

Bob got the following Excel file from HR (Hey, this is an interactive story. You can download it from here as well).

Pitfall 3 - 10 mistakes you always do in Power BI and Power Query

He opened his latest version of Power BI Desktop, clicked the Get Data button and selected Excel.

Pitfall 3 - 10 mistakes you always do in Power BI and Power Query in Excel

After selecting the HR file (Hey, you can follow Bob’s steps. Haven’t you already download the HR file from here?), Bob selected the EmployeePerformance table in the Navigator window and clicked Edit.

Pitfall 3 - 10 mistakes you always do in Power BI and Power Query in Excel

Next, Bob selected the column Performance Rank, clicked on the filter control, and unchecked the ranks 1 and 2.

Finally, with a triumphant click on OK, he created a list of all the employees that will survive the layoffs, and emailed the Power BI Report to his boss.

Pitfall 3 - Power BI
Hint: This was a crucial moment in Bob’s career – The filtering he will never forget

“Dear boss, Attached is the Power BI report with the employees that will remain with the company this year. Looking forward to my VP promotion. Bob”.

Chapter 3

Few hours later, Bob’s boss sent the file to Legal. Patrick, the Power BI specialist on the Legal team, applied Merge Queries, to combine Bob’s list, with the list of all the employees’ email addresses:

Pitfall 3 - Power BI

Using a Left Anti merge, Patrick created a list of all the employees that should receive a layoff letter.

Chapter 4

A thunder stroke outside the narrow rusty windows in the open space of the BI department. But Bob didn’t flinch. He has been staring with horror at his Power BI report. On his second 30″ flickering monitor, a dramatic email has been staring back at Bob. A layoff letter.

Before Bob left the building, he sent his boss one last Power BI report:

Do you know what was done wrong in Bob’s report?

Power Query Editor allows you to filter the data by clicking on the highlighted control.

When you check or uncheck specific values in the filter pane, you should know that in different circumstances, your decision will be interpreted differently by the Query Editor.

Let’s see what happens when we uncheck the values 1, 2 and 3 in the filter pane of the table above (with the values 1, 2, 3, 4, 5, 6, 7 in Column1).

Checking the formula bar, you’ll see that the following condition was auto-generated:

[Column1] <> 1 and [Column1] <> 2 and [Column1] <> 3

(Let’s call this condition type “the negative condition”, because it uses the not-equal <> sign).

But, an equivalent condition could be written as follows:

[Column1] = 4 or [Column1] = 5 or [Column1] = 6 or [Column1] = 7

(Let’s call this condition type “the positive condition”, because it uses the equal = sign).

Which formula is better? If your dataset can only have the values from 1 to 7 in Column1, both conditions are equal. But in many scenarios, you only see a preview of the data, so if you expect to find other values in the external data source, which are not seen in the preview, the two conditions will yield two different filtering results.

For example: If we have a new row with value 8 in Column1, the negative condition above will include that row, and the positive condition will filter it out.

Let’s see another example. This time Column1 contains only the values 1-6. On purpose I use a pair number of values.

Let’s filter Column1, and uncheck value 1.

The results make sense. We filtered out the rows with value 1, and the formula contains the negative condition:

[Column1] <> 1

Let’s repeat the test. This time, we will uncheck values 1 and 2.

Again, the results make sense. We have a negative condition that excludes values 1 and 2.

And now comes the surprise:

Let’s uncheck values 1, 2 and 3.

We expected to find a negative condition, right?

[Column1] <> 1 and [Column1] <> 2 and [Column1] <> 3

Instead, we got the positive condition:

[Column1] = 4 or [Column1] = 5 or [Column1] = 6

And this is where we can fall down the third pitfall. When we try to filter values in filter pane, and the number of unchecked values is equal to the number of checked values, Power Query will generate a positive condition. Even when we may need a negative condition.

Further tests that you can make with the filter pane, will reveal the following logic:

  • When the unchecked and checked value are equal in number, Power Query will generate a positive condition.
  • When you used the search box in the filter pane, and then checked or unchecked certain values, Power Query will always generate a positive condition.
  • When the number of checked values is smaller than the number of unchecked values, Power Query will generate a positive condition.
  • When the number of checked values is greater than the number of unchecked values, Power Query will generate a negative condition.

So, Bob unchecked the ranks 1 and 2, to exclude these ranks from his report. Unfortunately, the Query Editor loaded the first 1000 rows, excluding rank 5 from the preview. As a result, Power Query generated a positive condition, which filtered out all the employees with the rank of 5.

How to avoid the third pitfall

  • Ensure that the Formula Bar is visible (Pitfall #1 here), and check the condition on the formula, after each filtering step.
  • Make sure the auto-generated formula is correct.
    1. Use negative conditions, if you want to exclude specific values (For example: to exclude the years 2010 and 2011 in column Date use the negative condition: [Date] <> 2010 and [Date] <> 2011.
    2. Make sure you use positive conditions, if you want to include specific values (For example: to include the years 2012 and 2013 in column Date use the positive condition: [Date] = 2012 or [Date] = 2013.
  • If the formula is incorrect, convert the formula from a positive condition to a negative one or vice versa.
  • Pay attention to the warning “List may be incomplete”. If your data is relatively small, click Load more before you select the values for the filter.

Pitfall 3

  •  Consider using the Filters drop down menu (e.g. Number Filters), to gain more control on the filter condition, or when the values that you need in the filter, are not available in the preview.

Pitfall 3 - Use the filter menu

Epilogue

Bob stepped into the big conference room in FakeCompany Inc. His former boss waved him to sit, as his CEO swooped in from a second door. “Bob, We are sorry for last year’s layoffs” said the CEO. “I am excited to tell you, that we will hire back all of our former employees.”, continued the CEO, “We applied your filtering fixes on all of our reporting solutions, and the results are staggering!!! Welcome back, Bob!”.

THE END

The story, all names, characters, and incidents portrayed in this article are fictitious. No identification with actual persons (living or deceased), places, buildings, and products is intended or should be inferred.

Hope you found this blog post useful, and beware of the third pitfall.

Keep reading Pitfall #4 – Hint: Column ordering can be a real chaos.

22 comments

    • Gil Raviv Post authorReply

      Thank you Tristan. Get ready to learn how to rename columns in a robust way. That is my next part.

    • Gil Raviv Post authorReply

      My first attempt to combine fiction with tech writing. Thank you for the feedback.

  1. Oxenskiold Reply

    To give your thrilling story an extra dimension you should have named your main character Bob De Morgan. Bob for short. Thank you for the insight and for sharing it with the community.

    • Gil Raviv Post authorReply

      Thank you Oxenskiold,
      I thought of Bob Sponge. Who is Bob De Morgan? Sorry if it’s a dummy question.

  2. Oxenskiold Reply

    Hi Gil,
    Augustus De Morgan came up with this wonderful law or theorem in propositional logic: NOT(A and B) NOT(A) or NOT(B)

    This is the very law that the Power Query team is using in arriving at what you’re are calling the ‘the positive condition’. The team wants to apply all those ranks that are NOT different from the ranks 3,4 and 5.

    That wish in propositional logic would be:
    NOT(rank 3 and rank 4 and rank 5)

    By applying De Morgan’s law this would be equal to:
    NOT(rank3) or NOT(rank4) or NOT(rank5)

    … and taking it one step further:
    (rank=3 or rank=4 or rank=5)

    Thus arriving at ‘the positive condition’.

    Now imagine if Bob really was the grand-grand-son of Augustus De Morgan!
    😉

    • Oxenskiold Reply

      Sorry it seems that all inequality and implicate operators are stripped when I post the reply. That makes it a little difficult to see what I explain. Well anyway it was a good story 🙂

  3. Pingback: Split & Merge Columns - Pitfalls #7 and #8 - DataChant

  4. Pingback: The Ten Pitfalls of the Data Wrangler (aka the Wielder of M/Power Query) - PowerPivotPro

  5. Anonymous Reply

    Very interestng. May be, a good advice would be “never use filters in Power query?”

  6. Victor Maae Jensen Reply

    Great post, highly appreciated. Can you please elaborate on what to do if all collumns display “List may be incomplete”? Should you really run through all columnms and display all data before filtering? Thank you in advance.

    • Gil Raviv Post authorReply

      If you know the data, I recommend using the Text Filters instead of manually checking values in the list. Displaying all data may help, but will not solve future potential errors when new values are added to your data and the M logic was not generated correctly.

Leave a Reply