Finding mismatched headers and creating a conversion table – Bonus Exercise – Chapter 4 – #PowerQuery Book

This is the first post of a series of bonus exercises that will accompany my book, Collect, Combine, and Transform Data using Power Query in Excel and Power BIThe topics of these bonus exercises will usually cover areas that are already discussed in the book. They are not crucial for your learning journey if you read the book, but you can consider them as an optional enriching experiences, and extra opportunities to practice the techniques in chapters that you liked.

Bonus Exercise – Chapter 4 – Finding mismatched headers and creating a conversion table

We will begin with a bonus exercise for Chapter 4. It is recommended that you complete Chapter 4 before you follow the steps (but you can also follow this bonus exercise without reading the book).

In Chapter 4 of my book you learn how to combine dispersed silos of information and resolve mismatched tables. To avoid the risks of appending mismatched tables we introduced the conversion table.

A figure from Collect, Combine, and Transform Data using Power Query in Excel and Power BI, Chapter 4
A figure from Collect, Combine, and Transform Data using Power Query in Excel and Power BI, Chapter 4

The conversion table helps you to combine mismatching tables by setting the rules to normalize the column names before the tables are appended.

The conversion table C04E04 – Conversion Table.xlsx, which is shown on the right table below, was used in four different techniques (Exercises 4-8) to help you convert the column names in the mismatching tables of Products folder. This conversion table includes the Source column with all the possible “bad” column names that need to be normalized, and their corresponding “good” column names in the Target table.

A figure from Collect, Combine, and Transform Data using Power Query in Excel and Power BI, Chapter 4
A figure from Collect, Combine, and Transform Data using Power Query in Excel and Power BI, Chapter 4

While Chapter 4 helped you to combine mismatched tables using the conversion table, we assumed in that chapter that you already have the conversion table at your disposal. In today’s bonus exercise, you will learn how to create such a table in Power Query without the need to manually find the mismatching headers and build the table. This technique will help you to automate the entire process, and avoid the need to manually update the conversion table whenever column names are renamed.

Ready to begin?

Our exercise starts with the Zip file which is available here. Download it and extract its content into the folder C:\Data\C04\C04E02 – Products

As shown in the diagram above, the folder contains four workbooks, each with a different category of AdventureWorks products:

  • Accessories
  • Bikes
  • Clothing
  • Components

The four tables are of the same formats, but the column names are not matching across the tables. To make things easier, let’s assume that the headers of Components table contain the preferred names you would like to use.

Your goal – To create a conversion table using Power Query which will include all the mismatching headers in the Source column, and all the desired headers of Components in the Target column.

Your secondary goal, which is very useful, is to find all the mismatching headers (i.e. headers which are not the same of Components) by their file name (By the way, as you can see in the following table, you can achieve your first goal by removing the first column of the table required by your secondary goal. The name of the first column can  provide you a clue for one of the key transformations we will use).

The Solution

To import the tables from the Products folder, open a blank Excel workbook, and select Get Data, From Folder.

In the Folder dialog box, enter C:\Data\C04\C04E02 – Products in Folder Path and click OK.

In the next dialog box, select Combine, Combine & Edit.

In the Combine Files dialog box, select Sheet1, and click OK.

The Power Query Editor will open. In Queries pane, select Transform Sample File from C04E02 – Products, and in Applied Steps delete the last step.

We will now keep the first row, which contains the headers. In Home tab, select Keep Rows, Keep Top Rows.

The Keep Top Rows dialog box will appear. Set 1 as Number of rows, and click OK. Now we can see how all the headers are appended together. In the Queries pane select C04E02 – Products query.

But let’s fix the error first. Delete Changed Type step in Applied Steps.

You can now see that all the product headers are appended in C04E02 – Products query. This is a good time to stop and think, how you would proceed from here.

Transpose and Unpivot

To transform our table into a conversion table, we would next need to transpose the table and see all the mismatching column names in each row. Each column will contain all the headers of the same original table. In Transform tab, select Transpose.

Let’s use the first row which contains the file names as our context, and promote it as headers. In Transform tab, select Use First Row as Headers.

Since our desired headers already exist in the Components table, we should now create a new table with all the Source/Target pairs. The Target values will be taken from C04E02 – Components.xlsx column, and the Source values will be taken from all the other columns. The trick here is to use Unpivot Other Columns. Right click on C04E02 – Components.xlsx column header, and select Unpivot Other Columns in the shortcut menu.

Rename the first column Target, and Value column Source.

Reorder the columns by dragging and dropping Target to the right.

You can now see that some rows contain matching headers. For example, the second row shows that Name is the value of both Source and Target – Meaning that Name is correctly used as the column name in the Bikes table. There is no point in keeping rows with matching Source/Target pairs. Let’s filter them out. In Add Column tab. select Conditional Column.

The Add Conditional Column will open. Follow these steps, as shown in the screenshot below:

  1. Select Source as Column Name.
  2. Select equals as Operator.
  3. Select Select a Column in the drop-down menu below Value.
  4. Select Target in the next drop-down menu.
  5. Enter 0 in the Output text box.
  6. Enter 1 in the Otherwise text box.
  7. Click OK.

Now, a new Custom column will appear with 1’s when Source and Target are not matched. Let’s keep only the 1’s. Select the filter control of Custom, and uncheck 0, then click OK.

You can now remove the Custom column.

Mission accomplished. This table contains the Source and Target pairs with the associated file name of the mismatching column name. To create the conversion table, you can remove the first column. In my solution file, I kept both versions.

Creating a Robust Query

To improve your solution, imagine the following scenario: What if the different product workbooks will be different? Will the query work? Let’s say you may want to create a generic solution from the queries you have created in this exercise, and allow you to select different folders, with different mismatching tables. In this part you will see how you can improve the queries to create a robust version. Chapter 10 covers these techniques in more details, and you can also explore these techniques in my Pitfalls series.

First, let’s delete the Changed Type step that is highlighted below. You can see that it refers to the actual file names. Since all the columns here are text, there is no value in keeping this step anyway.

Next, we will create a parameter for the index of the file with the preferred headers we want to use as our Target headers in the conversion table. Earlier we picked the Component table which was the 4th file in the Products folder. In Power Query and M, indices are always considered to be zero-based integers, so our new parameter will have the value of 3 ( = 4 – 1).

In Home tab, select Manage Parameters, New Parameter.

Now, in the Parameters dialog box, enter TargetIndex in the Name text box. Then, enter 3 in the Current Value and click OK.

We can now find all the references in the code to the component file, C04E02 – Components.xlsx and change it to dynamically return the same value using Table.ColumnNames and the index.

Select the Unpivoted Other Columns step in Applied Steps.

Change the highlighted section in the formula bar from:

“C04E02 – Components.xlsx”

to

Table.ColumnNames(#”Promoted Headers”){TargetIndex}

Select the Renamed Columns step in Applied Steps. Notice that the first column is renamed from C04E02 – Components.xlsx to Target.

Change the highlighted section in the formula bar from:

“C04E02 – Components.xlsx”

to

Table.ColumnNames(#”Unpivoted Other Columns”){0}

Now you refer to the first column name in the previous step. The zero index is used to refer to the first column.

You can also follow Chapter 8 to create a path parameter for the folder. You now have a robust solution that receives a folder of mismatched tables and the index of the file with the preferred headers as the input, and detect the mismatch headers in a folder, and create the conversion table which is necessary to combine all the mismatching tables.

The solution file, C04 Conversion Table – Solution.xlsx, is available in the subscriber folder (Link is included in DataChant subscription emails. To subscribe click here).

Hope you enjoyed this bonus exercise.

 

2 comments

  1. Roger Spire Reply

    wow

    Why so complicated? All this can be done in 1 step.

    Script CODE M

    let
        Source = Folder.Files (“C: \ Users \ AMPLIX \ Desktop \ C04E02 – Products”),
        # “Customization added” = Table.AddColumn (Source, “Customize”, each Excel.Workbook ([Content]))
        # “Customizing Expanded” = Table.ExpandTableColumn (# “Customization Added”, “Custom”, {“Name”, “Data”, “Item”, “Kind”, “Hidden”}, {“Name.1” Data “,” Item “,” Kind “,” Hidden “),
        # “Other Columns Removed” = Table.SelectColumns (# “Custom Expanded”, {“Data”, “Name”})
        # “Expanded Date” = Table.ExpandTableColumn (# “Other Columns Removed”, “Data”, “Column1”, “Column2”, “Column3”, “Column4”, “Column5”, “Column6”, “Column7” “Column8”, “Column9”, “Column1”, “Column2”, “Column3”, “Column4”, “Column5”, “Column6”, “Column7”, “Column8”, “Column9”
        # “Columns Renamed” = Table.RenameColumns (“Expanded Data”, {{“Column1”, “PRODUCT_NAME”}, {“Column2”, “PRODUCT_ID”}, {“Column3”, “PRODUCT_COLOR” “” PRODUCT_COST “}, {” Column5 “,” PRODUCT_PRICE “}, {” Column6 “,” PRODUCT_SIZE “}, {” Column7 “,” PRODUCT_WEIGHT “}, {” Column8 “,” PRODUCT_PARENTCATEGORY “}, {” Column9 “,” PRODUCT_CATEGORY “}, {” Name “,” PRODUCT_GROUP “}}),
        # “Columns Removed1” = Table.RemoveColumns (# “Renamed Columns”, {“PRODUCT_GROUP”})
        “StandardCost” and “PRODUCT_COST”, and “PRODUCT_COST”, and “PRODUCT_COST”. The following table lists the “StandardCost” and “PRODUCT_COST” > “Standard_Cost”)),
        # “PRODUCT_COLOR”, type “text”, {“PRODUCT_SIZE”, type “PRODUCT_COLOR”, {“PRODUCT_SIZE”, type ” text “, {” PRODUCT_WEIGHT “, type text}, {” PRODUCT_PARENTCATEGORY “, type text}, {” PRODUCT_CATEGORY “, type text}, {” PRODUCT_COST “, type number}, {” PRODUCT_PRICE “, type number}})
    in
        # “Type Changed”

    • Gil Raviv Post authorReply

      Hi Roger,
      You will see that the combined results are not showing all the values correctly (scroll down a bit to see null values). This is covered in detail in Chapter 4. To get all the appended results correctly, you need to normalize the headers before you append the tables.

      In Chapter 4 we cover several techniques to normalize the headers by using a conversion table. This bonus exercise shows you how you can build it. To fully understand why you need it, you can follow the chapter.

Leave a Reply