This is the third bonus exercise for my book, Collect, Combine, and Transform Data using Power Query in Excel and Power BI.
Today’s bonus exercise takes you to an advanced challenge after you complete Chapter 5. You are not required to go through Chapter 5 to follow the steps of this bonus exercise, but I cannot promise that my explanations here will be as detailed as you would need. To understand all the steps today you may want to complete Chapter 5 and Chapter 9 of my book, as we will write some M formulas.
In Exercise 5-4 you learn how to preserve context from titles in the worksheet grid after you combine multiple tables, as shown in this figure.
Your challenge in that exercise is to append AdventureWorks products from multiple worksheets while preserving the parent categories. Unfortunately, parent categories are not available as a worksheet names (Sheet1, Sheet2, Sheet3 and Sheet4 are meaningless).
To handle this challenge, you can take a post-append approach which is explained in detail in Chapter 5. Unlike the pre-append approach that can be used to preserve context before you combine the multiple tables (e.g. when you combine multiple tables from a folder or a CSV, as shown in the previous bonus exercise here) the post-append approach can be used on multiple worksheets – But it can be tricky, because some crucial information may be better handled before you append the tables.
In Exercise 5-5 you learn how to get context cues before you append the tables. As shown in the figure below, to preserve the parent categories from AdventureWorks product catalog, you need to extract the data from the small tables which are located above the product tables of each worksheet.
The challenge becomes interesting in Exercise 5-6, as the parent category header can be in any row above the product table. Irrelevant data may be added in rows before the parent category table, as shown here:
The Challenge
In this bonus exercise you will learn how to address a bigger challenge, and preserve the context from multiple worksheets. The parent category of AdventureWorks products is no longer located as the first column, as you have in Exercise 5-6 of the book. Now, you will need to locate both the column and the row of the “Parent Category” header, to extract the parent category values in each sheet. The row and column may be different in each worksheet, and will require to write M code to succeed.
Ready to begin?
You can download the source workbook C05E07 – Bonus.xlsx here. Save it in the folder C:\Data\C05. Don’t proceed to the next section yet. Try to resolve the challenge by yourself. Not sure how to proceed? Chapter 5 Exercise 5-6 provides you the necessary techniques to resolve it, and Chapter 9 will take you through the basics of the M syntax you may need.
The Solution
To import the product data, open a blank Excel workbook or a Power BI report.
In Excel: Select From Workbook in the Data tab.
In Power BI: Select Get Data, and in the Get Data dialog box, select Excel and select Connect.
In the Import Data dialog box, select the workbook C:\Data\C05\C05E07 – Bonus.xlsx and click Open. Then, in Navigator dialog box, right click on C05E07 – Bonus.xlsx, and select Edit.
The Power Query Editor will open. You can now keep only the Data column. The other columns are meaningless. Right click on the header of the Data column, and select Remove Other Columns in the shortcut menu.
At this stage, if you expand the Data column, you will find that the parent category values are in different columns, and it will be difficult to apply a post-append context preservation. For that reason, we will create cues that will help us to locate the row and column indices and correctly extract the parent categories.
The first cue we will need for each product table in the workbook is the row index. Creating the index for each table before we append all tables will help us to identify when the next table starts when we later append them. To create an index for each table, you cannot use the Index Column in the UI. But you can do it as a temporary step to learn how to write it in M. Make sure you delete the Index Column step.
We will use a custom column that will run at each table object in the Data column. In Add Column tab, select Custom Column. In the Custom Column dialog box, enter Table with Index in New column name. Then enter the formula below in Custom column formula, and click OK.
= Table.AddIndexColumn([Data], "Index", 0, 1)
We will use the Index column only later. It will help us to identify and filter out all the irrelevant rows. Our next task is to identify the column index of Parent Category. We will do it through a series of steps, starting with a function in M, Table.FindText which returns the first row that contains a specific text.
In Add Column tab, select Custom Column. In the Custom Column dialog box, enter Parent Category Headers in New column name. Then enter the formula below in Custom column formula, and click OK.
= Table.FindText([Table with Index], "Parent Category")
If you click on the blank space of the first Table object in Parent Category Headers column, as shown in the next screenshot, you can see the single row table with the value Parent Category in the third column.
Our next challenge to extract the original column index of Parent Category is to transform it into a list. There are several ways how to do it. We will focus on the one which I found easier to understand. We will transpose our one row table, and then convert the column into a list in the next two steps.
In Add Column tab, select Custom Column. In the Custom Column dialog box, enter Transposed. Then enter the formula below in Custom column formula, and click OK.
= Table.Transpose([Parent Category Headers])
If you click on the blank space of the first Table object in Transposed column, as shown in the next screenshot, you can see the single column table with the value Parent Category in the second row.
To convert the column into a list, we can create a new custom column as follows:
In Add Column tab, select Custom Column. In the Custom Column dialog box, enter As List in New column name. Then enter the formula below in Custom column formula, and click OK.
= [Transposed][Column1]
If you click on the blank space of the first Table object in As List column, as shown in the next screenshot, you can see the List with Parent Category as the second element.
To find the position of Parent Category in the list, we will use List.PositionOf.
In Add Column tab, select Custom Column. In the Custom Column dialog box, enter Column Index in New column name. Then enter the formula below in Custom column formula, and click OK.
= List.PositionOf([As List], "Parent Category")
We now have the column index (a zero-based index) in the last column. Our next challenge is to locate the row index, which will enable us later to extract the parent category value. In the next step, we will use the Column Index to dynamically retrieve the column. One approach to refer to the column by its location, is to take advantage of the default Column1, Column2, default naming convention. But I prefer to show you a more generic way.
In Add Column tab, select Custom Column. In the Custom Column dialog box, enter Column with Parent Category in New column name. Then enter the formula below in Custom column formula, and click OK.
= Table.Column( [Table with Index], Table.ColumnNames([Table with Index]){[Column Index]} )
Now we can find the row index of Parent Category, using
In Add Column tab, select Custom Column. In the Custom Column dialog box, enter Row Index in New column name. Then enter the formula below in Custom column formula, and click OK.
= List.PositionOf([Column with Parent Category], "Parent Category")
Now when we have the Row Index column we can easily locate the actual value of the parent category by incrementing Row Index by 1, as demonstrated in this screenshot. Click on the blank space of the first List object in Column with Parent Category column. You can see the List with Parent Category as the fifth item (index is 4), and Bikes as the next item.
Finally, we can extract the parent category value in each AdventureWork product table.
In Add Column tab, select Custom Column. In the Custom Column dialog box, enter Parent Category in New column name. Then enter the formula below in Custom column formula, and click OK.
= [Column with Parent Category]{[Row Index] + 1}
Our mission is almost complete. We were able to extract the parent categories. It’s time to identify the actual product tables and filter out all irrelevant rows. To identify the product rows, we can track the row index of the product headers. The first header is Name. Let’s find it.
In Add Column tab, select Custom Column. In the Custom Column dialog box, enter Row Index of Name in New column name. Then enter the formula below in Custom column formula, and click OK.
= List.PositionOf([Data][Column1], "Name")
We can now remove all columns except of Table with Index, Parent Category and Row Index of Name, and extract the tables in Table with Index column by clicking on the highlighted control on its header.
In the expand pane, keep all the columns selected, uncheck Use original column name as prefix and click OK.
To filter out the irrelevant rows we will add a conditional column that will compare between Index and Row Index with Name columns. We should only keep rows in which Index is greater than or equal to Row Index with Name.
The Add Conditional Column dialog box will open. Follow these steps, as shown in the screenshot below:
- Keep Custom in the New column name box.
- Select Index as Column Name.
- Select is greater than or… as Operator.
- Select Select a Column in the drop-down menu below Value.
- Select Row Index as Name in the second drop-down menu below Value.
- Enter 1 in Output text box.
- Enter 0 in the Otherwise text box.
- Click OK.
You can now apply a filter on Custom column, and keep only rows with the value of 1.
It’s time to remove the helper columns: Index, Row Index of Name and Custom. From here the steps should be familiar.
In Transform tab, select Use First Row as Headers.
Notice that the last column header is now Bikes. You can also see in the Applied Steps pane that a new Changed Types step was added.
You can now rename Bikes column to Parent Category. But watch out of the pitfalls! The last two steps are dangerous because they refer to Bikes as a column header, and Bikes will no longer be a header if the data changes (e.g. when your first worksheet will no longer contains bikes, but other parent categories).
To fix it, delete the last Changed Type step. Next, you need to remove the hardcoded “Bikes” from the M expression in the formula bar of Renamed Columns step, and replace it with a dynamic reference to the last column. Chapter 10 covers this technique in more details (you can also read about it in my Pitfalls series). To keep us focused on the challenge at hand, let’s just show you how to do it.
In the Formula bar, replace “Bikes” with the following code, as highlighted below.
List.Last(Table.ColumnNames(#"Promoted Headers"))
Scroll down in the preview pane, you will notice that we still have the inline headers.
To filter out the inline headers, click on the filter control of Name column, and select Text Filters, Does Not Equal. The Filter Rows dialog box will open. Enter “Name” (without double quotes) in the highlighted box in the first row, and click OK.
You can now change the type of StandardCost and ListPrice to Decimal Number, and load the query to your Excel or Power BI report.
The solution file, C05E07 – Bonus – Solution.xlsx, is available in the subscriber folder (The link is included in your DataChant subscription emails. To subscribe click here).
Hope you enjoyed this bonus exercise. Did you read my book? I hope you liked it. Please consider writing a review for my book on Amazon.
This made me think outside the box. When I read “The Ultimate Guide to DAX” I learned that creativity is also an important skill solving scenarios like this. Thanks for sharing. I’m looking forward buying the book.
Love these bonuses. Got the final result (which is good) but a few more steps than what you had. But a lot of this (at least for me) is still learning all the functions and the only way I can really do that is to “see what sticks”. But i can see how things build on each other. Much like excel and DAX need the theory down (or at least some what close) and then time “in the trenches”. Great stuff!
Other way for get Row Index: expand “index” from “Table with Index”