This is the second bonus exercise for my book, Collect, Combine, and Transform Data using Power Query in Excel and Power BI.
In Chapter 5 you learn how to preserve context from titles in the worksheet grid after you combine multiple tables, as illustrated here:
In this bonus exercise you will learn how to address a similar challenge, and preserve the context from a CSV file.
In this text file, all of AdventureWorks products are stored as comma-separated values. The products are grouped by parent product category, and each group is divided by the line Title: X followed by the headers. For example: The first group of products who are associated to Bikes is represented by the leading line Title: Bikes. Your challenge: Import all the products from the CSV file, and preserve their parent category, as illustrated in the figure above.
Ready to begin?
The exercise starts with the CSV file which is available here. Download it and save it in the folder C:\Data\C05. Don’t proceed to the next section yet. Try to resolve it yourself. Chapter 5 provides you the necessary techniques to easily resolve it.
To import the product data, open a blank Excel workbook or Power BI report.
In Excel: Select From Text/CSV in the Data tab.
In Power BI: Select Get Data, and in the Get Data dialog box, select Text/CSV and select Connect.
In the Import Data dialog box, select the CSV file C:\Data\C05\C05E04 – Bonus.csv and click Open. Then, in the next dialog box, select Transform Data or Edit. (Newer versions of Excel contain the term Transform Data, instead of Edit).
The Power Query Editor will open. You can see that the first cell contains Title: Bikes. Can you find a way to copy it into a new column, and fill all the cells for the products which are bikes? Can you also endure that other parent categories will be handled in the same manner?
In Add Column tab, select Conditional Column.
The Add Conditional Column dialog box will open. Follow these steps, as shown in the screenshot below:
- Enter Parent Category in the New column name box.
- Select Column1 as Column Name.
- Select begins with as Operator.
- Enter Title: in the text box below Value.
- Select Select a Column in the drop-down menu below Output.
- Select Column1 in the next drop-down menu.
- Enter null in the Otherwise text box.
- Click OK.
Now, it’s time to fill down the values. Right click on the Parent Category column, and select Fill, Down in the shortcut menu.
You can now scroll down in the preview pane, and see how the context is beautifully preserved thanks to the Fill Down transformation.
It’s time to get rid of the “Title: ” prefix. Right click on Parent Category header, and select Replace Values in the shortcut menu.
In the Replace Values dialog box, enter “Title: ” (without the double quotes, and with a trailing space) in the Value To Find box. Keep the Replace With box empty and click OK.
To remove the title lines, select the filter control of Column1 and select Text Filters, Does Not Begin With..
The Filter Rows dialog box will open. Enter “Title:” (without double quotes) in the highlighted box below, and click OK.
In Transform tab, select Use First Row as Headers to promote the headers from the first row.
Notice that the last column header is now Bikes. You can also see in the Applied Steps pane that a new Changed Types step was added.
You can now rename Bikes column to Parent Category. But watch out of the pitfalls! The last two steps are dangerous. In the future, if the CSV file includes a new parent category before the bikes products, the last two steps will fail to refresh, because they refer to Bikes as a column header, and Bikes will no longer be a header when the data changes. You will have a different parent category value.
To fix it, delete the last Changed Type step. Next, you need to remove the hardcoded “Bikes” from the M expression in the formula bar of Renamed Columns step, and replace it with a dynamic reference to the last column. Chapter 10 covers this technique in more details (you can also read about it in my Pitfalls series). To keep us focused on the challenge at hand, let’s just show you how to do it.
In the Formula bar, replace “Bikes” with the following code, as highlighted below.
Our job is not complete. If you scroll down in the preview pane, you will notice that we still have the inline headers.
To filter out the inline headers, click on the filter control of Name column, and select Text Filters, Does Not Equal.
The Filter Rows dialog box will open. Enter “Name” (without double quotes) in the highlighted box in the first row, and click OK.
You can now change the type of StandardCost to Decimal Number, and load the query to your Excel or Power BI report.
This technique is very useful in handling text files with complex data structures. You were able to import all of AdventureWorks products while preserving their parent categories which were embedded as separate lines in the text file.
The solution file, C05E04 – Bonus – Solution.xlsx, is available in the subscriber folder (The link is included in your DataChant subscription emails. To subscribe click here).
Hope you enjoyed this bonus exercise. Did you read my book? I hope you liked it. Please consider writing a review for my book on Amazon.