I am excited to share with you an excerpt from the draft manuscript of my book. You can pre-order the book in Amazon here. I expect to publish more excerpts in the coming months, so please stay tuned.
Appending Worksheets from a Workbook
Excerpted from the draft manuscript of Collect, Transform and Combine Data using Power BI and Power Query in Excel (Pearson Education, 2018).
One of the most common scenarios of working with Excel files, is the existence of workbooks that have multiple worksheets of the same format. Each worksheet represents an individual reporting period (e.g. month, quarter or year), or a specific division, or a product category (as you will see in our example), but the data across the different worksheets is the same.
Can we append all the worksheets from a workbook into a single table? Can we ensure that newly added worksheets will automatically be appended in the next refresh? Let’s learn how to do it on an AdventureWorks sample workbook (You can download it here).
In the sample workbook, we have AdventureWorks products divided into separate worksheets by their release year. Our goal is to append them, while keeping their release year context, and avoid further modification of our query, when new worksheets will be added in future years.
- Download and save the file C03E03 – Year per Worksheet.xlsx.
- Open a blank workbook in Excel, or Power BI Desktop report.
- In Excel Data tab. In Power BI Desktop in Home tab, click Get Data.
- In Excel, select From File, and then click From Workbook. In Power BI Desktop, select File, then click Excel, and click Connect.
- Select the file C03E03 – Year per Worksheet.xlsx, and click Import.
- In Navigator dialog box select C03E03 – Year per Worksheet.xlsx.
- Right click on C03E03 – Year per Worksheet.xlsx and select Edit in the context menu.
- The Query Editor will open. Rename the query as Products. In the main preview pane, you will see a table with each worksheet in a separate row. The actual content from each worksheet is encapsulated in Data column. Before you combine the worksheets, you can remove the columns Item, Kind and Hidden.
Tip If you have hidden worksheets, or specific worksheets with unrelated data, you can exclude them from the append step that will follow, this is a good time to apply the necessary filters. For example, you can apply a filter on Hidden column to keep only the rows with the value of FALSE, to exclude hidden worksheets.
- On the header of Data column, click on the expand button.
- The expand pane will open, and will show the columns you can expand. Click OK to expand all columns.
Once you expand Data column, the combined table will have meaningless header names (Column1, Column2, and so forth), and will contain both headers and rows from each worksheet.
Note If your product tables in each worksheet could consist of Table objects, in the preceding step you would be able to expand the actual product column names, instead of Column1, Column2, etc. (To convert ranges in Excel to Tables, select the range, or a single cell in the range, can click Table in Insert tab, or press Ctrl+T).
The following transformation flow diagram, outlines the three transformation steps that are required to clean the combined table. Throughout the book, we will use this type of diagram to depict the sequence of main transformations steps, and capture the rationale to solve advanced data preparation problems. Now, let’s describe the steps in more details.
- In Transform tab, click Use First Row as Headers.
- Click on the filter control in the left side of Name column. The filter pane will open. Type Name in the search box, and uncheck the value Name. Then click OK.
- Double click on the first column header of 2015, and rename it to Release Year.
That’s it. You can now load the combined table to your Excel or Power BI report. We have successfully combined the three worksheets by their release year, and managed to extract the release year of each product from the worksheet names. It’s time to test the scalability of our solution, and see if it will refresh correctly when a fourth worksheet is added with the products released in 2018:
- While the Excel workbook or Power BI report with the combined products is still open, open the source Excel file C03E03 – Year per Worksheet.xlsx, that was used in this exercise. Duplicate 2017 worksheet as a fourth worksheet, and rename it to 2018.
- Refresh Products query in your Excel or Power BI report, and ensure that the duplicated products are now appended with the values 2018 as their release date. It worked!
But what would happen if you would add a 2014 worksheet with historical data for products released in 2014, as the first worksheet? Try it.
Now, when the first worksheet is named 2014 instead of 2015, the refresh will fail, with this error: Expression.Error: The column ‘2015’ of the table wasn’t found.
- To fix this error, open Products query in Query Editor, and click on the step Changed Types in Applied Steps. In the formula bar, you will see the following formula:
= Table.TransformColumnTypes(#"Promoted Headers",{{"2015", Int64.Type}, {"Name", type text}, {"ProductNumber", type text}, {"Color", type text}, {"StandardCost", type any}, {"ListPrice", type any}, {"Size", type text}, {"Weight", type any}, {"ParentProductCategoryName", type text}, {"ProductCategoryName", type text}})
- Find the text “2015” in the preceding formula, and replace it with “2014”.
- Click on the step Renamed Columns in Applied Steps. In the formula bar, you will see the following formula:
= Table.RenameColumns(#"Filtered Rows",{{"2015", "Release Year"}})
- Find the text “2015” in the preceding formula, and replace it with “2014”.
- Close the Query Editor and refresh the report. The products released in 2014 will be combined correctly to your report.
A Robust Approach to Combine Multiple Worksheets
In the preceding example, we have combined three worksheets of AdventureWorks products by their release year. While we could append the three worksheets in a scalable way, and incorporate a combined table from the new worksheets, as they are added to the data source, we found out that the query will fail, when an unknown worksheet is added as the first worksheet.
In the preceding walkthrough you followed three steps (9-11) to combine our worksheets into a single table. In chapter 9, you will learn in more details how each one of these steps, could have been done differently to avoid potential errors. For now, let’s just briefly introduce the mistakes, and how to resolve them in our scenario.
- In Step 9, we promoted to first row as headers, and allowed Power Query to automatically change the column types for us. As a result, the year 2015, was explicitly referenced in the formula. We kept the formula as is, and as a result, exposed our query to future refresh failures. This mistake, will be referred in chapter 9 as the Changed Types Pitfall.
- In Step 11, we renamed the header 2015 to Release Year. This action triggered an explicit reference to column 2015 in the formula bar, and exposed our query to refresh failures, as observed when we added 2014 as the first worksheet. This mistake, will be referred in chapter 9 as the Rename Columns Pitfall.
Note In step 10, we could almost fell for another pitfall, the Search in Filters Pitfall, when we applied the filter through the search pane, instead of clicking Text Filters and then defining the filter condition explicitly. You will learn about this pitfall and the others in chapter 9.
To correct the two pitfalls above, let’s follow these steps:
- Open Products query in Query Editor, and delete the step Changed Types in Applied Steps. We will explicitly apply the correct types later.
- Click on the step Renamed Columns in Applied Steps. In the formula bar, you will see the following formula:
= Table.RenameColumns(#"Filtered Rows",{{"2015", "Release Year"}})
- Replace “2015” in the preceding formula, with the code Table.ColumnNames(#”Filtered Rows”){0}. The final formula line is as follows:
= Table.RenameColumns(#"Filtered Rows",{{Table.ColumnNames(#"Filtered Rows"){0}, "Release Year"}})
In chapter 8 will cover the M language in depth. You should feel completely comfortable at this stage if you don’t understand its syntax. The need to fully master M, is not required to resolve 99% of the data challenges you will ever face. However, since we modified the M formula here, let’s just briefly explain what we did:
In a nutshell, instead of telling Power Query “Please rename column 2015 to Release Year“, we changed the instructions to “Please rename the first column to Release Year“.
How did we refer to the first column? By using the function Table.ColumnNames. The latter function returns the list of column names of the table it receives as an argument. In our case, the table is #”Filtered Rows”, which is named by the preceding step, in which we had filtered our table.
Finally, to instruct Power Query to refer to the first element in the column names, we can call the first item in the list by mentioning the zero index inside the curly brackets. This method in M allows you to access specific elements in a list, or a specific row in a table. The index in M starts at zero, instead of one. So, the index of the first element is 0, and the n’th element is n-1.
In Chapter 9, we will use the function Table.ColumnNames to avoid referring to columns by their hardcoded values, and avoid refresh errors, when column names change.
Let’s conclude the creation of our robust query by changing the column types:
Change the types of the following columns as follows:
- Release Year column to Whole Number.
- StandardCost column to Decimal Number.
- ListPrice column to Decimal Number.
You can now close the Query Editor and refresh the report. The products will be combined correctly, no matter which year is placed as the first worksheet. Mission accomplished.
Download the solution here.
Stay tuned for the next book excerpt – Combining mismatched tables.
Looking forward to your book Gil, like the format you have demonstrated thus far.
Good article! Thanks!
At the end of step 12. you forgot to mention that we have to save the file save the file, didn’t you?
Great catch Piotr. That is correct.
Hi Gil… I have pre-ordered your book! This has been an issue for several of my PQ tables. (I solved by NOT promoting headers and just rename “Column1”, “Column2”, etc. to the names in Row 1. This work, but if new columns are added, or the order of the columns change, things tend to fall apart. Looking forward to more insightful tips and tricks in your book.
Thank you John for pre-ordering. Hope you find the next book excerpt useful as well.