In today’s post, there is a nice Power Query trick to transform multiline record data into a table in Excel. I intend to use this trick quite soon on my next two posts as part of the Excel For Security Analysts series (So stay tuned for some magic on SSH and IPTables firewall logs).
We have many cases where our raw data is formatted in multiline attribute-value pairs. The first column contains the attributes, and the second column contains the values. Analyzing the data with such format is quite challenging.
In this tutorial you will learn step-by-step how to transform multiline records into a table. I will use Excel 2016, that incorporates Power Query technology, and exposes its functionality through the Get & Transform group in the Data ribbon. If you don’t have Excel 2016, you can use Excel 2013 or 2010, with the Power Query Add-in. You can also use Power BI Desktop to achieve the same transformations.
In our example, we have a dataset of employees with attributes of Employee ID, Name, Gender, Birthday and Salary. Each employee is represented in a range of 2 columns on 1-5 rows, as shown in the screenshot above.
We assume that the attribute Employee ID is a mandatory attribute that will always appear as the first attribute on each subset of rows per employee. All the other attributes are optional.
Obviously, we will better work with this data if we transform it into a table in which Employee ID, Name, and any other employee’s attributes are represented on separated columns.
As you can guess by now, with Power Query you can transform this data into the desired table. To skip this tutorial and see the results, download the workbook here.
Shall we begin?
Assuming you have a range in your workbook with the following data
Employee ID | 1010111 |
Employee Name | John |
Gender | Male |
Birthday | 32909 |
Employee ID | 2321321 |
Employee Name | Alice |
Gender | Female |
Birthday | 23778 |
Salary | 1100 |
Employee ID | 324324 |
Employee Name | Bob |
Gender | Male |
Birthday | 27430 |
Employee ID | 654322 |
Employee Name | Samantha |
Birthday | 33274 |
Salary | 1200 |
Employee ID | 543522 |
Employee Name | Patrick |
Gender | Male |
Birthday | 32544 |
Salary | 1000 |
Select a cell in the range.
In the Data ribbon, click From Table which is located in the Get & Transform group.
The Create Table dialog will appear. Uncheck My table has headers, and click OK.
In the Query Editor dialog will appear as shown in the following screenshot:
Click Add Column tab, and click Add Index Column.
A new column will be created with the name Index (If you use non EN-US locale, the name will be different). This column will contain a range of numbers from 0 to N-1 (where N is the last row in your raw data). We will use this column to have a unique number for each Employee.
Staying in the Add Column tab, click Add Custom Column.
In the Add Custom Column dialog, enter the following formula, and then click OK:
= if [Column1] = “Employee ID” then [Index] else null
Now we have a unique index number on every row that has Employee ID.
Right click on the column header Custom, click Fill, and then click Down.
Let’s delete column Index (By selecting the column and enter the Delete key).
We are now ready to stage our next dramatic transformation:
In Transform tab, click Pivot Column.
The Pivot Column dialog will appear. In Values Column select Column2, then expand Advanced options and select Don’t Aggregate.
Say “Abracadabra” and click OK.
In the Query Editor you will see that we finally have the data in the desired tabular format. Few more minor adjustments and we are ready to go.
Delete the column Custom (Dear column, you helped us a lot, but we don’t need you anymore. Goodbye).
Change the type of Birthday to Date/Time, and change the type of Salary to Decimal Number.
In Home tab, click Close & Load, select Table and your preferred worksheet destination, and then click Load.
That’s it. You can download this workbook to see the steps we have made together.
Stay tuned for the next posts. I promise we will use this technique quite soon.