Back to school with Excel & Power Query
This is the second post in the series The Definitive Guide to Unpivot with Power Query in Excel. For today’s post I picked a topic which is quite actual for many of us. Our kids were recently back to school. By now their weekly schedule is probably stable, and we can use Excel to perform some data analytics on the schedule. Which teacher is the busiest? Does our kid learn enough Math vs. English? (If this topic sounds too simplistic for data analytics, don’t worry, you will see some non-trivial and very useful transformations in this post).
My last post (here) was the first in a series of blog posts that walk you through one of the coolest data transformation features in Excel – The Unpivot transformation. Powered by Power Query technology, the Unpivot transformation is available for everyone using the new Get & Transform section of the Data tab in Excel 2016, or as an Add-in in previous versions of Excel.
Let’s reiterate the challenge at hand – As an advanced Excel user, you prefer to analyze or create your reports with PivotTables and PivotCharts. Unfortunately, many of your colleagues are not so familiar with it, and prefer to share their static already-summarized data. You get their static tables, but cannot convert it into PivotTable or shape it to meet your analysis needs. The previous post, showed you a basic scenario to transform a simple table. Today we move one step further – and we will show you how to address three requirements:
- Your source data should stay as a range, and should not be converted into a table before you import it.
- Your source data includes merged cells, you should not lose it, or the information they present.
- Your source data includes multiline data on each cell. Splitting multiline cells is not trivial.
We will use a class schedule data which was originally taken from the public Office Templates here. The data shows a weekly class schedule with class and teacher names that are assigned to different hours of day (rows) and days of week (columns).
Our goal is to create PivotChart that will show the time allocated for each class and teacher.
If you are a moderate Power Query user, you may want to try resolving this challenge by yourself, before you proceed to the walkthrough. Download this workbook to try it yourself, or share it as a riddle to your colleagues.
Ready to start?
Download this workbook, and select the Class Schedule range (B4:G15). In the Name Box type the name ClassSchedule, and press Enter.
Note that on many scenarios, you could skip this step by selecting any cell and clicking From Table in the Get & Transform section. However since our data includes merged cells, we cannot convert the range into a table as we import the data into the Query Editor. The reason is that we must keep the range and its merged cells.
Keep the range selected, and in the Data tab, Get & Transform section, click From Table
Following the last step, you will see that the Query Editor automatically promoted the first row as headers, and changed all column types to text (excluding the first column which is converted to Decimal Number). Let’s change the first column’s type to Time.
Right click the column header Time, click Change Type, then click Time.
Now we will handle all the blank cells (which are blank because of the merged cells). Select the columns Mon, Tue, Wed, Thu, and Fri, then right click one of the headers, and in the context menu click Fill, and then click Down.
Now comes the un-pivoting part. Right click on the header of the first column, and click Unpivot Other Columns.
Let’s click on the filter icon of the column Value, and filter out the values that are equal to “none“. In the filter pane, uncheck “none”, and click OK.
Note: We assume that empty time slots in the schedule should contain the text “none“. Otherwise, all blank cells are merged cells and were treated as part of the Fill Down transformation above.
Next step is to rename the column Attribute to Hour of Day.
And now comes the trick that handles the multiline data of Classes and Teachers.
In Add Column tab, click Add Custom Column, and copy the following formula (without the equation mark) in the Custom column formula box:
= Record.FromList( Lines.FromText([Value]), {"Class", "Teacher"})
Click OK to close the Add Custom Column dialog.
Note: This is probably the most advanced step in this post. We used a formula that reads the multiline text, splits it into a list, and then transforms the list into a record of Class and a Teacher attributes.
Following the previous step, the column Custom is created. Click on the expand icon (which is highlighted in the screenshot below). Uncheck the box Use original column name as prefix, and click OK.
We now have two new columns: Class and Teacher. Things are getting closer to our desired format.
Let’s add another column now with the value 30 in each cell. This will help us to assign 30 minute numeric value for each row in our data, as each row reflects the 30 minute portion of a lesson in the original dataset.
In Add Custom tab, click Add Custom Column, then type 30 in the Custom column formula box, and click OK.
To ensure this data will be handled as a numeric column in Excel, we will change its type to Whole Number. Right click on the header of the column Custom, and click Change type, then click Whole Number.
Last two final touches:
Rename the column Custom to Duration (Minutes).
Delete the column Value, and in the Home tab click Close & Load.
That’s it. The transformed data is loaded into a new worksheet with the right format which will work with a PivotTable. The Duration (Minutes) column can be used to calculate the Sum of classes. You can find out which teacher is the busiest, or why so much time is invested on Excel classes 🙂
Download here the entire workbook with the solution, including the original data, the query we have created together, and the resulting PivotChart.
Wish you and your children a happy school year!
Bonus for advanced Power Query users
I added an advanced query to the workbook here. The advanced query groups the 30 minute rows for each class, so if a given class is 90 minutes long, you will get a single entry in the resulting table with the duration of 90 minutes, instead of 3 entries with 30 minutes each.
@Gil – The number of minutes calculated is incorrect – Look at Math – we have 5 periods of 30 min duration in a week so the total minutes should be 150 not 300
Also we can simply the transformation as below
Select cell A4:G15 and define it as a Table (with headers)- name it as “D”
Use the below M code to generate the 2 Column Data for your chart (Class and Minutes)
let
Source = Excel.CurrentWorkbook(){[Name=”D”]}[Content],
mFillDown = Table.FillDown(Source,{“Mon”, “Tue”, “Wed”, “Thu”, “Fri”}),
mUnPivotOth = Table.UnpivotOtherColumns(mFillDown, {“Time”}, “Day”, “Class”),
mSplitCol = Table.SplitColumn(mUnPivotOth,”Class”,Splitter.SplitTextByDelimiter(“Class”, QuoteStyle.Csv),{“Class”}),
mGrpRows = Table.Group(mSplitCol, {“Class”}, {{“Min”, each Table.RowCount(_)*15, type number}}),
mSort = Table.Sort(mGrpRows,{{“Min”, Order.Descending}})
in
mSort
How about going the other way. Starting with a student or teacher daily schedule like the last table you generated, and reformatting it to generate the “ClassShedule” table you started with. I know it will require a date/time table to create the empty time slots. But beyond that, I’m lost.