Find overlapping time periods using #PowerQuery – Part 2

In my last blog post here, I introduced the overlapping time periods challenge and shared the M queries for the solution. In today’s post, we will go through the solution in details.

The Challenge

Your input data is a table with start and end dates (including time). Can you find all the overlapping time periods in the table? To make this challenge more specific, imagine you are a project manager responsible for tracking the progress of a mission-critical project in your organization. You have a list of resources that are assigned to tasks. Your job is to find the over-allocated resources that are assigned to multiple tasks with overlapping time periods (as demonstrated in the figure below).

Find overlapping time periods using Power Query
Can you find the overlapping time periods and the over-allocated resources?

In the sample data below, you can find 5 tasks allocated to 3 resources. The figure above highlights the overlapping time periods. While it is easy to visually detect the overlaps in a Gantt chart on a small dataset, it is much more interesting to assume that the number of tasks and resources can be significant high. Can you automate the detection of overlaps in Power BI or Excel? In this blog post we’ll describe how to do it.

TaskResourceStartEnd
Task 1Resource 17/2/2019 9:00:00 AM7/2/2019 5:00:00 PM
Task 2Resource 27/2/2019 11:00:00 AM7/2/2019 9:00:00 PM
Task 3Resource 27/2/2019 2:00:00 PM7/2/2019 8:00:00 PM
Task 4Resource 37/2/2019 4:00:00 PM7/2/2019 11:00:00 PM
Task 5Resource 37/2/2019 1:00:00 PM7/2/2019 7:00:00 PM

The Solution – Step by Step

Import the table above and name the query Tasks. To do it you can copy and paste the code below into a blank query in Power Query:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkkszlYwVNJRCkotzi8tSk4Fc8z1jfSNDAwtFSytDAyASMHRF1nUFCoa4KsUqwM1wwjZDCNk1YaGWA2xxGKIMU5DjBCqkUQtsJhhgmyGMbJqE6xmwJ2HbIgpTkMMsRpijmRGLAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Task = _t, Resource = _t, Start = _t, End = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Task", type text}, {"Resource", type text}, {"Start", type datetime}, {"End", type datetime}})
in
    #"Changed Type"

While Power Query Editor is open, right click on the Tasks query in Queries pane and select Reference in the shortcut menu.

Rename the new query Overlapping Tasks.

The Cartesian Product

Now we are going to use a Cartesian Product technique, which I describe in details in Chapter 11 of my book. The Cartesian Product enables you to combine two datasets in order to apply a calculation or filtering on each combination of the paired records. By matching each task against all the other tasks in the table, we will be able to find the overlapping time periods.

To compare between each row in the Overlapping Tasks table, we will create a new column that will include the table itself as an object in each row of the original table. Then, we’ll expand the table objects in the new column and generate a table with all the permutations of pairs.

In Add Column tab, select Custom Column. In the Custom Column dialog box, enter Source2 as New column name, enter the formula:

= Source

Then click OK.

Note: This technique may have slow performance if you load the original source table form an external data source. To solve this issue, we will use Table.Buffer. But let’s wait with it for now to keep the solution simple and clear.

Now, when we have the table objects in Source2, it is time to expand these tables. Click the expand column control on the right side of Source2 column header. In the expand pane, make sure that all the columns are selected. Check Use original column name as prefix and click OK.

You can see in the preview pane that Task 1 was duplicated five times, and is now paired against Task 1, Task 2, Task 3, Task 4 and Task 5 which are shown in the Source2.Task column. So, in total we have 25 rows instead of the original 5 as we apply the cartesian product.

Before we keep only the pairs that we would like to match, let’s make sure that we have the correct column types. Scrolling right in the table, you will notice that the types of columns Source2.Start and Source2.End are not set (You can see that the column headers have the ABC123 icon). Change the column types of these columns to Date/Time.

Filtering Different Tasks of the Same Resource

Since our task is to compare overlapping periods for the same resource, we would need to keep in the table only rows in which the task in Task column is not equal to the task in Source2.Task and the resource in Resource column is equal to the resource in Source2.Resource. To do it, we will start with a dummy filtering step.

Click the filter control on the column header of Task. Select Task1 in the filter pane and click OK. We will ignore the logic. We just need to generate a filtering step.

Make sure your Formula Bar is shown (You can enable it in the View tab). You will see the following formula:

= Table.SelectRows(#"Changed Type", each ([Task] = "Task 1"))

Copy and paste the formula below to keep only different tasks that are assigned to the same resource:

= Table.SelectRows(#"Changed Type", each ([Task] <> [Source2.Task] and [Resource] = [Source2.Resource]))

Bu replacing the formula, you can see that we changed the condition of the filter from the dummy logic:

[Task] = “Task 1”

to the correct logic:

[Task] <> [Source2.Task] and [Resource] = [Source2.Resource]

We used the not equal operator “<>” and “and” to get filter our cartesian product output into tasks that are assigned to same resource. You can see that we are now left with 4 rows. Task 2 matched with Task 3 as the first row. Next, Task 3 matched with Task 2 (the reversed instance of the previous row). Then, Task 4 matched with Task 5 and finally the reversed instance as the last row.

Calculating Overlapping Duration

Now it is time to calculate the overlapping time period between each of the tasks and to filter out rows without overlap.

To set the stage for the overlap calculation we will create two custom functions MinDate and MaxDate that find the minimal and maximal dates of two given Date/Time values.

In Home tab, select New Source drop-down menu, and select Blank Query.

Name the new query as MinDate. and copy and paste this formula into the Formula Bar:

= (date1, date2)=>
    if date1 < date2 then date1 else date2

Now go back to Home tab, New Source and select Blank Query. Name the new query MaxDate. Copy and paste this formula into the Formula Bar:

(date1, date2)=>
    if date1 >= date2 then date1 else date2

In Queries pane, select the Overlapping Tasks query and in Add Column tab select Custom Column. In the Custom Column dialog box follow these steps:

  1. Enter Overlap Duration in the New column name box.
  2. Enter the following formula below and click OK.

Here is code you can copy and paste into the Custom column formula box:

Duration.TotalHours(
        MinDate([End], [Source2.End]) -
        MaxDate([Start], [Source2.Start])
)

This code uses Duration.TotalHours to return the total hours as a decimal number of the overlapping duration. The main calculation in the formula is done by subtracting the minimal end date with the maximal start date of the two tasks, as shown here:

MinDate([End], [Source2.End]) -
MaxDate([Start], [Source2.Start])

Subtracting two Date/Time values in M returns a value whose type is duration. In our case, if the duration is positive, we have an overlap, but if the duration is negative, we don’t. You can now see that all the rows have a positive duration value because in our sample data, all our tasks that has the same resource are overlapping.

To test our Overlapping Duration custom column with tasks that are not overlapping let’s add a 6th task for Resource 3 that has no overlapping period. To do it, go to the Tasks query in Queries pane and select the settings cog wheel icon in the Source step of Applied Steps pane. Then enter the following row in Create Table dialog box.

Move back to Overlapping Tasks query and see that four rows with negative Overlapping Durations values were added. This can prove you that our formula worked. We just need to keep rows whose duration is positive.

Select the filter control of Overlapping Duration column and in the filter pane select Number Filters, then select Great Than…

In the Filter Rows dialog box, enter 0 as shown in the screenshot below and click OK.

You can now create two new columns for the overlapping period start and end dates. This can be done in multiple ways. Since we already have the MinDate and MaxDate functions, let’s do it using Invoke Custom Function.

In Add Column tab, select Invoke Custom Function. In the Invoke Custom Function dialog box, enter Overlap Start as New column name. Next, in the Function query drop-down menu select MaxDate.

Make sure that under date1 drop-down menu you choose Column Name. Select Start in the drop-down menu which is next to date1.

Under date2 drop-down menu select Column Name. Select Source2.Start in the drop-down menu which is next to date2 and click OK.

In Add Column tab, select Invoke Custom Function. In the Invoke Custom Function dialog box, enter Overlap End as New column name. Next, in the Function query drop-down menu select MinDate.

Make sure that under date1 drop-down menu you choose Column Name. Select End in the drop-down menu which is next to date1.

Under date2 drop-down menu select Column Name. Select Source2.End in the drop-down menu which is next to date2 and click OK.

We are almost done. We can now remove the unnecessary columns and keep only Task, Overlapping Duration, Overlap Start and Overlap End. and changed the types of Overlapping Duration, Overlap Start and Overlap End to the corresponding types: Decimal Number, Date/Time and Date/Time.

Performance Improvement

The Cartesian Product that we applied earlier may be tricky when you load the tasks table from an external data source – especially if you have many tasks. You can find the explanation in Chapter 11 of my book. To fix this issue you can use the Table.Buffer function (If you have enough memory to store the entire table, otherwise don’t follow this improvement).

Open the Advanced Editor in Overlapping Tasks and locate the following lines:

Source = Tasks,
#"Added Custom" = Table.AddColumn(Source, "Source2", each Source),

Modify the two lines into three as follows:

Source = Tasks,
BufferedSource = Table.Buffer(Source),
#"Added Custom" = Table.AddColumn(Source, "Source2", each BufferedSource),

Visualizing overlapping tasks

In this PBIX file, you can find the Power Query solution and a simple visualization of the overlapping tasks. The report loads the tasks into an as-Timeline custom visual and shows the over-allocated resource on a separate table with their corresponding tasks and the duration of the time overlaps.

When you click on one of the tasks in the as Timeline visual, you can see which tasks are overlapping. For example, in this screenshot, I clicked on Task 4 which is assigned to Resource 3 and found out that this resource is over allocated as it is also assigned to Task 5 with an overlap of 3 hours, starting from 7/2/2019 7PM.

Do you have similar challenges that are not covered by the solution above? please share in the comments below.

18 comments

  1. Oyekunle SOPEJU Reply

    Hi Gil,

    Thanks for the post.
    May i suggest below solution using mainly PQ User Interface(UI). No functions, No Cartesian, No Buffering, etc.

    Simply group the Source table by Resource column, filter for multiple task for a Resource , obtain “OverlappingHours” by subtraction then remove column and expand table.

    The Code

    let
    Source = Excel.CurrentWorkbook(){[Name=”Table1″]}[Content],

    GroupRows = Table.Group(Source, {“Resource”},
    {
    {“Tbl”, each _, type table},
    {“MaxStartDate”, each List.Max([Start]), type datetime},
    {“MinEndDate”, each List.Min([End]), type datetime},
    {“RowCount”, each Table.RowCount(_), type number}
    } ),

    FilterRows = Table.SelectRows(GroupRows, each ([RowCount] > 1 )),

    AddCol = Table.AddColumn(FilterRows, “OverlappingHours”, each [MinEndDate] – [MaxStartDate] ),

    RemoveCols1 = Table.RemoveColumns(AddCol,{“MaxStartDate”, “MinEndDate”, “RowCount”}),

    XpandTbl = Table.ExpandTableColumn(RemoveCols1, “Tbl”, {“End”, “Start”, “Task”}, {“End”, “Start”, “Task”})

    in
    XpandTbl

    • Gil Raviv Post authorReply

      Thank you Oyekunle for sharing your code. Could you please check how your query handles edge cases? For example: When I add a 6th Task for Resource 3 that starts and ends before all the other tasks the results are wrong.
      Try this Tasks query as a source:

      let
      Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText(“i45WCkkszlYwVNJRCkotzi8tSk4Fc8z1jfSNDAwtFSytDAyASMHRF1nUFCoa4KsUqwM1wwjZDCNk1YaGWA2xxGKIMU5DjBCqkUQtsJhhgmyGMbJqE6xmwJ2HbIgpTkMMsRpijsUMM3QzDPUNwarBTIhGpdhYAA==”, BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Task = _t, Resource = _t, Start = _t, End = _t]),
      #”Changed Type” = Table.TransformColumnTypes(Source,{{“Task”, type text}, {“Resource”, type text}, {“Start”, type datetime}, {“End”, type datetime}})
      in
      #”Changed Type”

  2. Diego Reply

    Thank you very much Gil. Very well explained!
    You don’t know how much this solution helped me.

  3. Evan Bush Reply

    Thank you Gil. This was very helpful. I particularly liked the Cartesian solution. With respect to M, I am like the 2nd grader when asked how much 4 2’s are adds 4 2′ rather than multiplying 4 by 2. In my case I am not working with dates or duration but overlapping segments and knowing the start and end points this will give me exactly what I need. When I followed your steps, I received a data type error with respect to date1 and date2. I had to use two conditional columns to get the correct answer. But those are two columns are the answers that I want for my situation.

  4. Brandon Parra Reply

    Hi Gil,

    How can I get the total time that a resource have been working? I mean, I need to calculate all time worked considering one time the overlaped time ranges without duplicating time?

    I relly need that.

  5. Lindsey Tran (Belliveau) Reply

    I keep getting the error:
    “OLE DB or ODBC error: [Expression.Error] We cannot convert the value null to type Logical..”

    Any ideas?
    Thank you and muh appreciated!
    Lindsey

    • Gil Raviv Post authorReply

      Do you apply a filter or add a conditional column? The logical expression may have a null value that converts the expression itself to null. Can you send me more details by email? My address is gilra@datachant.com

  6. Lindsey Tran (Belliveau) Reply

    Hi Gil,
    Thank you for the quick reply! Turns out it was an issue with the input Sharepoint – someone forgot to enter a finish date for one entry! Once I got to that it was smooth sailing 🙂
    Thanks again Gil!
    Lindsey

  7. Ryan Stewart Reply

    It looks to me like your solution should work for me, however I have hit a wall with the time it takes to load the data.
    I’ve trimmed up my data as much as possible, and when I close and apply my query, it takes a long time to load and apply the changes. (It’s still working on it, and this is my 4th try after trying to speed things up.)

    I’m only loading 2 rows a second, and my data is 100,00s of rows long. Any advice on reducing the processing time of this method?

  8. PowerQueryR Reply

    Hi Gil Raviv,

    Currently I am trying your solution, but I keep getting this error:

    Expression.Error: The field ‘Source2.Task’ of the record wasn’t found.
    Details:
    Task=Task 1
    Resource=Resource 1
    Start=7-2-2019 09:00:00
    End=7-2-2019 17:00:00

    Do you know how to solve this? My knowledge regarding m is very little compared to DAX.

    Thanks in advance

  9. Anonymous Reply

    I just ran across your post. It gave me the clues I needed. I created a new query based on a table and added a cartesian query with a custom column; expanded to all columns.

    it is much easier to just compare the two sets of dates in a custom column like this:

    [start data] [custom.start date]

    No need for custom functions.

  10. Anony Reply

    This works unless you need ANYTHING more from the solution. additional tasks or any other expension will be bad. also fixing ur problems in powerquery is just bad practice. fun until the dataset gets close to a few million per table.

  11. Edyta Reply

    Is visualization ‘as Timeline’ still available somewhere under a different name?

Leave a Reply