COVID-19 & Time Intelligence in Power BI

In my last tutorial here, you learned how to import COVID-19 confirmed and death cases from Johns Hopkins dataset to build your own Power BI dashboard using Power Query. In this tutorial, we will go quickly through the DAX measures that will help you gain better analytics capabilities using Power BI time intelligence.

COVID-19 Power BI Dashboard

To follow this tutorial, I recommend that first complete Part 1 here. You can skip it and start with this report as the baseline.

Challenge #1 – Cumulative Amounts

Johns Hopkins dataset is provided as cumulative numbers. While it is easy to create line charts that will display the cumulative number of confirmed cases and deaths, it is challenging to apply simple visuals like A Card that will show the latest total confirmed cases or a line chart with the daily number of confirmed cases by date.

Let’s demonstrate the challenge. Open your Power BI report and drag and drop the Confirmed column into the canvas. Then change the visual into a Card visual.

You will find that the number of confirmed cases is too high.

Why is the number incorrect? Because when you drag and drop the Confirmed column, Power BI applies a simple SUM operation on all the values in the Confirmed column. Since these amounts are cumulative, you end up with a much higher amount. To be more clear, let’s say that the cumulative confirmed cases are represented in this sequence 1,2,4,8,16 (e.g. on day 1 we have one case. On day 5 we have 16 cases). By dragging and dropping the Confirmed column into the Card you would have 31 cases (1+2+4+8+16=31) instead of 16.

For a quick & dirty fix, you can follow these steps:

Drag and drop the Date column into the Filters on this visual. Select Top N as Filter type. Enter 1 as Show items next to the Top drop-down menu. Drag and drop Date column again. This time into By value and set it as Latest Date.

Dragging and dropping amounts to visuals is a bad practice in Power BI. In our case, we would need a smarter way to represent the actual confirmed cases by date. Ideally, if Johns Hopkins would report the data by the actual daily cases, instead of cumulative numbers, your report would be simpler.

So, how would you address this challenge in a better and more scalable way without changing the data source? The answer is DAX.

Total Confirmed = 
VAR sum_confirmed =
SUMX (
    VALUES( 'Covid-19 Cases'[Country/Region] ),
    SUMX (
        VALUES ('Covid-19 Cases'[Province/State] ),
        CALCULATE ( MAX ( 'Covid-19 Cases'[Confirmed] ) )
    )
)
RETURN
IF (sum_confirmed = 0, BLANK(), sum_confirmed )

In the Modeling tab, click New measure. Make sure Home table is set to Covid-19 Cases and enter the DAX expression above.

Now you can create a new Card visual and drag and drop the Total Confirmed measure. The latest confirmed cases were calculated correctly without the need to set the filter in the visual.

The measure above iterates over all the countries and for each country it iterates over all provinces/states (Note: US states were excluded from this dataset, so for United States, you would only have a single blank value as a state). For each province/state we find the maximal value of confirmed cases and sum up all the maximal values by country and province/state.

Here is an alternative way you can implement Total Confirmed. This time by applying an explicit filter on the latest date and summing up all the amounts for the last date in Covid-19 Cases table:

Total Confirmed = 
VAR latest_date = MAX ( 'Covid-19 Cases'[Date])
RETURN
CALCULATE (
    SUM ( 'Covid-19 Cases'[Confirmed] ),
    FILTER (
        'Covid-19 Cases',
        'Covid-19 Cases'[Date] = latest_date
    )
)

But, I find the second option risky as it assumes that all countries and provinces/states will be reported on the same date. If for some reason, the cases of one country are missing on the latest date, the confirmed cases of the previous day will not be included in the calculations.

Note: The first measure can be improved by moving the countries and provinces/states into a separate dimension table.

Challenge 2: Daily Confirmed Cases

Now let’s move to the next challenge. How can we show the daily confirmed cases over time? When our data is cumulative we would need to build a measure that subtracts the confirmed cases amount on each day by the amount on the previous day. To do it right, it’s time to include a calendar table in our model.

In Power BI Desktop, Modeling tab, select New table and enter the following expression in the formula bar.

Calendar = CALENDAR (
    MIN ( 'Covid-19 Cases'[Date] ),
    MAX ( 'Covid-19 Cases'[Date] )
)

The new table includes a unique date in each row, starting from the earliest date in Covid-19 Cases table and ending with the latest. Having a calendar table is the first step for efficient time intelligence in Power BI. Once you create this table, you would need to create a relationship between its Date column and the date column in your other fact tables (in our case – our fact table is the Covid-19 Cases).

In the Model view (highlighted in the left pane below), click Date in Calendar and drag and drop it to Date in Covid-19 Cases.

You can now create a line chart. Select the Line chart visual. Drag and drop Date column to Axis in the Visualizations pane. Drag and drop Total Confirmed to Values.

You can see in the Axis section that the Date column turned into a hierarchy of Year, Quarter, Month and Day. To drill down in all levels to see the confirmed cases by date click on the two-pronged devil’s fork.

Another option to see the line chart only by dates is to switch the Date from Date Hierarchy to Date as shown in this screenshot.

Now that we have a trend line, we would like to create a new measure for the daily confirmed cases by date as illustrated here:

In the Modeling tab, click New measure. Make sure Home table is set to Covid-19 Cases and enter the following DAX expression:

Daily New Cases = 
VAR current_day = [Total Confirmed]
VAR prev_day = 
CALCULATE (
    [Total Confirmed],
    DATEADD ( 'Calendar'[Date], -1, DAY )
)
RETURN
IF ( ISBLANK ( prev_day ), BLANK(), current_day - prev_day )

The DAX expression above calculates the daily new cases by summing up the total cumulative confirmed cases subtracted by the total cumulative cases from the previous day. The previous day calculation is done using DATEADD that offsets our calendar table by one day to filter our cases to show only the previous date.

The ISBLANK function above is used to skip the edge case of countries that start reporting the cumulative amount with a number that is higher than their daily new confirmed cases. In DAX, subtracting a number by BLANK() will result in the original number. Since we want to ignore the first confirmed cases that can be too high, we apply that logic.

Challenge #3: Estimated Active Cases

Monitoring the spread of the Coronavirus by measuring only the confirmed and death cases can be depressing. Let’s think positively and take a look at the recovered cases. In my original tutorial, we imported the recovered cases CSV file. Since then, Johns Hopkins removed the file and later on put it back, stating that the recovery numbers are not consistently reliable for all countries. I thought it would be nice to skip this dataset altogether and create our own measure for estimated recovered cases.

Let’s make an assumption that it takes an average of X days to recover from the virus. So, on any given day, the number of recovered cases equal to the number of confirmed cases we had X days ago minus the number of new deaths in the passing period.

To calculate the estimated recovered cases, we first need to calculate the total deaths. As you recall from the challenge above, a simple SUM will not work on cumulative amounts. Here is the measure you would need (similarly to the Total Confirmed measure).

Total Deaths = 
VAR sum_deaths =
SUMX (
    VALUES( 'Covid-19 Cases'[Country/Region] ),
    SUMX (
        VALUES ('Covid-19 Cases'[Province/State] ),
        CALCULATE ( MAX ( 'Covid-19 Cases'[Deaths] ) )
    )
)
RETURN
IF (sum_deaths = 0, BLANK(), sum_deaths )

Now, let’s create a measure that will store the X days to recover. For now, let’s assume it takes 17 days to recover. We can change this number later.

Create a new measure with the following expression:

Estimated Days to Recover = 17

Now create the following measure for the estimated recovered cases:

Estimated Recovered = 
VAR prev_period =
DATEADD (
    'Calendar'[Date],
    - [Estimated Days to Recover],
    DAY
)
RETURN
CALCULATE (
    [Total Confirmed] + [Total Deaths],
    prev_period
) - [Total Deaths]

Now, you can create a new measure for the estimated active cases by subtracting the total confirmed cases by the estimated reovered cases.

Estimated Active Cases = [Total Confirmed] - [Estimated Recovered]

From here, you can create line charts that will show you the estimated active cases over time. See how China shows the bell-shaped curve, while the other countries on April 3rd are still in exponential growth for their active cases.

To turn the static estimated 17 days recovery time into a dynamic parameter that you can change in a slicer, you can create a What-If Parameter that will be described in my next tutorial, including a measure that can align all countries to the same timeline of days since the virus started as shown here:

Hope you find this tutorial useful. Stay tuned to learn how I built this report. Let’s flatten the curve!

29 comments

  1. Tiago Marciano Reply

    Hi, I’m Tiago and I live in Brazil, thanks for sharing this dashboard about the covid-19 pandemic.
    How to calculate metrics ‘Confirmed per km ^ 2’ and ‘Confirmed per 1M people’ ?
    Tks!

  2. Rahul Reply

    Thanks For Sharing, I Need Excel Data, Could you Pls Help
    Thank
    Rahul From India

  3. Nate Reply

    This is great…what visual are you using for the Bar Graph with the play button?

  4. bryan collins Reply

    Great, thanks again Gil. I’m struggling with the projection.
    First, I gather the measure ‘Estimated Recovered = 17’ should be ‘Recovery in Days = 17’.
    Estimated recovered works fine, but since the ‘total confirmed’ measure returns blanks for future dates, the estimated active cases turns negative for future periods.

  5. Gil Raviv Post authorReply

    Hi Bryan,
    You are correct. I fixed the measure Recovery in Days = 17. Did you apply the CALENDAR function using the MIN and MAX? Did you include future days in your model? If so, you can fixed the Total Confirmed and Total Deaths following this measure:

    Total Deaths =
    VAR current_Date = MAX ( ‘Calendar'[Date] )
    VAR sum_deaths =
    CALCULATE (
    SUMX (
    VALUES( ‘Covid-19 Cases'[Country/Region] ),
    SUMX (
    VALUES (‘Covid-19 Cases'[Province/State] ),
    CALCULATE ( MAX ( ‘Covid-19 Cases'[Deaths] ) )
    )
    ),
    FILTER (
    ALL ( ‘Calendar'[Date] ),
    ‘Calendar'[Date] <= current_Date ) ) RETURN IF (sum_deaths = 0, BLANK(), sum_deaths )

  6. Scott Reply

    Is this report published anywhere? The earlier link doesn’t work anymore.

  7. Blandine Marcelin Reply

    Hello Gil,

    Thank you so much for spending the time explaining us how to analyse this data! I came across an annoying problem that my rather limited knowledge and extended searches can’t work out.
    My locale is set in the UK- the date formats from the CSV are US format. When formatting the date columns as date, all is converted in the UK date format.. which of course completely skews the timelines as we have cases in the future now :-(. The generated calendar table is of course UK. I tried to change all locals within PowerBI. I tried to do some manipulations on dates but to no avail.. Do i need to split the imported dates column first in 3 columns and re-concatenate everything into a new columns or can you think for something a bit smarter for me to do?

    thank you.

  8. Róbert Ďurec Reply

    Hi.
    I have created COVID-19 Power BI template as a gift to Power BI community. I put to the report a lot of my Power BI know-how and created working COVID-19 API.

    I have created own API for COVID-19 updated every our (Better than standalone JHU CSSE Github API). Also I improve the Github JHU CSSE mess about country names and missing country codes. The countries, regions, continents and population data are also added.

    Here is the API: https://coronadatasource.org/open-data/corona-data/corona-time-series-dataset/

    And source code with working API
    https://coronadatasource.org/open-reports/general/cds-covid-19-sandbox-open-report-for-download/

    Would you like to give me a feedback for report or create small article about it?

    Thank you,
    Róbert Ďurec

  9. NKS Reply

    Hi,
    Sorry to bother you.
    I’m trying to improve my DAX and get my head around filter contexts and had a question about one of your formulas and I was hoping you could set me straight.

    Total Confirmed =
    VAR latest_date = MAX ( ‘Covid-19 Cases'[Date])
    RETURN
    CALCULATE (
    SUM ( ‘Covid-19 Cases'[Confirmed] ),
    FILTER (
    ‘Covid-19 Cases’,
    ‘Covid-19 Cases'[Date] = latest_date
    )
    )

    Why did you use a variable here instead of just declaring MAX as the boolean on the last line?
    i.e. ‘Covid-19 Cases'[Date] = MAX ( ‘Covid-19 Cases'[Date])

    ‘Covid-19 Cases'[Date] would be in the row context of filter as an iterator, where as MAX ( ‘Covid-19 Cases'[Date]) would be in the original filter context because it would be part of the filter context passed by the calculate. function.

    • Gil Raviv Post authorReply

      You are right. I could do it. But VAR simplifies so many scenarios that are more complex, so I got used to applying VARs for intermediate calculations.

  10. hisashikarazu Reply

    Thanks for the reply and the blog. It is very useful to think through these things and very useful for me to see how an expert thinks about DAX.
    One more if I may…
    “Daily New Cases =
    VAR current_day = [Total Confirmed]
    VAR prev_day =
    CALCULATE (
    [Total Confirmed],
    DATEADD ( ‘Calendar'[Date], -1, DAY )
    )
    RETURN
    IF ( ISBLANK ( prev_day ), BLANK(), current_day – prev_day )”

    Doesn’t this mean that the first date a country has a case, the formula will fail to report it? I mean, shouldn’t it say ‘current_day’ where the BLANK() is in the return statement.

    Thanks for all your help.

    • Gil Raviv Post authorReply

      You are absolutely right. I did it on purpose. Assuming that the first day a country report is not really the first day they got the virus. By ignoring the first reported day, I avoid having the initial high number of cases that will skew the results.

  11. hisashikarazu Reply

    Hi,
    I’m from the UK and I ran into this issue. You don’t actually have to change your global settings.
    At the top of the query editor where it has the little ‘ABC’ icon you can click and select ‘using locale’ to transform that column only without changing the rest of the file.
    It produces some interesting M code that makes it clear how to change the locale flexibly:
    = Table.TransformColumnTypes(Source, {{“Date”, type datetime}}, “en-US”)
    That language code at the end if Microsoft standard if you’re used to T-SQL or similar.
    Hope that helps.

  12. Ireti Reply

    Hi Gil,

    Thanks a lot for the step by step walk through, really helpful.
    Quick one, I noticed the two formulas for Total Confirmed give separate results
    E.g As at 15th of April (max date in dataset 14th of April)
    France = 137887 (Sum formula using SUMX and VALUES
    France = 131361 (Sum formula using FILTER and MAX DATE)

    I think this is because of the Province/State values, can you confirm the correct value?

    Also, what is the best non-manual way to assign the Countries to Continents ?

  13. Vergilio Sogabe Reply

    Thanks you for sharing!
    Waiting to learn about how to unify the time line.

  14. Katharina Reply

    Thanks for sharing, this is super helpful!
    I was wondering, how would you calculate the maximum of active cases to display in a tooltip? I’ve been struggling with this, since Max can’t be used on a measure and everything I’ve come with so far has been completely off or would be displaying the right figure in the tooltip, but changing with the date (i.e. the maximum by date rather than over the entire period).

  15. Tony Maclaren Reply

    Katharina This should do the trick:

    Max Daily New Cases :=
    CALCULATE (
    [Daily New Cases],
    FILTER ( ‘Covid-19 Cases’,
    MAX ( ‘Covid-19 Cases'[Date] )
    )
    )

  16. Tony Maclaren Reply

    Max Daily New Cases :=
    CALCULATE (
    [Daily New Cases],
    FILTER ( ‘Covid-19 Cases’,
    MAX ( ‘Covid-19 Cases'[Date] )
    )
    )

  17. Landon Reply

    Gil, this is great. When you say that the ‘first measure’ can be improved if using dimensions, how?

Leave a Reply