COVID-19: Build your own Power BI dashboard

Update, March 26, 2020: Due to data changes in the datasets, I published a revised tutorial here.

With all the craziness in our lives due to coronavirus, the reliance today on facts is more important than ever. To help you stay safe I thought you would appreciate this tutorial to learn how you can build your own reliable Power BI report that can analyze the confirmed infections, recovery and death rates of COVID-19.

This tutorial will focus on the data prep steps using Power Query. You can download the original report file here.

Ready to start?

The Data Source

Johns Hopkins University published here a great interactive dashboard that analyzes the spread of COVID-19 by country.

If you would like to see a similar dashboard on Power BI, you can check out Paul Turley’s blog here. The source data is based on the most reliable resources (e.g. CDC, Who) and is stored in a GitHub repository here as three separate CSV files. In this tutorial, you will learn how to combine and transform this data to build your own refreshable report on Power BI.

Go to https://github.com/CSSEGISandData/COVID-19 and select the folder case_covid_19_data (or click here).

Select the folder case_covid_19_time_series (or click here)

Select time_series_19-covid-Confirmed.csv (or click here).

Select the Raw button as shown in the screenshot below.

You will see comma-separated values displayed in your browser. Copy the URL address: https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv

Confirmed Cases

Open Power BI Desktop and in the Get Data drop-down select Web.

In the From Web dialog box, paste the URL into the URL text box, and click OK.

Click Transform Data on the next screen.

Rename the query Confirmed.

In Home tab, select Use First Row as Headers. This step will promote the first row. You should note that after this step you would need to harden the queries to be able to recognize new columns in the future (e.g. tomorrow when John Hopkins updates the CSV files). In the last section of this tutorial, you will learn how to do it.

You should note at this stage, that the 4 first columns identify the region/country, province/state and longitude and latitude, but the rest of the columns are all confirmed cases by date. This is not a good format for analysis in Power BI or Excel. We are going to change it.

Click and select the first 4 columns (using Ctrl key or Shift + Right cursor). Now Right-click on the header of one of the 4 columns and select Unpivot Other Columns in the shortcut menu.

Rename the last two columns Date and Confirmed.

Importing the Other Files

To bring the two other CSV files (Deaths and Recovered), follow the steps in this section.

Go again to the time series folder on Github here and click on the highlighted CSV file (time_series_19-covid-Deaths.csv)

Select the Raw button.

After you clicked on Raw, copy the URL address from the browser (in the same way you did it for the previous CSV file).

Back in Power Query Editor, right-click on the Confirmed query and select Duplicate in the shortcut menu.

Rename the new query Deaths. In Applied Steps, select the Source step and click the settings icon (the cogwheel).

In the Comma-Separated Values dialog box, paste the new URL that you previously pasted into the URL box, and click OK.

Go again to the time series folder on Github here and click on the last CSV of the recovered cases as shown below (time_series_19-covid-Recovered.csv). Select Raw, and copy the URL.

Back in Power Query Editor, right-click on the Confirmed query and select Duplicate in the shortcut menu.

Rename the new query Recovered. In Applied Steps, select the Source step and click the settings icon (the cogwheel).

In the Comma-Separated Values dialog box, paste the new URL that you previously pasted into the URL box, and click OK.

Combining the 3 Files

Now, that you have 3 queries Confirmed, Deaths and Recovered. It’s time to bring them together. First, let’s ensure we will not load the separate tables into the report. Right-click on each query and uncheck Enable Load.

Next, select Deaths query and rename the last column Deaths. Select Recovered query and rename the last column Recovered.

Go back to Confirmed query and in Home tab, select Merge Queries and then Merge Queries as New.

In the Merge dialog box, select Deaths in the second drop-down menu. Using the CTRL key, select the three columns Province/State, Country/Region, and Date. Make sure you select them in the right order. You will notice the little numbers 1,2,3 appearing in their headers.

Now, in the Deaths table of the Merge dialog box, select the same columns on the same order. Make sure you see the 1,2,3 numbers. Before you click OK, you may see the Privacy Levels dialog box.

If Privacy levels dialog box appear, select Public in the drop-down box as shown below and click Save. Then click OK on the Merge dialog box.

Expand the Deaths column by clicking on the expand button on the right side of the column header. In the pane that opens, unselect all fields and keep only Deaths selected. Click OK. This step will expand the number of death cases by the corresponding location and date of the confirmed cases.

Rename the column Deaths.1 as Deaths.

Now, let’s merge the recovered cases. While you are still on the Merge1 query, click Merge Queries. Note: this time don’t select Merge Queries as New.

In the Merge dialog box, select Recovered in the second drop-down menu. Using the CTRL key, select the three columns Province/State, Country/Region, and Date. Make sure you select them in the right order. You will notice the little numbers 1,2,3 appearing in their headers.

Now, in the Recovered table of the Merge dialog box, select the same columns on the same order. Make sure you see the 1,2,3 numbers and click OK.

Expand the Recovered column by clicking on the expand button on the right side of the column header. In the pane that opens, unselect all fields and keep only Recovered selected. Click OK. This step will expand the number of recovered cases by the corresponding country, province/state, and date of the confirmed cases.

Rename the column Recovered.1 as Recovered.

Rename the Merge1 query as COVID-19 Cases and click Close & Apply to load the data. You can now start building your visuals (which will not be covered in this tutorial).

Fixing the Queries to be Refreshable

If you will go back to your report tomorrow, you will notice that the refresh will not work. The queries will not load the daily new cases as a new column on the right-end.

But this issue can be easily fixed. In each one of the queries Confirmed, Deaths and Recovered, go to Applied Steps and select the Source step. Then, in the Formula Bar you can see the Columns section with an explicit static number of columns. (Note: If you don’t have the Formula Bar visible, you can activate it in the View tab).

Luckily, if we remove Columns=xx from the formula, we will resolve the issue. For example, here if the formula for the Confirmed query before and after the change.

Before:

= Csv.Document(Web.Contents("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"),[Delimiter=",", Columns=58, Encoding=65001, QuoteStyle=QuoteStyle.None])

After:

= Csv.Document(Web.Contents("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"),[Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.None])

After you remove the Columns element from the formula on Confirmed, Deaths and Recovered you can also delete the two Change Type steps in Applied Steps, as shown below (This is recommended as a best-practice. Read more about it here, or in Chapter 10 of my book).

Make sure you repeat these steps for Confirmed, Deaths and Recovered.

Finally, go to Covid-19 Cases query and change the types of Confirmed, Deaths, and Recovered columns to Whole Number as shown in the screenshot.

That’s it. You are ready to go. Check out the quick report I have built below from this data that also include population data (e.g. how many confirmed cases we have per 1M people in each country). I can now set up this report on Power BI service and enable the schedule refresh to follow the situation globally. You can download the pbix here and see the web version here.

March 19, 2020 Update: The report file here includes the following additions:

  1. Fixed a bug in the Daily Number of Cases.
  2. Included Population density information by country. You can now sort the countries by Confirmed Density, which is calculated by the number of cases divided by the area of the country (KM^2).
  3. Included a new page for animated visuals with a Play Axis to run the visuals by date.

Stay safe!!!

41 comments

    • Gil Raviv Post authorReply

      Yes. I included a table of population by country. So, you see how many cases per 1 million people in each country. This helps to sort the countries by the proportion of the infection by population.

  1. PAUL WINER Reply

    This is wonderful. Thanks!
    Shouldn’t we have re-enabled loads for the three tables in order to get the data?

    • Gil Raviv Post authorReply

      Thank you Paul. No need to re-enable the three tables. The parent table merges all datasets during the refresh. By disabling the load of these tables we ensure they will not be loaded as 3 tables in memory as redundant data and clutter the user experience.

  2. STEPHEN FARRELL Reply

    Thanks for this, I am retired now but like to try and stay current with Power BI. This is a great exercise for me to try and duplicate.

  3. excel2007master Reply

    Using Power Query in Office 365 E3, the Enable Load option does not appear. My workaround was to Close&Load at that point and right click each Query in turn and select Load To… I chose Only Create Connection from there. Everything after that worked as planned, thank you.

    • Gil Raviv Post authorReply

      You are right. Thank you for elaborating it for our Excel fans

  4. Garrett Reply

    Hi Gil, this is an incredible dashboard! Thank you for sharing.

    One issue I’m seeing is when viewing the table data, I’m seeing that the US has recovery cases, however; when I look at the table on the “Global” page, I’m seeing that the “Total Confirmed” equals “Total unrecovered” which means that none of the recovery cases are being picked up for the US on this particular table. I’m seeing the same thing on the table located on the “United States” page. Any idea why this is calculating like that?

    Thanks for any insight!
    -Garrett

    • Gil Raviv Post authorReply

      Thank you Garrett. Let me check…

      Quick update. The recovered cases have many inconsistencies. For some countries the data is aggregated for other countries like US the data is daily. In addition, for US there is a county-level data that made things buggy. I will update the pbix file to minimize those issues, but moving forward recovered cases will not be reported any more, according to Johns Hopkins https://github.com/CSSEGISandData/COVID-19/issues/1250

  5. suebayes Reply

    If using outside of the US, remember to change date type for date using locale. Brilliant thank you Gill.

    • Anonymous Reply

      Hi Suebayes, I will let our US friends to adapt the date format. The assumption is that other readers will want to view the situation in US.

  6. jwo1938 Reply

    Thank you so much for the great explanation and sharing of the code! As a Power Bi beginner I can certainly learn a lot from this. Would it be possible to add a column on the state display to show the rank of the state based on the column which has been selected as the sort field? I made an attempt to do this, but is currently beyond my abilities.

  7. Daniel Reply

    Hi Gil!

    Thanks for sharing this resource. I published my a report on the web following your steps and fix for queries, using a pro account.

    I’m not sure why the report is not refreshing daily on powerbi.com, after I setup the refresh. It can be refreshed manually on Power BI desktop.

    Any ideas?

    Thanks

  8. Ray Reply

    Thank you so much for this superb tutorial – I am absolutely blown away by the detail and clarity of the presentation you have provided.

    Even though I am retired I still try to improve upon my basic Excel related skills and this is a such a wonderful exercise for doing just that !

  9. Anonymous Reply

    Thanks Gil , great post .
    I notes today there is issue in the excel file update number. for example the number for italy yesterday is the same today .

    Thanks .

  10. Brendan Godden Reply

    hi, great tutorial… How did you get to report the number of new cases?

    • Gil Raviv Post authorReply

      Thank you Brendan. I did it with a little help of DAX. You can download the pbix file and review.

  11. STEPHEN FARRELL Reply

    It appears the data is not being updated daily, when I refresh the dashboard it has been the same for two days.

    • Gil Raviv Post authorReply

      The problem is that they removed US states… I will publish an updated version by end of the week – hopefully with states if Johns Hopkins update it.

  12. Ola.S Reply

    These sites are also good data-providers:
    Live info: https://www.worldometers.info/coronavirus/ — by far the best.
    Daily info: https://ourworldindata.org/coronavirus#healthcare-capacity — excellent charts/analysis

    +
    I’ve set a Constant line for 1.000/million (or 1 per 1.000) line.
    Since the Health care system is dimensioned for a few cases per 1.000.
    Though only 10-20% need intensive care (critical), so that’s the next x10 level (1 per 100).
    The saturation level is predicted be around 60% … that is 60 per 100 — dep. on the growthrate
    ++ the 4% mortality is based on having access to healthcare.
    ……

  13. Christopher Reply

    Excellent analysis, Gil. I’ve forwarded your report to others as an example of why data science is relevant.

    One topic in the news is the rate at which the number of cases in a country doubles. Is there an elegant way of showing how that number changes over time? I appreciate it is the slope of the number of cases over time. I would figure the solution is to find a date when the current quantity of cases was half its current number and then subtract that date from the current date. How could one show that in combination with other values?

  14. Bryan Collins Reply

    Thanks Gil, this was great and allowed me to learn a few new things. It was pretty easy to update for their new files. Once question regarding the data: I’m getting same as you so nothing wrong with the queries (i.e. 467k confirmed as at now). At first the numbers were inline with the John Hopkins dashboard but now after each update they are quite different (and more inline with numbers I am seeing elsewhere) – so 523k on the same date). Any idea why this is? For this reason I am apprehensive to share my reports any further. Appreciate your thoughts, many thanks

  15. Anonymous Reply

    Great…I am calculating the most Immunity strong country, where recovered is most in % over total infected.

  16. Michael Reply

    This is really great information! Thank you!

    How exactly do you configure the scheduled refresh for this data in the service online? Are you using a personal gateway to set it up? The scheduled refresh seems to be where I am having the most trouble.

    • Gil Raviv Post authorReply

      I am not using a gateway. The scheduled refresh works for me. What is the error you have?

  17. Zack Reply

    Hi Mr Gil,

    I’d love to thank you for sharing this great tutorial!
    I am a beginner in Power BI, but by following your instructions I’ve successfully built an awesome report and published it in my Workspace.
    My question is How can I add my country Covid-19 Table instead of US covid-19 table?

    • Gil Raviv Post authorReply

      What is your country? Can you share the public data source? Would love to share a quick article about it.

Leave a Reply