Update, March 26, 2020: Due to data changes in the datasets, I published a revised tutorial here.

With all the craziness in our lives due to coronavirus, the reliance today on facts is more important than ever. To help you stay safe I thought you would appreciate this tutorial to learn how you can build your own reliable Power BI report that can analyze the confirmed infections, recovery and death rates of COVID-19.

This tutorial will focus on the data prep steps using Power Query. You can download the original report file here.

Ready to start?

The Data Source

Johns Hopkins University published here a great interactive dashboard that analyzes the spread of COVID-19 by country.

If you would like to see a similar dashboard on Power BI, you can check out Paul Turley's blog here. The source data is based on the most reliable resources (e.g. CDC, Who) and is stored in a GitHub repository here as three separate CSV files. In this tutorial, you will learn how to combine and transform this data to build your own refreshable report on Power BI.

Go to https://github.com/CSSEGISandData/COVID-19 and select the folder case_covid_19_data (or click here).

Select the folder case_covid_19_time_series (or click here)

Select time_series_19-covid-Confirmed.csv (or click here).

Select the Raw button as shown in the screenshot below.

You will see comma-separated values displayed in your browser. Copy the URL address: https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv

Confirmed Cases

Open Power BI Desktop and in the Get Data drop-down select Web.

In the From Web dialog box, paste the URL into the URL text box, and click OK.

Click Transform Data on the next screen.

Rename the query Confirmed.

In Home tab, select Use First Row as Headers. This step will promote the first row. You should note that after this step you would need to harden the queries to be able to recognize new columns in the future (e.g. tomorrow when John Hopkins updates the CSV files). In the last section of this tutorial, you will learn how to do it.

You should note at this stage, that the 4 first columns identify the region/country, province/state and longitude and latitude, but the rest of the columns are all confirmed cases by date. This is not a good format for analysis in Power BI or Excel. We are going to change it.

Click and select the first 4 columns (using Ctrl key or Shift + Right cursor). Now Right-click on the header of one of the 4 columns and select Unpivot Other Columns in the shortcut menu.

Rename the last two columns Date and Confirmed.

Importing the Other Files

To bring the two other CSV files (Deaths and Recovered), follow the steps in this section.

Go again to the time series folder on Github here and click on the highlighted CSV file (time_series_19-covid-Deaths.csv)

Select the Raw button.

After you clicked on Raw, copy the URL address from the browser (in the same way you did it for the previous CSV file).

Back in Power Query Editor, right-click on the Confirmed query and select Duplicate in the shortcut menu.

Rename the new query Deaths. In Applied Steps, select the Source step and click the settings icon (the cogwheel).

In the Comma-Separated Values dialog box, paste the new URL that you previously pasted into the URL box, and click OK.

Go again to the time series folder on Github here and click on the last CSV of the recovered cases as shown below (time_series_19-covid-Recovered.csv). Select Raw, and copy the URL.

Back in Power Query Editor, right-click on the Confirmed query and select Duplicate in the shortcut menu.

Rename the new query Recovered. In Applied Steps, select the Source step and click the settings icon (the cogwheel).

In the Comma-Separated Values dialog box, paste the new URL that you previously pasted into the URL box, and click OK.

Combining the 3 Files

Now, that you have 3 queries Confirmed, Deaths and Recovered. It's time to bring them together. First, let's ensure we will not load the separate tables into the report. Right-click on each query and uncheck Enable Load.

Next, select Deaths query and rename the last column Deaths. Select Recovered query and rename the last column Recovered.

Go back to Confirmed query and in Home tab, select Merge Queries and then Merge Queries as New.

In the Merge dialog box, select Deaths in the second drop-down menu. Using the CTRL key, select the three columns Province/State, Country/Region, and Date. Make sure you select them in the right order. You will notice the little numbers 1,2,3 appearing in their headers.

Now, in the Deaths table of the Merge dialog box, select the same columns on the same order. Make sure you see the 1,2,3 numbers. Before you click OK, you may see the Privacy Levels dialog box.

If Privacy levels dialog box appear, select Public in the drop-down box as shown below and click Save. Then click OK on the Merge dialog box.

Expand the Deaths column by clicking on the expand button on the right side of the column header. In the pane that opens, unselect all fields and keep only Deaths selected. Click OK. This step will expand the number of death cases by the corresponding location and date of the confirmed cases.

Rename the column Deaths.1 as Deaths.

Now, let's merge the recovered cases. While you are still on the Merge1 query, click Merge Queries. Note: this time don't select Merge Queries as New.

In the Merge dialog box, select Recovered in the second drop-down menu. Using the CTRL key, select the three columns Province/State, Country/Region, and Date. Make sure you select them in the right order. You will notice the little numbers 1,2,3 appearing in their headers.

Now, in the Recovered table of the Merge dialog box, select the same columns on the same order. Make sure you see the 1,2,3 numbers and click OK.

Expand the Recovered column by clicking on the expand button on the right side of the column header. In the pane that opens, unselect all fields and keep only Recovered selected. Click OK. This step will expand the number of recovered cases by the corresponding country, province/state, and date of the confirmed cases.

Rename the column Recovered.1 as Recovered.

Rename the Merge1 query as COVID-19 Cases and click Close & Apply to load the data. You can now start building your visuals (which will not be covered in this tutorial).

Fixing the Queries to be Refreshable

If you will go back to your report tomorrow, you will notice that the refresh will not work. The queries will not load the daily new cases as a new column on the right-end.

But this issue can be easily fixed. In each one of the queries Confirmed, Deaths and Recovered, go to Applied Steps and select the Source step. Then, in the Formula Bar you can see the Columns section with an explicit static number of columns. (Note: If you don't have the Formula Bar visible, you can activate it in the View tab).

Luckily, if we remove Columns=xx from the formula, we will resolve the issue. For example, here if the formula for the Confirmed query before and after the change.

Before:

= Csv.Document(Web.Contents("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"),[Delimiter=",", Columns=58, Encoding=65001, QuoteStyle=QuoteStyle.None])

After:

= Csv.Document(Web.Contents("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"),[Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.None])

After you remove the Columns element from the formula on Confirmed, Deaths and Recovered you can also delete the two Change Type steps in Applied Steps, as shown below (This is recommended as a best-practice. Read more about it here, or in Chapter 10 of my book).

Make sure you repeat these steps for Confirmed, Deaths and Recovered.

Finally, go to Covid-19 Cases query and change the types of Confirmed, Deaths, and Recovered columns to Whole Number as shown in the screenshot.

That's it. You are ready to go. Check out the quick report I have built below from this data that also include population data (e.g. how many confirmed cases we have per 1M people in each country). I can now set up this report on Power BI service and enable the schedule refresh to follow the situation globally. You can download the pbix here and see the web version here.

March 19, 2020 Update: The report file here includes the following additions:

  1. Fixed a bug in the Daily Number of Cases.
  2. Included Population density information by country. You can now sort the countries by Confirmed Density, which is calculated by the number of cases divided by the area of the country (KM^2).
  3. Included a new page for animated visuals with a Play Axis to run the visuals by date.

Stay safe!!!