COVID-19: Build your own Power BI dashboard – Revised

Note: Due to changes in the COVID-19 dataset that was published by Johns Hopkins (described here), I revised the original tutorial. Keep reading if you want to learn how to build your own Power BI Dashboard to analyze the spread of COVID-19 with the revised dataset.

Here are the main changes:

  • Recovered cases are no longer provided.
  • There are two new CSV files for confirmed and death cases.
  • US States were removed from all data sources.
  • If you follow my original tutorial, you will not find the deprecated files.

Update – March 29, 2020: Due to wrong confirmed cases for Guyana between 17-23 March, I added a correction step in the report. You can download the latest version here (Thank you, Mike Carlo, for finding this issue).

The Revised Tutorial

This tutorial will focus on the data prep steps using Power Query. You can download the report file here. The online report is available here.

The Data Source

Johns Hopkins University published here a great interactive dashboard that analyzes the spread of COVID-19 by country. The source data is based on the most reliable resources (e.g. CDC, Who) and is stored in a GitHub repository here as two separate CSV files (Confirmed cases and Deaths). In this tutorial, you will learn how to combine and transform this data to build your own refreshable report on Power BI.

Go to https://github.com/CSSEGISandData/COVID-19 and select the folder case_covid_19_data (or click here).

Select the folder case_covid_19_time_series (or click here)

Select time_series_covid19_confirmed_global.csv (or click here). Select the Raw button as shown in the screenshot below (Note: This screenshot is outdated, but highlights the location of the Raw button).

You will see comma-separated values displayed in your browser. Copy the URL address: https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv

Confirmed Cases

Open Power BI Desktop and in the Get Data drop-down select Web.

In the From Web dialog box, paste the URL https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv into the URL text box, and click OK.

Click Transform Data on the next screen.

Rename the query Confirmed.

In Home tab, select Use First Row as Headers. This step will promote the first row. You should note that after this step you would need to harden the queries to be able to recognize new columns in the future (e.g. tomorrow when John Hopkins updates the CSV files). In the last section of this tutorial, you will learn how to do it.

You should note at this stage, that the 4 first columns identify the region/country, province/state and longitude and latitude, but the rest of the columns are all confirmed cases by date. This is not a good format for analysis in Power BI or Excel. We are going to change it.

Click and select the first 4 columns (using Ctrl key or Shift + Right cursor). Now Right-click on the header of one of the 4 columns and select Unpivot Other Columns in the shortcut menu.

Rename the last two columns Date and Confirmed.

Importing the second CSV

To bring the second CSV files (Death cases), go back to the time series folder on Github here and click on the CSV file time_series_covid19_deaths_global.csv

Select the Raw button.

After you clicked on Raw, copy the URL address from the browser (in the same way you did it for the previous CSV file).

Back in Power Query Editor, right-click on the Confirmed query and select Duplicate in the shortcut menu.

Rename the new query Deaths. In Applied Steps, select the Source step and click the settings icon (the cogwheel).

In the Comma-Separated Values dialog box, paste the new URL that you previously pasted into the URL box, and click OK.

Combining the Files

Now, that you have 2 queries Confirmed and Deaths. It’s time to bring them together. First, let’s ensure we will not load the separate tables into the report. Right-click on each query and uncheck Enable Load.

Next, select Deaths query and rename the last column Deaths.

Go back to Confirmed query and in Home tab, select Merge Queries and then Merge Queries as New.

In the Merge dialog box, select Deaths in the second drop-down menu. Using the CTRL key, select the three columns Province/State, Country/Region, and Date. Make sure you select them in the right order. You will notice the little numbers 1,2,3 appearing in their headers.

Now, in the Deaths table of the Merge dialog box, select the same columns on the same order. Make sure you see the 1,2,3 numbers. Before you click OK, you may see the Privacy Levels dialog box.

If Privacy levels dialog box appear, select Public in the drop-down box as shown below and click Save. Then click OK on the Merge dialog box.

Expand the Deaths column by clicking on the expand button on the right side of the column header. In the pane that opens, unselect all fields and keep only Deaths selected. Click OK. This step will expand the number of death cases by the corresponding location and date of the confirmed cases.

Rename the column Deaths.1 as Deaths.

Rename the Merge1 query as COVID-19 Cases and click Close & Apply to load the data. You can now start building your visuals (which will not be covered in this tutorial).

Fixing the Queries – Make It Refreshable

If you will go back to your report tomorrow, you will notice that the refresh will not work. The queries will not load the daily new cases as a new column on the right-end.

But this issue can be easily fixed. In each one of the queries Confirmed and Deaths, go to Applied Steps and select the Source step. Then, in the Formula Bar you can see the Columns section with an explicit static number of columns. (Note: If you don’t have the Formula Bar visible, you can activate it in the View tab).

Luckily, if we remove Columns=xx from the formula, we will resolve the issue. For example, here if the formula for the Confirmed query before and after the change.

Before:

= Csv.Document(Web.Contents("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"),[Delimiter=",", Columns=58, Encoding=65001, QuoteStyle=QuoteStyle.None])

After:

= Csv.Document(Web.Contents("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"),[Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.None])

After you remove the Columns element from the formula on Confirmed and Deaths you can also delete the two Change Type steps in Applied Steps, as shown below (This is recommended as a best-practice. Read more about it here, or in Chapter 10 of my book).

Make sure you repeat these steps for Confirmed and Deaths.

Finally, go to Covid-19 Cases query and change the types of Confirmed and Deaths columns to Whole Number as shown in the screenshot.

That’s it. You are ready to go. Click Close & Apply and start building Check out the quick report I have built below.

This tutorial will focus on the data prep steps using Power Query. You can download the report file here. The online report is available here.

Stay safe!!!

26 comments

  1. Steven Doherty Reply

    Hi Gil, a Global Recovered file was released yesterday. So, all three are now available again.

    • Gil Raviv Post authorReply

      Hi Steven,
      I will wait a bit to see if the file is maintained before I change this tutorial.

  2. Peta Reply

    Hi Gil

    Thanks so much for the post.
    I have a newbie problem – changing Date data type & format to dd/mm/yy:
    When I changed to date it results in “error” populating column.
    So I tried a new column from example, but that didn’t work either.
    Do you have any tips or links on how to transform date?
    Stay well.

    • Gil Raviv Post authorReply

      You don’t need to change the type. Keep it as date. The format can be defined by your locale. You can configure it in File -> Options. You can also change the format of the date columns in the ribbon (Not in Power Query).

  3. Anonymous Reply

    What is the Visual Name to get country details in separate table format when we move cursor on country.

  4. Freddy Ramirez Reply

    Hi,
    My date format is dd/mm/yy, cause of regional settings, so Power bi do not recognize as Date. how can i reorder this so Power BI identify it as Date.
    Thanks

  5. Anonymous Reply

    HI,
    Can I get a video/ tutorial on how to develop this same covid 19 dashboard

    • Gil Raviv Post authorReply

      I didn’t prepare one. But I will present it in Power BI User Groups later this month (Chicago and Minneapolis).

  6. john Reply

    Impressive stuff Gil

    The online report i is not available at the moment?!

    John

  7. Ryan Reply

    Hello Gil,
    Would you be making a tutorial using the US data to build a dashboard?

    • Gil Raviv Post authorReply

      Did you try and find obstacles? There are many similarities between the datasets. I wasn’t sure I will write about it.

  8. Ryan Reply

    I used the time_series_covid_19_confirmed_US.csv & time_series_covid_19_deaths_US.csv. I followed the steps, the obstacle I run into is after I duplicate the “Confirmed” query, I replace the source with the deaths_US. csv file and I get an issue with 8/24/20 column missing. I past that by deleting the Changed Type step. Once I unpivot the other columns, I noticed another column is present in the deaths_US source labeled as “Population”. I’m confused as to what I should do with the population column. Thanks in advance.

  9. Anonymous Reply

    I left a review on your AppSource, how would I go about getting the pbix file for the app? Thanks

Leave a Reply