Build COVID-19 CDS entity in Power Platform dataflows

In today’s tutorial, we will build an organizational dataflow that will automatically import COVID-19 stats like confirmed cases, tests, and hospitalization by US state into a CDS entity that can serve as a starting point for building a return to work app on Power Platform.

This time we are leaving the boundaries of our known Power BI planet (that was covered in the series) and enter into the solar system of Power Platform. As you will find out, space travel in Power Platform is easier when you have the Power Query knowledge at your disposal.

This article can be relevant to you, if you are –

  1. A Power Query practitioner who typically works with Power BI and Excel to connect to data sources, combine them together, and reshape the data to meet your analytics needs in Power BI or Excel. If you meet this condition, you will learn how your bet on Power Query can pay off by taking your data-wrangling skills into a new domain – Power Platform dataflows.
  2. A Power Apps developer who looks for ways to boost your impact and tackle challenging datasets in your app. This tutorial can motivate you to learn how to harness Power Query to tap into enterprise data sources and reshape data to meet your business needs in Power Apps or Power Automate.

Low-Code ETL

Power BI developers already use Power Query and its game-changing low-code/no-code ETL to load data into their models in Power BI. But with Power Query inside Power Platform dataflows, you can now use the same technology to collect, combine and transform data and load it into CDS (Common Data Service) and serve it in your low-code application.

In this tutorial, you will get a glimpse into the capabilities of Power Platform dataflow. Read more about it here.

The Use Case

Imagine you work in a company that is affected by COVID-19 and have offices in US. Your boss asked you to create an app that will apply your company return-to-work policy according to real COVID-19 test and hospitalization data. You decided to use the low-code Power Apps to build the solution, but you are not sure how to incorporate and audomate the COVID-19 data into this solution.

The Data

In this tutorial, we will tap into the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. The repository includes a folder of daily CSV files that aggregates US states by crucial stats like confirmed cases, deaths, tests, and hospitalizations. Read about the available fields here.

Click here to view the files.

In the next part of this tutprial, you will learn how to import a single CSV file into Power Platform dataflow. Then, in the following part, you will learn how to reuse the logic to bring all the files. Once we complete this tutorial we will have an automated pipeline to combine all the data together on a daily schedule with (almost) no line of code.

Select one of the csv file and click on it (for example here) and then click Raw.

Copy the URL and save it. We will use it shortly.

Import a single CSV

Our starting point in building the automated pipeline for COVID-19 US data will be Power Apps. Click https://make.powerapps.com/.

Expand the Data section on the left sidebar and click Dataflows (Don’t be confused with Flows).

In the Dataflows area, click New dataflow and then enter COVID-19 Data in the Name box and click Create.

In Power Query – Choose data source, go through all the different options you have. Cherish this moment. You will be back here soon after this tutorial to try it with your data sources, so it is important you will see all the different options you have to connect to your business data.

Select Text/CSV.

Copy the URL you got in the previous section (e.g. https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports_us/10-12-2020.csv), and paste it into File path or URL. Then click Next.

In the next screen you will see a preview if the data. Here you can define the delimiter for the CSV file. By default, Power Query will select Comma, so, you are good to go. Click Transform Data.

In Power Query – Edit queries dialog box, rename the query in the Query settings, Name box. Enter the new name: US COVID-19 by Date.

Before we can bring the data in, we would like to turn this query into a function that will be able to load the data by the different file names. Our first step in this direction will be to create a new parameter that represents the the filename and its undelying date.

Click Manage parameters and in the drop-down menu select New parameter.

In Manage parameters dialog box, enter PageDate in Name. Select Text in Type and enter the same date as you have in the selected CSV in the format MM-DD-YYYY. For example: 10-11-2020. Then, click OK.

Now you have the parameter PageDate. Let’s hook it into the US COVID-19 by Date query. In Queries pane, select US COVID-19 by Date, and click Advanced editor in Home ribbon.

You will notice the highlighted date in the code. Change it to ” & PageDate & “.

Here is the section of the code before the change:

"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports_us/10-12-2020.csv"

And here is the section of the code after the change:

"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports_us/" & PageDate & ".csv"

As you can see, we simply concatenated the PageDate parameter to replace the hard-coded filename. Now you can click OK to close the Advanced editor. Make sure you still see the data in the main preview pane. If you were wrong in the previous step, it will show an error or no data.

Now, when the parameter was inserted in the code, you can convert the main query into a function, so we will be able to invoke it on many files. In Queries pane, right click US COVID-19 by Date and select Create function in the shortcut menu.

In Create function dialog box, enter FnLoadDataByDate in Function name and click OK.

You can now see that we have a new items in Queries pane with a function icon: FnLoadDataByDate. The original US COVID-19 by Date query is now attached to this function. Whenever we want to change the function, we can edit the query US COVID-19 by Date and the changes will be propgated into the function. This is the beauty of low-code.

Right click on US COVID-19 by Date and uncheck Enable load in the shortcut menu. We disable the load of this query as we are looking for a way to bring all files, and not only a single file.

Bring ’em all

In this section, we will create a dynamic list of all the CSV files we shoul.d import, and invoke the function to load the CSV files and combine them together into a single entity in CDS.

Right click on the blank space of Queries and select New query. Then select Blank query.

Copy the code below into the main pane:

let
    Source =
        List.Dates(
            #date(2020,4,12),
            Duration.TotalDays(
              Date.From(DateTime.LocalNow()) - #date(2020,4,12)
            ),
            #duration(1, 0, 0, 0)
      )
in
  Source

The M code above creates a list of consecutive dates. We used a similar code here. Let’s explain the code.

List.Dates function in M accepts three arguments. The starting date, the total number of items in the list, and the increment. In our scenario, 12 April 2020 is the date of the first file we have in the data repository. The second argument of List.Dates function is the total number of items we want to generate in the list. To get the number of days from 12 April to today. First, we get the current time using DateTime.LocalNow(). Next, we subtract the first date from the today using the minus operator. In M, the result of subtracting two dates is a duration object. To convert the duration into a number of days that will represent the number of iterations, we apply Duration.TotalDays.

The last argument is the increment (a day) which should be a duration object. The expression #duration(1,0,0,0) returns a duration object of a single day (the first argument of #duration is days, followed by hours, then minutes, then seconds).

Let’s move on. Click Next.

To convert the list to a table click To table in List tools ribbon.

Rename the column Date by double-clicking on the column name.

Click on the ABC-123 icon and select Date in the drop-down menu to convert to type of the column to date.

We will now create a new column with the date as a formatted text MM-DD-YYYY. This is the format of the date in the CSV filename. In the Add column tab, select Custom column.

Enter PageDate as New column name and copy the following code in Custom column formula:

Date.ToText([Date], "MM-dd-yyyy")

Note that the format “MM-dd-yyyy” is case sensitive. If you type “mm-dd-yyyy”, the month element will be ignored and you will only see zeros because “mm” represents minutes. If you use “DD” instead of “dd” you will see the string “DD” instead of the actual day – so, please apply the exact string. When you are done click OK.

Now, when we have the format we need, we can invoke the function we created earlier – FnLoadDataByDate.

In the Add column tab, select Custom column. Then, enter the following function in the Custom column formula:

FnLoadDataByDate([PageDate])

This code will invoke the function with the formatted date in PageDate column. Click OK in the dialog box Custom column.

The next experience can be confusing. Power Query will detect that you try to bring multiple files and will ask you to define data privacy. Click Continue if you see the message below.

Instead of defining each one of the CSV files as a public data source as shown in the dialog box below, there is a way to refine the code to allow you to treat all the CSV files as a single data source. After all, all these files come from the same hostname (GitHub).

Click Cancel, and in the Queries pane select US COVID-19 by Date. In Home tab, select Advanced editor take a look of the following section in the code:

Source = Csv.Document(Web.Contents("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports_us/" & PageDate & ".csv"), [Delimiter = ",", Columns = 18, QuoteStyle = QuoteStyle.None]),

We can modify this code to specify to Power Query that we actually have only a single source here. By moving the URL path into the second parameter, we can consolidate the number of data sources.

Source = Csv.Document(
    Web.Contents(
      "https://raw.githubusercontent.com",
      [
        RelativePath = "CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports_us/" & PageDate & ".csv"
      ]
    ), [Delimiter = ",", Columns = 18, QuoteStyle = QuoteStyle.None]
  ),

Here is the full code after the modification:

let
  Source = Csv.Document(
    Web.Contents(
      "https://raw.githubusercontent.com",
      [
        RelativePath = "CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports_us/" & PageDate & ".csv"
      ]
    ), [Delimiter = ",", Columns = 18, QuoteStyle = QuoteStyle.None]
  ),
  #"Promoted headers" = Table.PromoteHeaders(Source, [PromoteAllScalars = true]),
  #"Changed column type" = Table.TransformColumnTypes(#"Promoted headers", {{"Province_State", type text}, {"Country_Region", type text}, {"Last_Update", type datetime}, {"Lat", type number}, {"Long_", type number}, {"Confirmed", Int64.Type}, {"Deaths", Int64.Type}, {"Recovered", Int64.Type}, {"Active", Int64.Type}, {"FIPS", Int64.Type}, {"Incident_Rate", type number}, {"People_Tested", Int64.Type}, {"People_Hospitalized", type text}, {"Mortality_Rate", type number}, {"UID", Int64.Type}, {"ISO3", type text}, {"Testing_Rate", type number}, {"Hospitalization_Rate", type text}})
in
  #"Changed column type"

After this change, the new Custom column will include the Table objects. Each from a different CSV for each date. If you click on one of the cells (not the Table hyperlink), you can see the table in the preview section at the bottom.

Click the expand control on the right side of Custom column header. Uncheck Use original column name as prefix and click OK. This action will expand all the Table objects.

You can remove the PageDate column. We don’t need it any longer. Remember, it was only needed to invoke the function to import the corresponding CSV files. Right-click on the header of PageDate and select Remove columns in the shortcut menu.

Rename the query US COVID-19.

Go over each of the column headers and click on the ABC-123 icon to change the columns to the right type. For example, you can change the Confirmed column to Whole number. Defining the right type of columns now will help you to ensure that the CDS entity can be mapped correctly.

You can now click the Next button.

Make sure that the first three queries are set under Load settings to Do not load.

Select US COVID-19 and select Load to new table in Load settings. Enter US COVID-19 in Table name and Table display name. You can also provide a description for this new entity in the Table description, so other Power Platform developers can find it. You can then make sure that the types in Field Mapping are mapped correctly. For example, I changed Multiline text to Text in some of the fields that were identified as Multline text. When you are ready click Next.

In the Refresh settings section you can select Refresh automatically and configure the refresh schedule. Then, click Create.

You are ready to go. You can now use the COVID-19 entity to build Power Apps, Power BI reports or even integrate this data into custom workflows in Dynamics 365.

To create a new Canvas App, click Create.

Select the US COVID-19S entity and build your app.

Hope you enjoyed this tutorial. If you have challenging datasets that need to be imported into CDS, Power Platform dataflows with its Power Query Online experience and underlying M code can be your best friends. You can learn more about Power Query in my book.