Do you use Power Apps Analytical Dataflows and would like to track data changes and monitor profile anomalies to improve the data quality of your business applications on Microsoft Data Platform? This app can help you monitor any dataflows in a given Power Platform Environment. The app is available to download on Microsoft AppSource here. The app allows you to use a connected ADLS Gen2 Power Apps Analytical dataflow as the source for auditing and data quality automation use cases. Every time you refresh the dataflow, a snapshot is created on ADLS Gen2 and the app will profile it and detect changes in records from the previous version.
This page will guide you through the installation steps and describe how you can connect the app to your dataflow’s snapshots to monitor data changes over time and find trends on key data profiling metrics such as Min, Average, Max, and Standard Deviations.
If you have any technical issues connecting to the app, please read the known limitations below and contact DataChant with more details here.
To install the app you need to have a Power BI Pro account and the permissions from your Power BI administrator to install Power BI template apps from Microsoft AppSource. This app is a Power BI Template App. You can learn more about Power BI Template Apps here.
After you install the app, you can review it with the built-in demo data before you connect to your data. In the Summary page you can view the main data changes, missing or new records, and find trends of key profiling metrics.
In the Differences view, you can review all data changes in more detail in a tabular format and slice & dice the changes by tables, change types, or time periods. For example, You can review only missing records or changes to a specific table in a given period. You can expand the Filter pane on the right to filter the report by specific dates and tables.
The Profiles view highlights selected profiling metrics by tables, columns, and snapshot dates, and allows you to find anomalies in any column in your tables based on min, average, max, standard deviation, null count, distinct count, and row count.
The app was designed to track changes in the snapshots of all specified Power Apps Analytical Dataflows in a given Power Platform Environment. When you connect your dataflows to ADLS Gen2, every time you refresh the dataflows it creates a snapshot on ADLS Gen2. This app takes leverage of the snapshots to track changes and profiles your data. To connect your workspace to ADLS Gen2 go here.
Now that you have your Dataflows connected to ADLS Gen2, you can follow these steps to monitor the selected dataflows and their snapshots.
Download the Excel file from here.
This workbook is important to define the dataflows, tables, and column keys to track the changes in the snapshots. For example, in the demo data of the app, we have 2 Dataflows: Products and Table. The Products dataflow imports data from 2 tables: Revenues, and Categories. The Revenues table has the Index column that can be used as a unique key (Meaning that every record in Revenues has a unique Index value). The Categories table doesn’t have a unique key, but each row has a unique combination of the columns: Parent Category, Category. So, the Excel table includes the rows below to define a composite key that will help the app to compare records with the matching values of these columns.
Now, you can review the tables of your dataflow, and fill in the Excel table with the relevant table names and their associated single or combined columns that can be associated as unique keys. Once you complete the Excel table, please save it on your OneDrive for Business or a SharePoint site and follow the instructions here to get the file path for the Excel file. You will need to provide it as a value to the parameter Table Keys Path.
Note: If you want to ignore a dataflow or tables in your analysis, simple don’t mention them in the table above.
Connecting the App
In Power BI service, go to the Apps section and select Dataflow Snapshots – Power Apps app.
Click Connect your data.
Enter the paramaters in the Connect to Dataflow Snapshots – Power Apps dialog box, following the instructions below.
The first parameter is ADLS Gen2 Account Name. Enter the account name of your ADLS Gen2 storage account. You can find it on the Azure Portal when you select the ADLS Gen2 resource under Storage accounts as highlighted below.
The account name is also available on the Power Apps Dataflows view. Select any of the Analytical Dataflows. Click the ellipsis and then select Copy path in lake.
In the Copy Products CDM Folder Path, copy the area highlighted as #1 in the screenshot below and use it as ADLS Gen2 Account Name. Now copy the section that is highlighted as #2 in the screenshot below. You can find it between “environments/” and the last “/” character as illustrated in the screenshot. This section will be used as your Environment Name parameter. Paste this value into Enviornment Name parameter (You will need to scroll down to find it in the parameters dialog box of the app).
The next parameter is Table Keys Path. Enter the file path (without “?web=1 suffix”), following the Initial Preparations sections above.
Scroll down to find the next parameter.
The next parameter is Top Rows. Select the maximal number of rows to analyze in each snapshot. For performance reasons, it is advised to monitor small tables. Very large snapshots may time out.
Scroll down to the next 3 parameters.
Max Recent Snapshots: Select the number of latest snapshots to monitor. For performance reasons, it is advised to monitor a small number of snapshots.
Show Only Changes: You can set this parameter to Yes if you want to look only for changes and ignore new and missing records. Set the parameter to No, if you wish to monitor both changes, missing records, and new records.
Envirnment Name: Your Power Platform enviornment you want to monitor. You can find it as explained above (2 screenshots above).
After you provide all parameters click Next. You will be asked to provide the Account Key for your Azure Data Lake Storage used by the Analytical Dataflows.
You can find the Access Key on Azure Portal, or ask the owner of the storage to provide it. Select Organizational in the Privacy Level and then click Sign in and continue.
Next, you would need to connect to the Excel file you stored on OneDrive for Business on SharePoint. Keep OAuth2 as Authentication method and select Organizational in the Privacy Level and then click Sign in and continue.
As the owner of the app, you can edit the parameters at any later time from the Workspace settings. Learn more here.
The Parameters section will be available in the Settings of the dataset. From there you can also set a scheduled refresh to have the app constantly refreshed.
Known Limitations & Best Practices
The app will not work well on very large data sets. It will most likely timeout due to the refresh limitations of the Power BI Service. The app is looking for differences by matching the keys in records in each pair of consecutive snapshots. This matching can be significantly time-consuming on large datasets or a high number of snapshots. Follow these recommendations to avoid timeouts:
- It is recommended to connect this app to small datasets (less than 1 million rows). If you have large data sets, you can use Power Query online to sample the data in another dataflow and connect the new dataflow to the app. It is better to audit a sampled data than avoid automated auditing altogether.
- For refresh performance improvements it is recommended to use tables with a single unique key, and update it in the Excel file that was described above.
- If you set the app’s dataset for a scheduled refresh, it is advised to set a limit to the parameter Max Recent Snapshots.
If you receive errors during the refresh of the app, please confirm that the keys are set correctly and that Power BI can connect to the Excel file and Azure Data Lake. It is also recommended to confirm using Azure Storage Explorer that your dataflow has snapshots stored on the associated ADLS Gen2. Note: The current solution uses a Pipe character (“|”) to concatenate multiple columns that are defined as keys.
If your keys contain the pipe character (“|”) in one of their columns, it is recommended that you create a new single-column key for your data. Otherwise, the app will fail to refresh or get wrong results. In future versions, we will remove this limitation.
If you still have technical issues connecting to the app or have specific requirements that the app doesn’t meet, please contact DataChant via this form and share feedback. Make sure you check out our other apps here.