Data Quality Automation with Power Query

The low-code capabilities of Power Query can be leveraged to solve highly complex data challenges. I have recently developed a data quality automation solution with it, that takes data quality to the next level. What is the point in purchasing an expensive shelf-product that implements an enterprise data quality automation suite, when you can build it quickly on your own with Power Query and dataflows? I am considering writing a new series and explain the elements of the automation that I have built. If you find it relevant, please comment below, or contact me in this form.

The serverless variation of the automation consists of a report that connects to all the snapshots of a Power BI dataflow on Azure Data Lake Gen2 and generates data profiles and looks for differences between each pair of consecutive snapshots. Here is a screenshot from my report:

The main building blocks to build a data quality automation with Power Query consist of the following elements:

  • Generation of table snapshots using dataflows or Excel files.
  • Power Query Table.Profile for ongoing data profiling.
  • Power Query advanced table comparison functions, available here. Comparing snapshots is the key to automate regression testing. To learn the main concepts for these functions, you can read my articles here and here.
  • A Powershell script for the on-prem Excel-based version.
  • An Azure Data Lake Gen2, connected to your Power BI workspace to Azure Data Lake Gen2 for automated snapshots that are created automatically every refresh. Learn more here.
Every dataflow refresh is automatically stored as a snapshot csv file on your Azure DataLake Gen2

At this stage, you may not be sure how the elements above are all connected to compete with any enterprise data quality suite, but if this area is relevant to your business objectives, I would be curious to learn from you. try to answer the following questions:

  1. Do you have a nonexistent or ineffective data quality automation for your BI projects? Do you need it at all?
  2. Do you spend a lot of time on data quality issues, including troubleshooting and fixes of regressions?
  3. If you do have a data quality automation in place – Is it difficult to add a new report or dataset to this automation? Do you need to hire a special developer or consultant to cover the test automation for your reports?
  4. Do you use Power BI for visualizations only, and never take leverage of its data shaping capabilities when you work with enterprise data?

If you answer Yes to the questions above, then it means I have an audience for this series 🙂 You are welcome to connect here.

2 comments

  1. milhouse77bs Reply

    This is a good idea. Testing data and reports is often left to waiting for the customers or analysts to notice.

Leave a Reply