In today’s post, I am going to be associative a bit, and explore the latest Power BI Desktop along several topics that I’ve wanted to share. We will cover the best features of Power BI Desktop April update: Power BI Service Live Connection and Add Column by Example (also known as Column From Examples in the UI). Finally, we will use the Add Column by Example feature to extract links from web pages, so you can consider this blog post as a follow up of my Web Scraping post here.
Microsoft Data Insights Summit
Power BI Desktop April Update is out (learn more here). So, I had to try it. After the installation, you can find a link in the Welcome screen to register to Microsoft Data Insights Summit. The link is not new, but the News is that I am going to speak there 🙂
You can go to Microsoft Data Insights Summit – Sessions page here, and type “Gil Raviv” in the search box to find my two sessions, or find the sessions here and here if you have registered to the summit.
Session 1: The Ten Pitfalls of the Data Wrangler
Monday, June 12 2017, 2:10 pm – 3:00 pm
BI Pro / 300 LEVEL / Excel, Power BI / Data Prep and Modeling / Breakout
Attending the summit? Add to your Calendar here.
Data preparation has become so easy and potent in Excel and Power BI. The user experience of the Query Editor of Power BI and Excel (Power Query Add-In, or Get & Transform in Excel 2016) is extremely rewarding. It transforms crucial yet repetitive tasks into an exciting and even fun activity. Just as important, it’s a process which produces an automated pipeline for your ever-changing reporting requirements. Thanks to Power Query, the modern data wrangler can become a self-service ETL power house. As you become a super data wrangler, it’s time to be aware of the most common mistakes you do in Power Query, and how to avoid them.
(The session is based on the Pitfalls series here).
Tuesday, June 13, 10:30 pm – 11:20 pm
Business User / 200 LEVEL / Power BI / General Overview/Intro / Breakout
Attending the summit? Add to your Calendar here.
In this session, we will learn how to utilize Power BI to speed up a data innovation culture in your organization. We will adopt agile & innovation approaches to push self-service BI to new frontiers. With real life examples, we will demonstrate how to use Power BI for rapid prototyping to gain higher engagement from your stakeholders and shorter time to insights. This session combines high-level concepts with technical walkthroughs that help you to become the catalyst of data innovation in your organization.
Do you have a great story to share, how Power BI helped you to become data innovation catalyst? Contact me at [email protected], and I may share your story during the session.
Power BI Desktop April Update
Back to the Power BI Desktop update –
There are three brilliant features you should try: Power BI Service Live Connection, Quick Measures and Add Column by Example. For a complete feature summary go here.
To start using Power BI Service Live Connection and Quick Measures (these features are still in preview), open the latest Power BI Desktop (April update) and select Options and settings in File tab. Then select Options.
Click Preview features and check the boxes of Power BI Service Live Connection and Quick measures. When you are done click OK.
Quick measures preview
The Quick measures preview is a great tool to build basic measures in Power BI without knowing DAX. You can apply a rich variety of calculations like aggregation within category, filters and baselines, time intelligence and much more. The Quick measures preview is also a great starting point to learn DAX. The fact that you can now get the actual DAX formula, using the Quick measures UI, allows you to learn the formula, edit it, and reuse it by other measures. To learn about Quick measures, go here.
Power BI Service Live Connection
The next feature that I wanted to try is the Live Connection to Microsoft Power BI Service (learn more here).
The fact that we can now create a single dataset in Power BI and reuse it by many Power BI reports is simply mind blowing!!! There are very interesting collaboration scenarios which are now possible. Imagine you work in a team of specialists. The Power Query and Modeling experts can work on the Power BI files (pbix) with blank reports. They can now write their advanced queries and measures in these files, publish them to the relevant Power BI workspace, and allow the visualization experts to work on multiple Power BI report files that consume the live data directly from the published dataset. Isn’t that awesome?
You can now create multiple reports for the same dataset. Why do we need multiple reports for the same dataset? Here are several ideas:
- Per report, we can have different selection of filters by topic, site or division (when row level security is not applicable).
- We can create different levels of granularity in the different reports. Instead of using pages, we can dedicate individual report for Executive Summary, a second report for business stakeholders, and a deep-dive report for analysts or data troubleshooting.
- Version control: Use a new report to the same dataset whenever you release a major change in the report. Keep the old report to manage the transition of your users to the new report.
- Maturity levels: You can use multiple reports to reflect their maturity and quality. Do you want to get frequent and early feedback from your users? Start with a single dataset, and release a low-fidelity report that connects to that dataset. You can now periodically release new reports that connect to the same dataset – Each version has new visuals and better quality to fit your users’ needs.
- Development / Staging / Production – You can now manage the visualization layer in three development environments. The Development report is where you make the daily changes for the visuals. Fix bugs, improve visuals, or add new ones. When you are ready, you can create a new report from the Development report. Let’s call it a Staging report. Share this report with your key stakeholders and quality assurance testers. When you are ready, you can duplicate the Staging report and share it in your production environment.
Let’s check out the new feature:
Open the latest Power BI Desktop. Make sure that you activated the feature in Options –> Preview features. Click Get Data, select Online Services, click Power BI service and click Connect.
In the Power BI service dialog, search for the dataset you would like to connect to, select it, and click Load. Note: Since this is a live connection to the dataset, you cannot edit the query.
After you click the Load button, you will find your tables, fields and measures in the Fields pane.
As you have noticed, in the dialog above, we didn’t have an Edit button. Only Load. For the same reason, the Edit Queries option is disabled. If you wish to perform changes, you must change the queries in the original Power BI report, and re-publish it.
Like other live connections, you will notice that there is only the Report view in the left pane. There are no Data or Relationships views.
In the Modeling tab, all the commands are disabled, so don’t try adding new measures or columns. You cannot manage the roles of this report to apply row level security. These changes should be done in the original report that was used to publish the dataset.
In the next screen, I have created a sample report. All the data is fetched on demand through the live connection.
I can now publish the report to Power BI service.
Add Column by Example
As a data wrangler, the Add Column by Example feature is going to be your best friend! It is one of those rare features which is amazingly useful for both basic and expert users.
To learn about the new feature go here.
Let’s demonstrate the new feature in the context of web scraping, and learn how to extract links from tables on the Web. In the last blog post here, we learned how to scrap tables from the Web, by navigating through the HTML Document Children tags. But sometimes, you also want to extract the links for each element in the table. Unfortunately, you cannot extract the links by using the method I’ve shared. But let’s find out how we can extract links using the new Add Column by Example.
We will demonstrate the scraping of links from Microsoft MVP official website. Our goal is to extract the links of the MVP images and profile pages, and combine them with the names, country and award category which were extracted in the last post.
Start with the Power BI Report that you created in the last post. Click Get Data and select Other and Web. Then click Connect.
In From Web dialog paste the URL below and click OK.
In the Navigator dialog select Document, and click Edit.
Now is a tricky part… We would like to remove the Web.Page( … ) formula and keep the Web.Contents inner part. If you try to do it directly from the formula bar, the Query Editor will undo the change.
Click Advanced Editor, and remove “Web.Page(“ and its closing parenthesis “)” from the following formula:
Here is the modified formula:
Now, in the Query Editor preview, you will see the icon for mvp.microsoft.com. Don’t double click on it.
Right click on the icon, and select Text.
The Query Editor will handle the HTML file as a single column table, each line in the HTML file will be placed in a cell in the new table:
Now, let’s filter the lines we need. By viewing the HTML source in the browser, we can see that each MVP contains a link to his/her image. The links contain the string “/PublicProfile/”.
Click the filter control, select Text Filters and click Contains…
In Filter Rows, paste the text /PublicProfile/ and click OK.
Now, the Query Editor shows the img tags with our MVP photos.
Finally, we are going to start using the new feature.
In Add Column tab, click Column From Examples.
Enter the number that is highlighted below (7914) in the first cell of the right column, and press Enter. The Query Editor will magically extract the correct MVP ID from each row (I must say – This is a real fun).
In the black notification bar above, you can review the M expression, that was automatically generated for us. If you know M, this is where you can validate that a correct formula was generated, or learn new stuff. In our example: you can see that a new function Text.BetweenDelimiters was used (learn more about the new function here). As a new function in Power BI Desktop, if you try the same query in Excel you will get the following error:
We will need to wait for one of the next Office updates to include the new functionality including the Text.BetweenDelimiters function (Note: I tested the compatibility with Excel on Power Query version is 2.42.4611.241. The Power BI Desktop version that supports the new functionality is 2.45.4704.442).
So, make sure your queries are not required in Excel. They will not be refreshed.
Let’s continue the scraping. It’s fun. We can rename the new column as MVP ID, and create a new Column From Examples. This time, we will enter en-us in the first cell of Column2.
You can now see that Text.BetweenDelimiters is used again, to extract the locale from the URL. We can now click OK in the bar, and rename the new column as Locale.
Let’s create a third column from Examples. In the first cell we will type:
As a result, the Query Editor detects that we would like to concatenate the string “https://mvp.microsoft.com/” with Locale, the fixed string “/PublicProfile/Photo/” and MVP ID. We can click OK in the bar, to get the links for the MVPs’ photos.
What’s next? A quick reverse engineering on the MVP website, reveals that the links to the MVPs’ profile is “https://mvp.microsoft.com/en-us/PublicProfile/” followed by the MVP ID. It’s easy to create a new column with the relevant example.
We can delete the columns MVP ID and Locale, and rename the last two new columns as Image URL and Profile URL. Let’s also rename the new query as URLs.
Finally, to combine between the table of URLs and the MVPs query we have created in the last blog post, we can add Index column to the two queries.
Note: The MVPs query was created in the last blog post. We used the web connector to navigate the HTML tags and extract the MVPs‘ name, award category and country.
Don’t forget to add an Index to the MVPs query.
Back to the MVPs query, we can click Merge Queries in Home tab.
In Merge dialog, select the Index column of MVPs, select URLs as the second query to merge, and select the Index column of URLs. Then, click OK.
Finally, expand NewColumn, select Image URL and Profile URL, and click OK.
That’s it. We can now load the combined table of MVPs with their name, award category, country, profile link and photo link to Power BI.
Power BI April update is packed with pure goodness. The live connection to a dataset in Power BI service, opens many new scenarios for team collaboration and release methods. The Columns from Examples feature (or Add Column by Example) is simply amazing. We used it to scrap URLs from web pages, and found out that it may not migrate yet to your Excel Power Query.
Finally, if you are attending Microsoft Data Insights Summit, I am looking forward to meeting you, and hope you will attend my sessions. If you want to share your story, how you have become a catalyst for data innovation in your organization, please contact me.