Web Scraping Power BI Custom Visuals from AppSource using Power Query

Last week, Microsoft moved Office Store to App Source. As a result, all of the custom visuals of Power BI are now available in the new site. As promised here, today I will share the main query (M code) that I used to web-scrap AppSource website. You can use the query below to build your own Power BI Custom Visuals catalog, or create a lovely alternative exploration of the custom visuals in a Power BI report like this one:

If you are a subscriber of DataChant, you can get the report above for free. You will find it under the name Custom Visuals AppSource Free.pbix in the subscribers folder (Folder link will be sent to you by email after you subscribe, and in the periodical email updates).

Subscribe ToDataChant

Join our mailing list to receive exclusive content that was used in the preparation of this article.

Thank you for subscribing to DataChant. An email will be sent to you shortly with more details.

The free report is missing the highlighted elements below (which can be obtained in the paid version – Learn more here).

Custom Visuals in Power BI

Both versions can be published to the Power BI service and support scheduled refresh. Here is the embedded version of the paid report that is auto-refreshed:

 

The Query

Here is the Power Query M expression you’ll need to web-scrap the Power BI Custom Visuals from AppSource. The code include some hidden gems that deserve a separate blog post (will be published soon). For example: You can use the List.Generate below to create a while (true) loop with a condition.

let
    FnLoadPageRecords = (pageNum)=>
    let
        webContent = Web.Contents(
            "https://appsource.microsoft.com/en-us/marketplace/apps?",
            [
                Query= [
                    product="power-bi-visuals",
                    page=Text.From(pageNum),
                    productgroup="PowerBI"
                ]
            ]
        ),

        Source = Table.FromColumns({Lines.FromBinary(webContent, null, null, 65001)}),
        #"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "window.__INITIAL_STATE__ =")),
        #"Replaced Value" = Table.ReplaceValue(#"Filtered Rows"," <script type=""application/javascript"">window.__INITIAL_STATE__ = ","",Replacer.ReplaceText,{"Column1"}),
        Column1 = #"Replaced Value"{0}[Column1],
        #"Trimmed Text" = Text.Trim(Column1, ";"),
        #"Parsed JSON" = Json.Document(#"Trimmed Text"),
        services = #"Parsed JSON"[services],
        temporaryInitialRenderedTileExtraData = services[temporaryInitialRenderedTileExtraData],
        #"Converted to Table" = Record.ToTable(temporaryInitialRenderedTileExtraData),
        #"Expanded Value" = Table.ExpandRecordColumn(#"Converted to Table", "Value", {"entityId", "title", "publisher", "iconURL", "shortDescription"}, {"ID", "Title", "Publisher", "Image", "Description"}),
        #"Removed Columns" = Table.RemoveColumns(#"Expanded Value",{"Name"})
    in
        #"Removed Columns",

    Source = List.Generate(()=>[Index=1, Page=FnLoadPageRecords(Index)], each Table.RowCount(_[Page])>0, each [Index=_[Index]+1, Page=FnLoadPageRecords(_[Index]+1)]),
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"Page"}, {"Page"}),
    #"Expanded Page" = Table.ExpandTableColumn(#"Expanded Column1", "Page", {"ID", "Title", "Publisher", "Image", "Description"}, {"ID", "Title", "Publisher", "Image", "Description"}),
    #"Added Custom" = Table.AddColumn(#"Expanded Page", "Link", each "https://appsource.microsoft.com/en-us/product/power-bi-visuals/" & [ID]),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "PBIVIZ", each "https://store.office.com/powerbiaddin?assetId=" & [ID])
in
    #"Added Custom1"

Coming next…

1- While (true) loops in Power Query.

2- Step by step walkthrough  – How I scraped AppSource.

3- Bulk download of the Custom Visuals pbiviz files from AppSource (Hint: Use the query above with the  VBA code I shared here).

What would you like to learn next? You can share in the comments below.

 

5 comments

  1. Frank Tonsen Reply

    Nice code, but some IDs are definitively duplicated. It seems that MS is permanently changing the order. Thus, some visuals appear on both pages. 🙁

    • Gil Raviv Post authorReply

      Hope this issue was resolved by Microsoft by now. Some Visuals appears randomly on both pages, instead of other Visuals. As a workaround, you can duplicate the queries and append the results. Then, remove the duplicates.

  2. Gowrishankar Reply

    Thanks for the article.I got an idea to web scrap google play store android apps. Could you tell me what are the things needs to be covered ?

    • Gil Raviv Post authorReply

      I would first check if you are not violating any user license agreement by scraping their site. Then I would evaluate how often they may change their website. If there are constant changes in format and UI, I would probably not even try it.

Leave a Reply