How to Enforce Power Query/Power BI to Import Uncached Data from the Web

A Power Query Mystery

I recently stumbled upon a mysterious Power Query behavior. A Power Query query was pulling data from an external Excel workbook that was hosted on a corporate Intranet. The data included exchange rate numbers which were updated on a daily basis. After the refresh, for some strange reason, the numbers in the local Excel file weren’t equal to the numbers in the external data source on the Intranet.

We started troubleshooting the issue by ensuring that Power Query connects to the same file on the Intranet, and that the correct table is requested.  It was really strange. There were no refresh errors, and the imported data was in the right format. Just different numbers.

Keep reading to find the answer, or subscribe to DataChant to access the solution.

The data just seemed to be outdated

[rad_rapidology_inline optin_id=optin_5]
I suspected that somewhere in the corporate network, there is a cached and outdated version of the Excel file. Could it be that when Power Query connects to the Web, it gets an outdated cached response from web proxies? The following diagram highlights my suspicion.

Power Query/Power BI imports cached data from the Web, the results may be outdated

Fortunately, in my earlier days, I was a developer of web proxies, and remebered few things that were related to HTTP, so I installed Fiddler in my computer and checked how Power Query sends its HTTP requests. To test it yourself install Fiddler and import any URL using the Web connector (e.g. in Excel 2016: Data –> New Query –> From Other Sources –> From Web).

In the following Fiddler screenshot, you can see the HTTP headers that are sent by Power Query (For the test, I used http://swapi.co/api/people, but any URL would yield the same Request Headers).

Power Query doesn't use HTTP header Cache-Control and as a result outdated data from web proxies can be imported.
Power Query doesn’t use the HTTP header Cache-Control and as a result outdated data from web proxies can be imported.

There is an important HTTP header that Power Query doesn’t use – Cache-Control. This header can be used by web clients to ensure that the response is not outdated. Read more about Cache-Control and caching in HTTP/1.1 here.

By not using Cache-Control with some specific cache-directives (that will be shown in a minute), Power Query/Power BI relies on the web server and any intermediate web proxies to do their job and provide the requested content. So if you use Power Query/Power BI in a corporate enviornment, which deploys web proxies that suffer from over-caching disorder, be prepared to get outdated data.

The solution is to include the following HTTP header:

Cache-Control: no-cache, no-store, must-revalidate

Note: If you have an old web proxy in your organization that doesn’t support HTTP/1.1, you can try using the header Pragma: no-cache instead of Cache-Control (read more here), but only if you are connected anonymously, or you may get the following error:

Use Pragma: no-cache to bypass HTTP/1.0 web proxies and get the latest data by Power Query/Power BI. Pragma will only work on anonymous requests.

To fix existing queries that you have, open Query Editor and click Advanced Editor. Then locate in the query expression (M) any references to Web.Contents.

Source =
    Web.Contents(
        URL
    )

You can now add a second argument to Web.Contents. A record with the field Headers, whose value is a nested record with the field Cache-Control and the value no-cache, no-store, must-revalidate. Here is the modified code:

Source =
    Web.Contents(
        URL,
        [
            Headers = [#"Cache-Control" = "no-cache, no-store, must-revalidate"]
        ]
    )

Adding headers from the UI is not always reliable

You can also try adding the Cache-Control header through the advanced option of the dialog From Web, as seen in the following screenshot. But after you use this user interface to add the header, make sure that the header is indeed inside the M expression. There is a bug (at least in my version of Excel) that ignore the headers you insert in the UI if the response is JSON.

You can use the advanced From Web dialog in Power Query to declare Cache-Control header and bypass web proxies outdated content. But check the advanced editor. There are some bugs.
You can use the advanced From Web dialog in Power Query to declare Cache-Control header and bypass web proxies outdated content. But check the advanced editor. There are some bugs.

When I used the Cache-Control header on a URL from StarWars API, Power Query detected the JSON response and created the following M expression, ignoring the headers I added in the UI.

let
    Source = Json.Document(Web.Contents("http://swapi.co/api/people"))
in
    Source

I suspect that you will encounter the same bug, when you import files TXT, CSV, XML files from Web. Power Query will  automatically convert the file by wrapping the Web.Contents function with another, and lose the HTTP header declarations.

So if your response is in JSON, and you are concerned that the response is outdated, fix the M expression according to the following example:

Before:

Source =
    Json.Document(
        Web.Contents(URL
    )
)

After:

Source =
    Json.Document(
        Web.Contents(URL,
            [
                Headers = [#"Cache-Control" = "no-cache, no-store, must-revalidate"]
            ]
        )
    )

And here is the screenshot from my Fiddler, after applying the code change above.

Adding Cache-Control header in M to ensure that the web response is not outdated and bypass web proxies by Power Query/Power BI
Adding Cache-Control header in M to ensure that the web response is not outdated and bypass web proxies by Power Query/Power BI

If you are a DataChant subscriber, you will find the solution in the subscribers’ folder as “Bypass Web Proxies with Cache-Control Header.xlsx

Subscribe to DataChant to get the solution that bypasses web proxies and prevent having outdated data from the web using Power Query

I hope you find this post useful. Share your thoughts in the comments below.

 

 

 

8 comments

  1. Adam Reply

    Thanks for this. I have been suffering too long with cached JSON files. Interestingly though I was unable to use Fiddler as Excel 2019 was detecting it and flagging as insecure so it wouldn’t proceed. The M expression worked a treat though.

  2. Syed Ahmad Reply

    Nice detailed post. I found it because I ran into this issue. My Excel power query stopped showing updated data even though there’s no error message. When I look at the website, it’s been updated but query doesn’t update. I cleared my browser cache, excel document cache, rebooted, etc. Deleted the connection and table and tried to import fresh. When I start setting up the new connection and look at Web View, even that shows two day old data! I followed your steps and I see the headers correctly in the advanced editor but I still only receive the same two day old data. Don’t know how to fix this. Any other ideas?

  3. Syed Ahmad Reply

    here’s my query in the advanced editor after making the changes you suggested:

    let
    Source = Web.Page(Web.Contents(“https://coronavirus.health.ny.gov/county-county-breakdown-positive-cases”, [Headers=[CacheControl=”no-cache, no-store, must-revalidate”]])),
    Data0 = Source{0}[Data],
    #”Changed Type” = Table.TransformColumnTypes(Data0,{{“County”, type text}, {“Positive Cases”, Int64.Type}})
    in
    #”Changed Type”

    • Gil Raviv Post authorReply

      Please fix the CacheControl element in your code to:
      #”Cache-Control”=”no-cache, no-store, must-revalidate”

      Here is the entire expression:
      let
      Source = Web.Page(Web.Contents(“https://coronavirus.health.ny.gov/county-county-breakdown-positive-cases”, [Headers=[#”Cache-Control”=”no-cache, no-store, must-revalidate”]])),
      Data0 = Source{0}[Data],
      #”Changed Type” = Table.TransformColumnTypes(Data0,{{“County”, type text}, {“Positive Cases”, Int64.Type}})
      in
      #”Changed Type”

  4. Antonio Reply

    Hi,

    Using web connector to get a CSV file is caching the content. Data is not refreshed. I have ensured cache headers are present into my query and still not refreshing. I tried to clear PoweirBI cache through Advanced options also and still not getting refreshed data…

    Any idea?

    Regards

Leave a Reply