A Power Query Mystery
I recently stumbled upon a mysterious Power Query behavior. A Power Query query was pulling data from an external Excel workbook that was hosted on a corporate Intranet. The data included exchange rate numbers which were updated on a daily basis. After the refresh, for some strange reason, the numbers in the local Excel file weren’t equal to the numbers in the external data source on the Intranet.
We started troubleshooting the issue by ensuring that Power Query connects to the same file on the Intranet, and that the correct table is requested. It was really strange. There were no refresh errors, and the imported data was in the right format. Just different numbers.
Keep reading to find the answer, or subscribe to DataChant to access the solution.
The data just seemed to be outdated
I suspected that somewhere in the corporate network, there is a cached and outdated version of the Excel file. Could it be that when Power Query connects to the Web, it gets an outdated cached response from web proxies? The following diagram highlights my suspicion.
Fortunately, in my earlier days, I was a developer of web proxies, and remebered few things that were related to HTTP, so I installed Fiddler in my computer and checked how Power Query sends its HTTP requests. To test it yourself install Fiddler and import any URL using the Web connector (e.g. in Excel 2016: Data –> New Query –> From Other Sources –> From Web).
In the following Fiddler screenshot, you can see the HTTP headers that are sent by Power Query (For the test, I used http://swapi.co/api/people, but any URL would yield the same Request Headers).
There is an important HTTP header that Power Query doesn’t use – Cache-Control. This header can be used by web clients to ensure that the response is not outdated. Read more about Cache-Control and caching in HTTP/1.1 here.
By not using Cache-Control with some specific cache-directives (that will be shown in a minute), Power Query/Power BI relies on the web server and any intermediate web proxies to do their job and provide the requested content. So if you use Power Query/Power BI in a corporate enviornment, which deploys web proxies that suffer from over-caching disorder, be prepared to get outdated data.
The solution is to include the following HTTP header:
Cache-Control: no-cache, no-store, must-revalidate
Note: If you have an old web proxy in your organization that doesn’t support HTTP/1.1, you can try using the header Pragma: no-cache instead of Cache-Control (read more here), but only if you are connected anonymously, or you may get the following error:
To fix existing queries that you have, open Query Editor and click Advanced Editor. Then locate in the query expression (M) any references to Web.Contents.
Source = Web.Contents( URL )
You can now add a second argument to Web.Contents. A record with the field Headers, whose value is a nested record with the field Cache-Control and the value no-cache, no-store, must-revalidate. Here is the modified code:
Source = Web.Contents( URL, [ Headers = [#"Cache-Control" = "no-cache, no-store, must-revalidate"] ] )
Adding headers from the UI is not always reliable
You can also try adding the Cache-Control header through the advanced option of the dialog From Web, as seen in the following screenshot. But after you use this user interface to add the header, make sure that the header is indeed inside the M expression. There is a bug (at least in my version of Excel) that ignore the headers you insert in the UI if the response is JSON.
When I used the Cache-Control header on a URL from StarWars API, Power Query detected the JSON response and created the following M expression, ignoring the headers I added in the UI.
let Source = Json.Document(Web.Contents("http://swapi.co/api/people")) in Source
I suspect that you will encounter the same bug, when you import files TXT, CSV, XML files from Web. Power Query will automatically convert the file by wrapping the Web.Contents function with another, and lose the HTTP header declarations.
So if your response is in JSON, and you are concerned that the response is outdated, fix the M expression according to the following example:
Source = Json.Document( Web.Contents(URL ) )
Source = Json.Document( Web.Contents(URL, [ Headers = [#"Cache-Control" = "no-cache, no-store, must-revalidate"] ] ) )
And here is the screenshot from my Fiddler, after applying the code change above.
If you are a DataChant subscriber, you will find the solution in the subscribers’ folder as “Bypass Web Proxies with Cache-Control Header.xlsx”
I hope you find this post useful. Share your thoughts in the comments below.