Iterate over dynamic Web API Pages with Power Query – How to resolve Cursor-Based Pagination

In today’s post, we will go over an advanced Power Query technique that will help you to import data from Web REST APIs that implement Cursor-Based Pagination. We will demonstrate this technique on Facebook Graph API v2.6, but you can apply it on any APIs with Cursor-Based Pagination.

If you analyze Facebook data, this is a Must Read blog post!
The technique that you are about to learn, helped me to create the Facebook Post Reactions to Trump and Clinton dashboard, and import 1000 posts for each Presidential candidate (and I could read more posts, of course). If you need to create an Excel or Power BI report which is based on Facebook data, and the out-of-the-shelf Facebook connector is not working properly due to bugs (e.g: here), this blog post is a must read for you. Without it, you will only be able to load 100 records from Facebook Graph API latest versions.

Before we start, here is a short intro on Web Pagination and a Cursor-Based Pagination.

Introduction to Web Pagination

Whenever you use a Web API to obtain large data set from a certain web service, you will usually get the responses in small pieces through paginated response. Each response will include a single page with limit number of records. To obtain the next page, you will usually need to provide the next page number or the offset as a parameter.

A nice and simple example for a REST API that uses static pagination, can be found here. The SWAPI API allows you to extract Star Wars fan data and specify a page number to get any of the paginated 10-records size responses.  If you are a Star Wars fan, and missed my earlier blogs here and here, you will find there a .pbix implementation of the static pagination. You can also read Matt Masson’s blog post here for a step by step tutorial of static pagination.

swapi.co is a Rest API for Star Wars fans. It implements a simple pagination.
swapi.co is a Rest API for Star Wars fans. It implements a simple pagination.
So what is a Cursor-Based Pagination, and why is it different than the static pagination?

According to Facebook (here): “Cursor-based pagination is the most efficient method of paging and should always be used where possible. A cursor refers to a random string of characters which marks a specific item in a list of data. … your app shouldn’t store any older cursors or assume that they will still be valid.”

Facebook's Cursor Based Pagination
Facebook’s Cursor Based Pagination
The tricky part of Cursor-Based Pagination is that you cannot know in advance which strings will be used to point you to the next pages. Each cursor is dynamically created when you get the specific page. As a result, you cannot create in advance a list of pages 1,2,3,4,..100 or a list of offsets, 0,10,20,30,40,…,100 to get all your data. Each page will return a randomly-generated cursor for the next page.

So how can we resolve Cursor-Based Pagination in Power Query?

To resolve this challenge in Power Query, you could consider recursions, to go over each page, then find its cursor and recursively drill down to the next page. A better solution for our challenge is to implement a for-loop using List.Generate. If this is the first time you stumble upon this function, you can read more about List.Generate in my previous blog posts here.

With List.Generate we can iterate over the pages. At each step of the iteration we will extract a page of data, keep the URL that includes the cursor for the next page, and store the results in the next item of the generated list. The following diagram illustrates the flow that is implemented with List.Generate.

Using List.Generate for Cursor-Based Pagination on Facebook
Using List.Generate for Cursor-Based Pagination on Facebook

The code

The following Power Query (M) expression demonstrates the extraction of 10 pages from Donald Trump’s facebook page (You will need to retrieve an access token from Facebook Graph API Explorer, and paste it below).

 

Wait, the code above was given as an example. There is an easier way to get Facebook posts with Power Query in-the-box Facebook connector. The example above is shared for the sole purpose of demonstration.

The compact version of the code

To reuse the M expression above on a different Facebook related API calls, here is a compact version of the code:

When you successfully apply this expression, you will get a list of lists, as shown in the screenshot below. In List Tools, Transform tab, click ToTable.

List.Generate returns a List of lists
List.Generate returns a List of lists

Expand Column1 twice, and select the relevant fields you wish to extract.
Screenshot_18

Finally, you can now see that you have multiple pages from Facebook (In the screenshot below you can see the 1000th row from Trump’s Facebook pages):
Screenshot_20

Let’s review the main function that we use inside List.Generate

The line that starts with Json.Document extracts the specific page with the input URL. The next line tries to access the field Source[data] which should contain the actual paginated data. If the response doesn’t contains the page due to an error (usually when there is no more data), we assign null to the key Data. The next line gets the URL for the next page including its cursor string. Again, if there is no further page, we will get an error, and assign a null value to the key Next. Finally, we return a record of [Data=…, Next=…]

Let’s review the List.Generate.

We initialize the List.Generate state with a record that contains a running index i=0 and a nested record res that holds the first record of FnGetOnePage. The next line defines the condition of the loop. We will continue generating the list as long as the index i is smaller than the value in iterations, and as long as the field Data of record res is not null (When it’s null, there are no further pages to retrieve). The next line defines the incremental step on the state of List.Generate. We increment the index by 1, and fetch the next record res that holds the new page in Data and the next URL with its cursor in Next. Finally, we define the next item in the list, which is embodied in the field Data of record res.

Conclusions

In conclusion, we have learnt how to implement Cursor-Based Pagination using List.Generate to extract large data sets from Web APIs. You can use this technique for a variety of Web APIs that don’t support the simple static approach of specifying page numbers or offsets in the API calls.

The most practical scenario for this technique that I can think of is reading Facebook page insights. In the current state of affairs, Facebook has a bug that prevents you from reading page insights with Power Query and its in-the-box Facebook connector.

To access a .pbix file with this tutorial, please subscribe to DataChant in the form below.

Do you have other scenarios where this technique can help you? Please share with us in the comments below.

4 comments

  1. Pingback: Iterate through API and extract fields in Power Query in Power BI

  2. Pingback: Excel Integration Part 1: Extract & Shape Marketo Data Using Power Query – Philippe Delle Case 's Blog

  3. Sandeep BhadauriyaReply

    Nice Article, It’s working for me.

    There is one issue in power bi Schedule Refresh not working for it. it’s give me following error.
    “You can’t schedule refresh for this dataset because one or more sources currently don’t support refresh.”

    Can you please suggest me where I am wrong..

    Regards,
    Sandeep

Leave a Reply