Iterate over dynamic Web API Pages with Power Query – How to resolve Cursor-Based Pagination

In today’s post, we will go over an advanced Power Query technique that will help you to import data from Web REST APIs that implement Cursor-Based Pagination. We will demonstrate this technique on Facebook Graph API v2.6, but you can apply it on any APIs with Cursor-Based Pagination.

If you analyze Facebook data, this is a Must Read blog post!
The technique that you are about to learn, helped me to create the Facebook Post Reactions to Trump and Clinton dashboard, and import 1000 posts for each Presidential candidate (and I could read more posts, of course). If you need to create an Excel or Power BI report which is based on Facebook data, and the out-of-the-shelf Facebook connector is not working properly due to bugs (e.g: here), this blog post is a must read for you. Without it, you will only be able to load 100 records from Facebook Graph API latest versions.

Before we start, here is a short intro on Web Pagination and a Cursor-Based Pagination.

Introduction to Web Pagination

Whenever you use a Web API to obtain large data set from a certain web service, you will usually get the responses in small pieces through paginated response. Each response will include a single page with limit number of records. To obtain the next page, you will usually need to provide the next page number or the offset as a parameter.

A nice and simple example for a REST API that uses static pagination, can be found here. The SWAPI API allows you to extract Star Wars fan data and specify a page number to get any of the paginated 10-records size responses.  If you are a Star Wars fan, and missed my earlier blogs here and here, you will find there a .pbix implementation of the static pagination. You can also read Matt Masson’s blog post here for a step by step tutorial of static pagination.

swapi.co is a Rest API for Star Wars fans. It implements a simple pagination.
swapi.co is a Rest API for Star Wars fans. It implements a simple pagination.

So what is a Cursor-Based Pagination, and why is it different than the static pagination?

According to Facebook (here): “Cursor-based pagination is the most efficient method of paging and should always be used where possible. A cursor refers to a random string of characters which marks a specific item in a list of data. … your app shouldn’t store any older cursors or assume that they will still be valid.”

Facebook's Cursor Based Pagination
Facebook’s Cursor Based Pagination

The tricky part of Cursor-Based Pagination is that you cannot know in advance which strings will be used to point you to the next pages. Each cursor is dynamically created when you get the specific page. As a result, you cannot create in advance a list of pages 1,2,3,4,..100 or a list of offsets, 0,10,20,30,40,…,100 to get all your data. Each page will return a randomly-generated cursor for the next page.

So how can we resolve Cursor-Based Pagination in Power Query?

To resolve this challenge in Power Query, you could consider recursions, to go over each page, then find its cursor and recursively drill down to the next page. A better solution for our challenge is to implement a for-loop using List.Generate. If this is the first time you stumble upon this function, you can read more about List.Generate in my previous blog posts here.

With List.Generate we can iterate over the pages. At each step of the iteration we will extract a page of data, keep the URL that includes the cursor for the next page, and store the results in the next item of the generated list. The following diagram illustrates the flow that is implemented with List.Generate.

Using List.Generate for Cursor-Based Pagination on Facebook
Using List.Generate for Cursor-Based Pagination on Facebook

The code

The following Power Query (M) expression demonstrates the extraction of 10 pages from Donald Trump’s facebook page (You will need to retrieve an access token from Facebook Graph API Explorer, and paste it below).

let
 fbPage = "MicrosoftBI",   // You can type your facebook page here
 iterations = 10,          // Number of iterations
 url =
  "https://graph.facebook.com/v2.8/" &
  fbPage &
  "/posts?limit=100&fields=message,created_time" &
  "&access_token=[<strong>Your access token here</strong>]",
 
// FnGetPage is the function that performs an import of single page.
// The page consists of a record with the data and the URL in the
// fields data and next. Other Web APIs hold the data and cursor in different formats
// but the principle is the same. 
 FnGetOnePage =
  (url) as record =>
   let
    Source = Json.Document(Web.Contents(url)),
    data = try Source[data] otherwise null,
    next = try Source[paging][next] otherwise null,
    res = [Data=data, Next=next]
   in
    res,
 
 GeneratedList =
  List.Generate(
   ()=>[i=0, res = FnGetOnePage(url)],
   each [i]<iterations and [res][Data]<>null,
   each [i=[i]+1, res = FnGetOnePage([res][Next])],
   each [res][Data]),
 
 #"Converted to Table" = Table.FromList(GeneratedList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
 #"Expanded Column2" = Table.ExpandListColumn(#"Converted to Table", "Column1"),
 #"Expanded Column3" = Table.ExpandRecordColumn(#"Expanded Column2", "Column1", {"message", "created_time", "id"}, {"message", "created_time", "id"})
in
 #"Expanded Column3"

 

Wait, the code above was given as an example. There is an easier way to get Facebook posts with Power Query in-the-box Facebook connector. The example above is shared for the sole purpose of demonstration.

The compact version of the code

To reuse the M expression above on a different Facebook related API calls, here is a compact version of the code:

let
 iterations = 10,          // Number of iterations
 url = 
  "https://graph.facebook.com/v2.6/...", // here goes your Facebook URL, Don't forget the access token
 
 FnGetOnePage =
  (url) as record =>
   let
    Source = Json.Document(Web.Contents(url)),
    data = try Source[data] otherwise null,
    next = try Source[paging][next] otherwise null,
    res = [Data=data, Next=next]
   in
    res,
 
 GeneratedList =
  List.Generate(
   ()=>[i=0, res = FnGetOnePage(url)],
   each [i]<iterations and [res][Data]<>null,
   each [i=[i]+1, res = FnGetOnePage([res][Next])],
   each [res][Data])
 in
  GeneratedList

When you successfully apply this expression, you will get a list of lists, as shown in the screenshot below. In List Tools, Transform tab, click ToTable.

List.Generate returns a List of lists
List.Generate returns a List of lists

Click OK in the following dialog.
Screenshot_16

Expand Column1 twice, and select the relevant fields you wish to extract.
Screenshot_18

Finally, you can now see that you have multiple pages from Facebook (In the screenshot below you can see the 1000th row from Trump’s Facebook pages):
Screenshot_20

Let’s review the main function that we use inside List.Generate

 FnGetOnePage =
  (url) as record =>
   let
    Source = Json.Document(Web.Contents(url)),
    data = try Source[data] otherwise null,
    next = try Source[paging][next] otherwise null,
    res = [Data=data, Next=next]
   in
    res

The line that starts with Json.Document extracts the specific page with the input URL. The next line tries to access the field Source[data] which should contain the actual paginated data. If the response doesn’t contains the page due to an error (usually when there is no more data), we assign null to the key Data. The next line gets the URL for the next page including its cursor string. Again, if there is no further page, we will get an error, and assign a null value to the key Next. Finally, we return a record of [Data=…, Next=…]

Let’s review the List.Generate.

GeneratedList =
 List.Generate(
  ()=>[i=0, res = FnGetOnePage(url)],
  each [i]<iterations and [res][Data]<>null,
  each [i=[i]+1, res = FnGetOnePage([res][Next])],
  each [res][Data]
 )

We initialize the List.Generate state with a record that contains a running index i=0 and a nested record res that holds the first record of FnGetOnePage. The next line defines the condition of the loop. We will continue generating the list as long as the index i is smaller than the value in iterations, and as long as the field Data of record res is not null (When it’s null, there are no further pages to retrieve). The next line defines the incremental step on the state of List.Generate. We increment the index by 1, and fetch the next record res that holds the new page in Data and the next URL with its cursor in Next. Finally, we define the next item in the list, which is embodied in the field Data of record res.

Conclusions

In conclusion, we have learnt how to implement Cursor-Based Pagination using List.Generate to extract large data sets from Web APIs. You can use this technique for a variety of Web APIs that don’t support the simple static approach of specifying page numbers or offsets in the API calls.

The most practical scenario for this technique that I can think of is reading Facebook page insights. In the current state of affairs, Facebook has a bug that prevents you from reading page insights with Power Query and its in-the-box Facebook connector.

To access a .pbix file with this tutorial, please subscribe to DataChant in the form below.

Do you have other scenarios where this technique can help you? Please share with us in the comments below.

25 comments

  1. Pingback: Iterate through API and extract fields in Power Query in Power BI

  2. Pingback: Excel Integration Part 1: Extract & Shape Marketo Data Using Power Query – Philippe Delle Case 's Blog

  3. Sandeep Bhadauriya Reply

    Nice Article, It’s working for me.

    There is one issue in power bi Schedule Refresh not working for it. it’s give me following error.
    “You can’t schedule refresh for this dataset because one or more sources currently don’t support refresh.”

    Can you please suggest me where I am wrong..

    Regards,
    Sandeep

  4. Anonymous Reply

    There is one improvement that should be made to your List.Generate(…) setup. Once you have reached your iteration limit, you are still fetching one more page from the server prior to deciding not to keep it. This is because the logic flow is generate entry -> test condition -> select entry. A simple solution is to test your iteration requirement during the generate entry step, then return null instead of calling the server. You can then add a requirement to your condition to only keep non-null entries.

  5. Satish Rajput Reply

    HI

    I am trying to do the same in Microsoft Graph API but using you query i am getting 1 page duplicated by 10 time. Ita not getting next page data. How i can fix that for graph API?

    • Gil Raviv Post authorReply

      I have some issues with the syntax highlighter WordPress plugin. Turned it off for now. You can now see the code.

  6. Al Reply

    Just found this. Huge for me. I’m trying to get data from a source that passes just the “offset” value back. Meaning, the next URL would have to append “&offset=” & offset.

    I’m new to Power Query, so I made the assuming that changing this code might work:
    res = [Data=data, Next=url & “&offset=” & next]

    Ultimately, I’m getting a blank list for queries and am curious what I might need to tweak to make this work? Appreciate any thoughts…

      • Al

        Product is called Airtable. It’s a private key, so I can’t share access to the API. The code is pretty much just like yours (minus the one line I changed). See it below.

        The end of the JSON contains an offset key that I’m supposed to pass into the next query as “&offset=”. See end of JSON here:

        ],
        offset: “itrmbC9o5iuJU8A7K/reclFVnXSgYsbPzda”
        }

        let
        baseUrl = “https://api.airtable.com/v0”,
        baseId = “”,
        tableName = “Initiatives”,
        apiKey = “”,
        iterations = 10,
        url = baseUrl & “/” & baseId & “/” & tableName & “/?api_key=” & apiKey,

        FnGetOnePage =
        (url) as record =>
        let
        Source = Json.Document(Web.Contents(url)),
        data = try Source[data] otherwise null,
        next = try Source[paging][next] otherwise null,
        res = [Data=data, Next=url & “&offset=” & next]
        in
        res,

        GeneratedList =
        List.Generate(
        ()=>[i=0, res = FnGetOnePage(url)],
        each [i]<iterations and [res][Data]null,
        each [i=[i]+1, res = FnGetOnePage([res][Next])],
        each [res][Data])
        in
        GeneratedList

        Anything else that would be helpful? Really appreciate this…

      • Gil Raviv Post author

        Thank you for sharing. Can you show the full Json response? Does it has a “data”, “paging” and an underlying “next” inside paging?

      • Gil Raviv Post author

        In your let statement, I think you should make these changes:
        data = try Source[records] otherwise null,
        next = try Source[offset] otherwise null,

        instead of:
        data = try Source[data] otherwise null,
        next = try Source[paging][next] otherwise null,

        The reason – You don’t have a data and paging->next attributes in your JSON. This is why you get nulls.

      • Al

        Here’s a dump of some test data.
        ====

        {
        records: [
        {
        id: “rec2bfDUkglNP4A9B”,
        fields: {
        Autonumber: 74,
        Notes: “test”
        },
        createdTime: “2019-03-14T16:09:03.000Z”
        },
        {
        id: “recjGNHQ6X74X1lqP”,
        fields: {
        Autonumber: 125,
        Notes: “100”
        },
        createdTime: “2019-03-14T16:09:15.000Z”
        },
        {
        id: “recjOBnTWw7a2GAB3”,
        fields: {
        Autonumber: 11,
        Notes: “paging”
        },
        createdTime: “2019-03-14T16:08:34.000Z”
        }
        ],
        offset: “itrkzHHPvpOPBVjrA/recjOBnTWw7a2GAB3”
        }

  7. Al Reply

    Hey there, having a hard time replying to our thread above for some reason…so had to start this one.

    We’re closer! I’m getting the first 100 rows in a list, but not the rest. Here’s the code section with our changes:

    FnGetOnePage =
    (url) as record =>
    let
    Source = Json.Document(Web.Contents(url)),
    data = try Source[records] otherwise null,
    next = try Source[offset] otherwise null,
    res = [Data=data, Next=url & “&offset=” & next]
    in
    res,

    Really, really, really appreciate all the help here.

    • Gil Raviv Post authorReply

      Hi Al
      Please send me more details by email. The comments are a challenging media at this stage.

  8. Al Reply

    Ooo…take that back. It’s working now. Not sure what was wrong the first time. THANK-YOU! Anything I can do in particular to “pay you back” for your time?

  9. Rob van Zutphen Reply

    Hi Gil,

    Thank you so much for this article. It helped me out big time!
    After using your code to obtain all the paginated results from a webapi I have pretty much finished my report. And so I ran into the next issue: the data refresh in the Power BI Service. I have read and treid Chris Webb’s advice (https://blog.crossjoin.co.uk/2016/08/23/web-contents-m-functions-and-dataset-refresh-errors-in-power-bi/), but when I changed the code to meet his workaround, the parameter is nog longer dynamic, and therefor I get x times the same page……

    Here is the code I’m using:

    et
    url = “https://auvikapi.eu2.my.auvik.com/v1/inventory/device/detail?tenants=xxxxxxxxx”,
    Source = Json.Document(Web.Contents(url)),
    iterations = Source[meta][totalPages],

    FnGetOnePage = (url) as record =>
    let
    Source = Json.Document(Web.Contents(url)),
    data = try Source[data] otherwise null,
    next = try Source[links][next] otherwise null,
    res = [Data=data, Next=next]
    in
    res,

    GeneratedList =
    List.Generate(
    ()=>[i=0, res = FnGetOnePage(url)],
    each [i]<iterations and [res][Data]null,
    each [i=[i]+1, res = FnGetOnePage([res][Next])],
    each [res][Data]),
    #”Converted to Table” = Table.FromList(GeneratedList, Splitter.SplitByNothing(), null, null, ExtraValues.Error)
    in
    #”Converted to Table”

    This is pretty much a 100% copy of your code, with one exception and that is that I get the number of pages, so that I can limit the number of iterations to this number. Other than that, it’s all your code.
    Now my altered code is as follows ( and i actually only changed the Source under FnGetonePage):

    let
    url = “https://auvikapi.eu2.my.auvik.com/v1/inventory/device/detail?tenants=xxxxxxxxx”,
    Source = Json.Document(Web.Contents(url)),
    iterations = Source[meta][totalPages],

    FnGetOnePage = (url) as record =>
    let
    Source = Json.Document(Web.Contents(
    “https://auvikapi.eu2.my.auvik.com”,
    [
    RelativePath=”v1/inventory/device/info”,
    Query=
    [
    tenants=”391134359548238589″,
    ]
    ]
    )),
    data = try Source[data] otherwise null,
    next = try Source[links][next] otherwise null,
    res = [Data=data, Next=next]
    in
    res,

    GeneratedList =
    List.Generate(
    ()=>[i=0, res = FnGetOnePage(url)],
    each [i]<iterations and [res][Data]null,
    each [i=[i]+1, res = FnGetOnePage([res][Next])],
    each [res][Data]),
    #”Converted to Table” = Table.FromList(GeneratedList, Splitter.SplitByNothing(), null, null, ExtraValues.Error)
    in
    #”Converted to Table”

    By changing the code this way I can now schedule a refresh in the Power BI Service. However, it gives me the same results for each page as the parameter isn’t used any more….
    I have been sitting on this issue since this morning and it’s now the middle of the night.. so I’m desparate for your help. How can I keep the parameter for the function while keeping the first part of the url static for the scheduled refresh?

    Really hope you can find the time to help me out

    Many thanks in advance,

    Regards,

    Rob van Zutphen

  10. Chip Umsted Reply

    Hi – I love this. I have been using on the desktop version of Powerbi to pull data from Workplace by Facebook and decided to move to a Power BI Dataflow but am getting an error – “Can’t Save dataflow. One or more entities references as dynamic data source.” when I try to save the dataflow entity. Has anyone gotten this to work on a dataflow? Here is my code.

    let
    iterations = 3, // Number of iterations
    url =
    “https://graph.facebook.com/v3.3/XXXXX/members/?fields=skills{id,name},email&limit=50&access_token=[accesstoken]”,
    FnGetOnePage =
    (url) as record =>
    let
    Source = Json.Document(Web.Contents(url)),
    data = try Source[data] otherwise null,
    next = try Source[paging][next] otherwise null,
    res = [Data=data, Next=next]
    in
    res,

    GeneratedList =
    List.Generate(
    ()=>[i=0, res = FnGetOnePage(url)],
    each [i]<iterations and [res][Data]null,
    each [i=[i]+1, res = FnGetOnePage([res][Next])],
    each [res][Data]),

    #”Converted to Table” = Table.FromList(GeneratedList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #”Expanded Column2″ = Table.ExpandListColumn(#”Converted to Table”, “Column1″),
    #”Expanded Column1″ = Table.ExpandRecordColumn(#”Expanded Column2”, “Column1”, {“email”, “id”, “skills”}, {“Column1.email”, “Column1.id”, “Column1.skills”}),
    #”Expanded Column1.posts” = Table.ExpandRecordColumn(#”Expanded Column1″, “Column1.skills”, {“data”}, {“Column1.skills.data”}),
    #”Expanded Column1.posts.data” = Table.ExpandListColumn(#”Expanded Column1.posts”, “Column1.skills.data”),
    #”Expanded Column1.posts.data1″ = Table.ExpandRecordColumn(#”Expanded Column1.posts.data”, “Column1.skills.data”, {“id”, “name”}, {“Column1.skills.data.id”, “Column1.skills.data.name”}),
    #”Renamed Columns” = Table.RenameColumns(#”Expanded Column1.posts.data1″,{{“Column1.email”, “Email”}, {“Column1.id”, “WPID”}, {“Column1.skills.data.id”, “SkillID”}, {“Column1.skills.data.name”, “SkillName”}})
    in
    #”Renamed Columns”

Leave a Reply