Beyond 1000 Messages

Today’s blog post will take you one step further with Sentiment Analysis in Power BI and will show you how easy it is to extend the solution beyond 1000 messages. Read and follow the steps of Part 1 before we proceed, or subscribe below to DataChant to receive the Power BI Template file that can be used as a starting point.

My recent article on Power BI Community blog (referred here as Part 1) demonstrated how easy it is to apply Microsoft Cognitive Services on messages, and perform Sentiment Analysis inside Power BI Desktop, and without any help from Data Scientists, or big data developers.

As Microsoft Cognitive service allows 1000 messages per API call, I decided to keep the last article short and simple to follow, and narrowed down our analysis to a single API call with the first 1000 messages.

So today we will learn how to handle thousands of messages and feed them to the Sentiment Analysis API in groups of 1000 messages. The great thing here is that we still do everything for free, as we are allowed to make 5000 API calls (5 million messages) per month.

Subscribe to DataChant and take the shortcut

If you didn’t follow all the steps in Part 1, don’t worry, you can take a shortcut and subscribe to my blog in the form above, and you’ll get the Power BI Template file of Part 1. When you get the template and open it, you will be asked to provide the Sentiment Analysis API key (You can learn in Part 1 how to get the API Key).

If you already followed Part 1, you should consider changing the sample data. Last time, we used an Excel workbook that had 1000 messages. This time you can use a larger data set. You can download it here, and change your query sources to this file.

Ready to Start?

Click Edit Queries in Power Bi Desktop (after you followed the steps in Part 1, or received the Power Bi Template, and entered the API key).

Click on the SentimentsResults query.

Screenshot_26

Delete all the steps starting from Kept First Rows. You can start by deleting the last step and proceed backwards till you delete Kept First Rows.

Screenshot_27

Click Add Index Column, in Add Column tab, and select From 0.

Screenshot_30

Divide & Conquer

Following the last step, we now have a running index. If we divide it by 1000 and get the integer part of the result, we will be able to get a running index of bulks of 1000 rows. First 1000 rows will receive a zero, second 1000 rows will receive a one, etc.

So let’s perform Integer-Divide on the index.

Select column Index and then in Transform tab, click Standard then select Integer-Divide.Screenshot_32

In the Integer-Divide dialog, set Value to 1000 and click OK.

Screenshot_33

You will now notice that the running index is transformed as expected. We have a zero index in the first 1000 rows,  a one in the second 1000 rows, a two in the third 1000 rows.Screenshot_34

So we divided. Now let’s move to the conquering part …

Select the column Index, and click Group By in Transform tab.

Screenshot_35

In the Group By dialog, make sure Index is selected in the first drop down menu. Type Data in New column name, select All Rows in Operation, and click OK.Screenshot_36

The last step, groups the original rows into smaller tables according to Index. You will now see in the preview pane three rows with the original Tables as cells of the column Data.

Make sure that the formula bar is visible, and notice that Power BI Desktop refers addresses each grouped table by using the function each _, which returns the result in the type of a table.

Screenshot_37

As you recall, in Part 1, we created the query function GetSentimentResults. As a reminder, here is the code that we used.

(Source as table) as any =>
let
    JsonRecords = Text.FromBinary(Json.FromValue(Source)),
    JsonRequest = "{""documents"": " & JsonRecords & "}",

    JsonContent = Text.ToBinary(JsonRequest, TextEncoding.Ascii),
    Response =
        Web.Contents("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment?",
            [
                Headers = [#"Ocp-Apim-Subscription-Key"= "[Paste your API key here]",
                           #"Content-Type"="application/json", Accept="application/json"],
                Content=JsonContent 
            ]),
    JsonResponse = Json.Document(Response,1252)
in
    JsonResponse

As you may notice, the function returns a record object, and not a table. Let’s use the function on each table that was grouped earlier.

We will change the formula:

= Table.Group(#"Integer-Divided Column", {"Index"}, {{"Data", each _, type table}})

To this new formula:

= Table.Group(#"Integer-Divided Column", {"Index"}, {{"Data", each GetSentimentResults(_), type record}})

And the changes are highlighted in this screenshot:

Screenshot_41

Now, lets expand the column Data by clicking on the highlighted expand icon. Ensure all columns are selected, mark off Use original column name as prefix, and click OK.

Screenshot_42

The preview pane will show you two new columns. Assuming we don’t get errors from the Sentiment Analysis service, you can ignore the next two steps. I am showing these steps to stress out how useful Power Query is to work and troubleshoot your data, even when the steps are only used temporarily.

Expand the column errors.

Screenshot_43

Make sure that all the errors are null. Here you may find error messages that will help you to refine your query.

Screenshot_44

After we confirmed that there are no errors in column errors, remove that column.

Note: In the future you can go again to this step and use the preview pane to find new errors. We could also duplicate the query at this stage to load errors into a separate table (But that will double the API calls, and you may not want to do it). There are also more advanced tricks to append the errors to the table, and avoid the multiple calls, but this is beyond our scope today.

Second Note: The Query Editor is quite “greedy” in the way it works to cache a preview for each transformation step. Each time you select a step or create a new one, Power Query may trigger new API calls to the service. This is true to all types of external data sources you use. But in our case, you may reach your free 5000 messages/month API calls sooner than expected.

Let’s continue. Few more steps and we are done.

After removing the column errors, and column Index, we can expand column documents.

Screenshot_45

Expand it again.

Screenshot_46

One last expand…

In the expand pane, make sure that all columns are selected and mark off Use original column name as prefix. Then click OK.

Screenshot_47

Now let’s change the type of column score to Decimal Number.

Screenshot_48

Finally, rename the columns to Sentiment Score and Post ID for better context and we can now click Close & Apply in Query Editor Home.

Screenshot_2

That’s it for today. Stay tuned for our next post we will drill down on the main visualizations we can create for the Sentiment Scores; We will create a cool Chiclet Slicer and measures for Positive, Neutral and Negative Posts, and will also apply a disconnected table to help us reaching our goal.

If you do choose to be a DataChant subscriber, you will get access to two Power BI Templates: The first template will be used as the starting point for today’s post. The second template will include today’s steps. You will also get all the cool visuals and DAX tricks that I will share next time, included in these templates.

Screenshot_3

Screenshot of the template you will receive as a subscriber