Beyond 1000 Messages
Today’s blog post will take you one step further with Sentiment Analysis in Power BI and will show you how easy it is to extend the solution beyond 1000 messages. Read and follow the steps of Part 1 before we proceed, or subscribe below to DataChant to receive the Power BI Template file that can be used as a starting point.
My recent article on Power BI Community blog (referred here as Part 1) demonstrated how easy it is to apply Microsoft Cognitive Services on messages, and perform Sentiment Analysis inside Power BI Desktop, and without any help from Data Scientists, or big data developers.
As Microsoft Cognitive service allows 1000 messages per API call, I decided to keep the last article short and simple to follow, and narrowed down our analysis to a single API call with the first 1000 messages.
So today we will learn how to handle thousands of messages and feed them to the Sentiment Analysis API in groups of 1000 messages. The great thing here is that we still do everything for free, as we are allowed to make 5000 API calls (5 million messages) per month.
Subscribe to DataChant and take the shortcut
[rad_rapidology_inline optin_id=”optin_5″]
If you didn’t follow all the steps in Part 1, don’t worry, you can take a shortcut and subscribe to my blog in the form above, and you’ll get the Power BI Template file of Part 1. When you get the template and open it, you will be asked to provide the Sentiment Analysis API key (You can learn in Part 1 how to get the API Key).
If you already followed Part 1, you should consider changing the sample data. Last time, we used an Excel workbook that had 1000 messages. This time you can use a larger data set. You can download it here, and change your query sources to this file.
Ready to Start?
Click Edit Queries in Power Bi Desktop (after you followed the steps in Part 1, or received the Power Bi Template, and entered the API key).
Click on the SentimentsResults query.
Delete all the steps starting from Kept First Rows. You can start by deleting the last step and proceed backwards till you delete Kept First Rows.
Click Add Index Column, in Add Column tab, and select From 0.
Divide & Conquer
Following the last step, we now have a running index. If we divide it by 1000 and get the integer part of the result, we will be able to get a running index of bulks of 1000 rows. First 1000 rows will receive a zero, second 1000 rows will receive a one, etc.
So let’s perform Integer-Divide on the index.
Select column Index and then in Transform tab, click Standard then select Integer-Divide.
In the Integer-Divide dialog, set Value to 1000 and click OK.
You will now notice that the running index is transformed as expected. We have a zero index in the first 1000 rows, a one in the second 1000 rows, a two in the third 1000 rows.
So we divided. Now let’s move to the conquering part …
Select the column Index, and click Group By in Transform tab.
In the Group By dialog, make sure Index is selected in the first drop down menu. Type Data in New column name, select All Rows in Operation, and click OK.
The last step, groups the original rows into smaller tables according to Index. You will now see in the preview pane three rows with the original Tables as cells of the column Data.
Make sure that the formula bar is visible, and notice that Power BI Desktop refers addresses each grouped table by using the function each _, which returns the result in the type of a table.
As you recall, in Part 1, we created the query function GetSentimentResults. As a reminder, here is the code that we used.
(Source as table) as any => let JsonRecords = Text.FromBinary(Json.FromValue(Source)), JsonRequest = "{""documents"": " & JsonRecords & "}", JsonContent = Text.ToBinary(JsonRequest, TextEncoding.Ascii), Response = Web.Contents("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment?", [ Headers = [#"Ocp-Apim-Subscription-Key"= "[Paste your API key here]", #"Content-Type"="application/json", Accept="application/json"], Content=JsonContent ]), JsonResponse = Json.Document(Response,1252) in JsonResponse
As you may notice, the function returns a record object, and not a table. Let’s use the function on each table that was grouped earlier.
We will change the formula:
= Table.Group(#"Integer-Divided Column", {"Index"}, {{"Data", each _, type table}})
To this new formula:
= Table.Group(#"Integer-Divided Column", {"Index"}, {{"Data", each GetSentimentResults(_), type record}})
And the changes are highlighted in this screenshot:
Now, lets expand the column Data by clicking on the highlighted expand icon. Ensure all columns are selected, mark off Use original column name as prefix, and click OK.
The preview pane will show you two new columns. Assuming we don’t get errors from the Sentiment Analysis service, you can ignore the next two steps. I am showing these steps to stress out how useful Power Query is to work and troubleshoot your data, even when the steps are only used temporarily.
Expand the column errors.
Make sure that all the errors are null. Here you may find error messages that will help you to refine your query.
After we confirmed that there are no errors in column errors, remove that column.
Note: In the future you can go again to this step and use the preview pane to find new errors. We could also duplicate the query at this stage to load errors into a separate table (But that will double the API calls, and you may not want to do it). There are also more advanced tricks to append the errors to the table, and avoid the multiple calls, but this is beyond our scope today.
Second Note: The Query Editor is quite “greedy” in the way it works to cache a preview for each transformation step. Each time you select a step or create a new one, Power Query may trigger new API calls to the service. This is true to all types of external data sources you use. But in our case, you may reach your free 5000 messages/month API calls sooner than expected.
Let’s continue. Few more steps and we are done.
After removing the column errors, and column Index, we can expand column documents.
Expand it again.
One last expand…
In the expand pane, make sure that all columns are selected and mark off Use original column name as prefix. Then click OK.
Now let’s change the type of column score to Decimal Number.
Finally, rename the columns to Sentiment Score and Post ID for better context and we can now click Close & Apply in Query Editor Home.
That’s it for today. Stay tuned for our next post we will drill down on the main visualizations we can create for the Sentiment Scores; We will create a cool Chiclet Slicer and measures for Positive, Neutral and Negative Posts, and will also apply a disconnected table to help us reaching our goal.
If you do choose to be a DataChant subscriber, you will get access to two Power BI Templates: The first template will be used as the starting point for today’s post. The second template will include today’s steps. You will also get all the cool visuals and DAX tricks that I will share next time, included in these templates.
Hey Gil,
After I group the table after I change the formula some of my records in the data column have errors. When I expand the data column to get the error column I get an error that says
“An error occurred in the ‘’ query. DataSource.Error: Web.Contents failed to get contents from ‘https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment’ (400): Bad Request
Details:
DataSourceKind=Web
DataSourcePath=https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment
Url=https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment”.
My data set isn’t huge it is just under 5000 rows. Is this a problem on my end? How do I go about fixing this?
Thank you,
Jordan
Also I tried doing the sentiment analysis with a parameter and a blank query like you did for the keyword extraction and kept bombing out when I would hit close and apply. Is it possible to do it for the sentiment?
Hi Jordan,
To prevent the first error, please ensure that column “id” has unique values (trim, lowercase and remove duplicates to ensure it). You should also remove empty values in column “text”. Hope it will help.
Regarding your second question, please elaborate. The keyword extraction and Sentiment Analysis are quite similar.
Pingback: Updates for Power BI Desktop and service and more... - Roundup #56 | Guy in a Cube
What’s wrong with the following message when I open the pbix file? How can I fix it?
CALENDARAUTO function can not find a valid date in base columns of DateTime type in the model.
Make sure that you have a date column with a date type in your main fact table.
1. I opened the file “Sentiment Analysis Sample (Handling thousands of messages).pbix”, all the charts in the report view were pending fix due to CalendarTable malfunction – I guessed.
2. I went back to data view then found except the column heads none of the contents of all tables were visible.
3. I deleted all the tables then run Get Data from “Facebook Reactions Data Sample.xlsx” but that excel file only have 998 posts – unable to do Integer-Divide by 1000.
4. I tried to use a larger data set as you guided but the file was not downloadable even I clicked the link several times.
Were there something I did wrong? Please advise. Thanks.
Hi Julian,
Could you download the file again? You can find the link (second resource) in this page: https://datachant.com/subscriber-corner/subscribers-resources-facebook-sentiment-analysis-samples/
After you paste the API key and click Load. Please ignore the errors, and click Refresh again. This should resolve the issue.
When clicking Refresh I was asked to Access web content with credentials check.
Please select Anonymous.
Yes, I did – it’s the first option but I got the error message saying “Access to the resource is forbidden”.
I am sorry for your experience. I have tried to reproduce this error, but couldn’t. Can you clear the permissions of the relevant data sources, and try to refresh again?
Clearing the permissions didn’t work either. However, in order not to bother you too much, I think I’d better let it go for now. Anyway, thank you very much for the great support.
I am sorry it doesn’t work for you. The last recommendation I can think of is to retry again from scratch, and try refreshing from the Query Editor instead of the main dialog. I had similar experiences in the past when working with templates. Usually after few attempts it was resolved.
I get this error when I Generate the sentiment analysis in the moment to invoke the function. “Formula. Firewall: Query “All Candidates(2) (Step ‘Invoke Custom Function’) references other queries or steps, so it may not directly access a data sorce. Please rebuild this data combination”.
Hi Marco. Please set Power BI to ignore privacy levels. Learn how here: https://powerbi.microsoft.com/en-us/documentation/powerbi-desktop-privacy-levels/
HI Gil
I have been through the instructions and even watching your YouTube video on “Facebook & Sentiment Analysis By Gil Raviv,” I can’t get this to work past the step after grouping into 1000 and then applying the invoke custom function I get errors:
An error occurred in the ‘GetSentimentResults’ query. DataSource.Error: Web.Contents failed to get contents from ‘https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment’ (400): Bad Request
Details:
DataSourceKind=Web
DataSourcePath=https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment
Url=https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment
I have cleaned, trimmed lower case every possible option and it still occurs!
It seems it is not only me having an issue with this. Its there a problem with Power BI? If not how is this fixed?
Cheer
Steven
Did you remove empty values and remove duplicate IDs?
WOW, I must have done something because it worked! It must have been empty values, as I removed duplicate. Thanks for the tip !!
Thanks very much for this useful post.
When I change the formula after grouping rows to add in GetSentimentResults I get the following error message:
“Formula.Firewall: Query ‘Sentiment results’ (step ‘Grouped Rows’) references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.”
Any ideas what went wrong?
Thank you Letitia,
The solution is to set Privacy in File -> Options to ignore privacy levels. Then click Refresh.
Hi Gil, thanks for this post.
When I try to change the formula after grouping rows into a table based on the index I get the following error:
“Formula.Firewall: Query ‘Sentiment results’ (step ‘Grouped Rows’) references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.”
Do you have any advice?
Thanks,
Hi again, I removed a set of brackets { } which solved the formula firewall error (none of the privacy settings worked for me), but now I’m unable to authenticate credentials (set to anonymous)
Thank you Letitia, I suggest you return the brackets. Not sure which brackets you refer to, but these are probably needed.
The solution to your previous issue is to set Privacy in File -> Options to ignore privacy levels. Then click Refresh.
Hi I get the following error after entering the formula:
An error occurred in the ‘’ query. DataSource.Error: Web.Contents failed to get contents from ‘https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment’ (413): Request Entity Too Large
Details:
DataSourceKind=Web
DataSourcePath=https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment
Url=https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment
Any tips for how to fix? Thanks
This error may occur when the dataset is beyond 1000 text messages. If you follow the steps including the grouping, you should not get his error.
I am following all the steps but all the results are errors. If anyone has a chance to look into my files, the PBIX and datasource are located here:
https://drive.google.com/file/d/0B1feyxv3WPLBRDBRMUx3SE1YRTg/view?usp=sharing
https://drive.google.com/file/d/0B1feyxv3WPLBM0phRXBEeEluT2c/view?usp=sharing
Hi Tim,
I was able to use your report and get the sentiment results for the first 3500 comments. Did you use the APIkey in GentSentimentResults? I suspect that your key is invalid, or that you have used all your quota.
It finally worked after changing credentials to anonymous. I think its critical to do that. Thanks.
Works like a charm. What I normally do is to slice the integer to 500 to see problematic rows. I also make it a point that I do a keep top values so no ‘null’ happens at the end of the dataset.
Hello, this post is amazing, thank you. One question on a production report I created there are new comments that need to be sent to Azure for processing. My question is does every time I hit refresh on the report and power query goes through the steps, am I sending the same old comments plus new ones to be analyzed and so am I reusing API calls on the same messages every time the report refreshes?
Or does Azure know that messages with the same ID column have been processed before and therefore they don’t get analyzed again? Hope this question makes sense..
Thank you for your posts
Thank you Juan. I am glad that you find this article helpful. You are correct. Every refresh will apply the API on all the dataset. Incremental refresh may be supported by Power BI in the future (As part of Power BI Premium). So hopefully, we will be able to implement your scenario.
Thank you Gil.
To make this a viable solution for large datasets, we not only need incremental data loads but also queries should run only once and not twice when refreshing a table.
Since there is no isolation of queries, API calls are repeated when we have dependencies. For example, if I first use language detection on the text and reference those results to do sentiment analysis, the calls will happen again.
Thanks it Works
Is there a maximum character length for each message which is sent for analysis?
I’m getting this error:
An error occurred in the ‘’ query. DataSource.Error: Web.Contents failed to get contents from ‘https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment’ (413): Request Entity Too Large
Details:
DataSourceKind=Web
DataSourcePath=https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment
Url=https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment
BUT, my groupings are fine (no group has more than 1000 messages in it). I’m analysing product reviews so some messages are quite long.
I think that the maximal length is 10240 characters, but last time I checked you should get a valid response to the 1000 messages, and only the specific long messages will be omitted from the results (and included in the errors field).
So after running some tests on my data set, it appears that the query fails on my dataset when more than 600 characters are sent in a single message. This seems quite limiting for key phase extraction and sentiment analysis. The documentation states that messages can be up to 10kb in size which seems to be ~10,000 characters. Any ideas why there is such a gulf in terms of what the documentation states and what’s working in Power BI?
I’ve found a solution – it must have been throwing the 413 error based on the TOTAL size of the 1000 records which were being sent for analysis. I changed the grouping to 100 instead of 1000 and it worked perfectly. Just posting this up in case it helps others with the same issue. In my example each comment was around 2-3kb so the 1000 record stack was too large (over the 1mb limit).
How to pass more than 1 parameter, to the corrected = Table.Group(#”Inserted Integer-Division”, {“Integer-Division”}, {{“Data”, each Stocks(_), type table}})
Expression.Error: 1 arguments were passed to a function which expects 2.
Details:
Pattern=
Arguments=List
Hi Pato, Not sure I understand. Do you want to group by more columns than just the Integer Division? You can use the UI Group dialog box to do it.
Hi Gil, I have a function that pases 2 parameters. In your example you have only 1. So how will you modify that step = Table.Group(#”Inserted Integer-Division”, {“Integer-Division”}, {{“Data”, each Stocks(_), type record}})
I can send you my book, if you want an extra twist to your post. 🙂
Another question, how to follow up this thread? Can I get a message, when someone replies or add comments?
hello, i just picked up sentiment analysis in power bi and i’ve been following your instruction on how its done but everytime i get to changing the fx formular and hitting enter i get an error, “we couldn’t authenticate with the credentials provided.please try again. i’ve tried multiple times but i’m stuck. i even changed privacy setting at a point but im still stuck. please help.
also to clarify my access web content is on anonymous but it still doesn’t authenticate it. im just stuck there, ive checked my privacy, everything seems good. pls help
Hi Tutu
Please confirm that your Text Analytics resource is still active (perhaps you reached your quota limit). Also ensure you use the right datacenter in the URL of the sentiment analysis M function.
hello, my text analytics resources is still active and i checked the M function multiple times, its using the right data center in the url
Try resetting the data source, and refresh. If it doesn’t work, I recommend you contact Microsoft support.
Hi, I managed to make it work with PowerBI Desktop, but when I publish the report to the PowerBI Web Service, I can’t authenticate the text analytics endpoints for my dataset to refresh (using anonymous, considering that the actual authentication -key- is done in the code)…does that means text analytics only refreshes the dataset in Desktop but not in the web service? I keep getting a “404:resource not found” on the web service, but the endpoint works perfectly in Desktop.
You are right. These techniques are currently not refreshable on the service.
Hi there, just subscribed to your blog today and i am yet to receive details on where i can download the files from. Could you please send me a copy. Thanks
Hi Satianv, Thank you for subscribing. You should receive an email with the details. Check your spam folder as well. If you cannot find it, please contact me at gilra@datachant.com
Hello, Gil. Thank you very much for this blog. It looks useful in solving a problem I have been facing for a while now. However, I am stuck at a certain point in your tutorial.
I am trying to develop a stock portfolio that will track a customised list of stocks. I will confess I am not an M or Power Query expert, though I have some experience with Power BI.
Long story short, here’s the error I get when I replace the text after creating a group on the index:
Table.Group(#”Renamed Columns”, {“Index”}, {{“Data”, each getPriceMetrics(_), type record}})
Error:
An error occurred in the ‘getPriceMetrics’ query. Expression.Error: We cannot apply operator & to types Text and Table.
Details:
Operator=&
Left=https://finnhub.io/api/v1/stock/metric?metric=price&token=XXX&symbol=
Right=[Table]
The ‘getPriceMetrics’ is a function that I am invoked using a parameter called ‘Sym’:
(Sym as any) => let
Source = Json.Document(Web.Contents(“https://finnhub.io/api/v1/stock/metric?metric=price&token=bpcur3frh5rfp0uqilj0&symbol=” & Sym))
in
Source
The parameter ‘Sym’ is based on a “Query” that contains a list of stock symbols.
I would appreciate it if you can offer any insight in how to supply a list argument to the function call when using the ‘each’ keyword. Thanks again for the wonderful post!
Hi Sikander,
When you apply the Table.Group function it returns a table object with the subset of the grouped rows by the matching criteria. You feed the getPriceMetrics function with that table instead of a single symbol. You would need to add a custom column for each row with a symbol and apply the function on the symbol.
Thank you for your reply Gil. For now, I have solved this by adding in a “3-second wait duration” using Function.InvokeAfter as in:
Table.AddColumn(#”Changed Type”, “PriceMetrics”,
each Function.InvokeAfter(()=>getPriceMetrics([Symbol]),#duration(0,0,0,3)))
I will also try what you have suggested and let you know how it goes!