Use DAX and Slicers to Define Thresholds for Sentiment Analysis

Use DAX and Slicers to define thresholds for Negative & Positive Sentiment

Sentiment Analysis in Power BI – Part 3

This is the third part of the Sentiment Analysis series. In the first part we learnt how apply Sentiment Analysis in Power BI without any server side code, or help from IT or Data Scientists. In the second part we refined the solution to support large content beyond 1000 messages.

In today’s post, I would like to focus on a different important aspect of our Sentiment Analysis implementation. According to Microsoft here, the “Text Analytics service returns a score between 0 and 1 denoting overall sentiment in the input text. Scores close to 1 indicate positive sentiment, while scores close to 0 indicate negative sentiment.”

So how should we treat a score of 0.65, or a score of 0.35? Is a score of 0.65 close enough to 1 to be considered as a positive sentiment?  How should we translate the numerical values into textual values: Negative, Neutral and Positive?

This is our main topic for today. We will use DAX and Slicers to give the analyst control in deciding the questions above. Moreover, the techniques we’ll describe today can be generalized and applied on any report where you need to translate numerical values into textual buckets (e.g. Low, Medium, High).

Our tutorial will start with static thresholds that are defined on the query level, and at the last part we will focus on dynamic thresholds that can be changed from slicers, as seen here:

Subscribe to DataChant to get the Power BI Desktop file used in this tutorial.

Subscribe ToDataChant

Join our mailing list to receive exclusive content that was used in the preparation of this article.

Thank you for subscribing to DataChant. An email will be sent to you shortly with more details.

Static Thresholds & Queries

In this section we will focus on static thresholds that will help us to translate between the decimal number to the Positive, Neutral & Negative sentiment ranks. I use the term “static”, because we will hard-code the ranges of the different sentiment ranks.

For example, we will define the range between 0 to 0.3 to define Negative sentiment; 0.3 to 0.7 to define Neutral sentiment, and 0.7 to 1 to define Positive sentiment (Ignore the equal or greater-than nuances for now, we will handle it later in the DAX formulas).

Our first step in this tutorial is to create a static disconnected table in the Data Model.

Let’s Start 🙂

Open the Power BI Desktop file that you created in Part 1 or Part 2 of this series.

Click Enter Data in Power BI Desktop Home tab.

Enter Data button in Power Bi Desktop

Add the columns Rank, Min Score and Order with the following rows (as seen in the screenshot below). When you are done, name the table as Sentiment Types and click EditScreenshot_22

Change the type of Min Score to Decimal Number.

Screenshot_4 - Copy

If you are already subscribed to DataChant, and use the Power BI Desktop file that I shared, the query above is already defined in the file Sentiment Analysis Part 3.pbit.Screenshot_5

Next step is to create a connected table that will be used as a data source for the Chiclet Slicer. Create a new table with the following content. When you are done click Load (or OK, if you clicked Enter Data from the Query Editor).Screenshot_7

Name the query of the table above as Sentiment Images, and click Close & Apply to load the tables to the report.

DAX Measures & a Calculated Column

Add a measure in the table Sentiment Results to measure the Sentiment average score.

Tip: It sounds trivial, and perhaps unnecessary, as you may be used to drag and drop numerical columns to the Values pane and set them as a calculate average, but explicit measures will make your life easier and your DAX more reusable, so avoid using implicit ones (implicit = you drag and drop a column to Values which triggers an implicit measure by Power Pivot).

Screenshot_14

Next, let’s add a measure under Posts to count the number of Posts.

And now let’s focus on the interesting measures –

Select the table Sentiment Types  and add the following new measures:

Screenshot_8 - Copy

The measures above extract the thresholds 0.3 and 0.7 from the table Sentiment Types.

Next step is to create a Calculated Column (not a measure) in the table Sentiment Results. The new column will contain the sentiment rank.

Screenshot_10

The advantage of the calculated column is clear. We can perform a row level calculation and convert the decimal number in Sentiment Score into its textual rank, as we defined in the table above.

Screenshot_23

Important Note: Calculated Columns are updated during Refresh. The ranks we applied here, can be updated after you change the thresholds and refresh the Data Model. But calculated columns will not help us in the “dynamic” approach we will describe later on (We don’t want the user to click Refresh every time he changes the threshold).

Relationships & Disconnected Slicer

The next step is to create the correct relationship, and to avoid the wrong ones (A good tip for life, isn’t it?).

If you try to connect Sentiment Types into Sentiment Results, you will get the following error message:

Screenshot_24

So let’s keep this table disconnected, and instead, create a relationship between Sentiment Images and Sentiment Results.

Screenshot_11 - Copy

Now we can go back to the Report view, and define a Chiclet Slicer as seen in this screenshot.

Add a Chiclet Slicer to your report, Drag and Drop Rank from table Sentiment Images to Category, URL to Image, and Sentiment Avg from table Sentiment Results to Value.   Screenshot_13To ensure that our slicer is sorted correctly (and not by alphabetic order), go to Data view, select the column Rank, click Sort By Column and select Order.

Sort By Column

In the next step we will build a visual that take benefit of our static thresholds.

We will count the number of posts by Negative, Neutral and Positive sentiment.Screenshot_15

Add the Clustered Bar Chart. Drag and drop Candidate column from Candidates table to Legend, Total Posts from Posts to Value and Sentiment Rank to Axis.

Screenshot_16 - Copy

That’s  it. We were able to convert the numerical values between 0 to 1 into textual sentiment ranks, and apply new measures that take the ranks into account. While this approach is relatively straightforward, it has its limitations, and require “hard-coded” update to the thresholds table to change the definition of positive, neutral and negative sentiments. So let’s try a different approach.

Dynamic Thresholds

Wouldn’t it be awesome if our analyst can use slicers to define the numeric ranges for positive, neutral & negative sentiments?

We will now use DAX and disconnected slicers to dynamically define the thresholds for the three sentiment ranks. You can open this report on Power BI, change the thresholds to the right and see how they affect the results.

Subscribe to DataChant to get this Power Bi Desktop file
Subscribe to DataChant to get this Power Bi Desktop file

We will start by adding two new tables to feed the slicers above.

The first table defines the decimal number options for the threshold between Negative and Neutral sentiments. You can use a blank query with the following Power Query expression (M) statement:

The second table defines the decimal number options for the threshold between Neutral and Positive sentiment. You can use a blank query with the following Power Query expression (M) statement:

The next step is to create the following two measures in the table Sentiment Types:

The measure above reads the maximal number that is selected in the Negative Threshold slicer. If you select 0.2, 0.3 and 0.4. We will take 0.4 as the maximal value which should be used as the threshold between Negative & Neutral.

The measure above reads the minimal number that is selected in the Positive Threshold slicer. If you select 0.9, 0.8 and 0.7. We will take 0.7 as the minimal value which should be used as the threshold between Neutral & Positive.

Finally, let’s create three measures to count the number of Negative, Neutral & Positive posts:

We can now move to the visuals, and create two Chiclet Slicers for the Negative Threshold and Positive Threshold tables:

Screenshot_26

Now, we will create a 100% stacked column chart, with Candidate in Axis and the three new measures in Value.

Screenshot_27 Screenshot_28

And finally, two area charts with Date in Axis, Candidate in Legend and Positive Posts in Value of the first chart, and Negative Posts in the Value of the second chart.Screenshot_29

That’s it. I am sure you will find both the static and dynamic approaches useful in many scenarios where you need to translate numerical values into textual buckets.

I hope you enjoyed this tutorial.