Are You Ready for the Debate?
Today is the US Presidential debate. It is also a perfect opportunity to hone our analytical skills and answer some of the most intriguing questions of this unprecedented race. According to CNN (here) we should “expect Clinton vs. Trump to be an Olympic battle of wits”. Well, whether we will get wits or just plain good mudslinging, one thing is certain, we will all get the “Bread and Circuses”.
During this race, both campaigns used negative messaging against their opponent. Did you ever wonder who strikes harder, and how it affects the audience? For the sake of learning new techniques in Excel and Power BI, today we will try to answer the following questions by looking into the candidates’ Facebook posts:
- Who mention the opponent the most? What is the rate of opponent mentions?
- Does the opponent mention rate change over time?
- Do people share more posts when the opponent is mentioned?
- Do people share more posts that have negative sentiment?
- Does the sentiment is more negative in posts in which the opponent is mentioned?
The method
We will use Excel 2016 Get & Transform (aka Power Query in Excel 2013 and 2010), and import Donald Trump’s and Hillary Clinton’s Facebook posts during 2016 (a total of ~4000 posts), including the number of shares each post received. Then we will split the posts of each candidate into two groups: Posts with mentions of the opponent, and posts without such mentions. Finally, we will apply sentiment analysis on the posts using Microsoft Cognitive Services, and answer the questions above. To read previous articles on Sentiment Analysis in Power BI (and Excel) click here.
Sneak Peek on the Results
Before we start the tutorial, let’s jump straight to the results:
Opponent Mention Rate – Who mentions his opponent the most?
~19% of the candidates’ Facebook posts include mentions of the opponent’s name. While Trump’s opponent mention rate is higher (19.40%) than Clinton’s (18.95%), starting from July 2016, Clinton beats Trump in the rate of opponent mentions (Check out the September Rates, where Clinton’s rate is 100% more than Trump’s rate).
Candidate | Opponent Mentions / Posts | Opponent Mentions | Posts |
Clinton | 18.95 % | 388 | 2,048 |
Trump | 19.40 % | 381 | 1,964 |
Grand Total | 19.17 % | 769 | 4,012 |
Who is more negative in sentiment?
While Trump’s Facebook posts are slightly more positive in sentiment than Clinton’s posts (0.75 in comparison to 0.67), both posts are relatively positive (The highest positive sentiment score is 1). However, when both candidates mention the opponent’s name, their sentiment score decreases. Trump’s sentiment decreases to 0.56 and Clinton’s sentiment decreases to 0.57.
While both candidates are in a tie in sentiment when they mention each other, it is clear from the chart above that Trump’s positive sentiment significantly increases when he don’t mention Clinton.
Another interesting finding is the sentiment variance. Trump’s sentiment has higher fluctuations when he mentions Clinton, with a variance of 0.10, while Clinton maintain the same variance of 0.7 in sentiment, no matter if her opponent is mentioned or not.
People share more posts when the opponent is mentioned
Since Trump has a significantly higher number of Facebook fans (~10M fans while Clinton has only ~4M) it is trivial that more people share his posts. Nevertheless, it is amazing to see evidence that the fans from both camps are much more engaged when the opponent is mentioned. Clinton doubles up the number of her Facebook shares, when she mentions Trump (Her average shares rise from 3,021 to 6,527), and Trump increases his average Facebook shares from 9,177 to 14,701.
People share more posts whose sentiment is negative, and one camp is leading
From the previous chart, we can conclude that people are more engaged when they consume negative campaigns. The following chart shows that people are more engaged when they face negative sentiment messages. Clinton’s Shares increase by 17% when her posts have negative sentiment while Trump’s shares increase by 49% when his posts have negative sentiment.
Another interesting findings is that Clinton has the lowest average of Facebook shares when her posts are neutral in sentiment, while Trump’s shares grow linearly as the sentiment gets more and more negative.
Download the Excel file
Would you like to get the Excel file that imports Trump’s and Clinton’s Facebook posts, apply the sentiment analysis and includes the results above? Subscribe to DataChant to get the workbook, or follow the tutorial below.
[rad_rapidology_inline optin_id=optin_5]
The tutorial
We’ll start with Excel 2016 (You can follow the steps with Power BI Desktop or Excel 2010/2013 once Power Query Add-in is installed).
In Data tab, Get & Transform section, click New Query. Select From Online Services, and click From Facebook.
In the Facebook dialog, set donaldtrump as the Object ID, and click OK.
If this is the first time you use the Facebook connector, you will be prompted to provide your Facebook credentials to login and approve Excel/Power BI as a Facebook app. I will skip this step, as its flow is quite trivial.
Make sure you have the formula bar visible in the Query Editor (There is a check box in the View tab that will help you to show this bar). In the formula bar, change the version number from 2.2 to 2.7 and add the suffix that is highlighted below:
= Facebook.Graph("https://graph.facebook.com/v2.7/donaldtrump/posts?limit=100&fields=message,created_time,shares")
Delete column object_link.
Expand the column shares (by clicking the small icon in its header). Uncheck Use original column name as prefix, and click OK.
Change the type of column count to Whole Number by clicking the left icon in the header, and selecting Whole Number in the drop down menu.
Rename the column count to shares.
Change the type of column created_time to Date/Time by clicking the left icon in the header, and selecting Date/Time in the drop down menu.
Right click on the header of column message, and select Duplicate Column.
Rename the new column to text (you will find the new column as the last column to the right).
Right click on the column text, and select Transform in the drop down menu, then click lowercase.
Click on the filter button in the header of column message, and click Remove Empty.
In the Add Column tab, click Add Custom Column.
Set is opponent mentioned as New column name, and paste the following formula into the formula box.
= if Text.Contains([text], "clinton") or Text.Contains([text], "hillary") then 1 else 0
When you are done click OK.
As a result of the last step, on each row that contains a message that includes a mention of Clinton, you will find the number 1 in the column is opponent mentioned (check out the highlighted lines below). We will later use this number to count the number of opponent mentions.
Let’s add another column with the text “Trump” in all the cells.
In Add Column tab, click Add Custom Column, set candidate as New column name, and set the following formula in the formula bar.
= "Trump"
When you are done, click OK.
Now we will rename the query to Trump (To rename it, type the new name in the Query Settins pane, as shown in this screenshot:
It’s Clinton’s Turn
We will now duplicate the query Trump and create a query for Clinton. Open the Queries pane, right click on the query Trump and select Duplicate.
Rename the query to Clinton.
Go to the first step in APPLIED STEPS pane, and change the text in the formula bar from donaldtrump to hillaryclinton.
Find the first Add Custom step, and click on the cog wheel icon to open the Add Custom Column dialog and edit it.
= if Text.Contains([text], "trump") or Text.Contains([text], "donald") then 1 else 0
Since Clinton usually don’t mention Trump by his first name only, you can simplify the formula:
= if Text.Contains([text], "trump") then 1 else 0
When you are done click OK.
Now, edit the last step in query Clinton, and change the formula in the Add Custom Column from:
= "Trump"
to:
= "Clinton"
Appending the queries together
Go back to the Queries pane, right click on query Trump, and select Reference.
Change the new query to Posts.
In Home tab, click Append Queries.
Select Two tables in Append dialog, then select Clinton in the drop down menu, and click OK.
Convert the type of is opponent mentioned to Whole Number.
Click the drop down menu under the Close & Load icon, and select Close & Load To.
Select Only Create Connection in the Load To dialog, and click Load.
Sentiment Analysis
The next part will not be new to you if you read my previous blog posts on Sentiment Analysis, but I’ve never written it explicitly for Excel, so here are the next steps to apply Sentiment Analysis (To follow a more detailed tutorial go here).
The Sentiment Analysis is performed with Microsoft Cognitive Services API. You can get a new API key here.
In the Data tab, click New Query, From Other Sources and then Blank Query.
Paste the API key in the formula bar and rename the query to API_Key.
Create a new blank query (This time it is easier to do it from the Query Editor –> New Source –> Other Sources –> Blank Query).
Click Advanced Query in Home tab, and paste the following formula:
(Source as table) as any => let SelectColumns = Table.SelectColumns(Source,{"text", "id"}), ToRecords = Table.ToRecords(SelectColumns), JsonRecords = Text.FromBinary(Json.FromValue(ToRecords)), jsonRequest = "{""documents"": " & JsonRecords & "}", content = Text.ToBinary(jsonRequest, TextEncoding.Ascii), response = Web.Contents("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment?", [ Headers = [#"Ocp-Apim-Subscription-Key"= API_Key, #"Content-Type"="application/json", Accept="application/json"], Content=content ]), res = Json.Document(response,1252) in res
When you are done click Done.
Rename the new query to FnGetSentiment.
Create a new blank query and paste the following formula in the Advanced Editor:
let Source = Posts, #"Split Column by Position" = Table.SplitColumn(Source,"message",Splitter.SplitTextByPositions({0, 10000}, false),{"text.1", "text.2"}), #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Position",{{"text.1", type text}, {"text.2", type text}}), #"Removed Columns1" = Table.RemoveColumns(#"Changed Type1",{"text.2"}), #"Renamed Columns1" = Table.RenameColumns(#"Removed Columns1",{{"text.1", "text"}}), #"Added Index" = Table.AddIndexColumn(#"Renamed Columns1", "Index", 0, 1), #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each Number.IntegerDivide([Index], 1000)), #"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"Index"}), #"Grouped Rows" = Table.Group(#"Removed Columns2", {"Custom"}, {{"Data", each _, type table}}), #"Removed Columns3" = Table.RemoveColumns(#"Grouped Rows",{"Custom"}), #"Added Custom1" = Table.AddColumn(#"Removed Columns3", "Custom", each FnGetSentiment([Data])), #"Removed Columns4" = Table.RemoveColumns(#"Added Custom1",{"Data"}), #"Expanded Custom" = Table.ExpandRecordColumn(#"Removed Columns4", "Custom", {"documents"}, {"Custom.documents"}), #"Expanded Custom.documents" = Table.ExpandListColumn(#"Expanded Custom", "Custom.documents"), #"Expanded Custom.documents1" = Table.ExpandRecordColumn(#"Expanded Custom.documents", "Custom.documents", {"score", "id"}, {"score", "id"}), #"Changed Type2" = Table.TransformColumnTypes(#"Expanded Custom.documents1",{{"score", type number}}) in #"Changed Type2"
When you are done, click Done, and rename to the new query to Sentiment.
We are almost done. We now have a new table with all the post IDs and the Sentiment Analysis score (between 0 = most negative, and 1 = most positive).
We can now load the Posts and Sentiment to Excel or to the Data Model in Excel and start performing the analysis.
You can subscribe below to DataChant to get the actual Excel workbook that I have built, or follow my previous Sentiment Analysis blog posts that explain the steps we have done to create the query Sentiment.
[rad_rapidology_inline optin_id=optin_5]
Few tips before we finish. Since there is a 1:1 relationship between Posts and the Sentiment, I recommend merging the two tables in Power Query before you build your measures. I may write a follow up with more details on the Power Pivot’s part.
I hope you enjoyed this post. We are now prepared for the debate 🙂
Pingback: The Power BI Version: Clinton vs Trump – Opponent-Mentions and Sentiment Analysis - DataChant