Presidential Word Cloud in Power BI – Part 2

As promised, today we will continue working on our Presidential Word Cloud that we have started here, and will focus on two important and useful techniques which are not only relevant for Facebook analysis in Power BI:

  1. We will learn how to extract words from textual columns without splitting the column to many columns.
  2. By splitting messages to single words, our Word Cloud visual in Power BI will work better, we will be able to resize it, and drill down from a single word to its original message.

Oh, and the relation to Ghostbusters will be revealed at the end.

To recap our current challenge – In our last tutorial we have created a Word Cloud which is not resizable.

Screenshot_1

To fix this limitation, we will need to create a new column in our query for the individual words, and feed this column to the Word Cloud visual.

Let’s start.

Right click on column message and click Duplicate Column.

Screenshot_2

Rename the new column to word, right click on the header, and select Transform –> lowercase. This step will lowercase all the messages and treat words like “They” and “they” as the same word in the visualization.Screenshot_3

Now let’s replace all line feeds with spaces. This step will help us to address text such as this one:

Hello world!

How are you?

And we will convert it to:

Hello world! How are you?

Right click on the column word and select Replace Values…Screenshot_27

In the Replace Values dialog, check the box Replace using special characters, and select Line Feed.Screenshot_28

Notice that the text #(lf) was added in the first edit box. #(lf) is the special characters that represent line feed in Power Query. You can use it in the formula bar whenever you want to perform string manipulations that includes line feed.

Enter space character in the Replace With, and click OK.Screenshot_29

Next step is a bit more advanced. It will allow you to split the column word into separate words, without getting into the risky Column operation of Splitting a single column into multiple ones (I explained the risk of splitting columns here).

We start by creating a list of delimiters.

Click Enter Data.

Screenshot_4

In Create Table dialog, enter delimiter as a header, and then on each cell enter different delimiters such as: space, comma, semicolon, colon etc.

Name the table as Delimiters, and click OK.Screenshot_5

Right click on the column delimiter and click Drill Down.Screenshot_9

You will see that your table was transformed into a list. We will use this list in the next steps.

Screenshot_10

Note: If you are an advanced Power Query user, you may prefer using a query such as this one, instead of manually entering the delimiters into a table. Then it may be easier for you to manipulate the list when you need to add new delimiters.

={” “, “,”, “.”, “?”, “!”}

But for basic users, it is easier to manage your delimiters through the Create Table UI as we have done above. You can go back to that list, and modify it by clicking on the cog wheel icon (highlighted below) in the Source step.

Screenshot_26

Now, let’s go back to Query1, right click on the column header word, and select Transform –> lowercase again. We perform this step as a foundation for the split operation with the list of delimiters we have just created. This step has no other significance (Who will perform double lowercase operations on the same data, right?)Screenshot_7

Now, mark the highlighted text in the formula bar and replace Text.Lower with:

Splitter.SplitTextByAnyDelimiter(Delimiters)

Screenshot_8

Here is a screenshot of the modified formula:Screenshot_11

And here is where the magic happens… Power Query transforms the column word from a text into a list. There is no intermediate step that will split your column into multiple ones.

Now, let’s click on the expand button which is highlighted below: Screenshot_12

After the expand step, you can see that each word in the message gets a dedicated row with its message.

Let’s filer out numbers, words that starts with numbers and any string with less then 3 characters.

Click on the filter icon in the column header word. Uncheck the values null and (blank), and click OK.Screenshot_14

In the formula bar add the following formula to the filtering condition to keep words with more than 2 characters:

and Text.Length([word]) > 2

Screenshot_16

Now, let’s add a new filter by clicking again in the filter icon, selecting Text Filters and clicking Does Not Equal… (or any other option).

Screenshot_18

In the Row Filters dialog enter aaa as values, and click OK. This step is an intermediate step. You should not try to figure out its intrinsic logic.Screenshot_19

Now let’s delete the section [word] = “aaa” in the formula and change it to

Value.FromText(Text.At([word], 0)) is number

Screenshot_20

This step select all the rows whose text in the column word starts with a number.

In the next step we will filter out these rows by negating the logic in the formula.Screenshot_21

Negate the condition in the formula by adding not (…) to the condition above.

not (Value.FromText(Text.At([word], 0)) is number)

Here is the finished formula:Screenshot_22

Now you are ready to go. Click Home –> Close & Apply, and rebuild the Word Cloud by associating the field word to Category. 

Screenshot_23

Don’t forget to activate the Stop Words and Defaults in the Visualizations pane as shown below (This step will allow the Word Cloud visual to ignore common words like “the”, “to” and “for”):

Screenshot_24

Now, at last, you can resize your Word Cloud visual 🙂

 

Here are few bonus refinements:

If you prefer to Capitalize each word in your Word Cloud, you can go back to Query1 and apply the following step:

Screenshot_30.png

You may notice that the Word Cloud visual ignore special characters like $. As a result, you will see in the Word Cloud some numbers, even though we filtered out words that started with numbers. For example, the word $100 will be represented as 100 in the report. You can add the $ character to the list of Delimiters to avoid these words. I described above how to modify the list of delimiters.

Finally, you may want to drill down from a single word to the original message. To do it, you can add a new visual like the Multi-Row Card, and assign the field message to the Fields box.

Screenshot_31

Now whenever you click on a single word in the Word Cloud, you will see the original message.Screenshot_32

At this stage you might be wondering why the word Ghostbusters was highlighted in comparison to other words that were used on Donald Trump’s page.

We forgot to define how the Word Cloud should count the word repetition. To do it, simply frag and drop the field word to the Values box, and you are ready to go.

Screenshot_33.png

Have a  happy Word-busting 🙂

 

8 comments

  1. grovelliReply

    I’ve just noticed that you add the Word Count visual (based on the instructions on part 1) but when you close and reopen Power BI desktop it disappears from the visualizations list

    • DataChantReply

      If you open the same .pbix file, the visual will remain. It is only when you open Power BI Desktop on a new or other files, you need to add the custom visual again. This is not an ideal experience, I agree.

  2. grovelliReply

    Interesting, I don’t remember whether Power BI asked me to save at the end of the procedure outlined in part 1 but I don’t think it did because I only have 1 pbix file saved on my pc and it’s not for this project

  3. Pingback: Facebook Post Reactions to Trump and Clinton in Power BI - DataChant

  4. AnonymousReply

    Great! Is there any way where we can highlight the words in the multi card control(as we select from word Cloud)? For Ex: If I select “America” word from word cloud, then its related rows from multi card will show me “America” highlighted!

  5. Gil Raviv Post authorReply

    Unfortunately, you cannot have the highlight effect on the table or matrix visual.

Leave a Reply