Job Market Research
Table of Contents
As I was planning to start job hunting, again, I had the idea of gathering LinkedIn data. My goal was having data of companies to sort, so I can prioritize which to contact…as well as doing some analysis.
1. Prioritazion factor
1.1. Employees count
One of the prioritization factors is the employee count, the problem with this variable is:
- Count base on owner reporting:
Anyone can report what ever they want, here is an example:
Learn With Sunny, is a company reporting to have +10,000 employees; with only 14 followers and a non-functioning website.
https://www.linkedin.com/company/learn-with-sunny
- Count base on users:
It is looks more reliable, but there are two reasons why it might report inaccurate data:
- Not all employees have a LinkedIn account
- Any user can claim to be working with a company
- Not all employees have a LinkedIn account
1.2. Followers count
Followers count is probably the most reliable variable in the LinkedIn data I acquired.
Below is a sunburst plot that shows how the market is split across multiple dimensions.
The follower count had to be turned from number to categories.
1.3. Enhanceing LinkedIn Data with WHOIS
Being the way I am, I wonder if I could get the companies' creation date, use it with followers variable in a metric, then spot companies that gained the most followers in a short amount of time.
Sadly such data is not present in LinkedIn, or at least in the free version, not sure about premium.
But, lucky me, I know about WHOIS; I made a script that can gather the data about the companies' domain creation date, with it, I was about to make this plot.
2. Analysis
2.1. Companies' posting behavious
2.1.1. Plot 1
Days without posting since last post vs how many posts a company posted on LinkedIn.
Observations: If company that have more than one post, it is less likely to go 400 days without posting.
2.1.2. Plot 2
Distributions of days without posting since last post by business followers count category.
It is expected to find that businesses with more followers post more frequently, except for some outliers.
2.2. Companies' website
2.2.1. overall
Paralled plot for the diffirent variables I captured when scraping the websites.
2.2.2. z-index
Z-index in a property in website scripting that prioritize showing an element in front of another.
The default value is 0, if let's say currently an element A is behind element B, you can assign element A, a z-index of 1, and it will be in front of element B.
Some assign very high number, three of the website almost .parquet database because the number was too big that I needed Int128 which .parquet doesn't currently support.
The below plots explores the highest z-indexes I encounter.
I assumed that the more successful a company is the less likely I can encounter an immense z-index values… I was wrong.