Skills demand analysis based on the data from online HR websites: Using web scraping and text mining applications: IT Sector
1 of 35
Downloaded 47 times
More Related Content
IT Skills Analysis
1. DATA MINING AND STATISTICAL ANALYSIS SOLUTIONS
Skills demand analysis based on the data from
online HR websites: Using web scraping and text
mining applications: IT Sector
Habet Madoyan
Vahe Movsisyan
Sunday, July 03, 2016
The analysis is funded by the research grant from American University of Armenia.
Presented at:
IX International School-Seminar. Town of Tsakhkadzor, Republic of Armenia
3. Introduction
In recent years online job ads became a popular job-search model, that’s
why the research community is increasingly experimenting with the
detailed breakdown of online job ads to study labor market dynamics.
It is estimated that in USA 60-70 percent of job openings are now posted
on the Internet. However these job ads are biased toward industries and
occupations that seek high-skilled, “white-collar” workers.
4. Introduction
Job seekers, employers, students, researchers, policymakers, higher education
institutions, career advisors, and curriculum developers now view online job ads
data as a practical source to explore the nature of today’s dynamic of labor market.
Online job ads can show the relative demand for different types of skills and levels
of education. The real-time nature of job ads data also allows for the early
detection of labor demand trends, which gives job seekers, employers, and
policymakers a forward-looking analytical tool.
Real-time labor market indicators can be particularly useful in aligning education
and training curricula with workforce needs in emerging or rapidly changing
industries, such as healthcare and information technology, etc.
5. Job ads provide an incomplete picture of labor
demand
Online job ads data strongly correlate with job
openings data
8. Synopsys of the study
? Develop an algorithm for web scrapping job announcement
data (careercenter.am)
? Text mining and parsing algorithms to structure job
announcements
? Algorithms to assess and track vacancy rates by:
? Industry
? Job role
? Specific skills
9. What was done
? Around 20,000 posts are scrapped from the web,
? Posts come in rough, unstructured way. Algorithm is
developed to structure them.
13. ICT sector and overall economy
Datamotus LLC 13
3.00
3.20
3.40
3.60
3.80
4.00
4.20
4.40
1.60
1.70
1.80
1.90
2.00
2.10
2.20
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Average yearly wage in Transport and Communication sector/Average yearly wage in RA
Weight of Transport and Communication sector (including IT sector) in GDP (right scale, in %)
19. Arules
? Association rules mining is used to analyse the co-
occurrence of programming languages in a job post
? R package “”arules” and “arulesViz” are used for
the analysis
? Analysis is done for IT jobs only
20. Association rules: Measures of rules
interestingness
Datamotus LLC 20
Measure 1
Support = ? ? ∩ ?
Measure 2
Confidence = ? ?|? = ?(? ∩ ?)/?(?)
Measure 3
Lift =
? ?|?
? ?
=
?(?∩?)
?(?)
?
1
?(?)
Suppose we have the rule : IF {A} = > {B}
22. Association Mining for
Programming languages: C++
Datamotus LLC 22
? Set of association rules is generated for top20 programming languages.
? Rules are subsetted with min support of 0.01 and min confidence of 0.1
Two items on the left
One item on the left
34. Next Steps:
? Develop machine learning algorithm to classify job ads by sectors,
? Develop state of art text mining and topic modeling algorithms to
predict demand for skills, professions and job roles,
? Create interactive web dashboard (using R shiny) to help:
? Potential job seekers
? Potential employees
? Policy makers
? Universities
Datamotus LLC 34