Anastasiia Kornilova has over 3 years of experience in data science. She has an MS in Applied Mathematics and runs two blogs. Her interests include recommendation systems, natural language processing, and scalable data solutions. The agenda of her presentation includes defining data science, who data scientists are and what they do, and how to start a career in data science. She discusses the wide availability of data, how data science makes sense of and provides feedback on data, common data science applications, and who employs data scientists. The presentation outlines the typical data science workflow and skills required, including domain knowledge, math/statistics, programming, communication/visualization, and how these skills can be obtained. It provides examples of data science
2. WHO AM I?
3+ years in Data Science
MS in Applied Mathematics
Professional interests: recommendations systems, natural language
processing, scalable data science solutions
Authors of two blogs: energy鍖refox.blogspot.com,
datascientistdiary.blogspot.com
Fan of online education (20+ 鍖nished MOOCs)
3. What is Data Science and why do we need it?
Data Scientists.Who they are and what do they
do?
How to start?
Practical case
AGENDA
3
21. TYPES OF DATA SCIENTISTS
A - Analysis
B - Building
Robert Chang
22. DSTYPE A - ANALYSIS
making sense of data or working with it in a fairly static way.
very similar to a statistician (and may be one)
knows all the practical details of working with data that
arent taught in the statistics curriculum: data cleaning,
methods for dealing with very large data sets, visualization,
deep knowledge of a particular domain, writing well
about data
23. share some statistical background withType A
very strong coders and may be trained software
engineers
mainly interested in using data in production.
build models which interact with users, often serving
recommendations (products, people you may know, ads,
movies, search results).
DSTYPE B - BUILDING
26. TYPICAL DATA SCIENCE
WORKFLOW
Preparing to run a model (Gathering, cleaning,
transformation)
Running the model
Interpreting the results
80% of work - Aaron Kimball
Other 80% of the work
26
28. DOMAIN KNOWLEDGE AND
SOFT SKILLS
Passionate about the business
Curios about data
In鍖uence without authority
Hacker mindset
Problem solver
Strategic, proactive, creative, innovative and collaborative
28
29. MATH AND STATISTICS
Machine learning
Statistical modelling
Experiment design
Supervised learning
Unsupervised learning
Optimisation
29
30. PROGRAMMING AND
DATABASES
Computer science fundamentals
Scripting language
Statistical computing language
Databases
Relational algebra
Distributed computations
30
31. COMMUNICATION AND
VISUALIZATION
Ability to engage with senior management
Storytelling skills
Visual art design
Knowledge of a vizualisation tool
Translate data-driven insights into decisions and actions
31