Data Science: An Emerging
Field for Future Jobs
Jian Qin
School of Information Studies
Syracuse University

A presentation for the Graduate School, Syracuse University
February 22, 2013

   Talk points
        財 Data science (DS) and data scientists in the context of
           research data
        財 Implications and expectations of future research workforce
        財 Preparing for the challenges and opportunities

                                         GRADUATION SCHOOL PRESENTATION 2013-2-22   2

   Feeling the pressure
        of data deluge in the
        digital information


                                           GRADUATION SCHOOL PRESENTATION 2013-2-22   3

       in science research


in our health care


                                                     GRADUATION SCHOOL PRESENTATION 2013-2-22   5
        in our neighborhood            type=1&v=8

                               GRADUATION SCHOOL PRESENTATION 2013-2-22   6
Shift in Science Paradigms

        Thousand         A few hundred       A few decades                     Today
             years ago          years ago              ago

                                                             Data exploration (eScience)
                                                              unify theory, experiment, and
                                             A computational -- Data captured by
                                                 approach    instruments or generated by
                                                 simulating  simulator
                               Theoretical        complex    -- Processed by software
                                 branch         phenomena    -- Information/Knowledge
                              using models,                  stored in computer
                             generalizations                 -- Scientist analyzes
          Science was                                        database/files using data
           empirical                                         management and statistics
        describing natural         Gray, J. & Szalay, A. (2007). eScience  A transformed scientific method.
          phenomena                http://research.microsoft.com/en-us/um/people/gray/talks/NRC-CSTB_eScience.ppt
           2/22/13 13:54
                                                                 GRADUATION SCHOOL PRESENTATION 2013-2-22           7

        Research data collections
                     Size          Metadata                      Management

                     Larger,          Multiple,                        Organized
                    discipline-    comprehensive                    Institutionalized,

                  Smaller, team-      None or                           inside the
                     based            random                               team

                                   GRADUATION SCHOOL PRESENTATION 2013-2-22          8
Emerging concepts

                  that are going to stay and
                      matter to your career

                              GRADUATION SCHOOL PRESENTATION 2013-2-22   9
What is data science?

                     An emerging area of work
                   concerned with the collection,
                presentation, analysis, visualization,
                 management, and preservation of
                  large collections of information.

                    Stanton, J. (2012). Introduction to Data Science.
                                             GRADUATION SCHOOL PRESENTATION 2013-2-22   10

   Data science and scientific research

    Management domain                                    Technical domain
    Plan, design, consult                                    Ingest, store,
     for, implement, and                                  organize, merge,
        evaluate data                                   filter, and transform
    management projects                                    data and create
         and services                                   analysis-ready data

                                 GRADUATION SCHOOL PRESENTATION 2013-2-22   11
Data management is essential

                                Laboratory Data                         Data Modeling/
                                            Management Specialist                   Management Specialist
Scientific Data Management                  ≒ Administer operational database      ≒ Work closely with the high
Specialist                                  ≒ Assure the quality of data              performance computing and
≒    Design, develop, implement, and          database content                        the IT manager
      manage high-throughput automatic      ≒ Interact closely with researchers,   ≒ Develop a data model for
      data processing infrastructure for       lab managers, and platform              complex multi-scale rocks
      large databases in a mature system       coordinators                         ≒ Design and organize a
≒    Develop and improve the               ≒ Track deliverables against budget       database and complex
      infrastructure supporting this system    and prepare data reports                queries
≒    Interface with multiple data          ≒ Collaborate closely with IT and      ≒ Integrate and mange multi-
      providers to design, build, and          bioinformatics colleagues               scale rocks subjected to
      maintain their customized databases ≒ Assist IT in gathering workflow           large-scale scientific
≒    Clarify requirements, feature            requirements                            computing applications
      requests and bug reports for software ≒ Test changes and updates in IT
                                               systems                                http://www.ingrainrocks.com/
      developers and assist in testing                                                data-management-specialist/
      code.                                 ≒ Create and maintain app
                                                                  GRADUATION SCHOOL PRESENTATION 2013-2-22   12

         Were increasingly finding data in
        the wild, and data scientists are
        involved with gathering data,
        massaging it into a tractable form,
        making it tell its story, and presenting
        that story to others.
          Loukides, M. (2011). What is data science? Sebastopol, CA: OReilly.

                                                  GRADUATION SCHOOL PRESENTATION 2013-2-22   13

   Emerging job market: Data scientists
        財 Data scientists are more likely to be involved across the
           data lifecycle:
            Acquiring new data sets: 33%
            Parsing data sets: 29%
            Filtering and organizing data: 40%
            Mining data for patterns: 30%
            Advanced algorithms to solve analytical problems: 29%
            Representing data visually: 38%
            Telling a story with data: 34%
            Interacting with data dynamically: 37%
            Making business decisions based on data: 40%
                                                GRADUATION SCHOOL PRESENTATION 2013-2-22   14
Are you ready for the data
        challenges and opportunities?


                   GRADUATION SCHOOL PRESENTATION 2013-2-22   15
Ability to use a       Knowledge

      wide variety         of a subject
             tools for            domain
         documentation,                                       database and
          analysis, and                                       query design
          report of data

                                       Data                               OS,
                                     scientists                       Programming
                and co-

                              Content and                  Encoding
    What are                   repository                 languages
    expected of data
    scientists?                                GRADUATION SCHOOL PRESENTATION 2013-2-22   16

Analytical    skills: domain modeling
   Requirement analysis
                                Interview skills, analysis and
                                generalization skills
    Workflow analysis
                                Ability to capture components and
                                sequences in workflows
      Data modeling

                                Ability to translate domain analysis
   Data transformation          into data models
     needs analysis
                                Ability to envision the data model
     Data provenance            within the larger system architecture
      needs analysis

                                     GRADUATION SCHOOL PRESENTATION 2013-2-22   17
Analytical skills: from data sources to patterns,

   relationships, and trends
                                    Analytical tools




                                     GRADUATION SCHOOL PRESENTATION 2013-2-22   18
Data management skills: data lifecycle and

 infrastructural services

      Metadata    Encoding       Semantic         Identify                Infrastructural
      standards   language        control       management                services

     Processed, transformed, derived, calculated,  data                  ≒ Data source
                                                                          ≒ Data curation
                      Common data format
                          Image formats
                                                                          ≒ Data preservation
                          Matrix formats                                  ≒ Data integration and
                      Microarray file formats                                mashup
                     Communication protocols                              ≒ Data citation,
                                                                             publication, and
                                                                          ≒ Data linking and
                                                   GRADUATION SCHOOL PRESENTATION 2013-2-22   19
Technology skills with excellent communication


        財 Operation systems         財 Interviews
        財 Repository systems        財 Ice breaking
        財 Database systems          財 Community building
        財 Programming languages     財 Institutionalization
        財 Encoding languages        財 Stakeholder buy-in
        財 Specialized programming

                                      GRADUATION SCHOOL PRESENTATION 2013-2-22   20

   Four tracks: choose what you are good at

               Data                                         Data storage
             analytics                                          and
                             Data Science                   management
                             core course:
                              Applied data
              system                                            Data
            management                                      visualization
                                      GRADUATION SCHOOL PRESENTATION 2013-2-22   22
The iSchools version of data science

                               Ability to use a        Knowledge
                                 wide variety          of a subject                 Data
                                   tools for             domain                   modeling,
                               documentation,                                   database and
                                analysis, and                                   query design
        Eventually the          report of data

        iSchool data science
        program will build                                      Data                       OS,
        the foundation for        communication,
                                                              scientists               Programming
                                      and co-
        super data                  ordination

                                                    Content and               Encoding
                                                     repository              languages

                                                    GRADUATION SCHOOL PRESENTATION 2013-2-22     23

        Thank You!

                GRADUATION SCHOOL PRESENTATION 2013-2-22   24

