狠狠撸

狠狠撸Share a Scribd company logo
Part 2:
What are data?
[Hands-on exercise]
What are Data?
Data are one part of scholarly capital, along with
human capital and instrumentation.
Data have become essential scholarly objects to be
captured, mined, used and reused.
Research in all academic fields relies on data.
Research Data
Lays out a nice definition of data and how they vary in different disciplines
The Digital Future is Now: A Call to Action for the Humanities
(please read sections 25-44).
[http://www.digitalhumanities.org/dhq/vol/3/4/000077/000077.html]
Presidential Chair & Professor of
Information Studies,
University of California, Los Angeles
Christine Borgman
Definitions associated with archival information systems offer a
useful starting point:
Definition of data
A reinterpretable representation of
information in a formalized manner suitable
for communication, interpretation, or
processing.
Examples of data include a sequence of bits, a
table of numbers, the characters on a page, the
recording of sounds made by a person speaking,
or a moon rock specimen.
Source: Reference model for an open archival information system 2002, 1-9.
[http://public.ccsds.org/publications/archive/650x0b1s.pdf]
Technical definition
Definition of data
In Buckland’s terms, data are
“alleged evidence”
Source: Buckland,M.K. (1991). “Information as thing.” Journal of the American Society for Information Science, 42 (5): 351-360.
Socio-technical definition
What are data?
Think about data by its origin.
In the context of cyberinfrastructure, the four categories of data identified in an influential
U.S. policy report Long-lived Data Collections 2005, and incorporated in National Science
Foundation strategy Cyberinfrastructure Vision for 21st Century Discovery 2007, are now
widely accepted.
1. Observational data- include weather measurements and
attitude surveys...
2. Computational data- result from executing a computer model
or simulation whether for physics or cultural virtual reality.
3. Experimental data- include results from laboratory studies
such as measurements of chemical reactions …
4. Records of government, business and public and private life
yield useful data for scientific, social scientific, and humanistic
research.
Example 1
Audio analyser
Frequency analyser
Intelligent Speech Analyser
MS Excel spread sheet
Audio clips
Text reports
Certain parts of the content for example 1 have been removed due to sensitive content
and copyright issue.
Please contact WY for more information.
Video recorders
Voice recorders
Diary
Video clips
Audio clips
Diary entries
Data Variety
To give you a better idea of what can be data, Christine Borgman
later expands on her examples and sources of data and how they
vary by branch of research.
Scientific data Social scientific data Humanities and arts data
Examples Ecology: weather, ground
water, sensor readings,
historical record
Medicine: xrays
Chemistry: protein structures
Astronomy: spectral surveys
Biology: specimens
Physics: events, objects
Documentation: Lab and field
notebooks, spreadsheets
Opinion polls
Surveys, interviews
Mass media
Laboratory experiments
Field experiments
Demographic records
Census records
Voting records
Economic indicators
Newspapers
Photographs
Letters
Diaries
Books, articles
Birth, death, marriage
records
Church records
Court records
School and college
yearbooks
Maps…
Sources Generate own data
Acquire from collaborators,
other scientists
Data repository
Generate own data
Acquire from other
scholars
Data repositories: Social
Surveys
Government records
Corporate records
Libraries, archives,
museums
Public records
Corporate records, mass
media
Acquire from other
scholars
Data repositories:
Beazley, Arts &
Humanities Data Service
(UK)
Table: Examples and sources of data from the major research branches. (Borgman)
Example 2
Example 2 has been removed due to sensitive content.
Please contact WY for more information.
Exercise
1. Form a group based on subject or discipline.
[Those without subject role can join in any group]
2. Hands-on exercise for Librarians (please work in group)
- use OneSearch/ Databases/ DR-NTU/ Google to get an article published by
any of your faculty or researcher.
- quickly go through the research paper, particularly the methodology section.
3. Librarians among the group to ask and answer the following questions.
[see next slide]
4. Post the findings (title of the research article, question and answer) to PD blog.
Instructions:
1.
Who are they? What research community do they belong to?
What larger discipline is that community a part of?
2.
What data are they creating (i.e., data types, formats, etc)?
How are they creating these data?
3.
What are the roles of data in their research?
Title: Librarian Class Attendance: Methods, Outcomes and Opportunities
http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1757&context=iatul
http://www.iatul.org/doclibrary/public/Conf_Proceedings/2006/CmorMarshallpaper.pdf
Example sharing
1. Who are they? What research community do they belong to? What larger discipline is that
community a part of?
Dianne Cmor and Victoria Marshall. Library science research community. Information Science.
2. What data are they creating (i.e., data types, formats, etc)? How are they creating these data?
1. Diary entries
2. Qualitative data from Ethnograph and SPSS
3. Reftracker report
4. Interview notes
5. Survey feedback
The data are mostly text and numeric social scientific and humanities & arts data.
3. What are the roles of data in their research?
The information collected was converted/ translated into data. The researchers analysed the
data and got the findings out from the data. They examined and evaluated the outcomes/
findings and then built a convincing evidence to answer all the questions they have posed
earlier for their research.
Example: [Before the interview]
Who are they? What research community do they belong to? What larger discipline is that community a part of?
Dianne Cmor is the lead researcher for a research project. Victoria Marshall is another member of the research project. Dianne
and Victoria are both librarian in a university library.
The project is a library related research and the topic of her research is "Librarian class attendance: methods, outcomes and
opportunities". [Library science research community]
The discipline of the research project belongs to Information Science.
What data are they creating (i.e., data types, formats, etc)? How are they creating these data?
The researchers attended a number of seminars called “Journal club” for about 9 weeks. They have jotted down all their
observation in the seminar on a diary. The diary entries were typed out in MS Word and eventually converted to some
qualitative data by using the Ethnograph software and SPSS software.
Reftracker was used each week to document time spent and associated outcomes in relation to meetings with students,
students’ attendance, and the creation of course support content.
The researchers conducted a few interview with the students and faculty members to collect information. A paper survey
form was also created to collect feedback from the students and some faculty members. The researchers typed out all the
notes collected from the interview and survey in MS Word.
The hard copy of the diaries and survey forms were scanned and saved in PDF format.
The data are mostly text and numeric social scientific and humanities & arts data.
What are the roles of data in their research?
The information collected through the observation at various university lectures and seminars/ tutorials, interviews and
survey conducted for students and faculty members was translated into data. The team analysed the data and got the findings
out from the data. They examined and evaluated the outcomes/ findings and then built a convincing evidence to answer all
the questions they have posed earlier for their research.
Example: [After the interview]
[For reference only]
Data Stage Output
# of Files / Typical
Size Format Other / Notes
Primary Data
Raw Diary, interview notes
and survey forms
25 files/ unknown Handwritten hard copy
Processed Diary and survey forms 2 files/ < 3MB PDF Scanned copy of the diary (1 file)
and survey forms (1 file).
Original data from the
diary, interview notes
and survey forms
3 files/ < 3MB .doc [MS Word] All entries in the diary, notes &
feedback from the interview and
survey were typed out in MS
Word.
Analyzed Qualitative and
quantitative data
2 files/ < 500KB .CHN [Ethnograph]
.csv [MS Excel]
The researchers used
Ethnograph software and SPSS
software to generate qualitative
data.
A report generated from
RefTracker.
Finalized Report [tables and
figures]
<100KB .csv [MS Excel]
Note: The data specifically designated by the scientist to make publicly available are indicated by the
rows shaded in gray (the “Analyzed” row is shaded here as an example). Empty cells represent cases in
which information was not collected or the scientist could not provide a response.
The data table [For reference only. You don’t have to do this]
Example: Data curation profile

More Related Content

Similar to What are Data? (20)

Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
University of Arizona
?
Data collection methods in research
Data collection methods in researchData collection methods in research
Data collection methods in research
Vijay Thorat
?
Htrm2009 Student Workshop Session1
Htrm2009 Student Workshop Session1Htrm2009 Student Workshop Session1
Htrm2009 Student Workshop Session1
englishonecfl
?
21st Century Research Landscape
21st Century Research Landscape21st Century Research Landscape
21st Century Research Landscape
Growth Canvas Consulting LLC
?
Edirisingha ethics unisa2012_12_june2012
Edirisingha ethics unisa2012_12_june2012Edirisingha ethics unisa2012_12_june2012
Edirisingha ethics unisa2012_12_june2012
Palitha Edirisingha
?
Introduction To Critical Enquiry Research
Introduction To Critical Enquiry ResearchIntroduction To Critical Enquiry Research
Introduction To Critical Enquiry Research
Terry Flew
?
Qualitative Research Overview
Qualitative Research OverviewQualitative Research Overview
Qualitative Research Overview
Savannah Technical College | Alaska Pacific University
?
LIS 653, Session 11: Data Management & Curation
LIS 653, Session 11: Data Management & CurationLIS 653, Session 11: Data Management & Curation
LIS 653, Session 11: Data Management & Curation
Dr. Starr Hoffman
?
Building and providing data management services a framework for everyone!
Building and providing data management services  a framework for everyone!Building and providing data management services  a framework for everyone!
Building and providing data management services a framework for everyone!
Renaine Julian
?
Social Media Use by Canadian Academic Librarians
Social Media Use by Canadian Academic LibrariansSocial Media Use by Canadian Academic Librarians
Social Media Use by Canadian Academic Librarians
CARLsurvey2010
?
Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523
ORCID, Inc
?
Ps rwebinar january2019final
Ps rwebinar january2019finalPs rwebinar january2019final
Ps rwebinar january2019final
Margaret Henderson
?
Alenka Sauperl: Qualitative Research Methods in Information and Library Science
Alenka Sauperl: Qualitative Research Methods in Information and Library ScienceAlenka Sauperl: Qualitative Research Methods in Information and Library Science
Alenka Sauperl: Qualitative Research Methods in Information and Library Science
?ISK FF UK
?
SOC2002 Lecture 3
SOC2002 Lecture 3SOC2002 Lecture 3
SOC2002 Lecture 3
Bonnie Green
?
Va sla nov 15 final
Va sla nov 15 finalVa sla nov 15 final
Va sla nov 15 final
Margaret Henderson
?
STEM Mom Speaks to Teachers at Princeton University
STEM Mom Speaks to Teachers at Princeton University STEM Mom Speaks to Teachers at Princeton University
STEM Mom Speaks to Teachers at Princeton University
Darci the STEM Mom
?
Information Skills For Researchers V3
Information Skills For Researchers V3Information Skills For Researchers V3
Information Skills For Researchers V3
Jacqueline Thomas
?
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social Sciences
Celia Emmelhainz
?
meta analysis
meta analysis meta analysis
meta analysis
aniuskmarin
?
Latest trends in open science and big data analytical study: a decade of scie...
Latest trends in open science and big data analytical study: a decade of scie...Latest trends in open science and big data analytical study: a decade of scie...
Latest trends in open science and big data analytical study: a decade of scie...
Oluwaseyi WUSU
?
Data collection methods in research
Data collection methods in researchData collection methods in research
Data collection methods in research
Vijay Thorat
?
Htrm2009 Student Workshop Session1
Htrm2009 Student Workshop Session1Htrm2009 Student Workshop Session1
Htrm2009 Student Workshop Session1
englishonecfl
?
Edirisingha ethics unisa2012_12_june2012
Edirisingha ethics unisa2012_12_june2012Edirisingha ethics unisa2012_12_june2012
Edirisingha ethics unisa2012_12_june2012
Palitha Edirisingha
?
Introduction To Critical Enquiry Research
Introduction To Critical Enquiry ResearchIntroduction To Critical Enquiry Research
Introduction To Critical Enquiry Research
Terry Flew
?
LIS 653, Session 11: Data Management & Curation
LIS 653, Session 11: Data Management & CurationLIS 653, Session 11: Data Management & Curation
LIS 653, Session 11: Data Management & Curation
Dr. Starr Hoffman
?
Building and providing data management services a framework for everyone!
Building and providing data management services  a framework for everyone!Building and providing data management services  a framework for everyone!
Building and providing data management services a framework for everyone!
Renaine Julian
?
Social Media Use by Canadian Academic Librarians
Social Media Use by Canadian Academic LibrariansSocial Media Use by Canadian Academic Librarians
Social Media Use by Canadian Academic Librarians
CARLsurvey2010
?
Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523
ORCID, Inc
?
Alenka Sauperl: Qualitative Research Methods in Information and Library Science
Alenka Sauperl: Qualitative Research Methods in Information and Library ScienceAlenka Sauperl: Qualitative Research Methods in Information and Library Science
Alenka Sauperl: Qualitative Research Methods in Information and Library Science
?ISK FF UK
?
STEM Mom Speaks to Teachers at Princeton University
STEM Mom Speaks to Teachers at Princeton University STEM Mom Speaks to Teachers at Princeton University
STEM Mom Speaks to Teachers at Princeton University
Darci the STEM Mom
?
Information Skills For Researchers V3
Information Skills For Researchers V3Information Skills For Researchers V3
Information Skills For Researchers V3
Jacqueline Thomas
?
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social Sciences
Celia Emmelhainz
?
Latest trends in open science and big data analytical study: a decade of scie...
Latest trends in open science and big data analytical study: a decade of scie...Latest trends in open science and big data analytical study: a decade of scie...
Latest trends in open science and big data analytical study: a decade of scie...
Oluwaseyi WUSU
?

Recently uploaded (20)

Boosting MySQL with Vector Search Scale22X 2025.pdf
Boosting MySQL with Vector Search Scale22X 2025.pdfBoosting MySQL with Vector Search Scale22X 2025.pdf
Boosting MySQL with Vector Search Scale22X 2025.pdf
Alkin Tezuysal
?
Lecture-AI and Alogor Parallel Aglorithms.pptx
Lecture-AI and Alogor Parallel Aglorithms.pptxLecture-AI and Alogor Parallel Aglorithms.pptx
Lecture-AI and Alogor Parallel Aglorithms.pptx
humairafatima22
?
Lesson 9- Data Governance and Ethics.pptx
Lesson 9- Data Governance and Ethics.pptxLesson 9- Data Governance and Ethics.pptx
Lesson 9- Data Governance and Ethics.pptx
1045858
?
Introduction to database and analysis software’s suitable for.pptx
Introduction to database and analysis software’s suitable for.pptxIntroduction to database and analysis software’s suitable for.pptx
Introduction to database and analysis software’s suitable for.pptx
nabinparajuli9
?
Design Data Model Objects for Analytics, Activation, and AI
Design Data Model Objects for Analytics, Activation, and AIDesign Data Model Objects for Analytics, Activation, and AI
Design Data Model Objects for Analytics, Activation, and AI
aaronmwinters
?
IFRS Finance Powerpoint ppt Finance D.pptx
IFRS Finance Powerpoint  ppt Finance D.pptxIFRS Finance Powerpoint  ppt Finance D.pptx
IFRS Finance Powerpoint ppt Finance D.pptx
amantiwari2091
?
HIRE MUYERN TRUST HACKER FOR AUTHENTIC CYBER SERVICES
HIRE MUYERN TRUST HACKER FOR AUTHENTIC CYBER SERVICESHIRE MUYERN TRUST HACKER FOR AUTHENTIC CYBER SERVICES
HIRE MUYERN TRUST HACKER FOR AUTHENTIC CYBER SERVICES
anastasiapenova16
?
Monitoring Imam Ririn di Pilkada Kota Depok 2024
Monitoring Imam Ririn di Pilkada Kota Depok 2024Monitoring Imam Ririn di Pilkada Kota Depok 2024
Monitoring Imam Ririn di Pilkada Kota Depok 2024
Deddy Rahman
?
Stasiun kernel pengolahan kelapa sawit indonesia
Stasiun kernel pengolahan kelapa sawit indonesiaStasiun kernel pengolahan kelapa sawit indonesia
Stasiun kernel pengolahan kelapa sawit indonesia
fikrimanurung1
?
Class 3-Workforce profile updated P.pptx
Class 3-Workforce profile updated P.pptxClass 3-Workforce profile updated P.pptx
Class 3-Workforce profile updated P.pptx
angelananalucky
?
MLecture 1 Introduction to AI . The basics.pptx
MLecture 1 Introduction to AI . The basics.pptxMLecture 1 Introduction to AI . The basics.pptx
MLecture 1 Introduction to AI . The basics.pptx
FaizaKhan720183
?
Valkey 101 - SCaLE 22x March 2025 Stokes.pdf
Valkey 101 - SCaLE 22x March 2025 Stokes.pdfValkey 101 - SCaLE 22x March 2025 Stokes.pdf
Valkey 101 - SCaLE 22x March 2025 Stokes.pdf
Dave Stokes
?
加拿大成绩单购买原版(鲍颁毕业证书)卡尔加里大学毕业证文凭
加拿大成绩单购买原版(鲍颁毕业证书)卡尔加里大学毕业证文凭加拿大成绩单购买原版(鲍颁毕业证书)卡尔加里大学毕业证文凭
加拿大成绩单购买原版(鲍颁毕业证书)卡尔加里大学毕业证文凭
taqyed
?
april 2024 paper 2 ms. english non fiction
april 2024 paper 2 ms. english non fictionapril 2024 paper 2 ms. english non fiction
april 2024 paper 2 ms. english non fiction
omokoredeolasunbomi
?
原版复刻加拿大多伦多大学成绩单(UTSG毕业证书) 文凭
原版复刻加拿大多伦多大学成绩单(UTSG毕业证书) 文凭原版复刻加拿大多伦多大学成绩单(UTSG毕业证书) 文凭
原版复刻加拿大多伦多大学成绩单(UTSG毕业证书) 文凭
taqyed
?
The Role of Christopher Campos Orlando in Sustainability Analytics
The Role of Christopher Campos Orlando in Sustainability AnalyticsThe Role of Christopher Campos Orlando in Sustainability Analytics
The Role of Christopher Campos Orlando in Sustainability Analytics
christophercamposus1
?
stages-of-moral-development-lawrence-kohlberg-pdf-free.pdf
stages-of-moral-development-lawrence-kohlberg-pdf-free.pdfstages-of-moral-development-lawrence-kohlberg-pdf-free.pdf
stages-of-moral-development-lawrence-kohlberg-pdf-free.pdf
esguerramark1991
?
Introduction Lecture 01 Data Science.pdf
Introduction Lecture 01 Data Science.pdfIntroduction Lecture 01 Data Science.pdf
Introduction Lecture 01 Data Science.pdf
messagetome133
?
办理魁北克大学成绩单触购买加拿大鲍蚕础惭成绩单文凭定制
办理魁北克大学成绩单触购买加拿大鲍蚕础惭成绩单文凭定制办理魁北克大学成绩单触购买加拿大鲍蚕础惭成绩单文凭定制
办理魁北克大学成绩单触购买加拿大鲍蚕础惭成绩单文凭定制
taqyed
?
vnptloveeeeeeeeeeeeeeeeeeeeeeeeeeee.pptx
vnptloveeeeeeeeeeeeeeeeeeeeeeeeeeee.pptxvnptloveeeeeeeeeeeeeeeeeeeeeeeeeeee.pptx
vnptloveeeeeeeeeeeeeeeeeeeeeeeeeeee.pptx
deomom129
?
Boosting MySQL with Vector Search Scale22X 2025.pdf
Boosting MySQL with Vector Search Scale22X 2025.pdfBoosting MySQL with Vector Search Scale22X 2025.pdf
Boosting MySQL with Vector Search Scale22X 2025.pdf
Alkin Tezuysal
?
Lecture-AI and Alogor Parallel Aglorithms.pptx
Lecture-AI and Alogor Parallel Aglorithms.pptxLecture-AI and Alogor Parallel Aglorithms.pptx
Lecture-AI and Alogor Parallel Aglorithms.pptx
humairafatima22
?
Lesson 9- Data Governance and Ethics.pptx
Lesson 9- Data Governance and Ethics.pptxLesson 9- Data Governance and Ethics.pptx
Lesson 9- Data Governance and Ethics.pptx
1045858
?
Introduction to database and analysis software’s suitable for.pptx
Introduction to database and analysis software’s suitable for.pptxIntroduction to database and analysis software’s suitable for.pptx
Introduction to database and analysis software’s suitable for.pptx
nabinparajuli9
?
Design Data Model Objects for Analytics, Activation, and AI
Design Data Model Objects for Analytics, Activation, and AIDesign Data Model Objects for Analytics, Activation, and AI
Design Data Model Objects for Analytics, Activation, and AI
aaronmwinters
?
IFRS Finance Powerpoint ppt Finance D.pptx
IFRS Finance Powerpoint  ppt Finance D.pptxIFRS Finance Powerpoint  ppt Finance D.pptx
IFRS Finance Powerpoint ppt Finance D.pptx
amantiwari2091
?
HIRE MUYERN TRUST HACKER FOR AUTHENTIC CYBER SERVICES
HIRE MUYERN TRUST HACKER FOR AUTHENTIC CYBER SERVICESHIRE MUYERN TRUST HACKER FOR AUTHENTIC CYBER SERVICES
HIRE MUYERN TRUST HACKER FOR AUTHENTIC CYBER SERVICES
anastasiapenova16
?
Monitoring Imam Ririn di Pilkada Kota Depok 2024
Monitoring Imam Ririn di Pilkada Kota Depok 2024Monitoring Imam Ririn di Pilkada Kota Depok 2024
Monitoring Imam Ririn di Pilkada Kota Depok 2024
Deddy Rahman
?
Stasiun kernel pengolahan kelapa sawit indonesia
Stasiun kernel pengolahan kelapa sawit indonesiaStasiun kernel pengolahan kelapa sawit indonesia
Stasiun kernel pengolahan kelapa sawit indonesia
fikrimanurung1
?
Class 3-Workforce profile updated P.pptx
Class 3-Workforce profile updated P.pptxClass 3-Workforce profile updated P.pptx
Class 3-Workforce profile updated P.pptx
angelananalucky
?
MLecture 1 Introduction to AI . The basics.pptx
MLecture 1 Introduction to AI . The basics.pptxMLecture 1 Introduction to AI . The basics.pptx
MLecture 1 Introduction to AI . The basics.pptx
FaizaKhan720183
?
Valkey 101 - SCaLE 22x March 2025 Stokes.pdf
Valkey 101 - SCaLE 22x March 2025 Stokes.pdfValkey 101 - SCaLE 22x March 2025 Stokes.pdf
Valkey 101 - SCaLE 22x March 2025 Stokes.pdf
Dave Stokes
?
加拿大成绩单购买原版(鲍颁毕业证书)卡尔加里大学毕业证文凭
加拿大成绩单购买原版(鲍颁毕业证书)卡尔加里大学毕业证文凭加拿大成绩单购买原版(鲍颁毕业证书)卡尔加里大学毕业证文凭
加拿大成绩单购买原版(鲍颁毕业证书)卡尔加里大学毕业证文凭
taqyed
?
april 2024 paper 2 ms. english non fiction
april 2024 paper 2 ms. english non fictionapril 2024 paper 2 ms. english non fiction
april 2024 paper 2 ms. english non fiction
omokoredeolasunbomi
?
原版复刻加拿大多伦多大学成绩单(UTSG毕业证书) 文凭
原版复刻加拿大多伦多大学成绩单(UTSG毕业证书) 文凭原版复刻加拿大多伦多大学成绩单(UTSG毕业证书) 文凭
原版复刻加拿大多伦多大学成绩单(UTSG毕业证书) 文凭
taqyed
?
The Role of Christopher Campos Orlando in Sustainability Analytics
The Role of Christopher Campos Orlando in Sustainability AnalyticsThe Role of Christopher Campos Orlando in Sustainability Analytics
The Role of Christopher Campos Orlando in Sustainability Analytics
christophercamposus1
?
stages-of-moral-development-lawrence-kohlberg-pdf-free.pdf
stages-of-moral-development-lawrence-kohlberg-pdf-free.pdfstages-of-moral-development-lawrence-kohlberg-pdf-free.pdf
stages-of-moral-development-lawrence-kohlberg-pdf-free.pdf
esguerramark1991
?
Introduction Lecture 01 Data Science.pdf
Introduction Lecture 01 Data Science.pdfIntroduction Lecture 01 Data Science.pdf
Introduction Lecture 01 Data Science.pdf
messagetome133
?
办理魁北克大学成绩单触购买加拿大鲍蚕础惭成绩单文凭定制
办理魁北克大学成绩单触购买加拿大鲍蚕础惭成绩单文凭定制办理魁北克大学成绩单触购买加拿大鲍蚕础惭成绩单文凭定制
办理魁北克大学成绩单触购买加拿大鲍蚕础惭成绩单文凭定制
taqyed
?
vnptloveeeeeeeeeeeeeeeeeeeeeeeeeeee.pptx
vnptloveeeeeeeeeeeeeeeeeeeeeeeeeeee.pptxvnptloveeeeeeeeeeeeeeeeeeeeeeeeeeee.pptx
vnptloveeeeeeeeeeeeeeeeeeeeeeeeeeee.pptx
deomom129
?

What are Data?

  • 1. Part 2: What are data? [Hands-on exercise]
  • 3. Data are one part of scholarly capital, along with human capital and instrumentation. Data have become essential scholarly objects to be captured, mined, used and reused. Research in all academic fields relies on data.
  • 4. Research Data Lays out a nice definition of data and how they vary in different disciplines The Digital Future is Now: A Call to Action for the Humanities (please read sections 25-44). [http://www.digitalhumanities.org/dhq/vol/3/4/000077/000077.html] Presidential Chair & Professor of Information Studies, University of California, Los Angeles Christine Borgman
  • 5. Definitions associated with archival information systems offer a useful starting point: Definition of data A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing. Examples of data include a sequence of bits, a table of numbers, the characters on a page, the recording of sounds made by a person speaking, or a moon rock specimen. Source: Reference model for an open archival information system 2002, 1-9. [http://public.ccsds.org/publications/archive/650x0b1s.pdf] Technical definition
  • 6. Definition of data In Buckland’s terms, data are “alleged evidence” Source: Buckland,M.K. (1991). “Information as thing.” Journal of the American Society for Information Science, 42 (5): 351-360. Socio-technical definition
  • 7. What are data? Think about data by its origin. In the context of cyberinfrastructure, the four categories of data identified in an influential U.S. policy report Long-lived Data Collections 2005, and incorporated in National Science Foundation strategy Cyberinfrastructure Vision for 21st Century Discovery 2007, are now widely accepted. 1. Observational data- include weather measurements and attitude surveys... 2. Computational data- result from executing a computer model or simulation whether for physics or cultural virtual reality. 3. Experimental data- include results from laboratory studies such as measurements of chemical reactions … 4. Records of government, business and public and private life yield useful data for scientific, social scientific, and humanistic research.
  • 8. Example 1 Audio analyser Frequency analyser Intelligent Speech Analyser MS Excel spread sheet Audio clips Text reports Certain parts of the content for example 1 have been removed due to sensitive content and copyright issue. Please contact WY for more information. Video recorders Voice recorders Diary Video clips Audio clips Diary entries
  • 9. Data Variety To give you a better idea of what can be data, Christine Borgman later expands on her examples and sources of data and how they vary by branch of research.
  • 10. Scientific data Social scientific data Humanities and arts data Examples Ecology: weather, ground water, sensor readings, historical record Medicine: xrays Chemistry: protein structures Astronomy: spectral surveys Biology: specimens Physics: events, objects Documentation: Lab and field notebooks, spreadsheets Opinion polls Surveys, interviews Mass media Laboratory experiments Field experiments Demographic records Census records Voting records Economic indicators Newspapers Photographs Letters Diaries Books, articles Birth, death, marriage records Church records Court records School and college yearbooks Maps… Sources Generate own data Acquire from collaborators, other scientists Data repository Generate own data Acquire from other scholars Data repositories: Social Surveys Government records Corporate records Libraries, archives, museums Public records Corporate records, mass media Acquire from other scholars Data repositories: Beazley, Arts & Humanities Data Service (UK) Table: Examples and sources of data from the major research branches. (Borgman)
  • 11. Example 2 Example 2 has been removed due to sensitive content. Please contact WY for more information.
  • 13. 1. Form a group based on subject or discipline. [Those without subject role can join in any group] 2. Hands-on exercise for Librarians (please work in group) - use OneSearch/ Databases/ DR-NTU/ Google to get an article published by any of your faculty or researcher. - quickly go through the research paper, particularly the methodology section. 3. Librarians among the group to ask and answer the following questions. [see next slide] 4. Post the findings (title of the research article, question and answer) to PD blog. Instructions:
  • 14. 1. Who are they? What research community do they belong to? What larger discipline is that community a part of? 2. What data are they creating (i.e., data types, formats, etc)? How are they creating these data? 3. What are the roles of data in their research?
  • 15. Title: Librarian Class Attendance: Methods, Outcomes and Opportunities http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1757&context=iatul http://www.iatul.org/doclibrary/public/Conf_Proceedings/2006/CmorMarshallpaper.pdf Example sharing
  • 16. 1. Who are they? What research community do they belong to? What larger discipline is that community a part of? Dianne Cmor and Victoria Marshall. Library science research community. Information Science. 2. What data are they creating (i.e., data types, formats, etc)? How are they creating these data? 1. Diary entries 2. Qualitative data from Ethnograph and SPSS 3. Reftracker report 4. Interview notes 5. Survey feedback The data are mostly text and numeric social scientific and humanities & arts data. 3. What are the roles of data in their research? The information collected was converted/ translated into data. The researchers analysed the data and got the findings out from the data. They examined and evaluated the outcomes/ findings and then built a convincing evidence to answer all the questions they have posed earlier for their research. Example: [Before the interview]
  • 17. Who are they? What research community do they belong to? What larger discipline is that community a part of? Dianne Cmor is the lead researcher for a research project. Victoria Marshall is another member of the research project. Dianne and Victoria are both librarian in a university library. The project is a library related research and the topic of her research is "Librarian class attendance: methods, outcomes and opportunities". [Library science research community] The discipline of the research project belongs to Information Science. What data are they creating (i.e., data types, formats, etc)? How are they creating these data? The researchers attended a number of seminars called “Journal club” for about 9 weeks. They have jotted down all their observation in the seminar on a diary. The diary entries were typed out in MS Word and eventually converted to some qualitative data by using the Ethnograph software and SPSS software. Reftracker was used each week to document time spent and associated outcomes in relation to meetings with students, students’ attendance, and the creation of course support content. The researchers conducted a few interview with the students and faculty members to collect information. A paper survey form was also created to collect feedback from the students and some faculty members. The researchers typed out all the notes collected from the interview and survey in MS Word. The hard copy of the diaries and survey forms were scanned and saved in PDF format. The data are mostly text and numeric social scientific and humanities & arts data. What are the roles of data in their research? The information collected through the observation at various university lectures and seminars/ tutorials, interviews and survey conducted for students and faculty members was translated into data. The team analysed the data and got the findings out from the data. They examined and evaluated the outcomes/ findings and then built a convincing evidence to answer all the questions they have posed earlier for their research. Example: [After the interview] [For reference only]
  • 18. Data Stage Output # of Files / Typical Size Format Other / Notes Primary Data Raw Diary, interview notes and survey forms 25 files/ unknown Handwritten hard copy Processed Diary and survey forms 2 files/ < 3MB PDF Scanned copy of the diary (1 file) and survey forms (1 file). Original data from the diary, interview notes and survey forms 3 files/ < 3MB .doc [MS Word] All entries in the diary, notes & feedback from the interview and survey were typed out in MS Word. Analyzed Qualitative and quantitative data 2 files/ < 500KB .CHN [Ethnograph] .csv [MS Excel] The researchers used Ethnograph software and SPSS software to generate qualitative data. A report generated from RefTracker. Finalized Report [tables and figures] <100KB .csv [MS Excel] Note: The data specifically designated by the scientist to make publicly available are indicated by the rows shaded in gray (the “Analyzed” row is shaded here as an example). Empty cells represent cases in which information was not collected or the scientist could not provide a response. The data table [For reference only. You don’t have to do this] Example: Data curation profile