際際滷

際際滷Share a Scribd company logo
20 years of evolution in
data production in health
and life sciences
FEBRUARY 2019
OMICS AND SYSTEMS BIOLOGY DAY
St辿phane LE CROM
stephane.le_crom@sorbonne-universite.fr
@slecrom
2
IBENS genomics core facility
Our goals are:
1. To make functional genomics technologies available for all laboratories working on eukaryotes;
2. To help researchers managing their high throughput genomics projects;
3. To disseminate high scale approaches in genomics among the scientific community.
1  IBENS GENOMICS 20 years of evolution in data production
201720142009
IBiSARIO
RNG
Birth
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2010 2011
France
G辿nomique
2012 2013
ISO
9001
2015 2016
NF
X50-900
2018 2019
20
years
3
20 years of genomics at IBENS
1  IBENS GENOMICS 20 years of evolution in data production
Yeast
genome
project
1996
Home made
microarrays
1999
Single Cell
2017
Nanopore
sequencing
2016
High
throughput
sequencing
2010
Transcriptome
microarrays
2005
4
1st Sanger sequencing
Sequencing by synthesis discovered in 1977 by Fr辿d辿rick Sanger.
Around 1 kb of DNA by run during 6-8 hours. One read by sample.
1  IBENS GENOMICS 20 years of evolution in data production
From The Scientist
5
Capillary sequencers
96 parallel capillary (up to 50 cm) array. 768 samples, 690 kb DNA, 3 hours run. The Broad Institute can
sequence 1 human genome in 12 days with 126 devices.
1  IBENS GENOMICS 20 years of evolution in data production
From The Scientist
6
2nd High throughput sequencing
Increasing throughput by sequencing a huge number of short reads in parallel.
1  IBENS GENOMICS 20 years of evolution in data production
7
Oxford Nanopore Technologies
3rd Long reads & real time sequencing
Pacific Biosciences SMRT sequencing
1  IBENS GENOMICS 20 years of evolution in data production
8
A rich catalogue of HTS applications
Publication date of a representative article describing a method versus the number of citations that the article received. Methods are colored
by category, and the size of the data point is proportional to publication rate (citations/months). The inset indicates the color key as well
the proportion of methods in each group. For clarity, seq has been omitted from the labels.
1  IBENS GENOMICS 20 years of evolution in data production
Reuter et al. (2015) Mol Cell
9
At various
levels
1  IBENS GENOMICS 20 years of evolution in data production
Shendure & Aiden
(2012) Nat. Biotech.
10
User direct access to genomic data
1  IBENS GENOMICS 20 years of evolution in data production
https://vitagene.com/
11
User direct access to genomic data
1  IBENS GENOMICS 20 years of evolution in data production
https://www.soccergenomics.com/
12
User direct access to genomic data
1  IBENS GENOMICS 20 years of evolution in data production
https://nahibu.com/fr
13
User direct access to genomics data
1  IBENS GENOMICS 20 years of evolution in data production
https://nahibu.com/fr
14
The genomic data challenges
The time and the cost of sequencing genomes were reduced by a factor of 1 million in less than 10 years.
Anyone can get their whole genome sequenced for some hundreds of dollars.
The acquisition, storage, distribution, and analysis of large datasets require unique technological
solutions. By 2025, an estimated 40 exabytes (1018) of storage capacity will be required for human
genomic data.
Without proper legislation, companies could use genomic data for many purposes outside the sight of
the consumer. Users have to worry about whether or not companies sell their personal data to the highest
bidder.
How safe would it be for citizens to let authorities know about their innermost biological secrets
alongside their health risks and future prospects?
1  IBENS GENOMICS 20 years of evolution in data production
https://medicalfuturist.com/the-genomic-data-challenges-of-the-future/
15
2UMS PASS - DATA PRODUCTION AND ANALYSIS IN LIFE
SCIENCES AND HEALTH
20 years of evolution in data production
16
The UMS PASS missions
To accompany the facilities in their
organisation;
To increase the visibility of Sorbonne
University facilities;
To promote the skills of the experts present
on our platforms;
To animate the technological community
and to reinforce the links between the
personnel on our remote sites.
2  UMS PASS 20 years of evolution in data production
THE FACILITIES OF THE UMS DATA PRODUCTION AND ANALYSIS IN LIFE SCIENCES AND HEALTH
Biological Resource Centres
Molecular Analysis
Histomorphology
Bioinformatics
Genomics
Cytometry
Imaging
ICMquant
Histomorphologie
CISASME
ARTbio
UMS Facility
Associated Facility
CRB APHP.SU
P3S
CyPS
18
CyPS  Hyperion an imaging mass cytometry system
for highly multiplexed immunohistochemistry / 2020
@CyPS11913876
http://www.cytometrie.pitie-salpetriere.upmc.fr/
A wide range of technologies that
produce large amounts of data
P3S  timsTOF Pro next generation ion
mobility separation for proteomics / 2018
@Plateforme_P3S
https://www.p3s.sorbonne-universite.fr/
2  UMS PASS 20 years of evolution in data production
19
Infra
 Operation
 Network
Team
 Training
 Expertise
Users
 Experimental
design
 Samples
R&D
 Technological
updates
 New protocols
Storage
 Analysis
 Backup
Sharing
 Collaborators
 Publications
The data acquisition challenge
2  UMS PASS 20 years of evolution in data production
20
Storage tailored to users needs
Storage is a central need;
Data is often maintained on an individual
basis: desktop hard drives, NAS servers,
clouds (free, academic, commercial)
Access to all the data produced does not
require the same level of performance and
availability:
 Acquisitions and analysis;
 Backups;
 Archives.
2  UMS PASS 20 years of evolution in data production
21
FAIR principles
They were designed with data-driven and machine-assisted open science in mind. The final aim is that
machines as well as people can reuse each others research objects.
2  UMS PASS 20 years of evolution in data production
https://www.go-fair.org/fair-principles/
22
Data management plans (DMPs)
DMPs are essential.
They need to be tailored to the scientific fields they are to be employed in, and their planning
consequently grows quickly into a complex and challenging task.
2  UMS PASS 20 years of evolution in data production
https://logs-repository.com/articles/essentials-for-a-data-management-plan-for-spectroscopists/
https://www.karaackerman.com/content-management-plan/data-management-plan-in-a-nutshell/
23
Credit data IDs
for data reuse
The benefits of data sharing require its reuse.
Linking people to the data will lead to ways
credit them when data are reused.
This would influence funding, promotion and
incentivize more and better curation and
sharing.
2  UMS PASS 20 years of evolution in data production
Pierce et al (2019) 10.1038/d41586-019-01715-4
24
User
Towards a project-centred data storage
A new project-centric data object
Moving away from the current user-centred
approach
2  UMS PASS 20 years of evolution in data production
Core
Facility
Bioinfo
Contributor
Project
Core
Facility
Bioinfo
Contributor
links
copy
access
copy
Project
managers
PID
PID
PID
ORCID
25
To conclude
Progress in science depends on new
techniques, new discoveries and new ideas,
probably in that order
Sydney Brenner
Think about data sharing from the beginning
of your project
Promote reproducibility of results in your
daily practices
stephane.le_crom@sorbonne-universite.fr
@slecrom
20 years of evolution in data production
iStock
SORBONNE-UNIVERSITE.FR

More Related Content

20 years of evolution in data production in health and life sciences

  • 1. 20 years of evolution in data production in health and life sciences FEBRUARY 2019 OMICS AND SYSTEMS BIOLOGY DAY St辿phane LE CROM stephane.le_crom@sorbonne-universite.fr @slecrom
  • 2. 2 IBENS genomics core facility Our goals are: 1. To make functional genomics technologies available for all laboratories working on eukaryotes; 2. To help researchers managing their high throughput genomics projects; 3. To disseminate high scale approaches in genomics among the scientific community. 1 IBENS GENOMICS 20 years of evolution in data production 201720142009 IBiSARIO RNG Birth 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2010 2011 France G辿nomique 2012 2013 ISO 9001 2015 2016 NF X50-900 2018 2019 20 years
  • 3. 3 20 years of genomics at IBENS 1 IBENS GENOMICS 20 years of evolution in data production Yeast genome project 1996 Home made microarrays 1999 Single Cell 2017 Nanopore sequencing 2016 High throughput sequencing 2010 Transcriptome microarrays 2005
  • 4. 4 1st Sanger sequencing Sequencing by synthesis discovered in 1977 by Fr辿d辿rick Sanger. Around 1 kb of DNA by run during 6-8 hours. One read by sample. 1 IBENS GENOMICS 20 years of evolution in data production From The Scientist
  • 5. 5 Capillary sequencers 96 parallel capillary (up to 50 cm) array. 768 samples, 690 kb DNA, 3 hours run. The Broad Institute can sequence 1 human genome in 12 days with 126 devices. 1 IBENS GENOMICS 20 years of evolution in data production From The Scientist
  • 6. 6 2nd High throughput sequencing Increasing throughput by sequencing a huge number of short reads in parallel. 1 IBENS GENOMICS 20 years of evolution in data production
  • 7. 7 Oxford Nanopore Technologies 3rd Long reads & real time sequencing Pacific Biosciences SMRT sequencing 1 IBENS GENOMICS 20 years of evolution in data production
  • 8. 8 A rich catalogue of HTS applications Publication date of a representative article describing a method versus the number of citations that the article received. Methods are colored by category, and the size of the data point is proportional to publication rate (citations/months). The inset indicates the color key as well the proportion of methods in each group. For clarity, seq has been omitted from the labels. 1 IBENS GENOMICS 20 years of evolution in data production Reuter et al. (2015) Mol Cell
  • 9. 9 At various levels 1 IBENS GENOMICS 20 years of evolution in data production Shendure & Aiden (2012) Nat. Biotech.
  • 10. 10 User direct access to genomic data 1 IBENS GENOMICS 20 years of evolution in data production https://vitagene.com/
  • 11. 11 User direct access to genomic data 1 IBENS GENOMICS 20 years of evolution in data production https://www.soccergenomics.com/
  • 12. 12 User direct access to genomic data 1 IBENS GENOMICS 20 years of evolution in data production https://nahibu.com/fr
  • 13. 13 User direct access to genomics data 1 IBENS GENOMICS 20 years of evolution in data production https://nahibu.com/fr
  • 14. 14 The genomic data challenges The time and the cost of sequencing genomes were reduced by a factor of 1 million in less than 10 years. Anyone can get their whole genome sequenced for some hundreds of dollars. The acquisition, storage, distribution, and analysis of large datasets require unique technological solutions. By 2025, an estimated 40 exabytes (1018) of storage capacity will be required for human genomic data. Without proper legislation, companies could use genomic data for many purposes outside the sight of the consumer. Users have to worry about whether or not companies sell their personal data to the highest bidder. How safe would it be for citizens to let authorities know about their innermost biological secrets alongside their health risks and future prospects? 1 IBENS GENOMICS 20 years of evolution in data production https://medicalfuturist.com/the-genomic-data-challenges-of-the-future/
  • 15. 15 2UMS PASS - DATA PRODUCTION AND ANALYSIS IN LIFE SCIENCES AND HEALTH 20 years of evolution in data production
  • 16. 16 The UMS PASS missions To accompany the facilities in their organisation; To increase the visibility of Sorbonne University facilities; To promote the skills of the experts present on our platforms; To animate the technological community and to reinforce the links between the personnel on our remote sites. 2 UMS PASS 20 years of evolution in data production
  • 17. THE FACILITIES OF THE UMS DATA PRODUCTION AND ANALYSIS IN LIFE SCIENCES AND HEALTH Biological Resource Centres Molecular Analysis Histomorphology Bioinformatics Genomics Cytometry Imaging ICMquant Histomorphologie CISASME ARTbio UMS Facility Associated Facility CRB APHP.SU P3S CyPS
  • 18. 18 CyPS Hyperion an imaging mass cytometry system for highly multiplexed immunohistochemistry / 2020 @CyPS11913876 http://www.cytometrie.pitie-salpetriere.upmc.fr/ A wide range of technologies that produce large amounts of data P3S timsTOF Pro next generation ion mobility separation for proteomics / 2018 @Plateforme_P3S https://www.p3s.sorbonne-universite.fr/ 2 UMS PASS 20 years of evolution in data production
  • 19. 19 Infra Operation Network Team Training Expertise Users Experimental design Samples R&D Technological updates New protocols Storage Analysis Backup Sharing Collaborators Publications The data acquisition challenge 2 UMS PASS 20 years of evolution in data production
  • 20. 20 Storage tailored to users needs Storage is a central need; Data is often maintained on an individual basis: desktop hard drives, NAS servers, clouds (free, academic, commercial) Access to all the data produced does not require the same level of performance and availability: Acquisitions and analysis; Backups; Archives. 2 UMS PASS 20 years of evolution in data production
  • 21. 21 FAIR principles They were designed with data-driven and machine-assisted open science in mind. The final aim is that machines as well as people can reuse each others research objects. 2 UMS PASS 20 years of evolution in data production https://www.go-fair.org/fair-principles/
  • 22. 22 Data management plans (DMPs) DMPs are essential. They need to be tailored to the scientific fields they are to be employed in, and their planning consequently grows quickly into a complex and challenging task. 2 UMS PASS 20 years of evolution in data production https://logs-repository.com/articles/essentials-for-a-data-management-plan-for-spectroscopists/ https://www.karaackerman.com/content-management-plan/data-management-plan-in-a-nutshell/
  • 23. 23 Credit data IDs for data reuse The benefits of data sharing require its reuse. Linking people to the data will lead to ways credit them when data are reused. This would influence funding, promotion and incentivize more and better curation and sharing. 2 UMS PASS 20 years of evolution in data production Pierce et al (2019) 10.1038/d41586-019-01715-4
  • 24. 24 User Towards a project-centred data storage A new project-centric data object Moving away from the current user-centred approach 2 UMS PASS 20 years of evolution in data production Core Facility Bioinfo Contributor Project Core Facility Bioinfo Contributor links copy access copy Project managers PID PID PID ORCID
  • 25. 25 To conclude Progress in science depends on new techniques, new discoveries and new ideas, probably in that order Sydney Brenner Think about data sharing from the beginning of your project Promote reproducibility of results in your daily practices stephane.le_crom@sorbonne-universite.fr @slecrom 20 years of evolution in data production iStock