I share feedbacks as a genomics core facility scientific head and as a facilities manager over the last 20 years. I go trough evolution in the high throughput sequencing field and discuss about data storage and sharing.
1 of 26
Download to read offline
More Related Content
20 years of evolution in data production in health and life sciences
1. 20 years of evolution in
data production in health
and life sciences
FEBRUARY 2019
OMICS AND SYSTEMS BIOLOGY DAY
St辿phane LE CROM
stephane.le_crom@sorbonne-universite.fr
@slecrom
2. 2
IBENS genomics core facility
Our goals are:
1. To make functional genomics technologies available for all laboratories working on eukaryotes;
2. To help researchers managing their high throughput genomics projects;
3. To disseminate high scale approaches in genomics among the scientific community.
1 IBENS GENOMICS 20 years of evolution in data production
201720142009
IBiSARIO
RNG
Birth
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2010 2011
France
G辿nomique
2012 2013
ISO
9001
2015 2016
NF
X50-900
2018 2019
20
years
3. 3
20 years of genomics at IBENS
1 IBENS GENOMICS 20 years of evolution in data production
Yeast
genome
project
1996
Home made
microarrays
1999
Single Cell
2017
Nanopore
sequencing
2016
High
throughput
sequencing
2010
Transcriptome
microarrays
2005
4. 4
1st Sanger sequencing
Sequencing by synthesis discovered in 1977 by Fr辿d辿rick Sanger.
Around 1 kb of DNA by run during 6-8 hours. One read by sample.
1 IBENS GENOMICS 20 years of evolution in data production
From The Scientist
5. 5
Capillary sequencers
96 parallel capillary (up to 50 cm) array. 768 samples, 690 kb DNA, 3 hours run. The Broad Institute can
sequence 1 human genome in 12 days with 126 devices.
1 IBENS GENOMICS 20 years of evolution in data production
From The Scientist
6. 6
2nd High throughput sequencing
Increasing throughput by sequencing a huge number of short reads in parallel.
1 IBENS GENOMICS 20 years of evolution in data production
7. 7
Oxford Nanopore Technologies
3rd Long reads & real time sequencing
Pacific Biosciences SMRT sequencing
1 IBENS GENOMICS 20 years of evolution in data production
8. 8
A rich catalogue of HTS applications
Publication date of a representative article describing a method versus the number of citations that the article received. Methods are colored
by category, and the size of the data point is proportional to publication rate (citations/months). The inset indicates the color key as well
the proportion of methods in each group. For clarity, seq has been omitted from the labels.
1 IBENS GENOMICS 20 years of evolution in data production
Reuter et al. (2015) Mol Cell
9. 9
At various
levels
1 IBENS GENOMICS 20 years of evolution in data production
Shendure & Aiden
(2012) Nat. Biotech.
10. 10
User direct access to genomic data
1 IBENS GENOMICS 20 years of evolution in data production
https://vitagene.com/
11. 11
User direct access to genomic data
1 IBENS GENOMICS 20 years of evolution in data production
https://www.soccergenomics.com/
12. 12
User direct access to genomic data
1 IBENS GENOMICS 20 years of evolution in data production
https://nahibu.com/fr
13. 13
User direct access to genomics data
1 IBENS GENOMICS 20 years of evolution in data production
https://nahibu.com/fr
14. 14
The genomic data challenges
The time and the cost of sequencing genomes were reduced by a factor of 1 million in less than 10 years.
Anyone can get their whole genome sequenced for some hundreds of dollars.
The acquisition, storage, distribution, and analysis of large datasets require unique technological
solutions. By 2025, an estimated 40 exabytes (1018) of storage capacity will be required for human
genomic data.
Without proper legislation, companies could use genomic data for many purposes outside the sight of
the consumer. Users have to worry about whether or not companies sell their personal data to the highest
bidder.
How safe would it be for citizens to let authorities know about their innermost biological secrets
alongside their health risks and future prospects?
1 IBENS GENOMICS 20 years of evolution in data production
https://medicalfuturist.com/the-genomic-data-challenges-of-the-future/
15. 15
2UMS PASS - DATA PRODUCTION AND ANALYSIS IN LIFE
SCIENCES AND HEALTH
20 years of evolution in data production
16. 16
The UMS PASS missions
To accompany the facilities in their
organisation;
To increase the visibility of Sorbonne
University facilities;
To promote the skills of the experts present
on our platforms;
To animate the technological community
and to reinforce the links between the
personnel on our remote sites.
2 UMS PASS 20 years of evolution in data production
17. THE FACILITIES OF THE UMS DATA PRODUCTION AND ANALYSIS IN LIFE SCIENCES AND HEALTH
Biological Resource Centres
Molecular Analysis
Histomorphology
Bioinformatics
Genomics
Cytometry
Imaging
ICMquant
Histomorphologie
CISASME
ARTbio
UMS Facility
Associated Facility
CRB APHP.SU
P3S
CyPS
18. 18
CyPS Hyperion an imaging mass cytometry system
for highly multiplexed immunohistochemistry / 2020
@CyPS11913876
http://www.cytometrie.pitie-salpetriere.upmc.fr/
A wide range of technologies that
produce large amounts of data
P3S timsTOF Pro next generation ion
mobility separation for proteomics / 2018
@Plateforme_P3S
https://www.p3s.sorbonne-universite.fr/
2 UMS PASS 20 years of evolution in data production
19. 19
Infra
Operation
Network
Team
Training
Expertise
Users
Experimental
design
Samples
R&D
Technological
updates
New protocols
Storage
Analysis
Backup
Sharing
Collaborators
Publications
The data acquisition challenge
2 UMS PASS 20 years of evolution in data production
20. 20
Storage tailored to users needs
Storage is a central need;
Data is often maintained on an individual
basis: desktop hard drives, NAS servers,
clouds (free, academic, commercial)
Access to all the data produced does not
require the same level of performance and
availability:
Acquisitions and analysis;
Backups;
Archives.
2 UMS PASS 20 years of evolution in data production
21. 21
FAIR principles
They were designed with data-driven and machine-assisted open science in mind. The final aim is that
machines as well as people can reuse each others research objects.
2 UMS PASS 20 years of evolution in data production
https://www.go-fair.org/fair-principles/
22. 22
Data management plans (DMPs)
DMPs are essential.
They need to be tailored to the scientific fields they are to be employed in, and their planning
consequently grows quickly into a complex and challenging task.
2 UMS PASS 20 years of evolution in data production
https://logs-repository.com/articles/essentials-for-a-data-management-plan-for-spectroscopists/
https://www.karaackerman.com/content-management-plan/data-management-plan-in-a-nutshell/
23. 23
Credit data IDs
for data reuse
The benefits of data sharing require its reuse.
Linking people to the data will lead to ways
credit them when data are reused.
This would influence funding, promotion and
incentivize more and better curation and
sharing.
2 UMS PASS 20 years of evolution in data production
Pierce et al (2019) 10.1038/d41586-019-01715-4
24. 24
User
Towards a project-centred data storage
A new project-centric data object
Moving away from the current user-centred
approach
2 UMS PASS 20 years of evolution in data production
Core
Facility
Bioinfo
Contributor
Project
Core
Facility
Bioinfo
Contributor
links
copy
access
copy
Project
managers
PID
PID
PID
ORCID
25. 25
To conclude
Progress in science depends on new
techniques, new discoveries and new ideas,
probably in that order
Sydney Brenner
Think about data sharing from the beginning
of your project
Promote reproducibility of results in your
daily practices
stephane.le_crom@sorbonne-universite.fr
@slecrom
20 years of evolution in data production
iStock