際際滷

際際滷Share a Scribd company logo
Thoughts on Biological Data
Sustainability
For the Global Biodata Consortium Advisory Board Meeting
Taken from Terrence R. Johnson & Philip E. Bourne
The Biological Data Sustainability Paradox
https://arxiv.org/abs/2311.05668
Philip E. Bourne
peb6a@virginia.edu
/pebourne
January 15, 2024
What we propose is not intended as an answer to
the biological data sustainability problem, but
rather a call to think differently in collaboration
with people outside biomedicine who study such
problems.
The Current Situation 
Biological Data Sustainability (BDS) 1.0
 More data, more demand, more money spent, less money for
innovation
 Mixture of aggregated and standalone data resources
 Current culture - sense of entitlement
 Community expects free access to data
 Data providers expect to be funded ad infinitum
 Sense of ownership, sovereignty
Hard to see how this scales in a digital economy
Enter Global Biodata Consortium
 Noble cause
 Chicken and egg situation with funders:
 Funders in a wait and see mode; seeing needs more international funders
 GBC core resources
 Identify appropriate resources
 Stimulate discussion and action between them towards sustainability
Hard to see how this moves the needle on sustainability, but
Not supporting the GBC only makes it worse
Towards Biological Data Sustainability (BDS) 2.0
 Recognize that data have monetary value  easier in an AI world
 Recognize the value of public-private partnership (PPP)
 Embrace a data economy  one model based on cap and trade
 Data has value expressed in some way e.g., credits
 Data curation as a service has value that can be traded in credits
Consider an Example - PDB
BDS 1.0
 No PPP
 Fear of funding loss
 Common data representation
 Multiple redundant global sites
 Curation done in house
 Competitive feature creep
BDS 2.0
 PPP
 Encouraged as part of the model
 Common data representation
 Single globally managed site
 Curation possibly outsourced
 Features defined by global
community
BDS 2.0 for PDB Only - Data Credits and Service
Model
 Global cap on how much will be spent on PDB agreed by funders
 GBC or funders (broker) create credits in accordance with the cap that can be
traded and audited
 Credits distributed according to current data deposition and curation work loads
 Sites processing lots of data will use their credits and can request, via the broker,
credits from other sites that are less productive
 Sites can subcontract paying with credits
 New sites can be allocated startup credits by the broker
 Sites that are not productive or non-conformant to data standards can be refused
credits
 Private sector heavy data users will be asked to contribute credits via the broker
and can specify who gets the credits initially
BDS 2.0 for PDB Only - Data Credits and
Service Model  Advantages/Disadvantages
Advantages
 Globalizes the enterprise
 Audits the enterprise
 Encourages competition (for
credits) across sites
 The broker can impose rules that
foster FAIR data across the
enterprise
 Private sector engagement
Disadvantages
 Level of global cooperation not
hitherto seen
 Over-competition might impact
collaboration
BDS 2.0 Global vs PDB - Data Credits and
Service Model
 Goes beyond PDB sites
 Curation is decentralized
 Introduces data users and data producers (i.e., researchers) into the system
for credits
 Data resource becomes a credit broker
 Resource awards credits to data depositor when data are downloaded
 Data user expends credits to download data
 Researchers without credits can curate data to obtain credits
 Researchers with too many credits (through deposition) can offer credits to
other researchers to curate their data
 Researchers can buy and sell credits
https://datascience.virginia.edu/people/alex-gates
https://datascience.virginia.edu/people/terence-johnson
Thoughts on Biological Data Sustainability

More Related Content

Thoughts on Biological Data Sustainability

  • 1. Thoughts on Biological Data Sustainability For the Global Biodata Consortium Advisory Board Meeting Taken from Terrence R. Johnson & Philip E. Bourne The Biological Data Sustainability Paradox https://arxiv.org/abs/2311.05668 Philip E. Bourne peb6a@virginia.edu /pebourne January 15, 2024
  • 2. What we propose is not intended as an answer to the biological data sustainability problem, but rather a call to think differently in collaboration with people outside biomedicine who study such problems.
  • 3. The Current Situation Biological Data Sustainability (BDS) 1.0 More data, more demand, more money spent, less money for innovation Mixture of aggregated and standalone data resources Current culture - sense of entitlement Community expects free access to data Data providers expect to be funded ad infinitum Sense of ownership, sovereignty Hard to see how this scales in a digital economy
  • 4. Enter Global Biodata Consortium Noble cause Chicken and egg situation with funders: Funders in a wait and see mode; seeing needs more international funders GBC core resources Identify appropriate resources Stimulate discussion and action between them towards sustainability Hard to see how this moves the needle on sustainability, but Not supporting the GBC only makes it worse
  • 5. Towards Biological Data Sustainability (BDS) 2.0 Recognize that data have monetary value easier in an AI world Recognize the value of public-private partnership (PPP) Embrace a data economy one model based on cap and trade Data has value expressed in some way e.g., credits Data curation as a service has value that can be traded in credits
  • 6. Consider an Example - PDB BDS 1.0 No PPP Fear of funding loss Common data representation Multiple redundant global sites Curation done in house Competitive feature creep BDS 2.0 PPP Encouraged as part of the model Common data representation Single globally managed site Curation possibly outsourced Features defined by global community
  • 7. BDS 2.0 for PDB Only - Data Credits and Service Model Global cap on how much will be spent on PDB agreed by funders GBC or funders (broker) create credits in accordance with the cap that can be traded and audited Credits distributed according to current data deposition and curation work loads Sites processing lots of data will use their credits and can request, via the broker, credits from other sites that are less productive Sites can subcontract paying with credits New sites can be allocated startup credits by the broker Sites that are not productive or non-conformant to data standards can be refused credits Private sector heavy data users will be asked to contribute credits via the broker and can specify who gets the credits initially
  • 8. BDS 2.0 for PDB Only - Data Credits and Service Model Advantages/Disadvantages Advantages Globalizes the enterprise Audits the enterprise Encourages competition (for credits) across sites The broker can impose rules that foster FAIR data across the enterprise Private sector engagement Disadvantages Level of global cooperation not hitherto seen Over-competition might impact collaboration
  • 9. BDS 2.0 Global vs PDB - Data Credits and Service Model Goes beyond PDB sites Curation is decentralized Introduces data users and data producers (i.e., researchers) into the system for credits Data resource becomes a credit broker Resource awards credits to data depositor when data are downloaded Data user expends credits to download data Researchers without credits can curate data to obtain credits Researchers with too many credits (through deposition) can offer credits to other researchers to curate their data Researchers can buy and sell credits