This document discusses approaches to improving biological data sustainability. It proposes moving from the current BDS 1.0 model to a BDS 2.0 model. BDS 1.0 is characterized by increasing data and costs but decreasing funds for innovation. BDS 2.0 would recognize the monetary value of data and embrace public-private partnerships and a data economy. It suggests a "data credits" system where data curation is a service with monetary value. The document provides examples of how this could work for the Protein Data Bank (PDB) and more globally. It argues BDS 2.0 could encourage competition, globalization, and private sector engagement to better foster sustainable and FAIR biological data.
1 of 11
More Related Content
Thoughts on Biological Data Sustainability
1. Thoughts on Biological Data
Sustainability
For the Global Biodata Consortium Advisory Board Meeting
Taken from Terrence R. Johnson & Philip E. Bourne
The Biological Data Sustainability Paradox
https://arxiv.org/abs/2311.05668
Philip E. Bourne
peb6a@virginia.edu
/pebourne
January 15, 2024
2. What we propose is not intended as an answer to
the biological data sustainability problem, but
rather a call to think differently in collaboration
with people outside biomedicine who study such
problems.
3. The Current Situation
Biological Data Sustainability (BDS) 1.0
More data, more demand, more money spent, less money for
innovation
Mixture of aggregated and standalone data resources
Current culture - sense of entitlement
Community expects free access to data
Data providers expect to be funded ad infinitum
Sense of ownership, sovereignty
Hard to see how this scales in a digital economy
4. Enter Global Biodata Consortium
Noble cause
Chicken and egg situation with funders:
Funders in a wait and see mode; seeing needs more international funders
GBC core resources
Identify appropriate resources
Stimulate discussion and action between them towards sustainability
Hard to see how this moves the needle on sustainability, but
Not supporting the GBC only makes it worse
5. Towards Biological Data Sustainability (BDS) 2.0
Recognize that data have monetary value easier in an AI world
Recognize the value of public-private partnership (PPP)
Embrace a data economy one model based on cap and trade
Data has value expressed in some way e.g., credits
Data curation as a service has value that can be traded in credits
6. Consider an Example - PDB
BDS 1.0
No PPP
Fear of funding loss
Common data representation
Multiple redundant global sites
Curation done in house
Competitive feature creep
BDS 2.0
PPP
Encouraged as part of the model
Common data representation
Single globally managed site
Curation possibly outsourced
Features defined by global
community
7. BDS 2.0 for PDB Only - Data Credits and Service
Model
Global cap on how much will be spent on PDB agreed by funders
GBC or funders (broker) create credits in accordance with the cap that can be
traded and audited
Credits distributed according to current data deposition and curation work loads
Sites processing lots of data will use their credits and can request, via the broker,
credits from other sites that are less productive
Sites can subcontract paying with credits
New sites can be allocated startup credits by the broker
Sites that are not productive or non-conformant to data standards can be refused
credits
Private sector heavy data users will be asked to contribute credits via the broker
and can specify who gets the credits initially
8. BDS 2.0 for PDB Only - Data Credits and
Service Model Advantages/Disadvantages
Advantages
Globalizes the enterprise
Audits the enterprise
Encourages competition (for
credits) across sites
The broker can impose rules that
foster FAIR data across the
enterprise
Private sector engagement
Disadvantages
Level of global cooperation not
hitherto seen
Over-competition might impact
collaboration
9. BDS 2.0 Global vs PDB - Data Credits and
Service Model
Goes beyond PDB sites
Curation is decentralized
Introduces data users and data producers (i.e., researchers) into the system
for credits
Data resource becomes a credit broker
Resource awards credits to data depositor when data are downloaded
Data user expends credits to download data
Researchers without credits can curate data to obtain credits
Researchers with too many credits (through deposition) can offer credits to
other researchers to curate their data
Researchers can buy and sell credits