�ݺ�ߣ

Brett Whitty
ICGC Data Coordination Center Curation Manager
Ontario Institute for Cancer Research
Open Cloud Consortium
“Towards a Biomedical Commons Cloud” Working Group
April, 2013
Some Considerations for Enabling Users of
International Cancer Genome Consortium (ICGC)
Data in a Biomedical Compute Cloud

2
53 projects 16 countries/regions > 25,000 tumors committed

ICGC Data
Current data:
(represents ~1/3 of goal)
• ~100GB of gzipped analysis results (open access)
◦ hosted via HTTP(S)/FTP at ICGC DCC data portal
• ~700TB raw sequencing and array datasets* (controlled access)
◦ hosted at EBI EGA repository (and other public repos)
*excluding data from TCGA projects (~50% of ICGC member projects are TCGA projects)
3

ICGC Data Access
• Blanket access to ICGC data granted by ICGC Data Access & Compliance Office (DACO)
◦ Excludes TCGA data for which access is granted by the TCGA project
• DACO, ICGC.org & DCC support OpenID for authentication
◦ Access to ICGC & TCGA data at NCBI, CGHub, EBI EGA use different authentication mechanisms
• ICGC datasets are presently distributed across several public repositories
◦ Presents a challenge to end users
◦ Need to aggregate the data through a single access point, virtually if not physically
• Ideally a single user sign-on method would be recognized by all resources
◦ May be impossible due to technical/organizational challenges
4

ICGC Computes(1)
• No common ICGC data analysis centers (yet)
• No common ICGC workflow systems (yet)
• No common ICGC pipelines (yet)
5

ICGC Computes(2)
• Who are the cloud-based data consumers?
◦ What do they need/want?
• Sufficient to have ICGC simply provide datasets?
• Does ICGC need to also provide canned analysis pipelines?
◦ Reproduce methods used in ICGC publications?
◦ Who creates/maintains these?
◦ Using which workflow system?
6

Other Issues
• Can ICGC DACO assure authorization and compliance of
cloud-based data consumers?
◦ Auditing, revoking access, etc.
◦ How is this achieved?
• What are the support needs of “ICGC Cloud” users?
◦ How much effort will they require?
◦ From whom?
• What is the minimal metadata we need to collect to make
the data useful?
◦ Who ensures this?
7

�ݺ�ߣ

2013-B_Whitty-biomedical_cloud

More Related Content

2013-B_Whitty-biomedical_cloud