際際滷

際際滷Share a Scribd company logo
Navigating the Marine Geophysical
Data Life Cycle:
From Acquisition and Synthesis to
Publication and Open Data Access
Vicki Ferrini
Lamont-Doherty Earth Observatory
Columbia University
Research InterestsResearch Interests
 Mapping seafloor morphology to
understand processes at a variety of scales
 Coastal, deep sea, rivers, lakes
 Techniques for remote seafloor
characterization using multibeam sonar
 Morphology
 Backscatter intensity
 Multibeam sonar data quality
 Data preservation, integration and access
Increasing Importance ofIncreasing Importance of
Data ManagementData Management
 Support science and discovery
 Scientific reproducibility
 Costs of acquisition
 Optimizing operations
 Increasing volumes of data
 Data policies with increasing focus on data
sharing
 Data Syntheses
 Data Publication
How can we lessen the burden of data
management for the science community?
A community-based data facility funded by NSF to
support, sustain, and advance the geosciences by
providing data services for observational solid earth
data from the Ocean, Earth, and Polar Sciences.
http://www.iedadata.org/
Integrated Earth Data Applications
 Investigator-focused
 Ensure Fitness for Re-use through data
stewardship
 Ensure professional data curation services
 Long-term archiving & access
 Persistent, unique identification
 Discoverability (metadata registration)
 Integrate with the scholarly communication
ecosystem
Domain-specific Repositories & Services
 Marine Geophysical Data
 Bathymetry, sidescan, subbottom
 Academic Seismic Facility (MCS, SCS)
 Data from AUV, ROV, HOV, Ship
 Complementary datasets
 Navigation, bottom photos
 Sample-based Data
 Sample Registry (SESAR)
 Geochemistry
 Geochronology
 Technical reports
Data Curated in IEDA Systems
Navigating the Marine Geophysical Data Life Cycle
Data Life Cycle: Plan
 Data Management Plan Tool
 Facilitate assembly
 Inform Investigators
 Inform down-stream repositories
 Promote dialogue
 Data Acquisition Plan
 Metadata & data templates
 Promote & facilitate
contemporaneous
documentation
Data Life Cycle: Collect & Assure
 Promote Best Practices
 What to document
 How to document
 Tools and workflows
to facilitate digital
documentation
 Metadata & Data
Templates
Data Life Cycle: Document & Preserve
 Document & capture data &
metadata as soon as it is available
 Simple interfaces & guidelines
 Sample metadata registry
 Link to complementary data
& metadata
Data Life Cycle: Analyze
 Tools to:
 Support domain specialists
 Make specialist data accessible
to non-specialist users
 Integrate & visualize data
 Quantitative access to Data
Syntheses
 Access to complementary
data & resources
Data Life Cycle: Integrate & Share
 Advise on what to preserve & how
 Data supporting pubs
 Data of value
 Facilitate data prep.
 Metadata requirements
 Templates
 Format guidelines
Data Life Cycle: Document & Preserve
Develop simple workflows, interfaces & templates
to capture sufficient information for:
 Long-term curation & access
 Inclusion in syntheses
 Links to scientific publications
 Data Publication
 Data use, discovery & re-use
 Attribution & collaboration
 Data Download Stats
 Data Compliance Reporting
Data Compliance Reporting Tool
 Tool for demonstrating compliance:
 Award-based
 Informed by DMP
 Report includes:
 Data Inventory
 Data release Status
 Links to data
 Save as PDF

http://www.iedadata.org/compliance/
How do we engage the science community?
http://www.marine-geo.org/
MGDS Search & Data CatalogMGDS Search & Data Catalog
 Text & Map-based Search
 Rich metadata
 Download data files
 Proprietary Hold
 Password Access
 Attribution
 Links to Refs
 Data DOIs
 Download Stats
 Web Services
MGDS Search & Data CatalogMGDS Search & Data Catalog
Navigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life Cycle
Data files are great - if you know what to do
with them
How can we make data quantitatively
accessible to non-specialists?
GeoMapApp
 Free Java Desktop App
 Basic GIS functionality
 Core functionality:
 GMRT Basemap
 Gridded & Tabular Data
 Linked Views
 Access online datasets (grids, shapefiles, tables)
 Attribution & links to source data
 Custom Portals
 Underway Geophysics, MB Sonar, DSDP, PetDB
 Import & Export
 Table, Image, Grid, KMZ
http://www.geomapapp.org/
Navigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life Cycle
GeoMapApp default basemap: GMRT
GMRT Synthesis
-Multi-resolution synthesis
-Access provided to images &
gridded compilation
-9 resolution levels to 100 m
-Dynamically maintained
-Mask highlights hi-res data
-Attribution to data sources &
contributing scientists
-Source Data Includes:
-ASTER, NED, IBCAO, BEDMAP,
Smith and Sandwell
-Contributed grids
-Swath data from > 700 cruises (public domain)
 1992  Ridge Multibeam
Synthesis Project
 2003  Expanded to
include US-funded data
from Southern Ocean
 2004  present --
Expanded to include
public domain data from
throughout global oceans
 ongoing growth by ~80
cruises/yr
 2009  G-cubed paper
(Ryan et al., 2009)
GMRTv2.6 ~780 cruises (April 2014)
Global Multi-Resolution Topography
http://gmrt.marine-geo.org
http://gmrt.marine-geo.org/
GMRT Components
LDEO 100-m
compilation* (raw &
processed swath files in
public domain)
Contributed
Grids (< 500 m res.)
Global & Regional
Grids (>= 500 m res.)
e.g. GEBCO_08, IBCAO
*LDEO team performs QC of ping files
MB files metadata
GMRT: Access
 Images & gridded data
 Desktop Apps
 GeoMapApp
 Virtual Ocean
 Web App
 GMRT MapTool
 iPad/iPhone App
 Earth Observer
 Web Map Services
 Images & Mask
Export as: NetCDF, Arc ASCII, Binary, Fledermaus, KMZ, PNG, Geotiff
GMRT: MB Data Reduction & SynthesisGMRT: MB Data Reduction & Synthesis
 Bad navigation
 Noisy outer beams
 Attitude problems
 Bad soundings
 Instrument problems
 Bad weather
 Sound velocity
 Slow speed in turns
 Quality assessment
for grid weighting
and resolution
Tracking and Managing MB
Content for GMRT
MGDS
Relational DB
MB data
files GMRT Access &
Web services
GMRT: Attribution & Access to Source Data
GMRT: Attribution & Access to Source Data
GMRT: Attribution & Access to Source Data
Navigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life Cycle
Navigating the Marine Geophysical Data Life Cycle
Cruise-level Attribution & Provenance
GMRT: Next Steps
 GMRT Version 2.6 April 2014
 MB Data from ~50 more cruises
 More Contributed grids (including LOS)
 Revise GMRT MapTool (web interface)
 more download format options
 Enhanced Web Services
 Gridded Content
 Attribution
 Enhanced Accessibility to Source Data
 DOIs on processed source data files
 Search & download multiple processed MB files
 GEBCO High-Res Effort
How can we optimize quality of the
data being preserved?
(Good data in = good data out)
Complementary Fleet-Wide EffortsComplementary Fleet-Wide Efforts
GMRTR2RMAC
GOAL: Well-documented high-quality publicly available data
199220092011
 Focus on Raw Underway Data
 Instruments permanently installed on ships
 Fleet-wide solution
~500 cruises/year
 Core Services
 Data documentation & preservation
 Programmatic Quality Assessment
 Navigation Products
 Event Logger
 Real-time MET/TSG
R2R Data Stewardship
MB Raw Data PreservationMB Raw Data Preservation
 769 file sets
 291,673 files
 ~ 7.6 TB
 from 671
cruises
as of Apr 15, 2014
R2R: Quality AssessmentR2R: Quality Assessment
 Programmatic post-cruise review of data
 Identify suspicious data
 Feedback to Operators
 Distributable Code
 Leverage existing tools where possible
(for MB data: MB-System)
 Customizable thresholds
 Generate QA Report
 Document QA procedures
 Provide info for downstream data use
R2R: QA DashboardR2R: QA Dashboard
 By Cruise
 By Ship
 By Instrument
 By Test
R2R: MB Quality AssessmentR2R: MB Quality Assessment
Lead: S. OHara (LDEO)
Complementary Fleet-Wide EffortsComplementary Fleet-Wide Efforts
GMRTR2RMAC
GOAL: Well-documented high-quality publicly available data
199220092011
Multibeam Advisory CommitteeMultibeam Advisory Committee
 Community of Stakeholders
 Fleet-wide Approach
 Best Practices
 Technical Resources
 Technical Teams
 Shipboard Acceptance
 Acoustic Noise
 Quality Assurance
 Help Desk
http://mac.unols.org/P. Johnson (UNH) & J. Beaudoin (UNH)
MAC AccomplishmentsMAC Accomplishments
 Test Reports Gathered and Posted
 Tools
 SVP Editor Tool
 SVP Mission Planning Tool
 Best Practice Cookbooks
 Ship visits
 Acoustic Noise Testing
 Quality Assurance
 Sea Acceptance
 Assistance to Operators & Investigators
Reports from Technical TeamsReports from Technical Teams
Technical ResourcesTechnical Resources
GMRT, R2R MBQA & MAC
 How can we lessen the burden?
 Simple workflows, interfaces, & guidelines
 How can we engage the science community?
 High-value Content
 Reward (attribution, citation)
 How can we make data accessible to non-
specialists?
 Data synthesis
 How can we optimize data quality?
 Best practices at acquisition
Summary:
How to Navigate the
Data Life Cycle
 Know what resources are available
 Tools to make process easier
 Access existing Data
 Communicate
 Upstream
 Downstream (Data Managers)
 Plan ahead
 Document contemporaneously
 Treat data as a valuable community resource
 Participate! Input always needed for:
 Metadata & data format standards
 Usability of interfaces

More Related Content

Navigating the Marine Geophysical Data Life Cycle

  • 1. Navigating the Marine Geophysical Data Life Cycle: From Acquisition and Synthesis to Publication and Open Data Access Vicki Ferrini Lamont-Doherty Earth Observatory Columbia University
  • 2. Research InterestsResearch Interests Mapping seafloor morphology to understand processes at a variety of scales Coastal, deep sea, rivers, lakes Techniques for remote seafloor characterization using multibeam sonar Morphology Backscatter intensity Multibeam sonar data quality Data preservation, integration and access
  • 3. Increasing Importance ofIncreasing Importance of Data ManagementData Management Support science and discovery Scientific reproducibility Costs of acquisition Optimizing operations Increasing volumes of data Data policies with increasing focus on data sharing Data Syntheses Data Publication
  • 4. How can we lessen the burden of data management for the science community?
  • 5. A community-based data facility funded by NSF to support, sustain, and advance the geosciences by providing data services for observational solid earth data from the Ocean, Earth, and Polar Sciences. http://www.iedadata.org/ Integrated Earth Data Applications
  • 6. Investigator-focused Ensure Fitness for Re-use through data stewardship Ensure professional data curation services Long-term archiving & access Persistent, unique identification Discoverability (metadata registration) Integrate with the scholarly communication ecosystem Domain-specific Repositories & Services
  • 7. Marine Geophysical Data Bathymetry, sidescan, subbottom Academic Seismic Facility (MCS, SCS) Data from AUV, ROV, HOV, Ship Complementary datasets Navigation, bottom photos Sample-based Data Sample Registry (SESAR) Geochemistry Geochronology Technical reports Data Curated in IEDA Systems
  • 9. Data Life Cycle: Plan Data Management Plan Tool Facilitate assembly Inform Investigators Inform down-stream repositories Promote dialogue Data Acquisition Plan Metadata & data templates Promote & facilitate contemporaneous documentation
  • 10. Data Life Cycle: Collect & Assure Promote Best Practices What to document How to document Tools and workflows to facilitate digital documentation Metadata & Data Templates
  • 11. Data Life Cycle: Document & Preserve Document & capture data & metadata as soon as it is available Simple interfaces & guidelines Sample metadata registry Link to complementary data & metadata
  • 12. Data Life Cycle: Analyze Tools to: Support domain specialists Make specialist data accessible to non-specialist users Integrate & visualize data Quantitative access to Data Syntheses Access to complementary data & resources
  • 13. Data Life Cycle: Integrate & Share Advise on what to preserve & how Data supporting pubs Data of value Facilitate data prep. Metadata requirements Templates Format guidelines
  • 14. Data Life Cycle: Document & Preserve Develop simple workflows, interfaces & templates to capture sufficient information for: Long-term curation & access Inclusion in syntheses Links to scientific publications Data Publication Data use, discovery & re-use Attribution & collaboration Data Download Stats Data Compliance Reporting
  • 15. Data Compliance Reporting Tool Tool for demonstrating compliance: Award-based Informed by DMP Report includes: Data Inventory Data release Status Links to data Save as PDF http://www.iedadata.org/compliance/
  • 16. How do we engage the science community?
  • 18. MGDS Search & Data CatalogMGDS Search & Data Catalog Text & Map-based Search Rich metadata Download data files Proprietary Hold Password Access Attribution Links to Refs Data DOIs Download Stats Web Services
  • 19. MGDS Search & Data CatalogMGDS Search & Data Catalog
  • 26. Data files are great - if you know what to do with them How can we make data quantitatively accessible to non-specialists?
  • 27. GeoMapApp Free Java Desktop App Basic GIS functionality Core functionality: GMRT Basemap Gridded & Tabular Data Linked Views Access online datasets (grids, shapefiles, tables) Attribution & links to source data Custom Portals Underway Geophysics, MB Sonar, DSDP, PetDB Import & Export Table, Image, Grid, KMZ http://www.geomapapp.org/
  • 31. GMRT Synthesis -Multi-resolution synthesis -Access provided to images & gridded compilation -9 resolution levels to 100 m -Dynamically maintained -Mask highlights hi-res data -Attribution to data sources & contributing scientists -Source Data Includes: -ASTER, NED, IBCAO, BEDMAP, Smith and Sandwell -Contributed grids -Swath data from > 700 cruises (public domain)
  • 32. 1992 Ridge Multibeam Synthesis Project 2003 Expanded to include US-funded data from Southern Ocean 2004 present -- Expanded to include public domain data from throughout global oceans ongoing growth by ~80 cruises/yr 2009 G-cubed paper (Ryan et al., 2009) GMRTv2.6 ~780 cruises (April 2014) Global Multi-Resolution Topography http://gmrt.marine-geo.org http://gmrt.marine-geo.org/
  • 33. GMRT Components LDEO 100-m compilation* (raw & processed swath files in public domain) Contributed Grids (< 500 m res.) Global & Regional Grids (>= 500 m res.) e.g. GEBCO_08, IBCAO *LDEO team performs QC of ping files MB files metadata
  • 34. GMRT: Access Images & gridded data Desktop Apps GeoMapApp Virtual Ocean Web App GMRT MapTool iPad/iPhone App Earth Observer Web Map Services Images & Mask Export as: NetCDF, Arc ASCII, Binary, Fledermaus, KMZ, PNG, Geotiff
  • 35. GMRT: MB Data Reduction & SynthesisGMRT: MB Data Reduction & Synthesis Bad navigation Noisy outer beams Attitude problems Bad soundings Instrument problems Bad weather Sound velocity Slow speed in turns Quality assessment for grid weighting and resolution
  • 36. Tracking and Managing MB Content for GMRT MGDS Relational DB MB data files GMRT Access & Web services
  • 37. GMRT: Attribution & Access to Source Data
  • 38. GMRT: Attribution & Access to Source Data
  • 39. GMRT: Attribution & Access to Source Data
  • 44. GMRT: Next Steps GMRT Version 2.6 April 2014 MB Data from ~50 more cruises More Contributed grids (including LOS) Revise GMRT MapTool (web interface) more download format options Enhanced Web Services Gridded Content Attribution Enhanced Accessibility to Source Data DOIs on processed source data files Search & download multiple processed MB files GEBCO High-Res Effort
  • 45. How can we optimize quality of the data being preserved? (Good data in = good data out)
  • 46. Complementary Fleet-Wide EffortsComplementary Fleet-Wide Efforts GMRTR2RMAC GOAL: Well-documented high-quality publicly available data 199220092011
  • 47. Focus on Raw Underway Data Instruments permanently installed on ships Fleet-wide solution ~500 cruises/year Core Services Data documentation & preservation Programmatic Quality Assessment Navigation Products Event Logger Real-time MET/TSG R2R Data Stewardship
  • 48. MB Raw Data PreservationMB Raw Data Preservation 769 file sets 291,673 files ~ 7.6 TB from 671 cruises as of Apr 15, 2014
  • 49. R2R: Quality AssessmentR2R: Quality Assessment Programmatic post-cruise review of data Identify suspicious data Feedback to Operators Distributable Code Leverage existing tools where possible (for MB data: MB-System) Customizable thresholds Generate QA Report Document QA procedures Provide info for downstream data use
  • 50. R2R: QA DashboardR2R: QA Dashboard By Cruise By Ship By Instrument By Test
  • 51. R2R: MB Quality AssessmentR2R: MB Quality Assessment Lead: S. OHara (LDEO)
  • 52. Complementary Fleet-Wide EffortsComplementary Fleet-Wide Efforts GMRTR2RMAC GOAL: Well-documented high-quality publicly available data 199220092011
  • 53. Multibeam Advisory CommitteeMultibeam Advisory Committee Community of Stakeholders Fleet-wide Approach Best Practices Technical Resources Technical Teams Shipboard Acceptance Acoustic Noise Quality Assurance Help Desk http://mac.unols.org/P. Johnson (UNH) & J. Beaudoin (UNH)
  • 54. MAC AccomplishmentsMAC Accomplishments Test Reports Gathered and Posted Tools SVP Editor Tool SVP Mission Planning Tool Best Practice Cookbooks Ship visits Acoustic Noise Testing Quality Assurance Sea Acceptance Assistance to Operators & Investigators
  • 55. Reports from Technical TeamsReports from Technical Teams
  • 57. GMRT, R2R MBQA & MAC
  • 58. How can we lessen the burden? Simple workflows, interfaces, & guidelines How can we engage the science community? High-value Content Reward (attribution, citation) How can we make data accessible to non- specialists? Data synthesis How can we optimize data quality? Best practices at acquisition Summary:
  • 59. How to Navigate the Data Life Cycle Know what resources are available Tools to make process easier Access existing Data Communicate Upstream Downstream (Data Managers) Plan ahead Document contemporaneously Treat data as a valuable community resource Participate! Input always needed for: Metadata & data format standards Usability of interfaces

Editor's Notes

  • #2: Seafloor bathy data are needed for a broad spectrum of studies. But coverage is exceedingly sparse and all multibeam are of high value
  • #32: Seafloor bathy data are needed for a broad spectrum of studies. But coverage is exceedingly sparse and all multibeam are of high value
  • #39: Through geomapapp the original data source can be identified. Here by clicking on the Italian data set Bouvet-Ligi, the location of the survey is highlighted in enclosed box on map view
  • #44: Copy right claims can not be made by data contributors for the synthesized products