I gave this presentation at the University of New Hampshire's Center for Coastal and Ocean Mapping on April 18, 2014 describing the marine geophysical data life cycle and a variety of resources available to help investigators navigate the world of data management, as well as efforts focused on optimizing high-quality publicly available data.
1 of 59
Download to read offline
More Related Content
Navigating the Marine Geophysical Data Life Cycle
1. Navigating the Marine Geophysical
Data Life Cycle:
From Acquisition and Synthesis to
Publication and Open Data Access
Vicki Ferrini
Lamont-Doherty Earth Observatory
Columbia University
2. Research InterestsResearch Interests
Mapping seafloor morphology to
understand processes at a variety of scales
Coastal, deep sea, rivers, lakes
Techniques for remote seafloor
characterization using multibeam sonar
Morphology
Backscatter intensity
Multibeam sonar data quality
Data preservation, integration and access
3. Increasing Importance ofIncreasing Importance of
Data ManagementData Management
Support science and discovery
Scientific reproducibility
Costs of acquisition
Optimizing operations
Increasing volumes of data
Data policies with increasing focus on data
sharing
Data Syntheses
Data Publication
4. How can we lessen the burden of data
management for the science community?
5. A community-based data facility funded by NSF to
support, sustain, and advance the geosciences by
providing data services for observational solid earth
data from the Ocean, Earth, and Polar Sciences.
http://www.iedadata.org/
Integrated Earth Data Applications
6. Investigator-focused
Ensure Fitness for Re-use through data
stewardship
Ensure professional data curation services
Long-term archiving & access
Persistent, unique identification
Discoverability (metadata registration)
Integrate with the scholarly communication
ecosystem
Domain-specific Repositories & Services
7. Marine Geophysical Data
Bathymetry, sidescan, subbottom
Academic Seismic Facility (MCS, SCS)
Data from AUV, ROV, HOV, Ship
Complementary datasets
Navigation, bottom photos
Sample-based Data
Sample Registry (SESAR)
Geochemistry
Geochronology
Technical reports
Data Curated in IEDA Systems
9. Data Life Cycle: Plan
Data Management Plan Tool
Facilitate assembly
Inform Investigators
Inform down-stream repositories
Promote dialogue
Data Acquisition Plan
Metadata & data templates
Promote & facilitate
contemporaneous
documentation
10. Data Life Cycle: Collect & Assure
Promote Best Practices
What to document
How to document
Tools and workflows
to facilitate digital
documentation
Metadata & Data
Templates
11. Data Life Cycle: Document & Preserve
Document & capture data &
metadata as soon as it is available
Simple interfaces & guidelines
Sample metadata registry
Link to complementary data
& metadata
12. Data Life Cycle: Analyze
Tools to:
Support domain specialists
Make specialist data accessible
to non-specialist users
Integrate & visualize data
Quantitative access to Data
Syntheses
Access to complementary
data & resources
13. Data Life Cycle: Integrate & Share
Advise on what to preserve & how
Data supporting pubs
Data of value
Facilitate data prep.
Metadata requirements
Templates
Format guidelines
14. Data Life Cycle: Document & Preserve
Develop simple workflows, interfaces & templates
to capture sufficient information for:
Long-term curation & access
Inclusion in syntheses
Links to scientific publications
Data Publication
Data use, discovery & re-use
Attribution & collaboration
Data Download Stats
Data Compliance Reporting
15. Data Compliance Reporting Tool
Tool for demonstrating compliance:
Award-based
Informed by DMP
Report includes:
Data Inventory
Data release Status
Links to data
Save as PDF
http://www.iedadata.org/compliance/
18. MGDS Search & Data CatalogMGDS Search & Data Catalog
Text & Map-based Search
Rich metadata
Download data files
Proprietary Hold
Password Access
Attribution
Links to Refs
Data DOIs
Download Stats
Web Services
19. MGDS Search & Data CatalogMGDS Search & Data Catalog
26. Data files are great - if you know what to do
with them
How can we make data quantitatively
accessible to non-specialists?
31. GMRT Synthesis
-Multi-resolution synthesis
-Access provided to images &
gridded compilation
-9 resolution levels to 100 m
-Dynamically maintained
-Mask highlights hi-res data
-Attribution to data sources &
contributing scientists
-Source Data Includes:
-ASTER, NED, IBCAO, BEDMAP,
Smith and Sandwell
-Contributed grids
-Swath data from > 700 cruises (public domain)
32. 1992 Ridge Multibeam
Synthesis Project
2003 Expanded to
include US-funded data
from Southern Ocean
2004 present --
Expanded to include
public domain data from
throughout global oceans
ongoing growth by ~80
cruises/yr
2009 G-cubed paper
(Ryan et al., 2009)
GMRTv2.6 ~780 cruises (April 2014)
Global Multi-Resolution Topography
http://gmrt.marine-geo.org
http://gmrt.marine-geo.org/
33. GMRT Components
LDEO 100-m
compilation* (raw &
processed swath files in
public domain)
Contributed
Grids (< 500 m res.)
Global & Regional
Grids (>= 500 m res.)
e.g. GEBCO_08, IBCAO
*LDEO team performs QC of ping files
MB files metadata
35. GMRT: MB Data Reduction & SynthesisGMRT: MB Data Reduction & Synthesis
Bad navigation
Noisy outer beams
Attitude problems
Bad soundings
Instrument problems
Bad weather
Sound velocity
Slow speed in turns
Quality assessment
for grid weighting
and resolution
36. Tracking and Managing MB
Content for GMRT
MGDS
Relational DB
MB data
files GMRT Access &
Web services
44. GMRT: Next Steps
GMRT Version 2.6 April 2014
MB Data from ~50 more cruises
More Contributed grids (including LOS)
Revise GMRT MapTool (web interface)
more download format options
Enhanced Web Services
Gridded Content
Attribution
Enhanced Accessibility to Source Data
DOIs on processed source data files
Search & download multiple processed MB files
GEBCO High-Res Effort
45. How can we optimize quality of the
data being preserved?
(Good data in = good data out)
47. Focus on Raw Underway Data
Instruments permanently installed on ships
Fleet-wide solution
~500 cruises/year
Core Services
Data documentation & preservation
Programmatic Quality Assessment
Navigation Products
Event Logger
Real-time MET/TSG
R2R Data Stewardship
48. MB Raw Data PreservationMB Raw Data Preservation
769 file sets
291,673 files
~ 7.6 TB
from 671
cruises
as of Apr 15, 2014
49. R2R: Quality AssessmentR2R: Quality Assessment
Programmatic post-cruise review of data
Identify suspicious data
Feedback to Operators
Distributable Code
Leverage existing tools where possible
(for MB data: MB-System)
Customizable thresholds
Generate QA Report
Document QA procedures
Provide info for downstream data use
53. Multibeam Advisory CommitteeMultibeam Advisory Committee
Community of Stakeholders
Fleet-wide Approach
Best Practices
Technical Resources
Technical Teams
Shipboard Acceptance
Acoustic Noise
Quality Assurance
Help Desk
http://mac.unols.org/P. Johnson (UNH) & J. Beaudoin (UNH)
54. MAC AccomplishmentsMAC Accomplishments
Test Reports Gathered and Posted
Tools
SVP Editor Tool
SVP Mission Planning Tool
Best Practice Cookbooks
Ship visits
Acoustic Noise Testing
Quality Assurance
Sea Acceptance
Assistance to Operators & Investigators
58. How can we lessen the burden?
Simple workflows, interfaces, & guidelines
How can we engage the science community?
High-value Content
Reward (attribution, citation)
How can we make data accessible to non-
specialists?
Data synthesis
How can we optimize data quality?
Best practices at acquisition
Summary:
59. How to Navigate the
Data Life Cycle
Know what resources are available
Tools to make process easier
Access existing Data
Communicate
Upstream
Downstream (Data Managers)
Plan ahead
Document contemporaneously
Treat data as a valuable community resource
Participate! Input always needed for:
Metadata & data format standards
Usability of interfaces
Editor's Notes
#2: Seafloor bathy data are needed for a broad spectrum of studies. But coverage is exceedingly sparse and all multibeam are of high value
#32: Seafloor bathy data are needed for a broad spectrum of studies. But coverage is exceedingly sparse and all multibeam are of high value
#39: Through geomapapp the original data source can be identified. Here by clicking on the Italian data set Bouvet-Ligi, the location of the survey is highlighted in enclosed box on map view
#44: Copy right claims can not be made by data contributors for the synthesized products