�ݺ�ߣ

Lambda Architecture
Use Case: Mayo Clinic
FEBRUARY 2015
Altan Khendup – Leader, UDA Architecture COE

2
Background of Lambda Architecture
Background
– Reference architecture for Big Data systems
– Designed by Nathan Marz (Twitter)
– Defined as a system that runs arbitrary functions on arbitrary
data
– “query = function(all data)”
Design Principles
– Human fault-tolerant, Immutability, Computable
Lambda Layers
– Batch - Contains the immutable, constantly growing master
dataset.
– Speed - Deals only with new data and compensates for the
high latency updates of the serving layer.
– Serving - Loads and exposes the combined view of data so
that they can be queried.

3
Overview of Lambda Architecture

4 © 2014 Teradata
USE CASE – MAYO CLINIC

Mayo Clinic History
Every year, more than a million people from all 50 states
and nearly 150 countries come for care
Dozens of locations in several states with major
campuses in Rochester, Minn.; Scottsdale and Phoenix,
Ariz.; and Jacksonville, Fla.
Mayo Clinic Rochester, Minn. recognized as the top
hospital in the nation for 2014-2015 by U.S. News &
World Report

Why Big Data?
Challenges in Medical Data
Health data tends to be “wide”, not “deep”
New data types are becoming more important
Unstructured
Real-time streaming
A challenge to generally move from retrospective “BI”
viewing to event-based and predictive analytics usage
Multiple layers
Lots of events, data
Complex
Lots of different languages and data structures
Difficult to maintain
Lots of moving pieces/components/technologies
Lots of changes in the business

Data Discovery
Many “Big Data” stories start with data discovery
The Data Lake, etc.
But, data discovery is not predictable!
Mayo Clinic needed to define a real operational need
that a “Big Data” technology stack could fulfill

Project
Optimize an existing Natural Language Processing
pipeline in support of critical Colorectal Surgery
(Move to tens of thousands of documents processed)
Replace an existing free-text search facility used by
Clinical Web Service for colorectal cancer
(Move search to milliseconds)

10
• Current Storm throughput up to 1.5 million documents per hour
• Average of 140,000 HL7 messages actually processed per day with
average latency of 60 milliseconds from ingest to persistence
• Average of 50,000 documents passed through annotators per day
versus 5,000 historically
• Actual annotations of documents up to 6 times faster than previously
accomplished
• Free-text search use cases that took over 30 minutes on old
infrastructure completing in milliseconds in ElasticSearch
Operational Statistics

11
• Benefits
– An architecturally-driven, internally-owned technology stack that blends:
- An event-based/”real-time” processing fabric
- A multi-destination distillation hub
- A foundation for “Classic” BI delivery techniques
- A foundation for “Services-based” delivery techniques
- A “serendipitous” discovery environment
– Mutually supportive components that combine in delivering novel clinical
solutions
– Data continuity
- Historical data can be assessed as algorithms change over time
Summary

12
Thank you! We’re Hiring!
thinkbigcareers.teradata.com
Altan Khendup (@madmongol)
Altan.khendup@teradata.com
Ron Bodkin (@ronbodkin)
Ron.bodkin@thinkbiganalytics.com

�ݺ�ߣ

Lambda Architecture The Hive

More Related Content

Lambda Architecture The Hive

Editor's Notes