The document discusses big data, defining it using the 4 Vs of volume, velocity, variety, and veracity. It describes how the volume of data has grown exponentially in recent years. Tools like Hadoop and Splunk are used to analyze large and diverse datasets in real-time. Examples are given of how big data impacts various industries like healthcare, retail, and more. Industries are now able to gain insights from large amounts of structured and unstructured data to improve areas such as customer service, risk analysis, and personalized medicine.
2. WhatisBigData
From the beginning of human civilization
until 2003, entire world generated 5
Exabyte of data.
In 2004, US alone produced 5 Exabyte of
data every two days and the rate of
growth is accelerating in a rapid pace.
1 Exabyte = 1 Million Terabyte
3. Whats Big Data?
IT industry defines Big Data using 4
Vs.Volume: amount of
data
Velocity:
speed of
data
arrival
Variety: text,
image, video
Veracity:
Trustworthiness
4. WhatisBigData
Volume Peta Byte, Exa Byte etc.
Variety Structured, Unstructured
video, twitter trends, free form text
Velocity- streaming data arriving in real-
time
Veracity trustworthiness of data
removing biases, noise, abnormality
5. Initial Challenges of the Big Data
Prohibitively expensive hardware
Computing, Networking and Storage
Small pool of Big Data experts
Lack of awareness about benefits of data
collected from different sources
Analytics to process real-time data in
milli-seconds
6. Tools for Big Data
Cluster of distributed computing nodes
using commodity hardware
Map-reduce framework to run parallel
computation on Hadoop Cluster
No SQL databases Columnar database
using Key-value storage for very fast
data retrieval
7. Tools For Analyzing Big Data
Hadoop
Batch Processing Framework
uses distributed cluster
running on low cost multi core
computers.
8. Tools For Analyzing Big Data
Splunk
Real time analytics software to
process streaming data from
millions of sensors.
9. Tools For Analyzing Big Data
Map Reduce Programming Model
that distributes small chunks of
data across thousand of nodes for
parallel processing and
combines the output from each
node to solve the big data
problem
10. Big Data Impact at all
Industries
Health Care
MIT Technology Review report on data-driven health
care using The New Medical Data Eco System
Analytics and predictive modeling of medical data
captured from many sources insurance claims, public
health data, mobile health data and electronic medical
records to provide personalized patient care and help
doctors quickly decide best treatments
14. Public Health Data
Insight into community health
patterns from federal and state
data.
15. Mobile Health Data
100-000 plus mobile health
apps, plus wearable devices
that measures activity bodily
function, offer a constant read on
patients.
16. Electronic Medical Records
Digital records include lab and
test results, drug prescription
and physicians reports.
These all records create
Family Health History
17. Outcome
Analytic algorithms and predictive
modeling mine the layers of data for
patterns and insight (MIT Technology
Review)
18. Outcome
Patients
More precise and personalized diagnosis
and care based on a holistic view may
become possible.
Doctors
Decision-support tools could help quickly
evaluate the best treatments
22. Retail
Wal-Mart
1. Display search results based on
semantics of search items predict
customer behavior
2. Customer behavior prediction
3. Supply chain management analysis
of millions of point-of sales data in
real time.
23. Future Trends
Health Care
Important tool for cost reduction
Adoption of EMR- Electronic Medical Record
for better patient treatment plan
Agriculture
Real-time tracking of farm machinery
24. FutureTrends
Internet protocol becoming standard in
electricity grid, oil industry etc..
IP v6 with 128 bit address will theoretically allow
trillion of trillion sensors to connect previously
unconnected places, people and things
Digitization of massive data currently stored in
non-digital form
25. Citation
Big Data@Work, Thomas H.Davenport
MIT Technology Review
Harness the Power Of Big Data, Paul Zikopolos,
Dirk Deroos, Krishna Parasuraman, Thomas
Deutsch, David Corrigan, James Giles
The Human Face of Big Data, Rick Smolan and
Jennifer Erwitt
Editor's Notes
#5: To keep your data clean and process to keep Dirty data from accumulating in your system.