際際滷

際際滷Share a Scribd company logo
n
WhatisBigData
 From the beginning of human civilization
until 2003, entire world generated 5
Exabyte of data.
 In 2004, US alone produced 5 Exabyte of
data every two days and the rate of
growth is accelerating in a rapid pace.
 1 Exabyte = 1 Million Terabyte
Whats Big Data?
IT industry defines Big Data using 4
Vs.Volume: amount of
data
Velocity:
speed of
data
arrival
Variety: text,
image, video
Veracity:
Trustworthiness
WhatisBigData
 Volume  Peta Byte, Exa Byte etc.
 Variety  Structured, Unstructured 
video, twitter trends, free form text
 Velocity- streaming data arriving in real-
time
 Veracity  trustworthiness of data
removing biases, noise, abnormality
Initial Challenges of the Big Data
 Prohibitively expensive hardware 
Computing, Networking and Storage
 Small pool of Big Data experts
 Lack of awareness about benefits of data
collected from different sources
 Analytics to process real-time data in
milli-seconds
Tools for Big Data
 Cluster of distributed computing nodes
using commodity hardware
 Map-reduce framework to run parallel
computation on Hadoop Cluster
 No SQL databases  Columnar database
using Key-value storage for very fast
data retrieval
Tools For Analyzing Big Data
Hadoop
 Batch Processing Framework
uses distributed cluster
running on low cost multi core
computers.
Tools For Analyzing Big Data
Splunk
 Real time analytics software to
process streaming data from
millions of sensors.
Tools For Analyzing Big Data
 Map Reduce Programming Model
that distributes small chunks of
data across thousand of nodes for
parallel processing and
combines the output from each
node to solve the big data
problem
Big Data Impact at all
Industries
Health Care
 MIT Technology Review report on data-driven health
care using The New Medical Data Eco System
 Analytics and predictive modeling of medical data
captured from many sources  insurance claims, public
health data, mobile health data and electronic medical
records to provide personalized patient care and help
doctors quickly decide best treatments
Insurance Claims Data
 Trends in drug and treatment
usage
Environmental Data
 Sensors can pick up
behavioral Information
 Ex: Mapping, Location, and
Weather Data.
Genomic Data
Less expensive genome
sequencing offers insight into
the role genetics may play
Public Health Data
 Insight into community health
patterns from federal and state
data.
Mobile Health Data
 100-000 plus mobile health
apps, plus wearable devices
that measures activity bodily
function, offer a constant read on
patients.
Electronic Medical Records
 Digital records include lab and
test results, drug prescription
and physicians reports.
 These all records create
Family Health History
Outcome
 Analytic algorithms and predictive
modeling mine the layers of data for
patterns and insight (MIT Technology
Review)
Outcome
 Patients
More precise and personalized diagnosis
and care based on a holistic view may
become possible.
 Doctors
Decision-support tools could help quickly
evaluate the best treatments
Outcome
 Researchers
Detailed information from many
patients, along with other data, could
lead to new insights into disease
and treatment.
Use Cases Across all Industries
 Recommendation Engine
 Customer Sentiment Analysis
 Marketing Campaign Analysis
 Social Network Analysis
 Fraud Detection
 Risk Analysis
Retail
 Macys Inc.
Optimizing pricing of 73 million
items based on real-time market
data.
Retail
 Wal-Mart
1. Display search results based on
semantics of search items predict
customer behavior
2. Customer behavior prediction
3. Supply chain management analysis
of millions of point-of sales data in
real  time.
Future Trends
Health Care
 Important tool for cost reduction
 Adoption of EMR- Electronic Medical Record
for better patient treatment plan
 Agriculture
 Real-time tracking of farm machinery
FutureTrends
 Internet protocol becoming standard in
electricity grid, oil industry etc..
 IP v6 with 128 bit address will theoretically allow
trillion of trillion sensors to connect previously
unconnected places, people and things
 Digitization of massive data currently stored in
non-digital form
Citation
 Big Data@Work, Thomas H.Davenport
 MIT Technology Review
 Harness the Power Of Big Data, Paul Zikopolos,
Dirk Deroos, Krishna Parasuraman, Thomas
Deutsch, David Corrigan, James Giles
 The Human Face of Big Data, Rick Smolan and
Jennifer Erwitt

More Related Content

Big data

  • 1. n
  • 2. WhatisBigData From the beginning of human civilization until 2003, entire world generated 5 Exabyte of data. In 2004, US alone produced 5 Exabyte of data every two days and the rate of growth is accelerating in a rapid pace. 1 Exabyte = 1 Million Terabyte
  • 3. Whats Big Data? IT industry defines Big Data using 4 Vs.Volume: amount of data Velocity: speed of data arrival Variety: text, image, video Veracity: Trustworthiness
  • 4. WhatisBigData Volume Peta Byte, Exa Byte etc. Variety Structured, Unstructured video, twitter trends, free form text Velocity- streaming data arriving in real- time Veracity trustworthiness of data removing biases, noise, abnormality
  • 5. Initial Challenges of the Big Data Prohibitively expensive hardware Computing, Networking and Storage Small pool of Big Data experts Lack of awareness about benefits of data collected from different sources Analytics to process real-time data in milli-seconds
  • 6. Tools for Big Data Cluster of distributed computing nodes using commodity hardware Map-reduce framework to run parallel computation on Hadoop Cluster No SQL databases Columnar database using Key-value storage for very fast data retrieval
  • 7. Tools For Analyzing Big Data Hadoop Batch Processing Framework uses distributed cluster running on low cost multi core computers.
  • 8. Tools For Analyzing Big Data Splunk Real time analytics software to process streaming data from millions of sensors.
  • 9. Tools For Analyzing Big Data Map Reduce Programming Model that distributes small chunks of data across thousand of nodes for parallel processing and combines the output from each node to solve the big data problem
  • 10. Big Data Impact at all Industries Health Care MIT Technology Review report on data-driven health care using The New Medical Data Eco System Analytics and predictive modeling of medical data captured from many sources insurance claims, public health data, mobile health data and electronic medical records to provide personalized patient care and help doctors quickly decide best treatments
  • 11. Insurance Claims Data Trends in drug and treatment usage
  • 12. Environmental Data Sensors can pick up behavioral Information Ex: Mapping, Location, and Weather Data.
  • 13. Genomic Data Less expensive genome sequencing offers insight into the role genetics may play
  • 14. Public Health Data Insight into community health patterns from federal and state data.
  • 15. Mobile Health Data 100-000 plus mobile health apps, plus wearable devices that measures activity bodily function, offer a constant read on patients.
  • 16. Electronic Medical Records Digital records include lab and test results, drug prescription and physicians reports. These all records create Family Health History
  • 17. Outcome Analytic algorithms and predictive modeling mine the layers of data for patterns and insight (MIT Technology Review)
  • 18. Outcome Patients More precise and personalized diagnosis and care based on a holistic view may become possible. Doctors Decision-support tools could help quickly evaluate the best treatments
  • 19. Outcome Researchers Detailed information from many patients, along with other data, could lead to new insights into disease and treatment.
  • 20. Use Cases Across all Industries Recommendation Engine Customer Sentiment Analysis Marketing Campaign Analysis Social Network Analysis Fraud Detection Risk Analysis
  • 21. Retail Macys Inc. Optimizing pricing of 73 million items based on real-time market data.
  • 22. Retail Wal-Mart 1. Display search results based on semantics of search items predict customer behavior 2. Customer behavior prediction 3. Supply chain management analysis of millions of point-of sales data in real time.
  • 23. Future Trends Health Care Important tool for cost reduction Adoption of EMR- Electronic Medical Record for better patient treatment plan Agriculture Real-time tracking of farm machinery
  • 24. FutureTrends Internet protocol becoming standard in electricity grid, oil industry etc.. IP v6 with 128 bit address will theoretically allow trillion of trillion sensors to connect previously unconnected places, people and things Digitization of massive data currently stored in non-digital form
  • 25. Citation Big Data@Work, Thomas H.Davenport MIT Technology Review Harness the Power Of Big Data, Paul Zikopolos, Dirk Deroos, Krishna Parasuraman, Thomas Deutsch, David Corrigan, James Giles The Human Face of Big Data, Rick Smolan and Jennifer Erwitt

Editor's Notes

  • #5: To keep your data clean and process to keep Dirty data from accumulating in your system.