際際滷

際際滷Share a Scribd company logo
Big Data & Analytics
Keshav Tripathy, Bharti Consulting Inc.
Outline
 Big Data
 Gartner Hype Cycle 2012
 Large scale data processing
 Visual Analytics
 Chances and Challenges
 Discussions
Big Data V3
 Volume鐚Gigabyte(109), Terabyte(1012), Petabyte(1015), Exabyte(1018),
Zettabytes(1021)
 Variety: Structured,semi-structured, unstructured; Text, image, audio, video,
record
 Velocity(Dynamic, sometimes time-varying)
Big Data refers to datasets that grow so large that it is difficult to capture, store, manage, share, analyze and
visualize with the typical database software tools.
Numbers
 How many data in the world?
 800 Terabytes, 2000
 160 Exabytes, 2006
 500 Exabytes(Internet), 2009
 2.7 Zettabytes, 2012
 35 Zettabytes by 2020
 How many data generated ONE day?
 7 TB, Twitter
 10 TB, Facebook
Big data: The next frontier for innovation, competition, and productivity
McKinsey Global Institute 2011
Why Is Big Data Important?
Gartner Hype Cycle 2012
Large Scale Visual Analytics
 Definition: Visual analytics is the science of analytical reasoning facilitated by
interactive visual interfaces.
 People use visual analytics tools and techniques to
 Synthesize information and derive insight from massive, dynamic,
ambiguous, and often conflicting data
 Detect the expected and discover the unexpected
 Provide timely, defensible, and understandable assessments
 Communicate assessment effectively for action.
Inforviz Reference Model to Visual Analytics
Applications
 Terrorism and Responses
 Multimedia Visual Analytics
 Situation Surveillance and Awareness in Investigative Analysis
 Disease visual analytics for Disease outbreak Prediction
 Financial Visual Analytics
 Cybersecurity Visual Analytics
 Visual Analytics for Investigative Analysis on Text Documents
Techniques and Technologies
 A wide variety of techniques and technologies has been developed and adapted for
 Data aggregation
 Data manipulation
 Data analysis
 Data visualization
 These techniques and technologies draw from several fields including
 Statistics
 Computer science
 Applied mathematics
 Economics.
Techniques and Applications
 Statistics: A/B testing(split testing/bucket testing ),Spatial analysis , Predictive modeling :Regression
 Machine Learning
 Unsupervised learning: cluster analysis
 Supervised learning: classification, support vector machines(SVM), ensemble learning
 Association rule learning
 Data Mining and Pattern Recognition: neural network, classification, clustering
 Natural language processing(NLP): Sentiment analysis
 Dimension Reduction: PCA, MDS, SVD
 Data fusion and data integration鐚 Visual Word
 Time series analysis: Combination of statistics and signal processing
 Simulation: Monte Carlo simulations, MRF
 Optimization: Genetic algorithms
 Visualization: Scientific Viz, Inforviz, Visual Analtytics
Technologies
 Database and Data warehouse
 Google File System and MapReduce: Big Table
 Hadoop: HBase and MapReduce, open source Apache project
 Cassandra: An open source (free) DBMS, originally developed at Facebook and now an Apache Software foundation project.
 Data warehouse: ETL (extract, transform, and load) tools and business intelligence tools.
 Business intelligence (BI): data warehouse, reporting, real-time management dashboards
 Cloud computing: Services, SOA, etc.
 Metadata: XML
 Stream processing
 R, SAS and SPSS
 Visualization:Tag cloud,Clustergram,History flow, Themeriver, Treemap
Origin of Information Visualization
InforViz Techniques
 Scatterplot and Scatterplot Matrix
 Hierarchies Visualization:Node-Link Diagrams, Sunburst,Treemap, Circle-
packing layouts
 Network Visualization:Force-Directed Layout,Arc Diagrams,Matrix Views
 Multidimensional Visualization/Parallel Coordinates
 Stacked Graphs
 Flow Maps
Scatterplot and Scatterplot Matrix
Tree Visualization(1)
Node-Link Diagrams
sunburst
Tree Visualization(2)
Treemap
Circle-packing layouts
Network Visualization
Force-Directed Layout
Arc Diagrams
Matrix Views
Parallel Coordinates
Stacked Graphs
Flow Maps
Examples
Bigdata analytics
Fraud Detection of Bank Wire Transactions
Displays and Views
A classical VA tool
GapMinder [Demo]
Smart Money Map [Demo]
A recent project
Chances and Challenges
 The basic techniques for large scale simulation and computing are ready
 However, large and time-consuming computing tasks need steering or
visualize the intermediate computing results.
 Most simulation and computing tasks have to tune hundreds of parameters.
 Smart/intelligent data mining/data processing algorithms are ready
 However, most data mining algorithms have high computational complexity: N2
rather than Nlog(N), or N
 How to combine automatic computing(machine) and high-level intelligence to gain
insight(Human), and involve human in the computing?
Recent Research Topics
 Unified Visual Analytics by Heterogeneous Data Sources(esp. Text)
 Structured and semi-structured data fusion framework
 Data indexing and similarity rank
 Visual analytics for high-dimensional heterogeneous data
 Domain Risk Management and Preventive Control by Sensor Data Collection and Data Mining
 Sensor techniques
 Data Warehouse
 Coordinated Views integrate visual analytic techniques
 Parallel/Distributed Computing Steering by Parameter Optimization and Visualization
 Parameter tuning and computing optimization
 Intermediate results visualization and task steering
 Markov Chain Monte Carlo(MCMC) Simulation
Questions and Thanks!

More Related Content

What's hot (18)

From Developer to Data Scientist
From Developer to Data ScientistFrom Developer to Data Scientist
From Developer to Data Scientist
Gaines Kergosien
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.
ukasz Grala
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
Jen Stirrup
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Maya Lumbroso
Big data 101
Big data 101Big data 101
Big data 101
Paresh Motiwala, PMP速
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
Big Data Value Association
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Dataconomy Media
Big data
Big dataBig data
Big data
nikki135
Data Infrastructure Development for SKA/Jasper Horrell
Data Infrastructure Development for SKA/Jasper HorrellData Infrastructure Development for SKA/Jasper Horrell
Data Infrastructure Development for SKA/Jasper Horrell
African Open Science Platform
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
magda3695
Hopper energyservices
Hopper energyservicesHopper energyservices
Hopper energyservices
hopperdev
Unit 1
Unit 1Unit 1
Unit 1
karthik eriki
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
Andrei Savu
Give sense to your Big Data w/ Apache TinkerPop & property graph databases
Give sense to your Big Data w/ Apache TinkerPop & property graph databasesGive sense to your Big Data w/ Apache TinkerPop & property graph databases
Give sense to your Big Data w/ Apache TinkerPop & property graph databases
DataStax
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
Abhi Jit
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCampSteve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
BigDataCamp
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
Melissa Hornbostel
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
Phillip Delaney
From Developer to Data Scientist
From Developer to Data ScientistFrom Developer to Data Scientist
From Developer to Data Scientist
Gaines Kergosien
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.
ukasz Grala
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
Jen Stirrup
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Maya Lumbroso
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
Big Data Value Association
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Dataconomy Media
Big data
Big dataBig data
Big data
nikki135
Data Infrastructure Development for SKA/Jasper Horrell
Data Infrastructure Development for SKA/Jasper HorrellData Infrastructure Development for SKA/Jasper Horrell
Data Infrastructure Development for SKA/Jasper Horrell
African Open Science Platform
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
magda3695
Hopper energyservices
Hopper energyservicesHopper energyservices
Hopper energyservices
hopperdev
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
Andrei Savu
Give sense to your Big Data w/ Apache TinkerPop & property graph databases
Give sense to your Big Data w/ Apache TinkerPop & property graph databasesGive sense to your Big Data w/ Apache TinkerPop & property graph databases
Give sense to your Big Data w/ Apache TinkerPop & property graph databases
DataStax
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
Abhi Jit
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCampSteve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
BigDataCamp
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
Melissa Hornbostel
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
Phillip Delaney

Similar to Bigdata analytics (20)

Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
IdontKnow66967
Lecture1
Lecture1Lecture1
Lecture1
Manish Singh
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
Amr Kamel Deklel
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
Tung Nguyen
DA_01_Intro.pptx
DA_01_Intro.pptxDA_01_Intro.pptx
DA_01_Intro.pptx
Alok Mohapatra
Knowledge Discovery
Knowledge DiscoveryKnowledge Discovery
Knowledge Discovery
Andr辿 Karpi邸t邸enko
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
RojaT4
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
AbhishekKumarAgrahar2
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big Data
Databricks
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
AIMLSEMINARS
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big data
webwinkelvakdag
Big Data And Machine Learning Using MATLAB.pdf
Big Data And Machine Learning Using MATLAB.pdfBig Data And Machine Learning Using MATLAB.pdf
Big Data And Machine Learning Using MATLAB.pdf
ssuserb2837a
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
Dr Pradhan PL Pradhan
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
Selvaraj Kesavan
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
Abhishek Roy
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsTraditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Ganesan Narayanasamy
Sharing a Startups Big Data Lessons
Sharing a Startups Big Data LessonsSharing a Startups Big Data Lessons
Sharing a Startups Big Data Lessons
George Stathis
Low-Latency Analytics with NoSQL Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL  Introduction to Storm and CassandraLow-Latency Analytics with NoSQL  Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL Introduction to Storm and Cassandra
Caserta
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
malathieswaran29
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
Amr Kamel Deklel
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
Tung Nguyen
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
RojaT4
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
AbhishekKumarAgrahar2
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big Data
Databricks
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
AIMLSEMINARS
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big data
webwinkelvakdag
Big Data And Machine Learning Using MATLAB.pdf
Big Data And Machine Learning Using MATLAB.pdfBig Data And Machine Learning Using MATLAB.pdf
Big Data And Machine Learning Using MATLAB.pdf
ssuserb2837a
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
Abhishek Roy
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsTraditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Ganesan Narayanasamy
Sharing a Startups Big Data Lessons
Sharing a Startups Big Data LessonsSharing a Startups Big Data Lessons
Sharing a Startups Big Data Lessons
George Stathis
Low-Latency Analytics with NoSQL Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL  Introduction to Storm and CassandraLow-Latency Analytics with NoSQL  Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL Introduction to Storm and Cassandra
Caserta
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
malathieswaran29

Recently uploaded (20)

Cybersecurity_Management_Presentation.pptx
Cybersecurity_Management_Presentation.pptxCybersecurity_Management_Presentation.pptx
Cybersecurity_Management_Presentation.pptx
rajkumarrch23
100680-05-Eucharist_Orientation_Sessions.pdf
100680-05-Eucharist_Orientation_Sessions.pdf100680-05-Eucharist_Orientation_Sessions.pdf
100680-05-Eucharist_Orientation_Sessions.pdf
jacobdivina9
MTC Supply Chain Management Strategy.pptx
MTC Supply Chain Management Strategy.pptxMTC Supply Chain Management Strategy.pptx
MTC Supply Chain Management Strategy.pptx
Rakshit Porwal
Updated Willow 2025 Media Deck_280225 Updated.pdf
Updated Willow 2025 Media Deck_280225 Updated.pdfUpdated Willow 2025 Media Deck_280225 Updated.pdf
Updated Willow 2025 Media Deck_280225 Updated.pdf
tangramcommunication
G33this is the presentaion fo smart desing.pdf
G33this is the presentaion fo smart desing.pdfG33this is the presentaion fo smart desing.pdf
G33this is the presentaion fo smart desing.pdf
Li0nSinEscanor
Elevate Your Space with Premium Design Services from NInterior Design
Elevate Your Space with Premium Design Services from NInterior DesignElevate Your Space with Premium Design Services from NInterior Design
Elevate Your Space with Premium Design Services from NInterior Design
Ninterior Design
Data_Collection_Methods_ in researchppt.pdf
Data_Collection_Methods_ in researchppt.pdfData_Collection_Methods_ in researchppt.pdf
Data_Collection_Methods_ in researchppt.pdf
nishaaggarwal46
Chapter 2 - Understanding Computer Investigations.ppt
Chapter 2 - Understanding Computer Investigations.pptChapter 2 - Understanding Computer Investigations.ppt
Chapter 2 - Understanding Computer Investigations.ppt
kong100
New Income Tax Bill - Capital Gains .pdf
New Income Tax Bill - Capital Gains .pdfNew Income Tax Bill - Capital Gains .pdf
New Income Tax Bill - Capital Gains .pdf
HarshilShah134194
158646276-Monitoring-Exadata-Performance.pptx
158646276-Monitoring-Exadata-Performance.pptx158646276-Monitoring-Exadata-Performance.pptx
158646276-Monitoring-Exadata-Performance.pptx
Aditya Mishra
PRGTUG: Lost in Data? Let's Chart the Way Out!
PRGTUG: Lost in Data? Let's Chart the Way Out!PRGTUG: Lost in Data? Let's Chart the Way Out!
PRGTUG: Lost in Data? Let's Chart the Way Out!
Stanislava Tropcheva
Updated Willow 2025 Media Deck_Updated010325.pdf
Updated Willow 2025 Media Deck_Updated010325.pdfUpdated Willow 2025 Media Deck_Updated010325.pdf
Updated Willow 2025 Media Deck_Updated010325.pdf
tangramcommunication
Lesson 6- Data Visualization and Reporting.pptx
Lesson 6- Data Visualization and Reporting.pptxLesson 6- Data Visualization and Reporting.pptx
Lesson 6- Data Visualization and Reporting.pptx
1045858
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
International Journal on Cloud Computing: Services and Architecture (IJCCSA)International Journal on Cloud Computing: Services and Architecture (IJCCSA)
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
ijccsa
Final_Geographical_Analysis_9-1-10 (1).pdf
Final_Geographical_Analysis_9-1-10 (1).pdfFinal_Geographical_Analysis_9-1-10 (1).pdf
Final_Geographical_Analysis_9-1-10 (1).pdf
OmkarPatilPatodekar
Data Privacy presentation for companies.pptx
Data Privacy presentation for companies.pptxData Privacy presentation for companies.pptx
Data Privacy presentation for companies.pptx
harmardir
Monitoring Imam Ririn di Pilkada Kota Depok 2024
Monitoring Imam Ririn di Pilkada Kota Depok 2024Monitoring Imam Ririn di Pilkada Kota Depok 2024
Monitoring Imam Ririn di Pilkada Kota Depok 2024
Deddy Rahman
9th Edition of International Research Awards
9th Edition of International Research Awards9th Edition of International Research Awards
9th Edition of International Research Awards
sciencereviewerview
Updated Willow 2025 Media Deck_270225.pdf
Updated Willow 2025 Media Deck_270225.pdfUpdated Willow 2025 Media Deck_270225.pdf
Updated Willow 2025 Media Deck_270225.pdf
tangramcommunication
ヰ$__Cubase Pro Crack Full Activativated 2025
ヰ$__Cubase Pro Crack Full Activativated 2025ヰ$__Cubase Pro Crack Full Activativated 2025
ヰ$__Cubase Pro Crack Full Activativated 2025
abrishhayat858
Cybersecurity_Management_Presentation.pptx
Cybersecurity_Management_Presentation.pptxCybersecurity_Management_Presentation.pptx
Cybersecurity_Management_Presentation.pptx
rajkumarrch23
100680-05-Eucharist_Orientation_Sessions.pdf
100680-05-Eucharist_Orientation_Sessions.pdf100680-05-Eucharist_Orientation_Sessions.pdf
100680-05-Eucharist_Orientation_Sessions.pdf
jacobdivina9
MTC Supply Chain Management Strategy.pptx
MTC Supply Chain Management Strategy.pptxMTC Supply Chain Management Strategy.pptx
MTC Supply Chain Management Strategy.pptx
Rakshit Porwal
Updated Willow 2025 Media Deck_280225 Updated.pdf
Updated Willow 2025 Media Deck_280225 Updated.pdfUpdated Willow 2025 Media Deck_280225 Updated.pdf
Updated Willow 2025 Media Deck_280225 Updated.pdf
tangramcommunication
G33this is the presentaion fo smart desing.pdf
G33this is the presentaion fo smart desing.pdfG33this is the presentaion fo smart desing.pdf
G33this is the presentaion fo smart desing.pdf
Li0nSinEscanor
Elevate Your Space with Premium Design Services from NInterior Design
Elevate Your Space with Premium Design Services from NInterior DesignElevate Your Space with Premium Design Services from NInterior Design
Elevate Your Space with Premium Design Services from NInterior Design
Ninterior Design
Data_Collection_Methods_ in researchppt.pdf
Data_Collection_Methods_ in researchppt.pdfData_Collection_Methods_ in researchppt.pdf
Data_Collection_Methods_ in researchppt.pdf
nishaaggarwal46
Chapter 2 - Understanding Computer Investigations.ppt
Chapter 2 - Understanding Computer Investigations.pptChapter 2 - Understanding Computer Investigations.ppt
Chapter 2 - Understanding Computer Investigations.ppt
kong100
New Income Tax Bill - Capital Gains .pdf
New Income Tax Bill - Capital Gains .pdfNew Income Tax Bill - Capital Gains .pdf
New Income Tax Bill - Capital Gains .pdf
HarshilShah134194
158646276-Monitoring-Exadata-Performance.pptx
158646276-Monitoring-Exadata-Performance.pptx158646276-Monitoring-Exadata-Performance.pptx
158646276-Monitoring-Exadata-Performance.pptx
Aditya Mishra
PRGTUG: Lost in Data? Let's Chart the Way Out!
PRGTUG: Lost in Data? Let's Chart the Way Out!PRGTUG: Lost in Data? Let's Chart the Way Out!
PRGTUG: Lost in Data? Let's Chart the Way Out!
Stanislava Tropcheva
Updated Willow 2025 Media Deck_Updated010325.pdf
Updated Willow 2025 Media Deck_Updated010325.pdfUpdated Willow 2025 Media Deck_Updated010325.pdf
Updated Willow 2025 Media Deck_Updated010325.pdf
tangramcommunication
Lesson 6- Data Visualization and Reporting.pptx
Lesson 6- Data Visualization and Reporting.pptxLesson 6- Data Visualization and Reporting.pptx
Lesson 6- Data Visualization and Reporting.pptx
1045858
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
International Journal on Cloud Computing: Services and Architecture (IJCCSA)International Journal on Cloud Computing: Services and Architecture (IJCCSA)
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
ijccsa
Final_Geographical_Analysis_9-1-10 (1).pdf
Final_Geographical_Analysis_9-1-10 (1).pdfFinal_Geographical_Analysis_9-1-10 (1).pdf
Final_Geographical_Analysis_9-1-10 (1).pdf
OmkarPatilPatodekar
Data Privacy presentation for companies.pptx
Data Privacy presentation for companies.pptxData Privacy presentation for companies.pptx
Data Privacy presentation for companies.pptx
harmardir
Monitoring Imam Ririn di Pilkada Kota Depok 2024
Monitoring Imam Ririn di Pilkada Kota Depok 2024Monitoring Imam Ririn di Pilkada Kota Depok 2024
Monitoring Imam Ririn di Pilkada Kota Depok 2024
Deddy Rahman
9th Edition of International Research Awards
9th Edition of International Research Awards9th Edition of International Research Awards
9th Edition of International Research Awards
sciencereviewerview
Updated Willow 2025 Media Deck_270225.pdf
Updated Willow 2025 Media Deck_270225.pdfUpdated Willow 2025 Media Deck_270225.pdf
Updated Willow 2025 Media Deck_270225.pdf
tangramcommunication
ヰ$__Cubase Pro Crack Full Activativated 2025
ヰ$__Cubase Pro Crack Full Activativated 2025ヰ$__Cubase Pro Crack Full Activativated 2025
ヰ$__Cubase Pro Crack Full Activativated 2025
abrishhayat858

Bigdata analytics

  • 1. Big Data & Analytics Keshav Tripathy, Bharti Consulting Inc.
  • 2. Outline Big Data Gartner Hype Cycle 2012 Large scale data processing Visual Analytics Chances and Challenges Discussions
  • 3. Big Data V3 Volume鐚Gigabyte(109), Terabyte(1012), Petabyte(1015), Exabyte(1018), Zettabytes(1021) Variety: Structured,semi-structured, unstructured; Text, image, audio, video, record Velocity(Dynamic, sometimes time-varying) Big Data refers to datasets that grow so large that it is difficult to capture, store, manage, share, analyze and visualize with the typical database software tools.
  • 4. Numbers How many data in the world? 800 Terabytes, 2000 160 Exabytes, 2006 500 Exabytes(Internet), 2009 2.7 Zettabytes, 2012 35 Zettabytes by 2020 How many data generated ONE day? 7 TB, Twitter 10 TB, Facebook Big data: The next frontier for innovation, competition, and productivity McKinsey Global Institute 2011
  • 5. Why Is Big Data Important?
  • 7. Large Scale Visual Analytics Definition: Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces. People use visual analytics tools and techniques to Synthesize information and derive insight from massive, dynamic, ambiguous, and often conflicting data Detect the expected and discover the unexpected Provide timely, defensible, and understandable assessments Communicate assessment effectively for action.
  • 8. Inforviz Reference Model to Visual Analytics
  • 9. Applications Terrorism and Responses Multimedia Visual Analytics Situation Surveillance and Awareness in Investigative Analysis Disease visual analytics for Disease outbreak Prediction Financial Visual Analytics Cybersecurity Visual Analytics Visual Analytics for Investigative Analysis on Text Documents
  • 10. Techniques and Technologies A wide variety of techniques and technologies has been developed and adapted for Data aggregation Data manipulation Data analysis Data visualization These techniques and technologies draw from several fields including Statistics Computer science Applied mathematics Economics.
  • 11. Techniques and Applications Statistics: A/B testing(split testing/bucket testing ),Spatial analysis , Predictive modeling :Regression Machine Learning Unsupervised learning: cluster analysis Supervised learning: classification, support vector machines(SVM), ensemble learning Association rule learning Data Mining and Pattern Recognition: neural network, classification, clustering Natural language processing(NLP): Sentiment analysis Dimension Reduction: PCA, MDS, SVD Data fusion and data integration鐚 Visual Word Time series analysis: Combination of statistics and signal processing Simulation: Monte Carlo simulations, MRF Optimization: Genetic algorithms Visualization: Scientific Viz, Inforviz, Visual Analtytics
  • 12. Technologies Database and Data warehouse Google File System and MapReduce: Big Table Hadoop: HBase and MapReduce, open source Apache project Cassandra: An open source (free) DBMS, originally developed at Facebook and now an Apache Software foundation project. Data warehouse: ETL (extract, transform, and load) tools and business intelligence tools. Business intelligence (BI): data warehouse, reporting, real-time management dashboards Cloud computing: Services, SOA, etc. Metadata: XML Stream processing R, SAS and SPSS Visualization:Tag cloud,Clustergram,History flow, Themeriver, Treemap
  • 13. Origin of Information Visualization
  • 14. InforViz Techniques Scatterplot and Scatterplot Matrix Hierarchies Visualization:Node-Link Diagrams, Sunburst,Treemap, Circle- packing layouts Network Visualization:Force-Directed Layout,Arc Diagrams,Matrix Views Multidimensional Visualization/Parallel Coordinates Stacked Graphs Flow Maps
  • 24. Fraud Detection of Bank Wire Transactions
  • 28. Smart Money Map [Demo]
  • 30. Chances and Challenges The basic techniques for large scale simulation and computing are ready However, large and time-consuming computing tasks need steering or visualize the intermediate computing results. Most simulation and computing tasks have to tune hundreds of parameters. Smart/intelligent data mining/data processing algorithms are ready However, most data mining algorithms have high computational complexity: N2 rather than Nlog(N), or N How to combine automatic computing(machine) and high-level intelligence to gain insight(Human), and involve human in the computing?
  • 31. Recent Research Topics Unified Visual Analytics by Heterogeneous Data Sources(esp. Text) Structured and semi-structured data fusion framework Data indexing and similarity rank Visual analytics for high-dimensional heterogeneous data Domain Risk Management and Preventive Control by Sensor Data Collection and Data Mining Sensor techniques Data Warehouse Coordinated Views integrate visual analytic techniques Parallel/Distributed Computing Steering by Parameter Optimization and Visualization Parameter tuning and computing optimization Intermediate results visualization and task steering Markov Chain Monte Carlo(MCMC) Simulation