2. What is a Data?
Data is any set of characters that has been gathered and
translated for some purpose, usually analysis.
It can be any character, including text and numbers, pictures,
sound, or video.
3. What is Digital Data?
Digital data are discrete, discontinuous representations of
information or work.
Digital data is a binary language.
4. Types of Digital Data
1.Unstructured Data
2. Semi Structured Data
3. Structured
5. Structured Data
Refers to any data that resides in a fixed field within a record or file.
Support ACID properties
Structured data has the advantage of being easily entered, stored,
queried and analyzed.
Structured data represent only 5 to 10% of all informatics data.
6. Unstructured Data
Unstructured data is all those things that can't be so readily
classified and fit into a neat box.
Unstructured data represent around 80% of data.
Techniques: Data mining-Association rule, Regression analysis, Text
mining, NLP etc.,
7. Semi Structured Data
Semi-structured data is a cross between the two. It is a type of
structured data, but lacks the strict data model structure.
Semi-structured data is information that doesnt reside in a
relational database but that does have some organizational
properties that make it easier to analyze.
8. Characteristic of Data
Composition - What is the Structure, type and Nature of
data?
Condition - Can the data be used as it is or it needs to be
cleansed?
Context - Where this data is generated? Why? How sensitive
this data? What are the events associated with this data?
9. What is Big Data?
Collection of data sets so large and complex that it becomes
difficult to process using on-hand database management tools
or traditional data processing applications.
10. What is Big Data? Cont..
The data is too big, moves too fast, or doesnt fit the structures
of your database architectures
The scale, diversity, and complexity of the data require new
architecture, techniques, algorithms, and analytics to manage
it and extract value and hidden knowledge from it
Big data is the realization of greater business intelligence by
storing, processing, and analyzing data that was previously
ignored due to the limitations of traditional data management
technologies.
11. Why Big Data? & what makes Big Data?
Key enablers for the growth of Big Data are
Every day we create 2.5 quintillion bytes of data.
90% of the data in the world today has been created in the last
two years.
Increase of storage capacities
Increase of processing power
Availability of data
12. Where does data come from?
Data come from many quarters.
Science Medical imaging, Sensor data, Genome
sequencing, Weather data, Satellite feeds
Industry - Financial, Pharmaceutical, Manufacturing,
Insurance, Online, retail
Legacy Sales data, customer behavior, product
databases, accounting data etc.,
System data Log files, status feeds, activity stream,
network messages, spam filters.
15. CHALLENGES
More data = more storage space
Data coming faster
Needs to handle various data structure
Agile business requirement
Securing big data
Data consistency & quality
16. What is the importance of Big Data?
The importance of big data is how you utilize the data which
you own. Data can be fetched from any source and analyze it
to solve that enable us in terms of
1) Cost reductions
2) Time reductions
3) New product development and optimized offerings, and
4) Smart decision making.
17. What is the importance of Big Data? Cont..
Combination of big data with high-powered analytics, you can
have great impact on your business strategy such as:
1) Finding the root cause of failures, issues and defects in real
time operations.
2) Generating coupons at the point of sale seeing the customers
habit of buying goods.
3) Recalculating entire risk portfolios in just minutes.
4) Detecting fraudulent behavior before it affects and risks your
organization.
18. Who are the ones who use the Big Data
Technology?
Banking
Government
Education
Health Care
Manufacturing
Retail
19. Storing Big Data
Analyzing your data characteristics
Selecting data sources for analysis
Eliminating redundant data
Establishing the role of NoSQL
Overview of Big Data stores
Data models: key value, graph, document,
column-family
Hadoop Distributed File System
HBase
Hive
20. Big Data Analytics
It is the process of examining big data to uncover patterns,
unearth trends, and find unknown correlations and other useful
information to make faster and better decisions.
21. Why is big data analytics important?
Big data analytics helps organizations harness their data and
use it to identify new opportunities. That, in turn, leads to
smarter business moves, more efficient operations, higher
profits and happier customers.
22. Types of Analytics
Business Intelligence
Descriptive Analysis
Predictive Analysis
23. Business intelligence (BI)
It is a technology-driven process for analyzing data and presenting
actionable information to help executives, managers and other
corporate end users make informed business decisions.
24. Descriptive Analysis
Descriptive statistics is the term given to the analysis of data that helps
describe, show or summarize data in a meaningful way such that, for
example, patterns might emerge from the data.
25. Predictive Analysis
Predictive analytics is the branch of data mining concerned with the
prediction of future probabilities and trends.
The central element of predictive analytics is the predictor, a variable that
can be measured for an individual or other entity to predict future behavior.
26. Predictive Analysis
There is 2 types of predictive analytics:
Supervised
Supervised analytics is when we know the truth about
something in the past
Example: We have historical weather data. The temperature,
humidity, cloud density and weather type (rain, cloudy, or sunny). Then we
can predict today weather based on temp, humidity, and cloud density today
Unsupervised
Unsupervised is when we dont know the truth about
something in the past. The result is segment that we need to interpret
Example: We want to do segmentation over the student
based on the historical exam score, attendance, and late history.
27. Tools used in Big Data
Where processing is hosted?
Distributed Servers / Cloud (e.g. Amazon EC2)
Where data is stored?
Distributed Storage (e.g. Amazon S3)
What is the programming model?
Distributed Processing (e.g. MapReduce)
How data is stored & indexed?
High-performance schema-free databases (e.g. MongoDB)
What operations are performed on data?
Analytic / Semantic Processing
28. Top Big Data Technologies
1. Apache Hadoop
Apache Hadoop is a java based free software framework that can
effectively store large amount of data in a cluster.
Hadoop Distributed File System (HDFS) is the storage system of Hadoop
which splits big data and distribute across many nodes in a cluster.
This also replicates data in a cluster thus providing high availability. It uses
Map Reducing algorithm for processing.
29. Top Big Data Technologies Cont..
2. NoSQL
NoSQL (Not Only SQL)is used to handle unstructured data.
NoSQL databases store unstructured data with no particular schema.
NoSQL gives better performance in storing massive amount of data. There
are many open-source NoSQL DBs available to analyse big Data.
30. Top Big Data Technologies Cont..
3. Apache Spark
Apache Spark is part of the Hadoop ecosystem, but its use has
become so widespread that it deserves a category of its own.
It is an engine for processing big data within Hadoop, and it's
up to one hundred times faster than the standard Hadoop
engine, Map Reduce.
31. Top Big Data Technologies Cont..
4. R
R, another open source project, is a programming language
and software environment designed for working with statistics.
Many popular integrated development environments (IDEs),
including Eclipse and Visual Studio, support the language.