This document discusses data mining with big data. It defines data mining as the process of discovering patterns in large data sets and big data as collections of data that are too large to process using traditional software tools. The document notes that 2.5 quintillion bytes of data are created daily and that 90% of data was produced in the past two years. It provides examples of big data like presidential debates and photos. It also discusses challenges of mining big data due to its huge volume and complex, evolving relationships between data points.
2. What is ?
Data Mining
computational process of discovering patterns in
large data sets
Big Data
it is the term for a collection of data sets so large
and complex that it becomes difficult to process
data has exponential growth, both structured and
unstructured
3. How much Data does
exist?
2.5 quintillion bytes of data are created
EVERY DAY
IBM: 90 percent of the data in the world today were
produced with past two years
Forms of Data????
4. Big Data Examples
October 4th, 2012, the first presidential debate
Flicker and its photos
5. Problem!
Data has grown tremendously
This large amount of data is beyond the of software
tools to manage
Exploring the large volume of data and extracting
useful information and knowledge is a challenge,
and sometimes, it is almost infeasible
6. HACE Theorem
Heterogeneous, Autonomous, Complex, Evolving
Big data starts with large volume, heterogeneous,
autonomous sources with distributed and
decentralized control, and seeks to explore
complex and evolving relationships among data
These are characteristics of Big Data
This is theorem to model Big Data characteristics
8. Huge Data with heterogeneous and diverse
dimensionality
represent huge volume of data
Autonomous sources with distributed and
decentralized control
main characteristics of Big Data
Complex and evolving relationships
9. Data Mining Challenges with Big
Data
Big Data Mining Platform
Dig Data Semantics and Application Knowledge
I. Information Sharing and Data Privacy
II. Domain and Application Knowledge
Big Data Mining Algorithm
I. Local Learning and Model Fusion for Multiple
Information Sources
II. mining from Sparse, Uncertain, and Incomplete Data
III. Mining Complex and Dynamic Data