際際滷

際際滷Share a Scribd company logo
Data Mining and Data Warehouse
 INTRODUCTION
 DATA MINING
 WHY DATA MINING
 APPLICATION OF DATA MINING
 STEPS OF DATA MINING
 DATA MINING TECHNIQUES
 THREAT OF DATA MINING
 SOLUTION OF THREAT
 ROLE OF DATA MINING
 DATA WAREHOUSE
 OLTP & OLAP
 DATA MINING TOOLS
 LATEST RESEARCH
INTRODUCTION
Data mining, the extraction of hidden predictive information
from large databases, is a powerful new technology with great
potential to help companies focus on the most important
information in their data warehouses.
DATA MINING
It is extraction of previously unknown, valid and understandable
information or pattern from data in repositories or sources :
 Databases
 Text files
 Social networks
 Computer simulation
The information obtained should be such that is can be used in any
organizations and enterprises for business making.
Why Data Mining ?
Data. Data everywhere yet
 I cant find the data I need
 I cant get the data I need
 I cant understand the data I found
 I cant use the data I found
 Data explosion problem
Advance data collection tools and database technology lead to
tremendous amounts of data stored in database.
 We are drawing in data, but starving for
knowledge!
 Solution: Data warehousing and Data mining
- Data warehousing and on-line analytical processing.
- Extraction of interesting knowledge using data mining.
APPLICATION OF DATA MINING
Data Mining is primarily used today by companies with a strong
consumer focus  retail, financial, communication, and marketing
organizations.
1. FINANCE INDUSTRY
Credit Card Analysis
2. INSURANCE INDUSTRY
Claims and Fraud Analysis
3. TELECOMMUNICATION
Call Record Analysis
4. TRANSPORT
Logistics Management
5. CONSUMER GOODS
Promotion Analysis
6. SCIENTIFIC RESERCH
Image, Video, Speech
7. UTILITIES
Power Usage Analysis
STEPS OF DATA MINING
 Data integration
 Data selection
 Data transformation
 Data mining
 Pattern evaluation
 Knowledge presentation
Data Mining and Data Warehouse
DATA MINING TECHNIQUES
Classification and Prediction
example  Focused Hiring
Cluster Analysis
example  Market Segmentation
Outlier Analysis
example  Fraud Detection
Association Analysis
example  Market Basket Analysis
Evolution Analysis
example  Forecasting stock market index using Time series Analysis
Threat To Privacy From Data Mining
They data mine information about your buying habits, sites you surf, so they
can personalize your search results when you use their search engine. It's
both frightening but on the other hand, in theory it's a way for companies to
tailor your online experience. The problem, of course, is that while generally
the data isn't scoured by humans, it is used by machines.
SOLUTION OF DATA MINING THREAT
SOLUTIONS :
 Purposes Specification & Use Limitation
 Openness
 Security Measures like Encryption
ROLE OF DATA MINING IN IT
Business Intelligence
Model Tool Method
Behavioral Basics
Information TechnologyData
Problem
Decision
DATA WAREHOUSE
Data warehousing is a technology that aggregates
structured data from one or more sources so that it can
be compared and analyzed for greater business
intelligence.
Data Mining and Data Warehouse
DATA WAREHOUSE
 Data warehouse provides the enterprise with a
memory.
 Data Mining provides enterprise with intelligence.
OLTP & OLAP
On-Line Transaction Processing (OLTP)
Short, simple, frequent queries and modifications
Each involving a small number of tuples
Example  answering queries from a web interface, sales at cash registers,
selling airline tickets.
On-line Application Processing (OLAP)
Few but complex queries --- may run for hours.
Queries do not depend on having an absolutely up-to-date
Database.
Example  analyst at Wal-mart look for items with increasing sales in some
region.
Data Mining and Data Warehouse
DATA MINING TOOLS
 Microsoft SQL Server 2005
 Microsoft SQL Server 2008
 Oracle Data Mining
 DB Miner
Latest Research and Reviews on Data
Mining
1. Systematic discovery of mutation-specific synthetic lethal by mining pan-
cancer human primary tumor data.
2. Multi-label Learning for Predicting the Activities of Antimicrobial
Peptides.
3. Semantic correction system - Little complex but interesting. Generally
retried text faces semantic error, hence leads to wrong result. Applying
this as preprocessing leads to better outcomes.
4. Syntactic correction system - Much needed now a days. Non-English
speakers creates much syntactical error. It can also be used as
preprocessing job in many projects. So you algorithm should
automatically detect such errors and suggest correct grammar.
5. Search engine for Wikipedia - Wikipedia data available as dump file.
Check dbpedia for reference. Apply indexing techniques and build
small kind of SE for wiki pages. As Wikipedia already provides this
functionality but you can work on better user experience, result
optimization.
6. Twitter tweets classifier - Pretty easy and interesting too. Creating
learning system for various categories kind of Sports, entertainment,
business, politics, Hollywood etc. Train the classifier (naive bayes,
SVM) and predict the category for incoming tweets.
7. Sentiment analysis for twitter, review, conversations - There are few
packages available in R which can help to perform this job. One needs to add
few additional feature on top of that to make more intuitive. Nltk, Stanford,
good open source tools for the same.
8. Spam mail detection - Again learning based classification system. Train
the classifier using users pre-selected spam mail which would be able to
classify new upcoming mails. If uses mark new mail as spam, then
retrain(may be some other better option).
9. Sarcasms detection - This can be very interesting one. In sentiment
analysis we identify users sentiment regarding something's, here we identify
sarcasm expressed by users. Check out Page on psu.edu - Sarcasm detection
on twitter
Data Mining and Data Warehouse

More Related Content

Data Mining and Data Warehouse

  • 2. INTRODUCTION DATA MINING WHY DATA MINING APPLICATION OF DATA MINING STEPS OF DATA MINING DATA MINING TECHNIQUES THREAT OF DATA MINING SOLUTION OF THREAT ROLE OF DATA MINING DATA WAREHOUSE OLTP & OLAP DATA MINING TOOLS LATEST RESEARCH
  • 3. INTRODUCTION Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses.
  • 4. DATA MINING It is extraction of previously unknown, valid and understandable information or pattern from data in repositories or sources : Databases Text files Social networks Computer simulation The information obtained should be such that is can be used in any organizations and enterprises for business making.
  • 5. Why Data Mining ? Data. Data everywhere yet I cant find the data I need I cant get the data I need I cant understand the data I found I cant use the data I found
  • 6. Data explosion problem Advance data collection tools and database technology lead to tremendous amounts of data stored in database. We are drawing in data, but starving for knowledge! Solution: Data warehousing and Data mining - Data warehousing and on-line analytical processing. - Extraction of interesting knowledge using data mining.
  • 7. APPLICATION OF DATA MINING Data Mining is primarily used today by companies with a strong consumer focus retail, financial, communication, and marketing organizations.
  • 9. 2. INSURANCE INDUSTRY Claims and Fraud Analysis
  • 15. STEPS OF DATA MINING Data integration Data selection Data transformation Data mining Pattern evaluation Knowledge presentation
  • 17. DATA MINING TECHNIQUES Classification and Prediction example Focused Hiring Cluster Analysis example Market Segmentation Outlier Analysis example Fraud Detection Association Analysis example Market Basket Analysis Evolution Analysis example Forecasting stock market index using Time series Analysis
  • 18. Threat To Privacy From Data Mining They data mine information about your buying habits, sites you surf, so they can personalize your search results when you use their search engine. It's both frightening but on the other hand, in theory it's a way for companies to tailor your online experience. The problem, of course, is that while generally the data isn't scoured by humans, it is used by machines.
  • 19. SOLUTION OF DATA MINING THREAT SOLUTIONS : Purposes Specification & Use Limitation Openness Security Measures like Encryption
  • 20. ROLE OF DATA MINING IN IT Business Intelligence Model Tool Method Behavioral Basics Information TechnologyData Problem Decision
  • 21. DATA WAREHOUSE Data warehousing is a technology that aggregates structured data from one or more sources so that it can be compared and analyzed for greater business intelligence.
  • 23. DATA WAREHOUSE Data warehouse provides the enterprise with a memory. Data Mining provides enterprise with intelligence.
  • 24. OLTP & OLAP On-Line Transaction Processing (OLTP) Short, simple, frequent queries and modifications Each involving a small number of tuples Example answering queries from a web interface, sales at cash registers, selling airline tickets. On-line Application Processing (OLAP) Few but complex queries --- may run for hours. Queries do not depend on having an absolutely up-to-date Database. Example analyst at Wal-mart look for items with increasing sales in some region.
  • 26. DATA MINING TOOLS Microsoft SQL Server 2005 Microsoft SQL Server 2008 Oracle Data Mining DB Miner
  • 27. Latest Research and Reviews on Data Mining 1. Systematic discovery of mutation-specific synthetic lethal by mining pan- cancer human primary tumor data. 2. Multi-label Learning for Predicting the Activities of Antimicrobial Peptides. 3. Semantic correction system - Little complex but interesting. Generally retried text faces semantic error, hence leads to wrong result. Applying this as preprocessing leads to better outcomes.
  • 28. 4. Syntactic correction system - Much needed now a days. Non-English speakers creates much syntactical error. It can also be used as preprocessing job in many projects. So you algorithm should automatically detect such errors and suggest correct grammar. 5. Search engine for Wikipedia - Wikipedia data available as dump file. Check dbpedia for reference. Apply indexing techniques and build small kind of SE for wiki pages. As Wikipedia already provides this functionality but you can work on better user experience, result optimization. 6. Twitter tweets classifier - Pretty easy and interesting too. Creating learning system for various categories kind of Sports, entertainment, business, politics, Hollywood etc. Train the classifier (naive bayes, SVM) and predict the category for incoming tweets.
  • 29. 7. Sentiment analysis for twitter, review, conversations - There are few packages available in R which can help to perform this job. One needs to add few additional feature on top of that to make more intuitive. Nltk, Stanford, good open source tools for the same. 8. Spam mail detection - Again learning based classification system. Train the classifier using users pre-selected spam mail which would be able to classify new upcoming mails. If uses mark new mail as spam, then retrain(may be some other better option). 9. Sarcasms detection - This can be very interesting one. In sentiment analysis we identify users sentiment regarding something's, here we identify sarcasm expressed by users. Check out Page on psu.edu - Sarcasm detection on twitter