際際滷

際際滷Share a Scribd company logo
Data Mining With SQL Server
Nguyen To Hoan Phuc
Pham Huu Khanh
2
Objectives
 Understand What is Data Mining
 Learn the Data Mining Process
 How to work with Data Mining
3
Agenda
 Data vs. Information
 What is Data Mining?
 Technical Platform
 Data Mining Process Overview
 Key Concepts and Terminology
 DEMO: Data Mining Process in Detail Using DMX
4
Data vs. Information
5
6
7
8
What is Data Mining?
9
Data Mining
 Technologies for analysis of data and discovery of
(very) hidden patterns
 Uses a combination of statistics, probability analysis
and database technologies
 Fairly young (<20 years old) but clever algorithms
developed through database research
10
What does Data Mining Do?
Explores
Your Data
Finds
Patterns -
Trends
Performs
Predictions
11
Data Mining Tasks
 Classification
 Ph但n loa味i, x棚p ha味ng
 Regression
 H担i quy
 Segmentation
 Ph但n khuc
 Association
 Li棚n k棚t
 Sequence Analysis
 Ph但n tich chu担i, day
12
The Technical Platform
13
 Data acquisition and
integration from
multiple sources
 Data transformation
and synthesis
 Knowledge and
pattern detection
through Data Mining
 Data enrichment with
logic rules and
hierarchical views
 Data presentation
and distribution
 Data publishing for
mass recipients
Integrate Analyze Report
SQL Server
We Need More Than Just Database Engine
14
DM  Part of Microsoft SQL Server
15
Server Mining Architecture
Analysis Services
Server
Mining Model
Data Mining Algorithm Data
Source
Excel/Visio/SSRS/Your App
OLE DB/ADOMD/XMLA
Deploy
BIDS
Excel
Visio
SSMS
App
Data
16
Data Mining Process Overview
17
Mining Model Mining ModelMining Model
Mining Process
DM EngineDM Engine
Training data
Data to be
predictedMining Model
With
predictions
18
Steps for Building a DM Model
1. Model Creation
 Define columns for cases: visually (BIDS), using DMX, or from PMML
2. Model Training
 Feed lots of data from a real database, or from a system log
Congratulations! We now have a model
3. Model Testing
 Test on sample data to check predictions.
 Testing data must be different from training
 If we get nonsense, adjust the algorithm, its parameters, model design, or even
data
4. Model Use (Exploration and Prediction)
 Use the model on new data to predict outcomes
19
Many Approaches
 Work the way you like:
 Database experts and SQL veterans:
 Write queries in DMX (similar to T-SQL)
 Everyone else:
 Use Business Intelligence Development Studio (BIDS)  rich GUI
included with SSAS
 Hosted in Visual Studio (included!)
 You dont have to program  click-click instead
 Use Excel 2007 with Data Mining Add-Ins
 The Data Mining tab has everything you need
 Table Analysis tab is easier but simplified
20
Key Concepts and Terminology
21
Mining Structure
 Describes data to be mined
 Columns from a data source and their:
 Data Type
 Content Type
 Contains Mining Models
 Often we build several different models in one structure
 Holds training data, known as Cases (if required)
 Holds testing data, known as Holdout (in SQL 2008)
22
Mining Structure
23
Data Mining Model
 Container of patterns discovered by a Data Mining
Algorithm amongst the training Cases
 A table containing patterns
 Expressed by visualisers
 Specifies usage of columns already defined in the
Mining Structure
24
Cases: The Things We Study
 Case  set of columns (attributes) you want to analyse
 Age, Gender, Region, Annual Spending
 Case Key  unique ID of a case
25
DEMO
Data Mining Process in Detail
Using DMX
26
Data Mining Extensions
DMX
 T-SQL for Data Mining
 Easy! Like scripting for IT Pros
 Two types of statements:
 Data Definition
 CREATE, ALTER, EXPORT, IMPORT, DROP
 Data Manipulation
 INSERT INTO, SELECT, DELETE
27
DMX  Just Like T-SQL
CREATE MINING MODEL CreditRisk
(CustID LONG KEY,
Gender TEXT DISCRETE,
Income LONG CONTINUOUS,
Profession TEXT DISCRETE,
Risk TEXT DISCRETE PREDICT)
USING Microsoft_Decision_Trees
INSERT INTO CreditRisk
(CustId, Gender, Income, Profession,
Risk)
Select
CustomerID, Gender, Income,
Profession,Risk
From Customers
Select NewCustomers.CustomerID, CreditRisk.Risk,
PredictProbability(CreditRisk.Risk)
FROM CreditRisk PREDICTION JOIN NewCustomers
ON CreditRisk.Gender=NewCustomer.Gender
AND CreditRisk.Income=NewCustomer.Income
AND CreditRisk.Profession=NewCustomer.Profession
28
Demos Steps
1
 Create Mining Structure
2
 Create Mining Model
3
 Process Mining Model
4
 Test Model
5
 Execute Prediction
29
30

More Related Content

Data Mining With SQL Server

  • 1. Data Mining With SQL Server Nguyen To Hoan Phuc Pham Huu Khanh
  • 2. 2 Objectives Understand What is Data Mining Learn the Data Mining Process How to work with Data Mining
  • 3. 3 Agenda Data vs. Information What is Data Mining? Technical Platform Data Mining Process Overview Key Concepts and Terminology DEMO: Data Mining Process in Detail Using DMX
  • 5. 5
  • 6. 6
  • 7. 7
  • 8. 8 What is Data Mining?
  • 9. 9 Data Mining Technologies for analysis of data and discovery of (very) hidden patterns Uses a combination of statistics, probability analysis and database technologies Fairly young (<20 years old) but clever algorithms developed through database research
  • 10. 10 What does Data Mining Do? Explores Your Data Finds Patterns - Trends Performs Predictions
  • 11. 11 Data Mining Tasks Classification Ph但n loa味i, x棚p ha味ng Regression H担i quy Segmentation Ph但n khuc Association Li棚n k棚t Sequence Analysis Ph但n tich chu担i, day
  • 13. 13 Data acquisition and integration from multiple sources Data transformation and synthesis Knowledge and pattern detection through Data Mining Data enrichment with logic rules and hierarchical views Data presentation and distribution Data publishing for mass recipients Integrate Analyze Report SQL Server We Need More Than Just Database Engine
  • 14. 14 DM Part of Microsoft SQL Server
  • 15. 15 Server Mining Architecture Analysis Services Server Mining Model Data Mining Algorithm Data Source Excel/Visio/SSRS/Your App OLE DB/ADOMD/XMLA Deploy BIDS Excel Visio SSMS App Data
  • 17. 17 Mining Model Mining ModelMining Model Mining Process DM EngineDM Engine Training data Data to be predictedMining Model With predictions
  • 18. 18 Steps for Building a DM Model 1. Model Creation Define columns for cases: visually (BIDS), using DMX, or from PMML 2. Model Training Feed lots of data from a real database, or from a system log Congratulations! We now have a model 3. Model Testing Test on sample data to check predictions. Testing data must be different from training If we get nonsense, adjust the algorithm, its parameters, model design, or even data 4. Model Use (Exploration and Prediction) Use the model on new data to predict outcomes
  • 19. 19 Many Approaches Work the way you like: Database experts and SQL veterans: Write queries in DMX (similar to T-SQL) Everyone else: Use Business Intelligence Development Studio (BIDS) rich GUI included with SSAS Hosted in Visual Studio (included!) You dont have to program click-click instead Use Excel 2007 with Data Mining Add-Ins The Data Mining tab has everything you need Table Analysis tab is easier but simplified
  • 20. 20 Key Concepts and Terminology
  • 21. 21 Mining Structure Describes data to be mined Columns from a data source and their: Data Type Content Type Contains Mining Models Often we build several different models in one structure Holds training data, known as Cases (if required) Holds testing data, known as Holdout (in SQL 2008)
  • 23. 23 Data Mining Model Container of patterns discovered by a Data Mining Algorithm amongst the training Cases A table containing patterns Expressed by visualisers Specifies usage of columns already defined in the Mining Structure
  • 24. 24 Cases: The Things We Study Case set of columns (attributes) you want to analyse Age, Gender, Region, Annual Spending Case Key unique ID of a case
  • 25. 25 DEMO Data Mining Process in Detail Using DMX
  • 26. 26 Data Mining Extensions DMX T-SQL for Data Mining Easy! Like scripting for IT Pros Two types of statements: Data Definition CREATE, ALTER, EXPORT, IMPORT, DROP Data Manipulation INSERT INTO, SELECT, DELETE
  • 27. 27 DMX Just Like T-SQL CREATE MINING MODEL CreditRisk (CustID LONG KEY, Gender TEXT DISCRETE, Income LONG CONTINUOUS, Profession TEXT DISCRETE, Risk TEXT DISCRETE PREDICT) USING Microsoft_Decision_Trees INSERT INTO CreditRisk (CustId, Gender, Income, Profession, Risk) Select CustomerID, Gender, Income, Profession,Risk From Customers Select NewCustomers.CustomerID, CreditRisk.Risk, PredictProbability(CreditRisk.Risk) FROM CreditRisk PREDICTION JOIN NewCustomers ON CreditRisk.Gender=NewCustomer.Gender AND CreditRisk.Income=NewCustomer.Income AND CreditRisk.Profession=NewCustomer.Profession
  • 28. 28 Demos Steps 1 Create Mining Structure 2 Create Mining Model 3 Process Mining Model 4 Test Model 5 Execute Prediction
  • 29. 29
  • 30. 30

Editor's Notes

  • #6: V但n棚 l動utr動, kinh phi, quanly, baoquanPhatsinhngaycangnhi棚uKho khnchovi棚味c hi棚u Nhuc但udoanhnghi棚味p: ruttrich動董味cth担ng tin t動 d動 li棚味u, h担 tr董味 raquy棚ti味nh, tngtinhca味nhtranhtr棚nthi味 tr動董ng.
  • #8: Nhuc但uv棚 Information: d動味 oan, h担 tr董味 raquy棚ti味nh
  • #10: Statistics: thongkeProbability: xacsuatLa mot linhvuc con khatre, duocphattriencach day chua den 20 namnhungcacthuattoanduocphattrienkharo rang
  • #12: Gi畉i thu畉t ph但n lo畉i (Classification Algorithm) d畛 o叩n ra m畛t ho畉c nhi畛u gi叩 tr畛 bi畉n r畛i r畉c, d畛a tr棚n c叩c thu畛c t鱈nh kh叩c c畛a t畉p d畛 li畛u. i畛n h狸nh l gi畉i thu畉t C但y Quy畉t 畛nh Microsoft Decision Trees Algorithm.Gi畉i thu畉t 畛 qui (Regression Algorithm) d畛 o叩n m畛t ho畉c nhi畛u bi畉n gi叩 tr畛 li棚n t畛c, nh動 l畛i nhu畉n v gi叩 tr畛 thua l畛, d畛a tr棚n c叩c thu畛c t鱈nh d畛 li畛u kh叩c trong t畉p d畛 li畛u. i畛n h狸nh l gi畉i thu畉t chu畛i th畛i gian Microsoft Time Series Algorithm.Gi畉i thu畉t ph但n o畉n (Segmentation Algorithm) ph但n chia d畛 li畛u thnh nhi畛u nh坦m g畛m c叩c thnh ph畉n c坦 thu畛c t鱈nh t動董ng t畛 nhau. Gi畉i thu畉t i畛n h狸nh l Microsoft Clustering Algorithm.Gi畉i thu畉t t動董ng quan (Assocication Algorithm) t狸m s畛 t動董ng quan gi畛a c叩c thu畛c t鱈nh trong c畛ng t畉p d畛 li畛u. 畛ng d畛ng ph畛 bi畉n nh畉t c畛a gi畉i thu畉t ny l x但y d畛ng c叩c lu畉t t動董ng quan, ph但n t鱈ch gi畛 hng. Gi畉i thu畉t i畛n h狸nh lo畉i gi畉i thu畉t ny l Microsoft Assocciation AlgorithmGi畉i thu畉t ph但n t鱈ch tuy畉n t鱈nh (Sequence Analysis Allgorithm) t畛ng k畉t c叩c chu畛i ho畉c m畉ng d畛 li畛u trong t畉p d畛 li畛u. i畛n h狸nh cho lo畉i gi畉i thu畉t ny l Microsoft Sequence Clustering Algorithm
  • #16: SSMS: SQL Server Management StudioC担ng cu味 棚 ta味oracac Mining Model. Cacc担ng cu味 動董味c Microsoft cungc但pg担m co: Business Inteligence Development Studio, Excel, Visio, SQL Server Management Studio. Saukhita味oracac Mining Model, c但nphaitri棚nkhail棚nh棚味 th担ng Analysis Services (A.S). Analysis Service la n董iv但味nhanh, quanly cac Model.L動u y rngcac Model saukhi動董味ctri棚nkhail棚n A.S chi la cac Model r担ng. 棚 co th棚 動avaos動 du味ng, c但nphai qua m担味t qua trinhgo味i la Training Model (ho味c Process Model). Vi th棚 c但n棚nthanhph但nth動 3 o la Data Source. Data source la n董ich動ad動 li棚味uc但nthi棚tchovi棚味c Training Model va ca qua trinh Test Model. Vi th棚 c但nphai chia l動董味ng Data thanh 2 ph但nri棚ngbi棚味t棚 phu味c vu味 cho 2 tac vu味 tr棚n.Thanhph但nth動 4, o la cac動ngdu味ngkhaithaccac Mining Model a 動董味cx但yd動味ng. Cac動ngdu味ng co th棚 la cacph但nm棚m動董味c Microsoft cungc但pnh動 Excel, Visio ho味c動ngdu味ng do ng動董idungx但yd動味ng. Cac動ngdu味ngnayg董id動 li棚味ucuaminhxu担ng Analysis Service va nh但味nphanh担i la k棚t qua cua qua trinh Data Mining tr董 la味i.
  • #20: Approaches: ph動董ngphapti棚pc但味n
  • #21: Concepts:khaini棚味mTerminology: thu但味tng動
  • #22: Mining Structures (Analysis Services - Data Mining)The mining structure defines the data from which mining models are built: it specifies the source data view, the number and type of columns, and an optional partition into training and testing sets. A single mining structure can support multiple mining models that share the same domain. The following diagram illustrates the relationship of the data mining structure to the data source, and to its constituent data mining models.
  • #23: The mining structure in the diagram is based on a data source that contains multiple tables, joined on the CustomerID field. One table contains information about customers, such as the geographical region, age, income and gender, while the related nested table contains multiple rows of additional information about each customer, such as products the customer has purchased. The diagram shows that multiple models can be built on one mining structure, and that the models can use different columns from the structure. Model 1 Uses CustomerID, Income, Age, Region, and filters the data on Region.Model 2 Uses CustomerID, Income, Age, Region and filters the data on Age.Model 3 Uses CustomerID, Age, Gender, and the nested table, with no filter.Because the models use different columns for input, and because two of the models additionally restrict the data that is used in the model by applying a filter, the models might have very different results even though they are based on the same data. Note that the CustomerID column is required in all models because it is the only available column that can be used as the case key.This section explains the basic architecture of data mining structures. For more information about how to create, manage, modify, or view data mining structures, see Managing Data Mining Structures and Models.
  • #24: A data mining model gets data from a mining structure and then analyzes that data by using a data mining algorithm. The mining structure and mining model are separate objects. The mining structure stores information that defines the data source. A mining model stores information derived from statistical processing of the data, such as the patterns found as a result of analysis. A mining model is empty until the data provided by the mining structure has been processed and analyzed. After a mining model has been processed, it contains metadata, results, and bindings back to the mining structure.