際際滷

際際滷Share a Scribd company logo
Knowledge Discovery & Representation
Knowledge Discovery & Representation
1.Introduction
 Knowledge discovery describes the process of automatically searching large
volumes of data for patterns that can be considered knowledge about the data.
 It can be categorized according to
1) what kind of data is searched
2) in what form is the result of the search represented.
 Knowledge discovery developed out of the Data mining domain, and is closely
related to it both in terms of methodology and terminology.
 Knowledge representation is a formalism for representing at least the data,
information and knowledge things in an application.
 Knowledge can be represented either as programs in an imperative language or
can be also represented as rules in a declarative language.
2.Knowledge Discovery
 It is also known as Knowledge Discovery in Databases (KDD).
Data
Knowledge
Discovery
Process
useful
information
Requires
much elapsed time.
Five steps of KDD process
3. Data Mining
Data mining involves many different algorithms to accomplish different
tasks
 Data mining algorithms can be characterized as consisting of three parts:
 The purpose of algorithm is to fit a model to the data.
Model
 Some criteria must be used to fit one model over another.
Preference
 All algorithms require some technique to search the data.
Search
4. Classification of Data
Mining
5. Working of Data Mining
 Data mining provides link between separate transaction and analytical systems.
 Data mining software analyzes relationships and patterns in stored transaction data
based on user queries.
 Generally four types of relationships are sought: classes, clusters, associations,
sequential patters.
Extract, transform,
and load
transaction data
Present the
data in a useful
format
Analyze the data
by application
software
Store and
manage the
data
Provide data
access to
business analysts
& IT professionals
Data mining
5. Clustering
WHAT IS A CLUSTER.?
 A cluster is collection of objects
which are similar between them
and are dissimilar to the objects
belonging to other clusters.
WHAT IS CLUSTERING.?
 The process of organizing objects
into groups whose members are
similar in some way.
 Distance-based clustering &
Conceptual clustering are some of
the types of clustering
Possible applications of
Clustering
Marketing
Biology Libraries
WorldWideWeb
Problems of clustering
Problems
Cant address
all
requirements
adequately
Large data
items can
cause time
complexity
The result
can be
interpreted in
different
ways
If obvious
distance
measure does not
exist defining it
is not easy
Clustering
algorithms
Exclusive Overlapping Hierarchical Probabilistic
Classification of Clustering
Algorithms
K-means Clustering
Original data K-means clustering
Clustering on mouse data set
 K-means is as iterative
clustering algorithm in
which items are moved
among sets of clusters
until the desired set is
reached.
This definition
assumes that each tuple
has only one numeric
value as apposed to a
tuple with many
attribute values.
K-means algorithm
Input:
 D = {t1,t2,..tn} //set of elements
 k //Number of desired clusters
Output:
 K //Set of clusters
Assign initial values for means m1,m2..mk;
Repeat
Assign each item ti to the cluster which has the closest mean;
Calculate the new mean for each cluster;
Until
---Example---
k = 2
{2,4,10,12,3,20,
30,11,25}
I
N
P
U
T
Output
m1 m2 K1 K2
2 4 {2,3} {4,10,12,20,30
,11,25}
2.5 16 {2,3,4} {10,12,20,30,1
1,25}
3 18 {2,3,4,10} {12,20,30,11,2
5}
4.75 19.6 {2,3,4,10,11,12} {20,30,25}
7 25 {2,3,4,10,11,12} {20,30,25}
Pictorial Representation
So we conclude with...
Knowledge Discovery & Representation
ThankYou
Ad

Recommended

Data Mining and Knowledge
Data Mining and Knowledge
Kartik Kalpande Patil
Illustrative approach-of-data-mining
Illustrative approach-of-data-mining
gufranresearcher
Data Mining: Classification and analysis
Data Mining: Classification and analysis
DataminingTools Inc
Data mining
Data mining
snegacmr
G045033841
G045033841
IJERA Editor
Data mining tasks
Data mining tasks
Khwaja Aamer
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET Journal
Data Cleaning Techniques
Data Cleaning Techniques
Amir Masoud Sefidian
Emerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networks
National Institute of Informatics
Simple and Flexible DHTs
Simple and Flexible DHTs
Luis Gal叩rraga
data mining
data mining
manasa polu
Basic terminologies
Basic terminologies
Rajendran
ESWC 2011 - Designing an Ontology for the Data Documentation Initiative
ESWC 2011 - Designing an Ontology for the Data Documentation Initiative
Dr.-Ing. Thomas Hartmann
Ghhh
Ghhh
agammya
Elementary data organisation
Elementary data organisation
Muzamil Hussain
Mining named entities -IIITH
Mining named entities -IIITH
gaurav264
Document Classification Using Hierarchies Clusters Technique
Document Classification Using Hierarchies Clusters Technique
upendra singh
Multidimensioal database
Multidimensioal database
TPO TPO
Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013
DataTactics
EDI Training Module 12: An Introduction to Metadata and Data Repositories
EDI Training Module 12: An Introduction to Metadata and Data Repositories
Environmental Data Initiative
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
ijsrd.com
EDI Training Module 4: Organizing Data Into Publishable Units
EDI Training Module 4: Organizing Data Into Publishable Units
Environmental Data Initiative
3. mining frequent patterns
3. mining frequent patterns
Azad public school
DM
DM
sowfi
MS Sql Server: Introduction To Datamining Suing Sql Server
MS Sql Server: Introduction To Datamining Suing Sql Server
DataminingTools Inc
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
IJwest
Talk
Talk
sumit621
Data mining
Data mining
Kinza Razzaq
Lecturedsfndskfjdsklfjldsdsfdsgmjdflgmdflmg.pptx
Lecturedsfndskfjdsklfjldsdsfdsgmjdflgmdflmg.pptx
JITENDER773791
Data Mining in Operating System
Data Mining in Operating System
ITz_1

More Related Content

What's hot (18)

Emerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networks
National Institute of Informatics
Simple and Flexible DHTs
Simple and Flexible DHTs
Luis Gal叩rraga
data mining
data mining
manasa polu
Basic terminologies
Basic terminologies
Rajendran
ESWC 2011 - Designing an Ontology for the Data Documentation Initiative
ESWC 2011 - Designing an Ontology for the Data Documentation Initiative
Dr.-Ing. Thomas Hartmann
Ghhh
Ghhh
agammya
Elementary data organisation
Elementary data organisation
Muzamil Hussain
Mining named entities -IIITH
Mining named entities -IIITH
gaurav264
Document Classification Using Hierarchies Clusters Technique
Document Classification Using Hierarchies Clusters Technique
upendra singh
Multidimensioal database
Multidimensioal database
TPO TPO
Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013
DataTactics
EDI Training Module 12: An Introduction to Metadata and Data Repositories
EDI Training Module 12: An Introduction to Metadata and Data Repositories
Environmental Data Initiative
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
ijsrd.com
EDI Training Module 4: Organizing Data Into Publishable Units
EDI Training Module 4: Organizing Data Into Publishable Units
Environmental Data Initiative
3. mining frequent patterns
3. mining frequent patterns
Azad public school
DM
DM
sowfi
MS Sql Server: Introduction To Datamining Suing Sql Server
MS Sql Server: Introduction To Datamining Suing Sql Server
DataminingTools Inc
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
IJwest
Emerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networks
National Institute of Informatics
Simple and Flexible DHTs
Simple and Flexible DHTs
Luis Gal叩rraga
Basic terminologies
Basic terminologies
Rajendran
ESWC 2011 - Designing an Ontology for the Data Documentation Initiative
ESWC 2011 - Designing an Ontology for the Data Documentation Initiative
Dr.-Ing. Thomas Hartmann
Elementary data organisation
Elementary data organisation
Muzamil Hussain
Mining named entities -IIITH
Mining named entities -IIITH
gaurav264
Document Classification Using Hierarchies Clusters Technique
Document Classification Using Hierarchies Clusters Technique
upendra singh
Multidimensioal database
Multidimensioal database
TPO TPO
Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013
DataTactics
EDI Training Module 12: An Introduction to Metadata and Data Repositories
EDI Training Module 12: An Introduction to Metadata and Data Repositories
Environmental Data Initiative
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
ijsrd.com
EDI Training Module 4: Organizing Data Into Publishable Units
EDI Training Module 4: Organizing Data Into Publishable Units
Environmental Data Initiative
DM
DM
sowfi
MS Sql Server: Introduction To Datamining Suing Sql Server
MS Sql Server: Introduction To Datamining Suing Sql Server
DataminingTools Inc
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
IJwest

Similar to Knowledge Discovery & Representation (20)

Talk
Talk
sumit621
Data mining
Data mining
Kinza Razzaq
Lecturedsfndskfjdsklfjldsdsfdsgmjdflgmdflmg.pptx
Lecturedsfndskfjdsklfjldsdsfdsgmjdflgmdflmg.pptx
JITENDER773791
Data Mining in Operating System
Data Mining in Operating System
ITz_1
Data mining
Data mining
Annies Minu
Advancing Knowledge Discovery and Data Mining
Advancing Knowledge Discovery and Data Mining
Ryota Eisaki
Data mining
Data mining
DeepikaT13
Data mining
Data mining
DeepikaT13
1328cvkdlgkdgjfdkjgjdfgdfkgdflgkgdfglkjgld8679 - Copy.ppt
1328cvkdlgkdgjfdkjgjdfgdfkgdflgkgdfglkjgld8679 - Copy.ppt
JITENDER773791
Lect 1 2 Data Mining.pptx for the predictive ananlysis
Lect 1 2 Data Mining.pptx for the predictive ananlysis
surajpandey4979
Data mining
Data mining
Birju Tank
Data mining an introduction
Data mining an introduction
Dr-Dipali Meher
Introduction to Data Mining and Data Warehousing
Introduction to Data Mining and Data Warehousing
yokeshmca
DWDM_UNIT4.pptx ddddddddddddddddddddddddddddd
DWDM_UNIT4.pptx ddddddddddddddddddddddddddddd
GangeshSawarkar
BIM Data Mining Unit1 by Tekendra Nath Yogi
BIM Data Mining Unit1 by Tekendra Nath Yogi
Tekendra Nath Yogi
DM-Unit-1-Part 1-R.pdf
DM-Unit-1-Part 1-R.pdf
ssuserb933d8
Data mining
Data mining
StudsPlanet.com
Data mining
Data mining
Nits Kedia
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
Dr. Abdul Ahad Abro
kdd vs database. For data mining btech pptx
kdd vs database. For data mining btech pptx
funadda1810
Lecturedsfndskfjdsklfjldsdsfdsgmjdflgmdflmg.pptx
Lecturedsfndskfjdsklfjldsdsfdsgmjdflgmdflmg.pptx
JITENDER773791
Data Mining in Operating System
Data Mining in Operating System
ITz_1
Advancing Knowledge Discovery and Data Mining
Advancing Knowledge Discovery and Data Mining
Ryota Eisaki
Data mining
Data mining
DeepikaT13
Data mining
Data mining
DeepikaT13
1328cvkdlgkdgjfdkjgjdfgdfkgdflgkgdfglkjgld8679 - Copy.ppt
1328cvkdlgkdgjfdkjgjdfgdfkgdflgkgdfglkjgld8679 - Copy.ppt
JITENDER773791
Lect 1 2 Data Mining.pptx for the predictive ananlysis
Lect 1 2 Data Mining.pptx for the predictive ananlysis
surajpandey4979
Data mining
Data mining
Birju Tank
Data mining an introduction
Data mining an introduction
Dr-Dipali Meher
Introduction to Data Mining and Data Warehousing
Introduction to Data Mining and Data Warehousing
yokeshmca
DWDM_UNIT4.pptx ddddddddddddddddddddddddddddd
DWDM_UNIT4.pptx ddddddddddddddddddddddddddddd
GangeshSawarkar
BIM Data Mining Unit1 by Tekendra Nath Yogi
BIM Data Mining Unit1 by Tekendra Nath Yogi
Tekendra Nath Yogi
DM-Unit-1-Part 1-R.pdf
DM-Unit-1-Part 1-R.pdf
ssuserb933d8
Data mining
Data mining
Nits Kedia
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
Dr. Abdul Ahad Abro
kdd vs database. For data mining btech pptx
kdd vs database. For data mining btech pptx
funadda1810
Ad

Knowledge Discovery & Representation

  • 3. 1.Introduction Knowledge discovery describes the process of automatically searching large volumes of data for patterns that can be considered knowledge about the data. It can be categorized according to 1) what kind of data is searched 2) in what form is the result of the search represented. Knowledge discovery developed out of the Data mining domain, and is closely related to it both in terms of methodology and terminology. Knowledge representation is a formalism for representing at least the data, information and knowledge things in an application. Knowledge can be represented either as programs in an imperative language or can be also represented as rules in a declarative language.
  • 4. 2.Knowledge Discovery It is also known as Knowledge Discovery in Databases (KDD). Data Knowledge Discovery Process useful information Requires much elapsed time.
  • 5. Five steps of KDD process
  • 6. 3. Data Mining Data mining involves many different algorithms to accomplish different tasks Data mining algorithms can be characterized as consisting of three parts: The purpose of algorithm is to fit a model to the data. Model Some criteria must be used to fit one model over another. Preference All algorithms require some technique to search the data. Search
  • 7. 4. Classification of Data Mining
  • 8. 5. Working of Data Mining Data mining provides link between separate transaction and analytical systems. Data mining software analyzes relationships and patterns in stored transaction data based on user queries. Generally four types of relationships are sought: classes, clusters, associations, sequential patters. Extract, transform, and load transaction data Present the data in a useful format Analyze the data by application software Store and manage the data Provide data access to business analysts & IT professionals Data mining
  • 9. 5. Clustering WHAT IS A CLUSTER.? A cluster is collection of objects which are similar between them and are dissimilar to the objects belonging to other clusters. WHAT IS CLUSTERING.? The process of organizing objects into groups whose members are similar in some way. Distance-based clustering & Conceptual clustering are some of the types of clustering
  • 11. Problems of clustering Problems Cant address all requirements adequately Large data items can cause time complexity The result can be interpreted in different ways If obvious distance measure does not exist defining it is not easy
  • 12. Clustering algorithms Exclusive Overlapping Hierarchical Probabilistic Classification of Clustering Algorithms
  • 13. K-means Clustering Original data K-means clustering Clustering on mouse data set K-means is as iterative clustering algorithm in which items are moved among sets of clusters until the desired set is reached. This definition assumes that each tuple has only one numeric value as apposed to a tuple with many attribute values.
  • 14. K-means algorithm Input: D = {t1,t2,..tn} //set of elements k //Number of desired clusters Output: K //Set of clusters Assign initial values for means m1,m2..mk; Repeat Assign each item ti to the cluster which has the closest mean; Calculate the new mean for each cluster; Until
  • 15. ---Example--- k = 2 {2,4,10,12,3,20, 30,11,25} I N P U T Output m1 m2 K1 K2 2 4 {2,3} {4,10,12,20,30 ,11,25} 2.5 16 {2,3,4} {10,12,20,30,1 1,25} 3 18 {2,3,4,10} {12,20,30,11,2 5} 4.75 19.6 {2,3,4,10,11,12} {20,30,25} 7 25 {2,3,4,10,11,12} {20,30,25}
  • 17. So we conclude with...