際際滷

際際滷Share a Scribd company logo
WEKA
BY: Keshab Kumar Gaurav
(ISSA, DRDO)
INTRODUCTION TO WEKA
 A collection of open source of many data
mining and machine learning algorithms,
Including
> Pre-processing on data
> Classification
> Clustering
> Association rule extraction
>3D Visualize
 Developed by researchers at the University
of Waikato in New Zealand
 Pure Java based (also open source).
Weka Main Features
 71 data pre-processing tools
 52 classification/regression algorithms
 7 clustering algorithms
 9 attribute/subset evaluators + 3 search
algorithms for feature selection.
 3 algorithms for finding association rules
 3 graphical user interfaces
The Explorer
The Experimenter
The Knowledge Flow
Weka : Download and Installation
 Download Weka (the stable version) from
http://www.cs.waikato.ac.nz/ml/weka/
 Choose a self-extracting executable (including
Java VM)
 After download is completed, run the self
extracting file to install Weka, and use the
default set-ups.
GOAL
The programs aims to build a state-of-the-art
facility for developing techniques for machine
learning and investigating their application in
key areas of machine learning.
Specifically we will create a workbench for
machine learning. Determine the factors that
contributes towards its successful application in
the agriculture, industries, scientific research
and developing new method for machine
learning and ways of accessing their
effectiveness.
Start Weka
From windows desktop
 click Start, choose All programs
 Choose Weka 3.7.9 to start Weka
Then the first interface window appears:
Weka GUI Chooser
WEKA APPLICATION
INTERFACES
 Explorer
 Environment for exploring data with WEKA. It gives
access to all the facilities using menu selection and
form filling.
 Experimenter
 It can be used to get the answer for a question: Which
methods and parameter values work best for the given
problem?
 Knowledge Flow
 Same function as explorer. Supports incremental
learning. It allows designing configurations for
streamed data processing. Incremental algorithms can
be used to process very large datasets.
 Simple CLI
 It provides a simple Command Line Interface for
directly executing WEKA commands.
WEKA Application Interface
WEKA FUNCTIONS AND
TOOLS
 Preprocessing Filters
 Attribute selection
 Classification/Regression
 Clustering
 Association discovery
 Visualization
LOAD DATA FILE AND
PREPROCESSING
 Load data file in formats: ARFF, CSV,
C4.5,binary
 Import from URL or SQL database (using
JDBC)
 Preprocessing filters
o Adding/removing attributes
o Attribute value substitution
o Discretization
o Time series filters (delta, shift)
o Sampling, randomization
o Missing value management
o Normalization and other numeric
transformations.
WEKA DATA FORMATS
FOUR FORMATS
 ARFF (Attribute Relation File Format) has two sections
 The Header information defines attribute name, type and
relations.
 The Data section lists the data records.
 CSV: Comma Separated Values (text file)
 C4.5: A format used by a decision induction algorithm C4.5,
requires two separated files
 Name file: defines the names of the attributes
 Date file: lists the records (samples)
 Binary
 Data can also be read from a URL or from an SQL database
(using JDBC).
ATTRIBUTE RELATION FILE FORMAT (arff)
An ARFF file consists of two distinct sections
 The Header section defines attribute name, type and
relations, start with a keyword.
@Relation <data-name>
@attribute <attribute-name> <type> or {range}
 The Data section lists the data records, starts with
@Data list of data instances
Example
WEKA SYSTEM HIERARCHY
Weka : A machine learning algorithms for data mining
Role of WEKA
INPUT
Raw data
Data Ming by WEKA
Pre-processing
Classification
Regression
Clustering
Association Rules
Visualization
OUTPUT
Result
KDD Process of WEKA
Data
Knowledge
Selection
Preprocessing
Transformation
Data Mining
Interpretation
Evaluation
CLASSIFICATION
 Predicted target must be categorical
 Implemented methods
 decision trees(J48) and rules
 Naive Bayes
 neural networks
 instance-based classifier
 Evaluation methods
 test data set
 cross validation
 (Example)
Weka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data mining
CLUSTERING
 Clustering allows a user to make groups of data to
determine patterns from the data.
 Clustering has its advantages when the data set is
defined and a general pattern needs to be
determined from the data.
 We can create a specific number of groups,
depending on your business needs.
 One defining benefit of clustering over classification
is that every attribute in the data set will be used to
analyze the data. (where as in the classification
method, only a subset of the attributes are used in
the model.)
Clustering SimpleKMeans
ASSOCIATION
There are few association rules algorithms
implemented in WEKA. They try to 鍖nd
associations between di鍖erent attributes instead
of trying to predict the value of the class
attribute.
Association Rules (A=>B)
3D Visualising
Conclusion
The overall goal of Weka is to build a state-
of-the-art facility for developing machine
learning (ML) techniques and allow people to
apply them to real-world data mining
problems.
Thank You !!!

More Related Content

Weka : A machine learning algorithms for data mining

  • 1. WEKA BY: Keshab Kumar Gaurav (ISSA, DRDO)
  • 2. INTRODUCTION TO WEKA A collection of open source of many data mining and machine learning algorithms, Including > Pre-processing on data > Classification > Clustering > Association rule extraction >3D Visualize Developed by researchers at the University of Waikato in New Zealand Pure Java based (also open source).
  • 3. Weka Main Features 71 data pre-processing tools 52 classification/regression algorithms 7 clustering algorithms 9 attribute/subset evaluators + 3 search algorithms for feature selection. 3 algorithms for finding association rules 3 graphical user interfaces The Explorer The Experimenter The Knowledge Flow
  • 4. Weka : Download and Installation Download Weka (the stable version) from http://www.cs.waikato.ac.nz/ml/weka/ Choose a self-extracting executable (including Java VM) After download is completed, run the self extracting file to install Weka, and use the default set-ups.
  • 5. GOAL The programs aims to build a state-of-the-art facility for developing techniques for machine learning and investigating their application in key areas of machine learning. Specifically we will create a workbench for machine learning. Determine the factors that contributes towards its successful application in the agriculture, industries, scientific research and developing new method for machine learning and ways of accessing their effectiveness.
  • 6. Start Weka From windows desktop click Start, choose All programs Choose Weka 3.7.9 to start Weka Then the first interface window appears: Weka GUI Chooser
  • 8. Explorer Environment for exploring data with WEKA. It gives access to all the facilities using menu selection and form filling. Experimenter It can be used to get the answer for a question: Which methods and parameter values work best for the given problem? Knowledge Flow Same function as explorer. Supports incremental learning. It allows designing configurations for streamed data processing. Incremental algorithms can be used to process very large datasets.
  • 9. Simple CLI It provides a simple Command Line Interface for directly executing WEKA commands. WEKA Application Interface
  • 11. Preprocessing Filters Attribute selection Classification/Regression Clustering Association discovery Visualization
  • 12. LOAD DATA FILE AND PREPROCESSING
  • 13. Load data file in formats: ARFF, CSV, C4.5,binary Import from URL or SQL database (using JDBC) Preprocessing filters o Adding/removing attributes o Attribute value substitution o Discretization o Time series filters (delta, shift) o Sampling, randomization o Missing value management o Normalization and other numeric transformations.
  • 15. FOUR FORMATS ARFF (Attribute Relation File Format) has two sections The Header information defines attribute name, type and relations. The Data section lists the data records. CSV: Comma Separated Values (text file) C4.5: A format used by a decision induction algorithm C4.5, requires two separated files Name file: defines the names of the attributes Date file: lists the records (samples) Binary Data can also be read from a URL or from an SQL database (using JDBC).
  • 16. ATTRIBUTE RELATION FILE FORMAT (arff) An ARFF file consists of two distinct sections The Header section defines attribute name, type and relations, start with a keyword. @Relation <data-name> @attribute <attribute-name> <type> or {range} The Data section lists the data records, starts with @Data list of data instances
  • 20. Role of WEKA INPUT Raw data Data Ming by WEKA Pre-processing Classification Regression Clustering Association Rules Visualization OUTPUT Result
  • 21. KDD Process of WEKA Data Knowledge Selection Preprocessing Transformation Data Mining Interpretation Evaluation
  • 23. Predicted target must be categorical Implemented methods decision trees(J48) and rules Naive Bayes neural networks instance-based classifier Evaluation methods test data set cross validation (Example)
  • 27. Clustering allows a user to make groups of data to determine patterns from the data. Clustering has its advantages when the data set is defined and a general pattern needs to be determined from the data. We can create a specific number of groups, depending on your business needs.
  • 28. One defining benefit of clustering over classification is that every attribute in the data set will be used to analyze the data. (where as in the classification method, only a subset of the attributes are used in the model.)
  • 31. There are few association rules algorithms implemented in WEKA. They try to 鍖nd associations between di鍖erent attributes instead of trying to predict the value of the class attribute.
  • 34. Conclusion The overall goal of Weka is to build a state- of-the-art facility for developing machine learning (ML) techniques and allow people to apply them to real-world data mining problems.