Weka is a collection of machine learning algorithms for data mining tasks. The name "Weka" stands for "Waikato Environment for Knowledge Analysis," as it was developed at the University of Waikato in New Zealand. Weka provides a graphical user interface (GUI) that makes it easy to experiment with various machine learning algorithms on datasets.
1 of 35
Download to read offline
More Related Content
Weka : A machine learning algorithms for data mining
2. INTRODUCTION TO WEKA
A collection of open source of many data
mining and machine learning algorithms,
Including
> Pre-processing on data
> Classification
> Clustering
> Association rule extraction
>3D Visualize
Developed by researchers at the University
of Waikato in New Zealand
Pure Java based (also open source).
3. Weka Main Features
71 data pre-processing tools
52 classification/regression algorithms
7 clustering algorithms
9 attribute/subset evaluators + 3 search
algorithms for feature selection.
3 algorithms for finding association rules
3 graphical user interfaces
The Explorer
The Experimenter
The Knowledge Flow
4. Weka : Download and Installation
Download Weka (the stable version) from
http://www.cs.waikato.ac.nz/ml/weka/
Choose a self-extracting executable (including
Java VM)
After download is completed, run the self
extracting file to install Weka, and use the
default set-ups.
5. GOAL
The programs aims to build a state-of-the-art
facility for developing techniques for machine
learning and investigating their application in
key areas of machine learning.
Specifically we will create a workbench for
machine learning. Determine the factors that
contributes towards its successful application in
the agriculture, industries, scientific research
and developing new method for machine
learning and ways of accessing their
effectiveness.
6. Start Weka
From windows desktop
click Start, choose All programs
Choose Weka 3.7.9 to start Weka
Then the first interface window appears:
Weka GUI Chooser
8. Explorer
Environment for exploring data with WEKA. It gives
access to all the facilities using menu selection and
form filling.
Experimenter
It can be used to get the answer for a question: Which
methods and parameter values work best for the given
problem?
Knowledge Flow
Same function as explorer. Supports incremental
learning. It allows designing configurations for
streamed data processing. Incremental algorithms can
be used to process very large datasets.
9. Simple CLI
It provides a simple Command Line Interface for
directly executing WEKA commands.
WEKA Application Interface
13. Load data file in formats: ARFF, CSV,
C4.5,binary
Import from URL or SQL database (using
JDBC)
Preprocessing filters
o Adding/removing attributes
o Attribute value substitution
o Discretization
o Time series filters (delta, shift)
o Sampling, randomization
o Missing value management
o Normalization and other numeric
transformations.
15. FOUR FORMATS
ARFF (Attribute Relation File Format) has two sections
The Header information defines attribute name, type and
relations.
The Data section lists the data records.
CSV: Comma Separated Values (text file)
C4.5: A format used by a decision induction algorithm C4.5,
requires two separated files
Name file: defines the names of the attributes
Date file: lists the records (samples)
Binary
Data can also be read from a URL or from an SQL database
(using JDBC).
16. ATTRIBUTE RELATION FILE FORMAT (arff)
An ARFF file consists of two distinct sections
The Header section defines attribute name, type and
relations, start with a keyword.
@Relation <data-name>
@attribute <attribute-name> <type> or {range}
The Data section lists the data records, starts with
@Data list of data instances
23. Predicted target must be categorical
Implemented methods
decision trees(J48) and rules
Naive Bayes
neural networks
instance-based classifier
Evaluation methods
test data set
cross validation
(Example)
27. Clustering allows a user to make groups of data to
determine patterns from the data.
Clustering has its advantages when the data set is
defined and a general pattern needs to be
determined from the data.
We can create a specific number of groups,
depending on your business needs.
28. One defining benefit of clustering over classification
is that every attribute in the data set will be used to
analyze the data. (where as in the classification
method, only a subset of the attributes are used in
the model.)
31. There are few association rules algorithms
implemented in WEKA. They try to 鍖nd
associations between di鍖erent attributes instead
of trying to predict the value of the class
attribute.
34. Conclusion
The overall goal of Weka is to build a state-
of-the-art facility for developing machine
learning (ML) techniques and allow people to
apply them to real-world data mining
problems.