狠狠撸

狠狠撸Share a Scribd company logo
IT445
Decision Support Systems
College of Computing and Informatics
Week 10
Data analytics: Getting started with
Orange
Contents
o Orange Workflow Overview.
o Visualizing data from input data file.
o Analyzing data with regression models and decision trees.
o Analyzing data with deep learning models.
o Text processing and classification.
Weekly Learning Outcomes
1. Understand Data workflow in orange.
2. Inspect data with orange visualisation.
3. Develop predictive models in orange, including regression, decision trees
and deep learning models.
4. Perform Textual data analysis with orange.
5. Assess quality of various predication methods.
Required Reading
? Orange Visual Programming Documentation (Release 3). Orange (2021).
https://buildmedia.readthedocs.org/media/pdf/orange-visual-programming/latest/orange-visual-
programming.pdf
? Chapter 1 – section 1.2, 1.3 (subsection: 1-2), 1.4, 1.5, Chapter 2 – section 2.1 (subsection: 1-10, 34) 2.2
(subsection: 1,3,4,5,16) 2.3 (subsection: 5,9,10,13) and section 2.4, (subsection: 2,4,6)
? AJDA. (2017, August 4). Text Analysis: New Features. Orangedatamining.Com.
https://orangedatamining.com/blog/2017/08/04/text-analysis-new-features/
Recommended Readings
? Zupan, D. (2018, May). Introduction to Data Mining: Working notes for the hands-on course with Orange Data Mining. University of
Ljubljana. https://file.biolab.si/notes/2018-05-intro-to-datamining-notes.pdf
? Lesson: 1,2,3,4,5,6,7,8,10,14,17,31, 32.
? Foong, N. W. (2019, August 7). Data Science Made Easy: Interactive Data Visualization using Orange. Medium,
Towardsdatascience.com. https://towardsdatascience.com/data-science-made-easy-interactive-data-visualization-using-orange-
de8d5f6b7f2b
? Analytics Vidhya, A. (2017, September 7). Building Machine Learning Model is fun using Orange. Analytics Vidhya.
https://www.analyticsvidhya.com/blog/2017/09/building-machine-learning-model-fun-using-orange/
? Foong, N. W. (2019b, August 29). Data Science Made Easy: Image Analytics using Orange. Medium, Towardsdatascience.com.
https://towardsdatascience.com/data-science-made-easy-image-analytics-using-orange-ad4af375ca7a
This Presentation is mainly dependent on the above recourses.
Recommended Videos
? Getting Started with Orange 01: Welcome to Orange (2015, December 21). [Video]. YouTube.
https://www.youtube.com/channel/UClKKWBe2SCAEyv7ZNGhIe4g
? Orange Data Mining tool. (2016, May 5). [Video]. YouTube.
https://www.youtube.com/watch?v=rrsRBSCHDXw
? Getting Started with Orange 16: Text Preprocessing (2017, Jun 20). [Video]. YouTube.
youtube.com/watch?v=V70UwJZWkZ8
? Text Mining: Twitter Data Analysis (2020, August 4). [Video]. YouTube.
https://www.youtube.com/watch?v=HDkI6G4slzQ
? Getting Started with Orange 18: Text Classification (2017, Jun 28). [Video]. YouTube.
https://www.youtube.com/watch?v=zO_zwKZCULo
Orange workflow Overview
Orange workflow
“Orange is a component-based data mining software. It includes a range of data
visualization, exploration, preprocessing and modeling techniques. It can be used
through a nice and intuitive user interface or, for more advanced users, as a
module for the Python programming language.”(Orange official GitHub page).
Orange workflow
? The core principle of Orange is visual programming.
? The basic processing unit of any data manipulation in Orange are called widgets.
? Each analytical step/action is contained within a widget.
? Widgets communicate by sending information along with a communication channel and the output from
one widget is used as input to another.
? A workflow is the sequence of steps/actions that is performed to accomplish a particular task.
? Widgets are are placed on the canvas and connected into an analytical workflow.
? Orange analytical workflow is executed from left to right and never passes data backwards.
? Orange workflows consist of components that
? Read
? Process
? Visualize data
Simple Orange workflow - Files and Data
Tables
? File widget: reads the data.
? Data Table widget: a viewer
and shows the data in a
spreadsheet. It passes
onwards only the selection.
? The data is always available
in the File widget.
Workflow with two connected widgets
Simple Orange workflow - Files and Data
Tables
? Most Orange workflows would probably start with the File widget.
? Orange can import any comma, .xlsx or tab-delimited data file or URL.
Example:
File widget is used to read the data that is sent to both the Data Table and the Box
Plot widgets.
Workflows with subsets
? Visualizations in Orange are interactive, which means the user can select data
instances from the plot and pass them downstream.
Example:
? Selecting subsets
? Step 1: Place File widget on the canvas.
? Step 2: Connect Scatter Plot to it.
? Step 3: Click and drag a rectangle around a subset of points.
? Step 4: Connect Data Table to Scatter Plot ? Data Table will show selected points.
? Highlighting workflows
? Connect Data Table to Scatter Plot.
? Select a subset of points from the Data Table ? Scatter Plot will highlight selected points.
Workflows - data exploration
? Feature Statistics widget provides a quick way to inspect and find interesting
features in a given data set.
? Example: Heart-disease data exploration
? Select a subset of potentially interesting
features, or simply select the features we
want to keep.
? The widget will outputs a new data set with
only these selected feature.
Workflows with Models
? Predictive models are evaluated in
Test and Score widget.
? Test and Score accepts several
inputs:
1. Data (data set for evaluating
models).
2. Learners (algorithms to use for
training the model).
3. Optional preprocessor (for
normalization or feature selection).
Workflows with Models
The widget does two things:
1. It shows evaluation results (results of
testing different
classification/regression algorithms).
2. It outputs evaluation results, which can
be used by other widgets for analysing
the performance of classifiers, such
as confusion matrix.
Sampling setting (e.g., performs cross-validation or
some other train-and-test procedures).
Workflows with Models - Evaluation
? Confusion matrix widget: show
proportions between the predicted and
actual class.
Inputs:
? Evaluation results: results of testing
classification algorithms.
Outputs:
? Selected Data: data subset selected from
confusion matrix.
? Data: data with the additional information on
whether a data instance was selected.
The test results are fed into the Confusion
Matrix, where we can observe how many
instances were misclassified and in which way.
Workflows with Models
? Predictions on new data are done
in Predictions:
? The training data is first passed
to the model.
? Once the model is trained, it is
passed to Predictions.
? The Predictions widget also
needs data to predict on (Test
data), which are passed as a
second input.
Workflows with Models
? Predictive models can be saved and
reused in different Orange.
Workflows.
? To save a model:
1. Models first require data for
training.
2. They output a trained model,
which can be saved with Save
Model widget in the pickle format.
? Trained model can be loaded and
used in Predictions and/or
elsewhere.
Visualizing data from input data file
Visualizations in Orange
? Visualizations are an essential part
of data science.
? Visualizations in Orange are
interactive.
Visualizations in Orange
Interactive visualization workflow example
Visualizations in Orange
Visualization workflow of data subsets example
Visualizations in Orange
? Exercise: Build a simple workflow with File and Scatter Plot for Iris dataset.
The scatter plot is showing x-axis (petal
width) and length for three species of
Iris flowers (y-axis). The relation
between them increase linearly.
Visualizations in Orange
? Scatter Plot supports zooming-in and out of part of the plot and a manual
selection of data instances.
? Example: Explorative data analysis using Iris dataset.
1. Selected data instances from
a rectangular region on
Scatter plot.
2. Sent them to the Data Table
widget.
3. Explore the relationship
between any two variables.
Visualizations in Orange
? Basic Data Exploration from input data file.
Example: heart_disease
How does the data look?
Cont…
Example: heart_disease
Explore the data with standard visualizations tell us anything interesting!
? The Box Plot widget is most commonly used immediately
after the File widget to observe the statistical properties of
a dataset and discover any anomalies, such as duplicated
values, outliers, …)
? The Scatter Plot widget provides a 2-dimensional scatter
plot visualization for continuous attributes.
? The Distributions widget displays the value distribution of
discrete or continuous attributes.
Cont…
Example: heart_disease
Box plot for attribute 'age' grouped by 'gender'
Max - HR decreases with age.
Distribution of 'chest pain' with columns split by 'gender'
Cont…
Example: heart_disease
? Data can also be split by the value of features and analyse it separately.
? Split data by gender – use select Rows widget
Choose the female patients in Select Rows widget
Selection of data instances works well with visualisation of data
distribution and explore the data.
Visualizations in Orange
? Reports
? Reports allow to trace back analytical steps as it saves the workflow at which each
report segment was created.
? Reports can be saved in .html, .pdf or .report format.
Analyzing data with regression models and decision trees
Analyzing data with decision
trees
? Decision tree is is one of the oldest, but still popular, machine learning
methods.
? Decision trees workflow example
? Decision trees in Orange does not use any data pre-processing.
Analyzing data with decision
trees
Tree viewer
? This widget cab be used for or
visualizing decision trees.
? To enable explorative data
analysis, Select a node, which
instruct the widget to output the
data associated with that node.
? If both the viewer and Tree are
open, any re-run of the tree
induction algorithm will
immediately affect the
visualization.
Explore how the parameters of the decision
tree algorithm influence the structure of the
resulting tree.
Analyzing data with decision
trees
Tree parameters:
? Induce binary tree: build a binary tree (split into two
child nodes).
? Min. number of instances in leaves: if checked, the
algorithm will never construct a split which would put
less than the specified number of training examples
into any of the branches.
? Do not split subsets smaller than: forbids the
algorithm to split the nodes with less than the given
number of instances.
? Limit the maximal tree depth: limits the depth of the
classification tree to the specified number of node
levels.
? Stop when majority reaches [%]: stop splitting the
nodes after a specified majority threshold is reached.
Explore how the parameters of the
decision tree algorithm influence the
structure of the resulting tree.
Analyzing data with decision
trees
Example: Using sailing data, predict the conditions under which a
friend skipper went sailing
Load the data ? Build a tree ? visualize it in the Tree Viewer
Analyzing data with decision trees
Example: Using sailing data, predict the conditions under which a
friend skipper went sailing
? Trees place the most useful feature at the root.
? The most useful feature is the feature that splits the
data into two purest possible subsets.
? These are then split further, again by the most
informative features.
? This process of breaking up the data subsets to
smaller ones repeats until we reach subsets where all
data belongs to the same class.
? These subsets are represented by leaf nodes in
strong blue or red.
? The process of data splitting can also terminate when
it runs out of data instances or out of useful features
(the two leaf nodes in white).
Analyzing data with decision
trees
Example: Using sailing data, predict the conditions under which a
friend skipper went sailing
According to the decision tree results:
? It looks like this skipper is a social
person; as soon as there’s company,
the probability of her sailing increases.
? When joined by a smaller group of
individuals, there is no sailing if there is
rain (Thunderstorms? Too dangerous?)
? When she has a smaller company, but
the boat at her disposal is big, there is
no sailing either.
Analyzing data with decision
trees
Example: Using sailing data, predict the conditions under which a
friend skipper went sailing
? What are the most the most useful” feature?
? Rank widget - estimates the quality of data features and ranks them
according to how much information they carry.
Analyzing data with decision
trees
Model inspection example
? To inspect a model, combine Tree and Scatter Plot widgets to display instances
taken from a chosen decision tree node.
Iris dataset: Model inspection example
Analyzing data with decision
trees
? Decision trees works for regression tasks.
Decision tree for housing dataset example
Prediction models
? Predictions widget will shows the data, but makes no predictions.
? To analyse data with prediction model, a predictive model is needed.
? The Predictions widget uses the model to make predictions about the data and shows them in the
table
? E.g.:
1. The data is fed into the model widget to infer a
predictive model.
2. The Predictions widget gets the data from the
File widget and also a predictive model from
the model widget.
Model widget is channel that carries a model
Analyzing data with regression
models
Two regression models are available:
? Liner Regression works with continuous data.
? Logistic Regression only works for classification tasks (It learns a Logistic Regression
model from the data).
Example: Demonstrate prediction results with logistic regression on hayes-roth
dataset.
Traning:
1. First load hayes-roth dataset to File widget.
2. Pass the data to Logistic Regression model for training.
3. Pass the trained model to Prediction widget.
Testing: predict class value on a new dataset
1. Load hayes-roth_test in the second File widget
2. Connect it to Predictions.
3. Observe class values predicted with Logistic Regression directly in Predictions.
Analyzing data with regression
models
Example: Predict Iris flower type using Logistic Regression.
Analyzing data with regression
models
? Linear Regression widget constructs a
learner/predictor that learns a linear
function from its input data.
? The model can identify the relationship
between a predictor x and the response
variable y.
? Linear regression works only on regression
tasks.
Analyzing data with regression
models
? Example: Train a Linear Regression on housing dataset and evaluated its
performance in Test & Score.
Analyzing data with deep learning models
Analyzing data with deep learning
models
? Neural Network widget is a multi-layer perceptron (MLP) algorithm with
backpropagation.
Inputs
? Data: input dataset
? Preprocessor: preprocessing method(s)
Outputs
? Learner: multi-layer perceptron learning algorithm
? Model: trained model
neural network with
3 layers can be
defined as 2, 3, 2
Analyzing data with deep learning
models
? Neural Network uses default preprocessing when no other preprocessors
are given. It executes them in the following order:
1. Removes instances with unknown target values
2. Continuizes categorical variables (with one-hot-encoding)
3. Removes empty columns
4. Imputes missing values with mean values
5. Normalizes the data by centering to mean and scaling to standard deviation of 1
? To remove default preprocessing, connect an empty Preprocess widget to
the learner.
Analyzing data with deep learning
models
Example: Neural Network Workflow for classification task on the iris data.
Analyzing data with deep learning
models
Example: Neural Network Workflow for a prediction task on the iris data.
1. Input the Neural Network prediction model into Predication.
2. Observe the predicted values.
Analyzing data deep learning
models
Example: image analytics workflow on domestic
animal image dataset using Image Analytics add-
on.
1. Import the image data via the Import
Images widget.
2. display all of the loaded images using Image
Viewer widget.
3. For image data analysis Image embeddings
widget must be used as classification and
regressions tasks accept data in the form of
numbers.
The most important parameters for the Image
Embedding interface is the Embedder. Supported Deep
network embeddings: SqueezeNet, Inception v3, VGG-16,
VGG-19, Painters, Deeploc.
Analyzing data deep learning models
Image Embedding widget convert images to a vectors of numbers.
Analyzing data deep learning models
? Image Grid widget display images from
a dataset in a similarity grid such that
images with similar content are placed
closer to each other.
? Image Grid widget can be used for
image comparison, while looking for
similarities or discrepancies between
selected data instances.
Analyzing data deep learning models
Example: Workflow for image analytics
for Classification task on on animal
image dataset.
1. Pass Image Embedding to Test and
Score.
2. Use Neural Network learner with 3
layers with 10 neurons each.
3. Input the learner into Test and Score.
4. Observe the predicted values.
Analyzing data deep learning models
Example: Workflow for image analytics for Classification task on on animal
image dataset.
The model provided 94% accuracy
Misclassification: the model predicted the
image as cat instead of dog.
Analyzing data deep learning
models
Example: Workflow for image
analytics for Classification task on on
animal image dataset.
Use image viewer to investigate the
Misclassification example
Select the misclassified example
Error justification: in the
misclassified image dog
looks like a cat.
Text processing and classification
Textual data analysis
? Orange support textual data analysis through Text add-on.
? Common text widgets:
? Text preprocessing: preprocessing text (e.g., removing stopworks, lowercase, …).
? Corpus viewer: to view corpus content.
? Sentiment Analysis: enables basic sentiment analysis of corpora.
? Topic Modelling: uncover the hidden thematic structure in a corpus.
? Word Cloud: display word frequency.
? Typical textual data analysis workflows:
Textual data analysis
Typical text pre-process workflow example:
This workflow uses simple reprocessing for
creating tokens from documents:
1. it applies lowercase
2. splits text into words
3. it removes frequent stopwords.
Results of preprocessing can be
observe in a Word Cloud
Textual data analysis
Typical Sentiment workflow example:
Yellow represent a high, positive score, while blue
represent a low, negative score.
Textual data analysis
Typical Topic modelling workflow example:
Topic Modelling, for example, colors words by their
weights - positive weights are colored green and
negative red.
Uncover latent topics in the data
Textual data analysis
? Tweets are a valuable source of information, for social scientists, marketing
managers, linguists, economists, and so on.
Twitter Data Analysis workflow example:
Text classification
? Predictive models can be used to classify
documents by authorship, their type,
sentiment and so on.
Text classification workflow example:
? Data: Grimm tales data
? Task: classify documents by their topic of the
tale.
? Predication models: Logistic Regression and
Decision Tree.
Load Grimm tales data
Text classification
Text classification workflow example:
Given to tales of different class Logistic
regression can correctly distinguish between
them in over 90% of the cases. Better than
Decision Tree!
Predicative model quality
Predicative model quality
Example: Explore the performance of different predictive models on iris
dataset.
Logistic Regression outperformer other classifiers
Main Reference
? Orange Visual Programming Documentation (Release 3). Orange (2021).
https://buildmedia.readthedocs.org/media/pdf/orange-visual-programming/latest/orange-visual-
programming.pdf
? AJDA. (2017, August 4). Text Analysis: New Features. Orangedatamining.Com.
https://orangedatamining.com/blog/2017/08/04/text-analysis-new-features/
? Zupan, D. (2018, May). Introduction to Data Mining: Working notes for the hands-on course with
Orange Data Mining. University of Ljubljana. https://file.biolab.si/notes/2018-05-intro-to-
datamining-notes.pdf
? Foong, N. W. (2019, August 7). Data Science Made Easy: Interactive Data Visualization using
Orange [Post]. Medium, Towardsdatascience.com. https://towardsdatascience.com/data-science-
made-easy-interactive-data-visualization-using-orange-de8d5f6b7f2b
? Analytics Vidhya, A. (2017, September 7). Building Machine Learning Model is fun using Orange.
Analytics Vidhya. https://www.analyticsvidhya.com/blog/2017/09/building-machine-learning-model-
fun-using-orange/
? Orange Data Mining. Workflows, accessed 10 August 2021
https://orangedatamining.com/workflows/
? Foong, N. W. (2019b, August 29). Data Science Made Easy: Image Analytics using Orange.
Medium, Towardsdatascience.com. https://towardsdatascience.com/data-science-made-easy-
image-analytics-using-orange-ad4af375ca7a
This Presentation is mainly dependent on the above recourses.
Week self-review exercises
? Download orange and understand its workflow:
? https://orangedatamining.com/getting-started/
? https://orangedatamining.com/workflows/
? Hands-on practice on using orange for data analysis, visualisation, developing predictive models
and textual data analysis: https://orangedatamining.com/blog/
Thank You

More Related Content

Similar to IT445_Week_10_Part2.pdf (20)

Graduation Thesis
Graduation ThesisGraduation Thesis
Graduation Thesis
Yifan Zhai
?
IT445_Week_12.pdf
IT445_Week_12.pdfIT445_Week_12.pdf
IT445_Week_12.pdf
AiondBdkpt
?
IT445_Week_6_Part1.pdf
IT445_Week_6_Part1.pdfIT445_Week_6_Part1.pdf
IT445_Week_6_Part1.pdf
AiondBdkpt
?
同济优秀课程设计 - 软件测试报告
同济优秀课程设计 - 软件测试报告同济优秀课程设计 - 软件测试报告
同济优秀课程设计 - 软件测试报告
Kerry Zhu
?
IT445_Week_5.pdf
IT445_Week_5.pdfIT445_Week_5.pdf
IT445_Week_5.pdf
AiondBdkpt
?
Foundation of software development 1
Foundation of software development 1Foundation of software development 1
Foundation of software development 1
netdbncku
?
软件工程 第二章
软件工程 第二章软件工程 第二章
软件工程 第二章
浒 刘
?
Se2009 ch8
Se2009 ch8 Se2009 ch8
Se2009 ch8
浒 刘
?
软件工程 第八章
软件工程 第八章软件工程 第八章
软件工程 第八章
浒 刘
?
Jira live demo 2017
Jira live demo 2017Jira live demo 2017
Jira live demo 2017
Linktech
?
非监督是学习冲碍尘别补苍蝉冲辫谤辞肠别蝉蝉冲惫颈蝉耻补濒颈锄补迟颈辞苍20241110.辫诲蹿
非监督是学习冲碍尘别补苍蝉冲辫谤辞肠别蝉蝉冲惫颈蝉耻补濒颈锄补迟颈辞苍20241110.辫诲蹿非监督是学习冲碍尘别补苍蝉冲辫谤辞肠别蝉蝉冲惫颈蝉耻补濒颈锄补迟颈辞苍20241110.辫诲蹿
非监督是学习冲碍尘别补苍蝉冲辫谤辞肠别蝉蝉冲惫颈蝉耻补濒颈锄补迟颈辞苍20241110.辫诲蹿
FEG
?
软件工程 第一章
软件工程 第一章软件工程 第一章
软件工程 第一章
浒 刘
?
「沙中捞金术」﹣谈开放原始码的推荐系统
「沙中捞金术」﹣谈开放原始码的推荐系统 「沙中捞金术」﹣谈开放原始码的推荐系统
「沙中捞金术」﹣谈开放原始码的推荐系统
建興 王
?
Refactoring with Patterns in PHP
Refactoring with Patterns in PHPRefactoring with Patterns in PHP
Refactoring with Patterns in PHP
Jace Ju
?
Deep learning and the introducation of Pytorch
Deep learning and the introducation of PytorchDeep learning and the introducation of Pytorch
Deep learning and the introducation of Pytorch
ssuser0b3e24
?
AI for Everyone (Chinese)
AI for Everyone (Chinese)AI for Everyone (Chinese)
AI for Everyone (Chinese)
Xiao-Wei CAO
?
Introduction to software quality assurance and its implementation
Introduction to software quality assurance and its implementationIntroduction to software quality assurance and its implementation
Introduction to software quality assurance and its implementation
Yung-Chun Chang
?
项目描述(2).辫诲蹿
项目描述(2).辫诲蹿项目描述(2).辫诲蹿
项目描述(2).辫诲蹿
1914848496c
?
09 文档管理实施步骤与案例
09 文档管理实施步骤与案例09 文档管理实施步骤与案例
09 文档管理实施步骤与案例
Yeong-Long Chen
?
Graduation Thesis
Graduation ThesisGraduation Thesis
Graduation Thesis
Yifan Zhai
?
IT445_Week_12.pdf
IT445_Week_12.pdfIT445_Week_12.pdf
IT445_Week_12.pdf
AiondBdkpt
?
IT445_Week_6_Part1.pdf
IT445_Week_6_Part1.pdfIT445_Week_6_Part1.pdf
IT445_Week_6_Part1.pdf
AiondBdkpt
?
同济优秀课程设计 - 软件测试报告
同济优秀课程设计 - 软件测试报告同济优秀课程设计 - 软件测试报告
同济优秀课程设计 - 软件测试报告
Kerry Zhu
?
Foundation of software development 1
Foundation of software development 1Foundation of software development 1
Foundation of software development 1
netdbncku
?
软件工程 第二章
软件工程 第二章软件工程 第二章
软件工程 第二章
浒 刘
?
Se2009 ch8
Se2009 ch8 Se2009 ch8
Se2009 ch8
浒 刘
?
软件工程 第八章
软件工程 第八章软件工程 第八章
软件工程 第八章
浒 刘
?
Jira live demo 2017
Jira live demo 2017Jira live demo 2017
Jira live demo 2017
Linktech
?
非监督是学习冲碍尘别补苍蝉冲辫谤辞肠别蝉蝉冲惫颈蝉耻补濒颈锄补迟颈辞苍20241110.辫诲蹿
非监督是学习冲碍尘别补苍蝉冲辫谤辞肠别蝉蝉冲惫颈蝉耻补濒颈锄补迟颈辞苍20241110.辫诲蹿非监督是学习冲碍尘别补苍蝉冲辫谤辞肠别蝉蝉冲惫颈蝉耻补濒颈锄补迟颈辞苍20241110.辫诲蹿
非监督是学习冲碍尘别补苍蝉冲辫谤辞肠别蝉蝉冲惫颈蝉耻补濒颈锄补迟颈辞苍20241110.辫诲蹿
FEG
?
软件工程 第一章
软件工程 第一章软件工程 第一章
软件工程 第一章
浒 刘
?
「沙中捞金术」﹣谈开放原始码的推荐系统
「沙中捞金术」﹣谈开放原始码的推荐系统 「沙中捞金术」﹣谈开放原始码的推荐系统
「沙中捞金术」﹣谈开放原始码的推荐系统
建興 王
?
Refactoring with Patterns in PHP
Refactoring with Patterns in PHPRefactoring with Patterns in PHP
Refactoring with Patterns in PHP
Jace Ju
?
Deep learning and the introducation of Pytorch
Deep learning and the introducation of PytorchDeep learning and the introducation of Pytorch
Deep learning and the introducation of Pytorch
ssuser0b3e24
?
AI for Everyone (Chinese)
AI for Everyone (Chinese)AI for Everyone (Chinese)
AI for Everyone (Chinese)
Xiao-Wei CAO
?
Introduction to software quality assurance and its implementation
Introduction to software quality assurance and its implementationIntroduction to software quality assurance and its implementation
Introduction to software quality assurance and its implementation
Yung-Chun Chang
?
项目描述(2).辫诲蹿
项目描述(2).辫诲蹿项目描述(2).辫诲蹿
项目描述(2).辫诲蹿
1914848496c
?
09 文档管理实施步骤与案例
09 文档管理实施步骤与案例09 文档管理实施步骤与案例
09 文档管理实施步骤与案例
Yeong-Long Chen
?

IT445_Week_10_Part2.pdf

  • 1. IT445 Decision Support Systems College of Computing and Informatics
  • 2. Week 10 Data analytics: Getting started with Orange
  • 3. Contents o Orange Workflow Overview. o Visualizing data from input data file. o Analyzing data with regression models and decision trees. o Analyzing data with deep learning models. o Text processing and classification.
  • 4. Weekly Learning Outcomes 1. Understand Data workflow in orange. 2. Inspect data with orange visualisation. 3. Develop predictive models in orange, including regression, decision trees and deep learning models. 4. Perform Textual data analysis with orange. 5. Assess quality of various predication methods.
  • 5. Required Reading ? Orange Visual Programming Documentation (Release 3). Orange (2021). https://buildmedia.readthedocs.org/media/pdf/orange-visual-programming/latest/orange-visual- programming.pdf ? Chapter 1 – section 1.2, 1.3 (subsection: 1-2), 1.4, 1.5, Chapter 2 – section 2.1 (subsection: 1-10, 34) 2.2 (subsection: 1,3,4,5,16) 2.3 (subsection: 5,9,10,13) and section 2.4, (subsection: 2,4,6) ? AJDA. (2017, August 4). Text Analysis: New Features. Orangedatamining.Com. https://orangedatamining.com/blog/2017/08/04/text-analysis-new-features/ Recommended Readings ? Zupan, D. (2018, May). Introduction to Data Mining: Working notes for the hands-on course with Orange Data Mining. University of Ljubljana. https://file.biolab.si/notes/2018-05-intro-to-datamining-notes.pdf ? Lesson: 1,2,3,4,5,6,7,8,10,14,17,31, 32. ? Foong, N. W. (2019, August 7). Data Science Made Easy: Interactive Data Visualization using Orange. Medium, Towardsdatascience.com. https://towardsdatascience.com/data-science-made-easy-interactive-data-visualization-using-orange- de8d5f6b7f2b ? Analytics Vidhya, A. (2017, September 7). Building Machine Learning Model is fun using Orange. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2017/09/building-machine-learning-model-fun-using-orange/ ? Foong, N. W. (2019b, August 29). Data Science Made Easy: Image Analytics using Orange. Medium, Towardsdatascience.com. https://towardsdatascience.com/data-science-made-easy-image-analytics-using-orange-ad4af375ca7a This Presentation is mainly dependent on the above recourses.
  • 6. Recommended Videos ? Getting Started with Orange 01: Welcome to Orange (2015, December 21). [Video]. YouTube. https://www.youtube.com/channel/UClKKWBe2SCAEyv7ZNGhIe4g ? Orange Data Mining tool. (2016, May 5). [Video]. YouTube. https://www.youtube.com/watch?v=rrsRBSCHDXw ? Getting Started with Orange 16: Text Preprocessing (2017, Jun 20). [Video]. YouTube. youtube.com/watch?v=V70UwJZWkZ8 ? Text Mining: Twitter Data Analysis (2020, August 4). [Video]. YouTube. https://www.youtube.com/watch?v=HDkI6G4slzQ ? Getting Started with Orange 18: Text Classification (2017, Jun 28). [Video]. YouTube. https://www.youtube.com/watch?v=zO_zwKZCULo
  • 8. Orange workflow “Orange is a component-based data mining software. It includes a range of data visualization, exploration, preprocessing and modeling techniques. It can be used through a nice and intuitive user interface or, for more advanced users, as a module for the Python programming language.”(Orange official GitHub page).
  • 9. Orange workflow ? The core principle of Orange is visual programming. ? The basic processing unit of any data manipulation in Orange are called widgets. ? Each analytical step/action is contained within a widget. ? Widgets communicate by sending information along with a communication channel and the output from one widget is used as input to another. ? A workflow is the sequence of steps/actions that is performed to accomplish a particular task. ? Widgets are are placed on the canvas and connected into an analytical workflow. ? Orange analytical workflow is executed from left to right and never passes data backwards. ? Orange workflows consist of components that ? Read ? Process ? Visualize data
  • 10. Simple Orange workflow - Files and Data Tables ? File widget: reads the data. ? Data Table widget: a viewer and shows the data in a spreadsheet. It passes onwards only the selection. ? The data is always available in the File widget. Workflow with two connected widgets
  • 11. Simple Orange workflow - Files and Data Tables ? Most Orange workflows would probably start with the File widget. ? Orange can import any comma, .xlsx or tab-delimited data file or URL. Example: File widget is used to read the data that is sent to both the Data Table and the Box Plot widgets.
  • 12. Workflows with subsets ? Visualizations in Orange are interactive, which means the user can select data instances from the plot and pass them downstream. Example: ? Selecting subsets ? Step 1: Place File widget on the canvas. ? Step 2: Connect Scatter Plot to it. ? Step 3: Click and drag a rectangle around a subset of points. ? Step 4: Connect Data Table to Scatter Plot ? Data Table will show selected points. ? Highlighting workflows ? Connect Data Table to Scatter Plot. ? Select a subset of points from the Data Table ? Scatter Plot will highlight selected points.
  • 13. Workflows - data exploration ? Feature Statistics widget provides a quick way to inspect and find interesting features in a given data set. ? Example: Heart-disease data exploration ? Select a subset of potentially interesting features, or simply select the features we want to keep. ? The widget will outputs a new data set with only these selected feature.
  • 14. Workflows with Models ? Predictive models are evaluated in Test and Score widget. ? Test and Score accepts several inputs: 1. Data (data set for evaluating models). 2. Learners (algorithms to use for training the model). 3. Optional preprocessor (for normalization or feature selection).
  • 15. Workflows with Models The widget does two things: 1. It shows evaluation results (results of testing different classification/regression algorithms). 2. It outputs evaluation results, which can be used by other widgets for analysing the performance of classifiers, such as confusion matrix. Sampling setting (e.g., performs cross-validation or some other train-and-test procedures).
  • 16. Workflows with Models - Evaluation ? Confusion matrix widget: show proportions between the predicted and actual class. Inputs: ? Evaluation results: results of testing classification algorithms. Outputs: ? Selected Data: data subset selected from confusion matrix. ? Data: data with the additional information on whether a data instance was selected. The test results are fed into the Confusion Matrix, where we can observe how many instances were misclassified and in which way.
  • 17. Workflows with Models ? Predictions on new data are done in Predictions: ? The training data is first passed to the model. ? Once the model is trained, it is passed to Predictions. ? The Predictions widget also needs data to predict on (Test data), which are passed as a second input.
  • 18. Workflows with Models ? Predictive models can be saved and reused in different Orange. Workflows. ? To save a model: 1. Models first require data for training. 2. They output a trained model, which can be saved with Save Model widget in the pickle format. ? Trained model can be loaded and used in Predictions and/or elsewhere.
  • 19. Visualizing data from input data file
  • 20. Visualizations in Orange ? Visualizations are an essential part of data science. ? Visualizations in Orange are interactive.
  • 21. Visualizations in Orange Interactive visualization workflow example
  • 22. Visualizations in Orange Visualization workflow of data subsets example
  • 23. Visualizations in Orange ? Exercise: Build a simple workflow with File and Scatter Plot for Iris dataset. The scatter plot is showing x-axis (petal width) and length for three species of Iris flowers (y-axis). The relation between them increase linearly.
  • 24. Visualizations in Orange ? Scatter Plot supports zooming-in and out of part of the plot and a manual selection of data instances. ? Example: Explorative data analysis using Iris dataset. 1. Selected data instances from a rectangular region on Scatter plot. 2. Sent them to the Data Table widget. 3. Explore the relationship between any two variables.
  • 25. Visualizations in Orange ? Basic Data Exploration from input data file. Example: heart_disease How does the data look?
  • 26. Cont… Example: heart_disease Explore the data with standard visualizations tell us anything interesting! ? The Box Plot widget is most commonly used immediately after the File widget to observe the statistical properties of a dataset and discover any anomalies, such as duplicated values, outliers, …) ? The Scatter Plot widget provides a 2-dimensional scatter plot visualization for continuous attributes. ? The Distributions widget displays the value distribution of discrete or continuous attributes.
  • 27. Cont… Example: heart_disease Box plot for attribute 'age' grouped by 'gender' Max - HR decreases with age. Distribution of 'chest pain' with columns split by 'gender'
  • 28. Cont… Example: heart_disease ? Data can also be split by the value of features and analyse it separately. ? Split data by gender – use select Rows widget Choose the female patients in Select Rows widget Selection of data instances works well with visualisation of data distribution and explore the data.
  • 29. Visualizations in Orange ? Reports ? Reports allow to trace back analytical steps as it saves the workflow at which each report segment was created. ? Reports can be saved in .html, .pdf or .report format.
  • 30. Analyzing data with regression models and decision trees
  • 31. Analyzing data with decision trees ? Decision tree is is one of the oldest, but still popular, machine learning methods. ? Decision trees workflow example ? Decision trees in Orange does not use any data pre-processing.
  • 32. Analyzing data with decision trees Tree viewer ? This widget cab be used for or visualizing decision trees. ? To enable explorative data analysis, Select a node, which instruct the widget to output the data associated with that node. ? If both the viewer and Tree are open, any re-run of the tree induction algorithm will immediately affect the visualization. Explore how the parameters of the decision tree algorithm influence the structure of the resulting tree.
  • 33. Analyzing data with decision trees Tree parameters: ? Induce binary tree: build a binary tree (split into two child nodes). ? Min. number of instances in leaves: if checked, the algorithm will never construct a split which would put less than the specified number of training examples into any of the branches. ? Do not split subsets smaller than: forbids the algorithm to split the nodes with less than the given number of instances. ? Limit the maximal tree depth: limits the depth of the classification tree to the specified number of node levels. ? Stop when majority reaches [%]: stop splitting the nodes after a specified majority threshold is reached. Explore how the parameters of the decision tree algorithm influence the structure of the resulting tree.
  • 34. Analyzing data with decision trees Example: Using sailing data, predict the conditions under which a friend skipper went sailing Load the data ? Build a tree ? visualize it in the Tree Viewer
  • 35. Analyzing data with decision trees Example: Using sailing data, predict the conditions under which a friend skipper went sailing ? Trees place the most useful feature at the root. ? The most useful feature is the feature that splits the data into two purest possible subsets. ? These are then split further, again by the most informative features. ? This process of breaking up the data subsets to smaller ones repeats until we reach subsets where all data belongs to the same class. ? These subsets are represented by leaf nodes in strong blue or red. ? The process of data splitting can also terminate when it runs out of data instances or out of useful features (the two leaf nodes in white).
  • 36. Analyzing data with decision trees Example: Using sailing data, predict the conditions under which a friend skipper went sailing According to the decision tree results: ? It looks like this skipper is a social person; as soon as there’s company, the probability of her sailing increases. ? When joined by a smaller group of individuals, there is no sailing if there is rain (Thunderstorms? Too dangerous?) ? When she has a smaller company, but the boat at her disposal is big, there is no sailing either.
  • 37. Analyzing data with decision trees Example: Using sailing data, predict the conditions under which a friend skipper went sailing ? What are the most the most useful” feature? ? Rank widget - estimates the quality of data features and ranks them according to how much information they carry.
  • 38. Analyzing data with decision trees Model inspection example ? To inspect a model, combine Tree and Scatter Plot widgets to display instances taken from a chosen decision tree node. Iris dataset: Model inspection example
  • 39. Analyzing data with decision trees ? Decision trees works for regression tasks. Decision tree for housing dataset example
  • 40. Prediction models ? Predictions widget will shows the data, but makes no predictions. ? To analyse data with prediction model, a predictive model is needed. ? The Predictions widget uses the model to make predictions about the data and shows them in the table ? E.g.: 1. The data is fed into the model widget to infer a predictive model. 2. The Predictions widget gets the data from the File widget and also a predictive model from the model widget. Model widget is channel that carries a model
  • 41. Analyzing data with regression models Two regression models are available: ? Liner Regression works with continuous data. ? Logistic Regression only works for classification tasks (It learns a Logistic Regression model from the data). Example: Demonstrate prediction results with logistic regression on hayes-roth dataset. Traning: 1. First load hayes-roth dataset to File widget. 2. Pass the data to Logistic Regression model for training. 3. Pass the trained model to Prediction widget. Testing: predict class value on a new dataset 1. Load hayes-roth_test in the second File widget 2. Connect it to Predictions. 3. Observe class values predicted with Logistic Regression directly in Predictions.
  • 42. Analyzing data with regression models Example: Predict Iris flower type using Logistic Regression.
  • 43. Analyzing data with regression models ? Linear Regression widget constructs a learner/predictor that learns a linear function from its input data. ? The model can identify the relationship between a predictor x and the response variable y. ? Linear regression works only on regression tasks.
  • 44. Analyzing data with regression models ? Example: Train a Linear Regression on housing dataset and evaluated its performance in Test & Score.
  • 45. Analyzing data with deep learning models
  • 46. Analyzing data with deep learning models ? Neural Network widget is a multi-layer perceptron (MLP) algorithm with backpropagation. Inputs ? Data: input dataset ? Preprocessor: preprocessing method(s) Outputs ? Learner: multi-layer perceptron learning algorithm ? Model: trained model neural network with 3 layers can be defined as 2, 3, 2
  • 47. Analyzing data with deep learning models ? Neural Network uses default preprocessing when no other preprocessors are given. It executes them in the following order: 1. Removes instances with unknown target values 2. Continuizes categorical variables (with one-hot-encoding) 3. Removes empty columns 4. Imputes missing values with mean values 5. Normalizes the data by centering to mean and scaling to standard deviation of 1 ? To remove default preprocessing, connect an empty Preprocess widget to the learner.
  • 48. Analyzing data with deep learning models Example: Neural Network Workflow for classification task on the iris data.
  • 49. Analyzing data with deep learning models Example: Neural Network Workflow for a prediction task on the iris data. 1. Input the Neural Network prediction model into Predication. 2. Observe the predicted values.
  • 50. Analyzing data deep learning models Example: image analytics workflow on domestic animal image dataset using Image Analytics add- on. 1. Import the image data via the Import Images widget. 2. display all of the loaded images using Image Viewer widget. 3. For image data analysis Image embeddings widget must be used as classification and regressions tasks accept data in the form of numbers. The most important parameters for the Image Embedding interface is the Embedder. Supported Deep network embeddings: SqueezeNet, Inception v3, VGG-16, VGG-19, Painters, Deeploc.
  • 51. Analyzing data deep learning models Image Embedding widget convert images to a vectors of numbers.
  • 52. Analyzing data deep learning models ? Image Grid widget display images from a dataset in a similarity grid such that images with similar content are placed closer to each other. ? Image Grid widget can be used for image comparison, while looking for similarities or discrepancies between selected data instances.
  • 53. Analyzing data deep learning models Example: Workflow for image analytics for Classification task on on animal image dataset. 1. Pass Image Embedding to Test and Score. 2. Use Neural Network learner with 3 layers with 10 neurons each. 3. Input the learner into Test and Score. 4. Observe the predicted values.
  • 54. Analyzing data deep learning models Example: Workflow for image analytics for Classification task on on animal image dataset. The model provided 94% accuracy Misclassification: the model predicted the image as cat instead of dog.
  • 55. Analyzing data deep learning models Example: Workflow for image analytics for Classification task on on animal image dataset. Use image viewer to investigate the Misclassification example Select the misclassified example Error justification: in the misclassified image dog looks like a cat.
  • 56. Text processing and classification
  • 57. Textual data analysis ? Orange support textual data analysis through Text add-on. ? Common text widgets: ? Text preprocessing: preprocessing text (e.g., removing stopworks, lowercase, …). ? Corpus viewer: to view corpus content. ? Sentiment Analysis: enables basic sentiment analysis of corpora. ? Topic Modelling: uncover the hidden thematic structure in a corpus. ? Word Cloud: display word frequency. ? Typical textual data analysis workflows:
  • 58. Textual data analysis Typical text pre-process workflow example: This workflow uses simple reprocessing for creating tokens from documents: 1. it applies lowercase 2. splits text into words 3. it removes frequent stopwords. Results of preprocessing can be observe in a Word Cloud
  • 59. Textual data analysis Typical Sentiment workflow example: Yellow represent a high, positive score, while blue represent a low, negative score.
  • 60. Textual data analysis Typical Topic modelling workflow example: Topic Modelling, for example, colors words by their weights - positive weights are colored green and negative red. Uncover latent topics in the data
  • 61. Textual data analysis ? Tweets are a valuable source of information, for social scientists, marketing managers, linguists, economists, and so on. Twitter Data Analysis workflow example:
  • 62. Text classification ? Predictive models can be used to classify documents by authorship, their type, sentiment and so on. Text classification workflow example: ? Data: Grimm tales data ? Task: classify documents by their topic of the tale. ? Predication models: Logistic Regression and Decision Tree. Load Grimm tales data
  • 63. Text classification Text classification workflow example: Given to tales of different class Logistic regression can correctly distinguish between them in over 90% of the cases. Better than Decision Tree!
  • 65. Predicative model quality Example: Explore the performance of different predictive models on iris dataset. Logistic Regression outperformer other classifiers
  • 66. Main Reference ? Orange Visual Programming Documentation (Release 3). Orange (2021). https://buildmedia.readthedocs.org/media/pdf/orange-visual-programming/latest/orange-visual- programming.pdf ? AJDA. (2017, August 4). Text Analysis: New Features. Orangedatamining.Com. https://orangedatamining.com/blog/2017/08/04/text-analysis-new-features/ ? Zupan, D. (2018, May). Introduction to Data Mining: Working notes for the hands-on course with Orange Data Mining. University of Ljubljana. https://file.biolab.si/notes/2018-05-intro-to- datamining-notes.pdf ? Foong, N. W. (2019, August 7). Data Science Made Easy: Interactive Data Visualization using Orange [Post]. Medium, Towardsdatascience.com. https://towardsdatascience.com/data-science- made-easy-interactive-data-visualization-using-orange-de8d5f6b7f2b ? Analytics Vidhya, A. (2017, September 7). Building Machine Learning Model is fun using Orange. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2017/09/building-machine-learning-model- fun-using-orange/ ? Orange Data Mining. Workflows, accessed 10 August 2021 https://orangedatamining.com/workflows/ ? Foong, N. W. (2019b, August 29). Data Science Made Easy: Image Analytics using Orange. Medium, Towardsdatascience.com. https://towardsdatascience.com/data-science-made-easy- image-analytics-using-orange-ad4af375ca7a This Presentation is mainly dependent on the above recourses.
  • 67. Week self-review exercises ? Download orange and understand its workflow: ? https://orangedatamining.com/getting-started/ ? https://orangedatamining.com/workflows/ ? Hands-on practice on using orange for data analysis, visualisation, developing predictive models and textual data analysis: https://orangedatamining.com/blog/