The document provides an overview of sequence models used in machine learning, detailing types like RNNs, LSTMs, GRUs, and transformer models. It highlights their applications in processing sequential data, addressing issues like the vanishing gradient problem, and the significance of self-attention mechanisms in transformers. Additionally, it includes tutorial content and homework assignments related to implementing these models in PyTorch.
The document serves as a tutorial on data visualization techniques, emphasizing the importance of storytelling and clarity in presenting data. It covers various chart types, the significance of visual simplicity, and the steps involved in selecting the appropriate visualizations based on data types and audience needs. Additionally, the document highlights practical applications using Python and Orange 3 for data analysis and visualization, including insights from a Titanic dataset.
2023 Supervised Learning for Orange3 from scratchFEG
?
This document provides an overview of supervised learning and decision tree models. It discusses supervised learning techniques for classification and regression. Decision trees are explained as a method that uses conditional statements to classify examples based on their features. The document reviews node splitting criteria like information gain that help determine the most important features. It also discusses evaluating models for overfitting/underfitting and techniques like bagging and boosting in random forests to improve performance. Homework involves building a classification model on a healthcare dataset and reporting the results.
This document provides an overview of unsupervised learning techniques including k-means clustering and association rule mining. It begins with introductions to the speaker and tutorial topics. It then contrasts supervised vs unsupervised learning, describing how k-means is used for clustering without labels and how association rules can discover relationships between items. The document provides examples of applying these techniques in domains like retail, sports, email marketing and healthcare. It also includes visualizations and discusses important concepts for k-means like data transformation and for association rules like support, confidence and lift. Homework questions are asked about preparing data for these algorithms in Orange.
202312 Exploration Data Analysis Visualization (English version)FEG
?
This document provides an overview of exploratory data analysis (EDA) and visualization techniques that can be performed before building a machine learning model. It introduces the Iris dataset as an example and outlines the key steps of EDA, including loading the data, examining correlations, creating scatter plots, and generating distribution and box plots to understand feature statistics. As homework, students are asked to explore another dataset with a numeric target feature called "housing.tab" and explain the visualizations.
202312 Exploration of Data Analysis VisualizationFEG
?
This document provides a tutorial on data visualization and analysis using Orange 3. It discusses different types of charts like pie charts, line charts, histograms, bar charts, scatter plots, box plots, and pivot tables. It demonstrates how to visualize survival rates from the Titanic dataset based on features like sex, passenger class, age, and fare paid. Key findings are that women and higher class passengers had higher survival rates, and survival rates also depended on combinations of these features.
The document discusses transfer learning, particularly using pretrained models like ResNet50 and VGG16 for image classification tasks. It outlines the benefits of transfer learning, including faster model training and effective feature extraction from existing neural networks. It also provides practical resources, exercises, and code links to help practitioners implement transfer learning using Keras.
The document provides an overview of sequence models used in machine learning, detailing types like RNNs, LSTMs, GRUs, and transformer models. It highlights their applications in processing sequential data, addressing issues like the vanishing gradient problem, and the significance of self-attention mechanisms in transformers. Additionally, it includes tutorial content and homework assignments related to implementing these models in PyTorch.
The document serves as a tutorial on data visualization techniques, emphasizing the importance of storytelling and clarity in presenting data. It covers various chart types, the significance of visual simplicity, and the steps involved in selecting the appropriate visualizations based on data types and audience needs. Additionally, the document highlights practical applications using Python and Orange 3 for data analysis and visualization, including insights from a Titanic dataset.
2023 Supervised Learning for Orange3 from scratchFEG
?
This document provides an overview of supervised learning and decision tree models. It discusses supervised learning techniques for classification and regression. Decision trees are explained as a method that uses conditional statements to classify examples based on their features. The document reviews node splitting criteria like information gain that help determine the most important features. It also discusses evaluating models for overfitting/underfitting and techniques like bagging and boosting in random forests to improve performance. Homework involves building a classification model on a healthcare dataset and reporting the results.
This document provides an overview of unsupervised learning techniques including k-means clustering and association rule mining. It begins with introductions to the speaker and tutorial topics. It then contrasts supervised vs unsupervised learning, describing how k-means is used for clustering without labels and how association rules can discover relationships between items. The document provides examples of applying these techniques in domains like retail, sports, email marketing and healthcare. It also includes visualizations and discusses important concepts for k-means like data transformation and for association rules like support, confidence and lift. Homework questions are asked about preparing data for these algorithms in Orange.
202312 Exploration Data Analysis Visualization (English version)FEG
?
This document provides an overview of exploratory data analysis (EDA) and visualization techniques that can be performed before building a machine learning model. It introduces the Iris dataset as an example and outlines the key steps of EDA, including loading the data, examining correlations, creating scatter plots, and generating distribution and box plots to understand feature statistics. As homework, students are asked to explore another dataset with a numeric target feature called "housing.tab" and explain the visualizations.
202312 Exploration of Data Analysis VisualizationFEG
?
This document provides a tutorial on data visualization and analysis using Orange 3. It discusses different types of charts like pie charts, line charts, histograms, bar charts, scatter plots, box plots, and pivot tables. It demonstrates how to visualize survival rates from the Titanic dataset based on features like sex, passenger class, age, and fare paid. Key findings are that women and higher class passengers had higher survival rates, and survival rates also depended on combinations of these features.
The document discusses transfer learning, particularly using pretrained models like ResNet50 and VGG16 for image classification tasks. It outlines the benefits of transfer learning, including faster model training and effective feature extraction from existing neural networks. It also provides practical resources, exercises, and code links to help practitioners implement transfer learning using Keras.
2. About me
? Education
? NCU (MIS)、NCCU (CS)
? Experiences
? Telecom big data Innovation
? Retail Media Network (RMN)
? Customer Data Platform (CDP)
? Know-your-customer (KYC)
? Digital Transformation
? Research
? Data Ops (ML Ops)
? Business Data Analysis, AI
2
30. 常見的決策樹問題
? 一般超參數
? Minimum samples for a node split (資料數目不得小於多少才能再產生
新的節點?)
? Minimum samples for a terminal node (leaf) (要成為葉節點,最少需
要多少資料?)
? Maximum depth of tree (vertical depth) (限制樹的高度最多幾層?)
? Maximum number of terminal nodes (限制最終葉節點的數量?)
? Maximum features to consider for split (節點分裂時,最多考慮幾種
特徵值?)
30
42. 42
1. 一共有 3 個 classes,分別是: Setosa、 Virginica、Versicolor (三個顏色)
2. 節點由 node #0 開始至 node#8,生成方式以先深後廣 (DFS)
3. 特徵重要性排序為 petal.length,接下來為 petal.width、sepal.width
4. 請注意每一個node內容的 samples數量
5. 某種程度來說,分裂到 node #5 即可
產生 SQL 語法
共五個葉子節點,表示有五段 SQL Case When
Select case
when petal.length <=2.35 then 0
when petal.length >2.35 and petal.length <=5.05 and petal.width <=1.75 then 1
when petal.length >2.35 and petal.length <=5.05 and petal.width >1.75 and sepal.width <=3.1 then 2
when petal.length >2.35 and petal.length <=5.05 and petal.width >1.75 and sepal.width >3.1 then 1
when petal.length >2.35 and petal.length > 5.05 then 2
以 petal.length為例,數據分箱的效果如下:
1.0 2.35 5.05 6.9
1 2 3 用途?