The document summarizes the results of using naive Bayes and complementary naive Bayes classifiers on Japanese text data. The naive Bayes classifier correctly classified around 94% of instances while complementary naive Bayes correctly classified around 72% of instances. Confusion matrices are provided to show the classification breakdown between different categories for each model.
This document discusses Hadoop, HBase, Mahout, naive Bayes classification, and analyzing web content. It provides an example of using Mahout to train a naive Bayes classifier on web content stored in Hadoop and HBase. Evaluation results are presented, showing over 90% accuracy in classifying different types of web content. The effects of parameters like alpha values, n-grams, and feature selection are also explored.
Apache Mahout - Random Forests - #TokyoWebmining #8 Koichi Hamada
?
The document discusses social media, social graphs, personality modeling, data mining, machine learning, and random forests. It references social media, how individuals connect through social graphs, modeling personality objectively, extracting patterns from data through data mining and machine learning techniques, and the random forests algorithm developed by Leo Breiman in 2001.
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Salah Amean
?
Chapter 7 of 'Data Mining: Concepts and Techniques' discusses advanced frequent pattern mining, covering topics such as multi-level and multi-dimensional pattern mining, mining of quantitative associations, and constraint-based mining. It emphasizes the importance of mining strategies like rare and negative patterns and the necessity of user-directed mining through constraints. Additionally, it addresses efficient data mining techniques and challenges in discovering interesting patterns from vast datasets.
This document discusses Hadoop, HBase, Mahout, naive Bayes classification, and analyzing web content. It provides an example of using Mahout to train a naive Bayes classifier on web content stored in Hadoop and HBase. Evaluation results are presented, showing over 90% accuracy in classifying different types of web content. The effects of parameters like alpha values, n-grams, and feature selection are also explored.
Apache Mahout - Random Forests - #TokyoWebmining #8 Koichi Hamada
?
The document discusses social media, social graphs, personality modeling, data mining, machine learning, and random forests. It references social media, how individuals connect through social graphs, modeling personality objectively, extracting patterns from data through data mining and machine learning techniques, and the random forests algorithm developed by Leo Breiman in 2001.
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Salah Amean
?
Chapter 7 of 'Data Mining: Concepts and Techniques' discusses advanced frequent pattern mining, covering topics such as multi-level and multi-dimensional pattern mining, mining of quantitative associations, and constraint-based mining. It emphasizes the importance of mining strategies like rare and negative patterns and the necessity of user-directed mining through constraints. Additionally, it addresses efficient data mining techniques and challenges in discovering interesting patterns from vast datasets.
24. 処理速度
●
Junjie Hou, Chunping Li, "A Pattern Growth Method Based on Memory Indexing for Frequent Patterns Mining," cimca, vol. 1,
pp.663-668, International Conference on Computational Intelligence for Modelling, Control and Automation and International
Conference on Intelligent Agents, Web Technologies and Internet Commerce Vol-1 (CIMCA-IAWTIC'05), 2005
0
10
20
30
40
50
60
70
80
90
100
0 0.5 1 1.5 2 2.5 3
Support threshold(%)
Run time(sec.)
D1 FP-grow th runtime
D1 Apriori runtime
sec