Two sentences are tokenized and encoded by a BERT model. The first sentence describes two kids playing with a green crocodile float in a swimming pool. The second sentence describes two kids pushing an inflatable crocodile around in a pool. The tokenized sentences are passed through the BERT model, which outputs the encoded representations of the token sequences.
1) Canonical correlation analysis (CCA) is a statistical method that analyzes the correlation relationship between two sets of multidimensional variables.
2) CCA finds linear transformations of the two sets of variables so that their correlation is maximized. This can be formulated as a generalized eigenvalue problem.
3) The number of dimensions of the transformed variables is determined using Bartlett's test, which tests the eigenvalues against a chi-squared distribution.
Two sentences are tokenized and encoded by a BERT model. The first sentence describes two kids playing with a green crocodile float in a swimming pool. The second sentence describes two kids pushing an inflatable crocodile around in a pool. The tokenized sentences are passed through the BERT model, which outputs the encoded representations of the token sequences.
1) Canonical correlation analysis (CCA) is a statistical method that analyzes the correlation relationship between two sets of multidimensional variables.
2) CCA finds linear transformations of the two sets of variables so that their correlation is maximized. This can be formulated as a generalized eigenvalue problem.
3) The number of dimensions of the transformed variables is determined using Bartlett's test, which tests the eigenvalues against a chi-squared distribution.
This document discusses designing lessons for the "Information I and II" courses based on Peirce's theory of inquiry stages. It first provides background on the goals and structure of the Information courses according to the Ministry of Education's curriculum guidelines. It then reviews previous work on structuring education and on problem-solving and Peirce's theory of abduction, deduction, and induction. Finally, it proposes mapping the problem-solving methods covered in Information (information design, programming, data utilization) to Peirce's three stages of inquiry.
See https://github.com/saireya/thesis/tree/master/2021jaeis-peirce for details.
共通教科「情報」では、各科目で導入の単元が設けられ、他の単元で個別の問題解決方法を学ぶ。だが、各単元の内容面での関連性が明確でないため、多様な問題解決の方法のうち、これらを扱う合理的な説明に乏しい。本稿では、Peirceの推論分類と探究段階に関する理論に基づいて各単元の特徴を整理し、共通教科「情報」の授業を体系的に展開する指針を提案する。
3. Gensim
Gensim
トピックモデル (pLSA, LDA) や deep learning(word2vec) を簡単に使えるラ
イブラリ [2][3]
公式サイトの tutorial は若干分かりにくいです
使い方は [4] や [1] に詳しい
Figure: Mentioned by the author:)
3 / 11
4. Gensim
Gensim
公式サイトの tutorial は若干分かりにくいです
使い方は [4] や [1] に詳しい
..........
..........
..........
..........
..........
..........
System
and human
system
documents
..........
..........
..........
..........
..........
..........
['system',
'and'
'human']
texts
形态素解析
{'and': 19,
'minors': 37, ...}
dic = corpora.Dictonary()
..........
..........
..........
..........
..........
..........
[(10, 2),
(19, 1),
(3, 1), ...]
corpus
dic.doc2bow()
辞書とtf値を対応付け
dic.save()
dict.dic
MmCorpus
.serialize()
corpus.mm
tf?idf
LSALSA
LDA
HDP
RP
log
entropy
word
2vec
models
model
.save()
lda.model
dic.load() MmCorpus()
model
.load()
similarities
文書の類似性判定
lda.model
topic
extraction
model
.show_topics()
文書のトピック抽出
Figure: Gensim を使った処理の一例
4 / 11
5. Gensim
Step0. documents
元の文書をリスト型で準備
1 # 元の文書
2 documents = [
3 ”Human machine interface for lab abc computer applications”,
4 ”A survey of user opinion of computer system response time”,
5 ”The EPS user interface management system”,
6 ”System and human system engineering testing of EPS”,
7 ”Relation of user perceived response time to error measurement”,
8 ”The generation of random binary unordered trees”,
9 ”The intersection graph of paths in trees”,
10 ”Graph minors IV Widths of trees and well quasi ordering”,
11 ”Graph minors A survey”]
5 / 11
6. Gensim
Step1. 形态素解析
1 def parse(doc):
2 # 日本語なら形态素解析
3 # stopwordを除去する
4 stoplist = set(’for a of the and to in’.split())
5 text = [word for word in doc.lower().split() if word not in stoplist]
6 return text
7
8 texts = [[w for w in parse(doc)] for doc in documents]
9 print texts
10 ’’’ [
11 [’human’, ’machine’, ’interface’, ...],
12 [’a’, ’survey’, ’of’, ’user’, ...],
13 ...] ’’’
6 / 11