狠狠撸

狠狠撸Share a Scribd company logo
MM text team
蔡捷恩
莊文立
溫鈺瑋
2015@Delta Research Center
Fully automatic F/T matrix
analysis from patent data
蔡捷恩
Function/Technology MatrixUsing keyword “ ”
“The Patent-Classification Technology/Function Matrix - A Systematic Method for Design Around”, Cheng et al. Mar-2013, CSIR
Problem reduce
? detecting problem/solution pairs in a patent
document
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Problem term detection
? Step1. finding key frames
? Step2. feature extraction
– Unsupervised feature
– Supervised feature
? Step3. classifier training
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Step1. key frames detection
? We define key frames to be “
”
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Step2 – unsupervised feature
(language model)
? The model:
Maximize likelihood evaluation(MLE)
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Step2 – supervised feature
(linguistic model)
? By part-of-speech(POS) statistic on labeled
patents
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Step2 – supervised feature
(linguistic model)
? The model:
Delta function = 1 only when the current key frame
matches the given pattern
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Step3. classifier training
? Simply concatenate the features mention
above => LIBSVM
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Solution term detection
? Step1. key frame detection
? Step2. feature extraction
– Unsupervised feature
– Supervised feature: based on problem terms
? Step3. classifier training
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Problems
? Lacked of labeled data? => the linguistic
model proposed in the paper seems general
enough => believe it directly with porter
stemming
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Further improvement
? Coreference resolution
– “the method solves the problem of overfitting.”
? Semantic based clustering
– Okapi BM25 ”The Probabilistic Relevance Framework: BM25 and Beyond”, Robertson et al., 2009
– Word vector “Efficient Estimation of Word Representations in Vector Space” T. Mikolov, ICLR, 2013.
– Document vector “Distributed Representations of Words and Phrases and their
Compositionality”,NIPS, 2013.
In my opinion: okapi > word vector > document vector
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Thank you
中文領域術語提取
溫鈺瑋
範例
×目前 此車 铣 設備 由 绮 發 機械 提供
?目前 此 車铣 設備 由 绮發機械 提供
×L 固定 板會 有 擺動 過大 疑慮
?L固定板 會 有 擺動 過大 疑慮
方法
? Collocation
– 利用Mutual information (簡稱MI) 得知「字跟字」及
「詞跟字」搭配成詞的機率, 詞的內部結合強度
– 例: c = “自然語言處理”, a = “自然語言處”
b = “然語言處理”
方法
? Adaptation
目前 此車 铣 設備 由 绮 發 機械 提供
b e b e s b e s s s b e b e
目前此車铣設備由绮發機械提供
CKIP, stanford, jieba…
手動調整
目前 此 車铣 設備 由 绮發機械 提供
b e s b e b e s b m m e b e
CRF-based DELTA word segmentor
Input : L 固定 板會 有 擺動 過大 疑慮
Output : L固定板 會 有 擺動 過大 疑慮
Thank you
台達資料的知識萃取
莊文立
Information Extraction
? Named Entity Recognition (NER)
– 專有名詞的辨識和分類
? 公司、人物、產品、地點…等等
? Relation Extraction (RE)
– 從文字裡找出named entities之間的關係,例如
? 競爭
? 合作
? 客戶
? 上游廠商
– 通常用(subject,relation,object)三元組來表示
SALES拜訪記錄:
對於BV3418專案價格的了解,欣特協寶姚經理給出的回應是,周總
認為,台達的價格比西門子808低階機種NC控制器的價格高。
? NER
? 西門子/Organization
? 欣特協寶/Organization
? 台達/Organization
? 姚經理/Person
? 周總/Person
? RE
# Subject Relation Object
1 台達 COMPETE_WITH 西門子
2 台達 IS_VENDOR 欣特協寶
3 西門子 IS_VENDOR 欣特協寶
4 欣特協寶 SUBORDINATE 姚經理
5 欣特協寶 SUBORDINATE 周總
Named Entity Recognition
? 資料處理
– 中文需要良好的斷詞結果
– 人工標記
? 模型: Conditional Random Fields (CRF)
– 從每個字的特徵裡,學習專有名詞使用的規律
? 本身的詞、詞性
? 上下文的詞、詞性
? 文法剖析樹
? 搭配用法
? 稱謂、姓氏
? 專有名詞資料庫
Relation Extraction
? 還是需要人工標記 ?
? Deep Learning!
– 讓機器自己發現最適合的表達方法
? Recursive Neural Network
– 順著文法剖析樹往上”爬”
– 每個字用 矩陣 +向量 表示
? 向量表示本身詞義
? 矩陣表示上下文資訊
– 兩個named entity交會處輸出的向量,放入分類器
1
?3
4
?
5
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●
●
Classifier
Future work
? Cross sentence
? Cross document
? Cross language
Thank you

More Related Content

Multimedia-text team report_2015-07-31