The document discusses techniques for extracting information from Chinese language documents, including named entity recognition and relation extraction. It describes using conditional random fields to recognize named entities and recursive neural networks to identify relationships between entities by representing words as vectors and modeling the context using matrices. Future work could involve extracting information across multiple sentences, documents, and languages.
3. Function/Technology MatrixUsing keyword “ ”
“The Patent-Classification Technology/Function Matrix - A Systematic Method for Design Around”, Cheng et al. Mar-2013, CSIR
4. Problem reduce
? detecting problem/solution pairs in a patent
document
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
5. Problem term detection
? Step1. finding key frames
? Step2. feature extraction
– Unsupervised feature
– Supervised feature
? Step3. classifier training
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
6. Step1. key frames detection
? We define key frames to be “
”
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
7. Step2 – unsupervised feature
(language model)
? The model:
Maximize likelihood evaluation(MLE)
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
8. Step2 – supervised feature
(linguistic model)
? By part-of-speech(POS) statistic on labeled
patents
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
9. Step2 – supervised feature
(linguistic model)
? The model:
Delta function = 1 only when the current key frame
matches the given pattern
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
10. Step3. classifier training
? Simply concatenate the features mention
above => LIBSVM
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
11. Solution term detection
? Step1. key frame detection
? Step2. feature extraction
– Unsupervised feature
– Supervised feature: based on problem terms
? Step3. classifier training
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
12. Problems
? Lacked of labeled data? => the linguistic
model proposed in the paper seems general
enough => believe it directly with porter
stemming
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
13. Further improvement
? Coreference resolution
– “the method solves the problem of overfitting.”
? Semantic based clustering
– Okapi BM25 ”The Probabilistic Relevance Framework: BM25 and Beyond”, Robertson et al., 2009
– Word vector “Efficient Estimation of Word Representations in Vector Space” T. Mikolov, ICLR, 2013.
– Document vector “Distributed Representations of Words and Phrases and their
Compositionality”,NIPS, 2013.
In my opinion: okapi > word vector > document vector
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
17. 方法
? Collocation
– 利用Mutual information (簡稱MI) 得知「字跟字」及
「詞跟字」搭配成詞的機率, 詞的內部結合強度
– 例: c = “自然語言處理”, a = “自然語言處”
b = “然語言處理”
18. 方法
? Adaptation
目前 此車 铣 設備 由 绮 發 機械 提供
b e b e s b e s s s b e b e
目前此車铣設備由绮發機械提供
CKIP, stanford, jieba…
手動調整
目前 此 車铣 設備 由 绮發機械 提供
b e s b e b e s b m m e b e
CRF-based DELTA word segmentor
Input : L 固定 板會 有 擺動 過大 疑慮
Output : L固定板 會 有 擺動 過大 疑慮