【word2vec 2016總統大選新聞】
講者:施旭峰
主辦單位:蜂巢數據(Beehive Data Group)
word2vec 是 Google 2013 年年中釋出基於 Apache 2.0 的開源專案,常被歸類在 Deep Learning 的一環。這次的晚餐時間,我們會分享利用 2016總統大選收集的新聞資料實作 word2vec 的過程,歡迎一起來晚餐唷!
#Beehive Data Group
10. Mathematical model
It is the New York City driver’s public shame — a sentence of solitary
front-seat confinement levied against those for whom subways, buses and
taxis are insufficient.For at least 90 minutes each week, residents move
their vehicles from their curbside berths, slide into formation behind a row
of double-parked neighbors and moor together in a singular urban traffic
jam, beholden to a hulking contraption whose distinguishing feature
appears to be this: It swirls plastic bags and cigarette stubs briefly before
returning them to the earth.But the ignominy of alternate-side-of-the-street
parking, which allows city workers to clean roadways without the
obstruction of parked cars, could soon be eased. A bill that will have a
hearing before the City Council on Monday would allow drivers to return to
parking spaces once the street sweepers pass, causing a potentially
significant reduction in wait times for those doomed to mornings in their
cars.Councilman Ydanis Rodriguez, a Democrat from Manhattan and the
bill’s sponsor, said the legislation would prevent accidents by reducing the
duration of double-parking; help the environment, with fewer cars idling or
driving in search of spaces; and save New Yorkers “millions of dollars” in
lost time.
數學
模型
17. Word2Vec - Reference
Paper
1. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of
Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013.
2. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed
Representations of Words and Phrases and their Compositionality. In Proceedings of
NIPS, 2013.
3. Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous
Space Word Representations. In Proceedings of NAACL HLT, 2013.
18. Word2Vec - Reference
Other
1. Deep Learning实战之word2vec
http://techblog.youdao.com/?p=915
2. 用中文資料測試 word2vec
http://city.shaform.com/blog/2014/11/04/word2vec.html
3. 用中文把玩Google开源的Deep-Learning项目word2vec
http://www.cnblogs.com/wowarsenal/p/3293586.html
29. 結果與測試
線性關係推理
./word-analogy vectors.binEnter three words (EXIT to break): 蔡英文 陳建仁 宋楚瑜
Word: 蔡英文 Position in vocabulary: 9
Word: 陳建仁 Position in vocabulary: 147
Word: 宋楚瑜 Position in vocabulary: 109
Word Distance
------------------------------------------------------------------------
徐欣瑩 0.605165
親民黨 0.565923
宋瑩配 0.564386
昨也用 0.536368