【DL輪読会】Efficiently Modeling Long Sequences with Structured State SpacesDeep Learning JP
?
This document summarizes a research paper on modeling long-range dependencies in sequence data using structured state space models and deep learning. The proposed S4 model (1) derives recurrent and convolutional representations of state space models, (2) improves long-term memory using HiPPO matrices, and (3) efficiently computes state space model convolution kernels. Experiments show S4 outperforms existing methods on various long-range dependency tasks, achieves fast and memory-efficient computation comparable to efficient Transformers, and performs competitively as a general sequence model.
【DL輪読会】Efficiently Modeling Long Sequences with Structured State SpacesDeep Learning JP
?
This document summarizes a research paper on modeling long-range dependencies in sequence data using structured state space models and deep learning. The proposed S4 model (1) derives recurrent and convolutional representations of state space models, (2) improves long-term memory using HiPPO matrices, and (3) efficiently computes state space model convolution kernels. Experiments show S4 outperforms existing methods on various long-range dependency tasks, achieves fast and memory-efficient computation comparable to efficient Transformers, and performs competitively as a general sequence model.
CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multipl...禎晃 山崎
?
CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multiple Languages
Word Sense Disambiguation, BERT, clustering
ということで読みました.
p. 7 は「solid は glass の上位語,glassware は glass の下位語」でした。。。
Robust Neural Machine Translation with Doubly Adversarial InputsSho Takase
?
This document discusses using adversarial training methods to improve neural machine translation. Specifically, it explores training a Transformer model with both original inputs and their "doubly adversarial" corrupted versions to learn robust representations. Evaluation on two translation tasks showed this approach improved BLEU scores over the baseline Transformer model, demonstrating the effectiveness of adversarial training for more robust neural machine translation.
Harnessing Deep Neural Networks with Logic RulesSho Takase
?
This document summarizes a method for harnessing deep neural networks with logic rules. The goal is to incorporate general rules and human intuitions into neural networks. Rules are expressed using first-order predicate logic and incorporated into training as constraints. The method alternates between calculating the model distribution subject to constraints (q(y|x)) and updating the model parameters (θ). Experiments on sentiment analysis and named entity recognition show the approach improves performance by enforcing linguistic rules during training.
IoT Devices Compliant with JC-STAR Using Linux as a Container OSTomohiro Saneyoshi
?
Security requirements for IoT devices are becoming more defined, as seen with the EU Cyber Resilience Act and Japan’s JC-STAR.
It's common for IoT devices to run Linux as their operating system. However, adopting general-purpose Linux distributions like Ubuntu or Debian, or Yocto-based Linux, presents certain difficulties. This article outlines those difficulties.
It also, it highlights the security benefits of using a Linux-based container OS and explains how to adopt it with JC-STAR, using the "Armadillo Base OS" as an example.
Feb.25.2025@JAWS-UG IoT
1. Retrofitting Word Vectors to
Semantic Lexicons
Manaal Faruqui, Jese Dodge, Sujay K. Jauhar,
Chris Dyer, Eduard Hovy, Noah A. Smith
NACL 2015
読む人:高瀬翔
知識獲得研究会2015/4/21
1
5. 提案手法
? やりたいことは2つ
– コーパスから得たベクトル(入力)と似たベクトルとする
– 外部知識上で関連する単語は似たベクトルとする
? 関連:同義語,上位下位語,言い換え
? 目的関数
– 似せたいベクトル間のユークリッド距離を最小化
? 一項目:コーパスの情報(入力ベクトルに近づける)
? 二項目:外部知識(外部知識上での関連語に近づける)
– E:外部知識上で関連している単語間に張ったエッジの集合
– α,β:ハイパーパラメータ(α=1,β=1 / エッジの次数)
5
en related words
inferred (white)
method works
ord vector mod-
tors to beretro?tted (and correspond to V?); shaded
nodes are labeled with the corresponding vectors in
?Q, which areobserved. Thegraph can beinterpreted
as a Markov random ?eld (Kindermann and Snell,
1980).
The distance between a pair of vectors is de?ned
to be the Euclidean distance. Since we want the
inferred word vector to be close to the observed
value ?qi and close to its neighbors qj , 8j such that
(i, j ) 2 E, theobjectiveto beminimized becomes:
(Q) =
nX
i= 1
2
4?i kqi ? ?qi k2
+
X
(i,j )2E
βij kqi ? qj k2
3
5
where ? and β values control the relative strengths
of associations (moredetails in §6.1).
コーパスから得たベクトル(入力)
改良後のベクトル
6. 解き方
? 反復更新で解を求める
– 各 qi について,目的関数を最小化する値への更
新を繰り返す
– qi は入力ベクトルで初期化
? 経験的には10回の反復で近づけたいベクトル
間のユークリッド距離は0.01未満になる
6
orma-
o mul-
gives
valua-
engths
ect of
?tting
com/
s
heset
desse-
resent
ex for
V ? V
lution can be found by solving a system of linear
equations. To do so, we use an ef?cient iterative
updating method (Bengio et al., 2006; Subramanya
et al., 2010; Das and Petrov, 2011; Das and Smith,
2011). The vectors in Q are initialized to be equal
to thevectorsin ?Q. Wetakethe?rst derivativeof
with respect to one qi vector, and by equating it to
zero arriveat thefollowing onlineupdate:
qi =
P
j :(i,j )2E βij qj + ?i ?qi
P
j :(i,j )2E βij + ?i
(1)
In practice, running this procedure for 10 iterations
converges to changes in Euclidean distance of ad-
jacent vertices of less than 10? 2. The retro?tting
approach described above is modular; it can be ap-
plied to word vector representations obtained from
更新式: