【DL輪読会】Efficiently Modeling Long Sequences with Structured State SpacesDeep Learning JP
?
This document summarizes a research paper on modeling long-range dependencies in sequence data using structured state space models and deep learning. The proposed S4 model (1) derives recurrent and convolutional representations of state space models, (2) improves long-term memory using HiPPO matrices, and (3) efficiently computes state space model convolution kernels. Experiments show S4 outperforms existing methods on various long-range dependency tasks, achieves fast and memory-efficient computation comparable to efficient Transformers, and performs competitively as a general sequence model.
This document discusses building a finite state transducer (FST) for efficient dictionary lookups during tokenization. It describes building the FST by iterating through a word list, freezing states when word suffixes differ, and merging equivalent states. The built FST is then compiled into a program that can be executed by a virtual machine to lookup words. The program represents the FST as a list of instructions including transition characters and output values. By running the program backwards, it simulates traversing the FST from a word to an output.
This document discusses key concepts in natural language processing including parse trees, part-of-speech tags, and dependency trees. It also contains mathematical formulas for Charles' Law and the ideal gas law, along with their variables and constants described in short phrases.
Technical term extraction aims to automatically identify important terms in scientific papers to help analyze the meaning of texts. It uses a machine learning model called CRF that leverages existing scientific text corpora and bilingual lexicons to recognize terms. The identified terms are then applied in natural language processing tasks involving scientific papers.
The document discusses semantic enrichment of mathematical expressions by associating semantic tags using MathML to describe structure and content, and applying statistical machine translation to automatically extract translation rules and introduce segmentation rules to segment expressions, combining both types of rules to strengthen the translation system and improve over prior rule-based systems.
This document discusses using eye tracking data to diagnose cognitive attributes and readability levels. It examines how factors like technicality, lexical perplexity, syntactic complexity, semantic consistency, background knowledge, native language, emotional state and working memory can influence eye movements and aid in recognizing personal attributes. The diagnosis also considers how these various cognitive and contextual elements impact readability.
This document discusses composing word meanings from sub-word components using deep learning. It notes that while vectors can be used to construct word spaces, all new words share the same representation and appear identical. However, humans can generalize the meaning of new words like "minced-tuna" by understanding the individual meanings of "mince" and "tuna". The document suggests using deep learning to compose a word's meaning from its sub-word parts to better represent new words.
This document discusses associating gaze information with human reading strategies. It describes using natural language processing technologies and reading behavior clues like word length and frequency to predict reading strategies, such as fixation and skipping, with 95% similarity to observed reader data. The goal is to better understand general reading strategies regardless of individual differences. It also discusses using a conditional random field model and gaze features to optimize comma placement in text for improved readability.
1) The paper proposes a co-ranking framework to adapt graph-based ranking to tweet recommendation by simultaneously ranking tweets and their authors.
2) The co-ranking algorithm considers popularity, personalization based on user interests, and diversity to avoid closely connected nodes having only high scores.
3) An evaluation on a large Twitter dataset from 2011 shows the co-ranking approach improves tweet recommendation over baselines by 18.3% in DCG and 7.8% in MAP.
30. 主な言語モデルツール
?CMU-Cambridge language model toolkit
–Kneser-neyがないのであまりおすすめしない
?SRI language model toolkit (SRILM)
–Kneser-neyはある
–Bengioらもこれを使った。多分これが一番使わ れている?
–早い!