LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
Ìý
This document summarizes the LLaMa model, which is an open and efficient foundation language model.
[1] LLaMa achieves state-of-the-art performance on various tasks while being trained exclusively on publicly available data and requiring only a single GPU for inference, making it more accessible than other large models.
[2] Key aspects of LLaMa include pre-normalization, SwiGLU activation, rotary embeddings, and efficient implementation techniques. It was trained on 1.4 trillion tokens of publicly available data using 2048 A100 GPUs over 5 months.
[3] Evaluation shows LLaMa outperforms other models on common sense reasoning, question answering, reading comprehension,
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
Ìý
PL-Marker is a span representation method that uses packed levitated markers to consider the interrelations between spans for named entity recognition and relation extraction tasks. It strategically inserts solid and levitated markers into the encoder to represent spans and span pairs. In experiments, PL-Marker achieved state-of-the-art results on several NER and RE datasets, outperforming previous methods. It obtains better performance by modeling the interrelations between spans that share the same subject or object entities.
Scaling Instruction-Finetuned Language Modelstaeseon ryu
Ìý
The document discusses improving the performance of language models on unseen tasks through instruction finetuning, wherein models are finetuned on a large collection of tasks described as instructions rather than examples. It finds that scaling both the number of finetuning tasks and the size of the model improves performance, and finetuning on chain-of-thought annotations particularly helps the model's reasoning abilities. Instruction finetuning is shown to generalize across models and improve usability while mitigating potential harms.
mPLUG is a new vision-language pre-trained model proposed by the authors that achieves state-of-the-art performance on various vision-language tasks through an asymmetric architecture using novel cross-modal skip connections. The model introduces skip-connected fusion blocks to address information asymmetry and computation inefficiency problems in multi-modal fusion. mPLUG is pre-trained using contrastive learning on image-text pairs and masked language modeling, and shows strong zero-shot transfer ability on tasks like image captioning and image-text retrieval. Evaluation shows mPLUG outperforms prior work on tasks including visual question answering, image captioning, image-text retrieval, visual grounding and visual reasoning.