This document discusses using large language models (LLMs) like GPT-3 and BERT in research. It provides a brief history of LLMs, describing how they use transformers, massive datasets and GPUs. Popular LLMs include GPT-3, BERT, LLaMA and T5. The document suggests using tools like Sentence-BERT, T5 and GPT-3 with applications like Elicit for research tasks like summarization and question answering. Finally, it provides more resources on LLMs and references recent papers about models such as GPT-4, LaMDA, PaLM and Chinchilla.
1 of 8
Download to read offline
More Related Content
You and Your Research -- LLMs Perspective
1. You and Your Research
LLMs Perspective
Dr Mohamed Elawady
Department of Computer and Information Sciences
University of Strathclyde
4th ML/AI Workshop
14th Sep 2023
2. Agenda
Introduction: LLMs
History of LLMs
LLMs + Chatbots
LLMs + Research
2
https://www.reddit.com/r/ChatGPTMemes/comme
nts/102mvys/yours_sincerely_chatgpt/?rdt=43569
I visualise a time when we will
be to robots what dogs are to
humans, and Im rooting for the
machines.
Claude Shannon (1916-2001)
3. Introduction: LLMs
Large Language Model
(LLM): Natural Language
Processing (NLP) + Deep
Learning (DL)
Basic: Input (text),
Output (text)
How: self-supervised
(aka reinforcement
learning) and
semi-supervised
training over massive
datasets (in
terabytes).
3
https://lifearchitect.ai/models/
4. History of LLMs
4
Zhao, Wayne Xin, et al. "A survey of large language models." arXiv
preprint arXiv:2303.18223 (2023).
Whats behind
Transformers
Massive data
GPUs
Popular
OpenAI GPT 3/4
Google Bard
Meta LLaMA
Google T5
BLOOM
Coming Soon!
Deepmind Gemini
OpenAI GPT 5
7. More Resources
LLM Introduction: Learn Language Models, GitHub Gist:
https://gist.github.com/rain-1/eebd5e5eb2784feecf450324e3341c8d
Awesome-LLM: a curated list of Large Language Model, GitHub:
https://github.com/Hannibal046/Awesome-LLM
Demos over Hugging Face platform (signup required)
Text-to-Text Generation: https://huggingface.co/google/flan-t5-base
Text Summarization: https://huggingface.co/facebook/bart-large-cnn
Text Generation: https://huggingface.co/bigscience/bloom
7
8. References
(GPT-3) Brown, Tom, et al. "Language models are few-shot learners." Advances in neural information processing systems 33
(2020): 1877-1901.
(GPT-4) OpenAI. GPT-4 Technical Report. ArXiv abs/2303.08774 (2023).
(LaMDA) Thoppilan, Romal, et al. "Lamda: Language models for dialog applications." arXiv preprint arXiv:2201.08239
(2022).
(SciBERT) Beltagy, Iz, Kyle Lo, and Arman Cohan. "SciBERT: A pretrained language model for scientific text." arXiv preprint
arXiv:1903.10676 (2019).
(Sentence-bert) Reimers, Nils, and Iryna Gurevych. "Sentence-bert: Sentence embeddings using siamese bert-networks."
arXiv preprint arXiv:1908.10084 (2019).
(T5) Raffel, Colin, et al. "Exploring the limits of transfer learning with a unified text-to-text transformer." The Journal of
Machine Learning Research 21.1 (2020): 5485-5551.
(LLaMA) Touvron, Hugo, et al. "Llama: Open and efficient foundation language models." arXiv preprint arXiv:2302.13971
(2023).
(BLOOM) Scao, Teven Le, et al. "Bloom: A 176b-parameter open-access multilingual language model." arXiv preprint
arXiv:2211.05100 (2022).
(LaMDA) Thoppilan, Romal, et al. "Lamda: Language models for dialog applications." arXiv preprint arXiv:2201.08239
(2022).
(PaLM) Chowdhery, Aakanksha, et al. "Palm: Scaling language modeling with pathways." arXiv preprint arXiv:2204.02311
(2022).
(Chinchilla) Hoffmann, Jordan, et al. "Training compute-optimal large language models." arXiv preprint arXiv:2203.15556
(2022).
8