際際滷

際際滷Share a Scribd company logo
You and Your Research
LLMs Perspective
Dr Mohamed Elawady
Department of Computer and Information Sciences
University of Strathclyde
4th ML/AI Workshop
14th Sep 2023
Agenda
 Introduction: LLMs
 History of LLMs
 LLMs + Chatbots
 LLMs + Research
2
https://www.reddit.com/r/ChatGPTMemes/comme
nts/102mvys/yours_sincerely_chatgpt/?rdt=43569
I visualise a time when we will
be to robots what dogs are to
humans, and Im rooting for the
machines.
Claude Shannon (1916-2001)
Introduction: LLMs
Large Language Model
(LLM): Natural Language
Processing (NLP) + Deep
Learning (DL)
 Basic: Input (text),
Output (text)
 How: self-supervised
(aka reinforcement
learning) and
semi-supervised
training over massive
datasets (in
terabytes).
3
https://lifearchitect.ai/models/
History of LLMs
4
Zhao, Wayne Xin, et al. "A survey of large language models." arXiv
preprint arXiv:2303.18223 (2023).
 Whats behind
 Transformers
 Massive data
 GPUs
 Popular
 OpenAI GPT 3/4
 Google Bard
 Meta LLaMA
 Google T5
 BLOOM
 Coming Soon!
 Deepmind Gemini
 OpenAI GPT 5
LLMs + Chatbots
 GPT-3.5/4 + ChatGPT (OpenAI)
 LaMDA + Bard (Google)
 GPT 4 + Bing (Microsoft)
 GPT 4 + YouChat (You.com)
 Claude + Claude AI (Anthropic)
 GPT 4 + ChatSonic (ChatSonic)
5
LLMs + Research
 Sentence-BERT / T5 / GPT-3 + Elicit
 SciBERT + Scite Assistant
 GPT-4 + Consensus
6
More Resources
 LLM Introduction: Learn Language Models, GitHub Gist:
https://gist.github.com/rain-1/eebd5e5eb2784feecf450324e3341c8d
 Awesome-LLM: a curated list of Large Language Model, GitHub:
https://github.com/Hannibal046/Awesome-LLM
 Demos over Hugging Face platform (signup required)
 Text-to-Text Generation: https://huggingface.co/google/flan-t5-base
 Text Summarization: https://huggingface.co/facebook/bart-large-cnn
 Text Generation: https://huggingface.co/bigscience/bloom
7
References
 (GPT-3) Brown, Tom, et al. "Language models are few-shot learners." Advances in neural information processing systems 33
(2020): 1877-1901.
 (GPT-4) OpenAI. GPT-4 Technical Report. ArXiv abs/2303.08774 (2023).
 (LaMDA) Thoppilan, Romal, et al. "Lamda: Language models for dialog applications." arXiv preprint arXiv:2201.08239
(2022).
 (SciBERT) Beltagy, Iz, Kyle Lo, and Arman Cohan. "SciBERT: A pretrained language model for scientific text." arXiv preprint
arXiv:1903.10676 (2019).
 (Sentence-bert) Reimers, Nils, and Iryna Gurevych. "Sentence-bert: Sentence embeddings using siamese bert-networks."
arXiv preprint arXiv:1908.10084 (2019).
 (T5) Raffel, Colin, et al. "Exploring the limits of transfer learning with a unified text-to-text transformer." The Journal of
Machine Learning Research 21.1 (2020): 5485-5551.
 (LLaMA) Touvron, Hugo, et al. "Llama: Open and efficient foundation language models." arXiv preprint arXiv:2302.13971
(2023).
 (BLOOM) Scao, Teven Le, et al. "Bloom: A 176b-parameter open-access multilingual language model." arXiv preprint
arXiv:2211.05100 (2022).
 (LaMDA) Thoppilan, Romal, et al. "Lamda: Language models for dialog applications." arXiv preprint arXiv:2201.08239
(2022).
 (PaLM) Chowdhery, Aakanksha, et al. "Palm: Scaling language modeling with pathways." arXiv preprint arXiv:2204.02311
(2022).
 (Chinchilla) Hoffmann, Jordan, et al. "Training compute-optimal large language models." arXiv preprint arXiv:2203.15556
(2022).
8

More Related Content

You and Your Research -- LLMs Perspective

  • 1. You and Your Research LLMs Perspective Dr Mohamed Elawady Department of Computer and Information Sciences University of Strathclyde 4th ML/AI Workshop 14th Sep 2023
  • 2. Agenda Introduction: LLMs History of LLMs LLMs + Chatbots LLMs + Research 2 https://www.reddit.com/r/ChatGPTMemes/comme nts/102mvys/yours_sincerely_chatgpt/?rdt=43569 I visualise a time when we will be to robots what dogs are to humans, and Im rooting for the machines. Claude Shannon (1916-2001)
  • 3. Introduction: LLMs Large Language Model (LLM): Natural Language Processing (NLP) + Deep Learning (DL) Basic: Input (text), Output (text) How: self-supervised (aka reinforcement learning) and semi-supervised training over massive datasets (in terabytes). 3 https://lifearchitect.ai/models/
  • 4. History of LLMs 4 Zhao, Wayne Xin, et al. "A survey of large language models." arXiv preprint arXiv:2303.18223 (2023). Whats behind Transformers Massive data GPUs Popular OpenAI GPT 3/4 Google Bard Meta LLaMA Google T5 BLOOM Coming Soon! Deepmind Gemini OpenAI GPT 5
  • 5. LLMs + Chatbots GPT-3.5/4 + ChatGPT (OpenAI) LaMDA + Bard (Google) GPT 4 + Bing (Microsoft) GPT 4 + YouChat (You.com) Claude + Claude AI (Anthropic) GPT 4 + ChatSonic (ChatSonic) 5
  • 6. LLMs + Research Sentence-BERT / T5 / GPT-3 + Elicit SciBERT + Scite Assistant GPT-4 + Consensus 6
  • 7. More Resources LLM Introduction: Learn Language Models, GitHub Gist: https://gist.github.com/rain-1/eebd5e5eb2784feecf450324e3341c8d Awesome-LLM: a curated list of Large Language Model, GitHub: https://github.com/Hannibal046/Awesome-LLM Demos over Hugging Face platform (signup required) Text-to-Text Generation: https://huggingface.co/google/flan-t5-base Text Summarization: https://huggingface.co/facebook/bart-large-cnn Text Generation: https://huggingface.co/bigscience/bloom 7
  • 8. References (GPT-3) Brown, Tom, et al. "Language models are few-shot learners." Advances in neural information processing systems 33 (2020): 1877-1901. (GPT-4) OpenAI. GPT-4 Technical Report. ArXiv abs/2303.08774 (2023). (LaMDA) Thoppilan, Romal, et al. "Lamda: Language models for dialog applications." arXiv preprint arXiv:2201.08239 (2022). (SciBERT) Beltagy, Iz, Kyle Lo, and Arman Cohan. "SciBERT: A pretrained language model for scientific text." arXiv preprint arXiv:1903.10676 (2019). (Sentence-bert) Reimers, Nils, and Iryna Gurevych. "Sentence-bert: Sentence embeddings using siamese bert-networks." arXiv preprint arXiv:1908.10084 (2019). (T5) Raffel, Colin, et al. "Exploring the limits of transfer learning with a unified text-to-text transformer." The Journal of Machine Learning Research 21.1 (2020): 5485-5551. (LLaMA) Touvron, Hugo, et al. "Llama: Open and efficient foundation language models." arXiv preprint arXiv:2302.13971 (2023). (BLOOM) Scao, Teven Le, et al. "Bloom: A 176b-parameter open-access multilingual language model." arXiv preprint arXiv:2211.05100 (2022). (LaMDA) Thoppilan, Romal, et al. "Lamda: Language models for dialog applications." arXiv preprint arXiv:2201.08239 (2022). (PaLM) Chowdhery, Aakanksha, et al. "Palm: Scaling language modeling with pathways." arXiv preprint arXiv:2204.02311 (2022). (Chinchilla) Hoffmann, Jordan, et al. "Training compute-optimal large language models." arXiv preprint arXiv:2203.15556 (2022). 8