{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "# unit 3.2 - Transformer and LLM examples\n",
    "\n",
    "\n",
    "## Transformer implementation in PyTorch\n",
    "\n",
    "For a well developed and commented Transformer implementation, see [this repo](https://github.com/karpathy/nanoGPT) or [this one](https://github.com/karpathy/minGPT/).\n",
    "\n",
    "In particular [this repo](https://github.com/karpathy/minGPT/) has a [great example](https://github.com/karpathy/minGPT/tree/master/projects/adder) that you can run and train on your laptop very quickly.\n",
    "\n",
    "This example learns to add numbers as a string of text. You can see below results after training for < 10 minutes on a 2023 Apple Macbook Pro. \n",
    "\n",
    "```\n",
    "GPT claims that 12 + 75 = 77 but gt is 87\n",
    "GPT claims that 21 + 28 = 39 but gt is 49\n",
    "GPT claims that 15 + 14 = 19 but gt is 29\n",
    "GPT claims that 9 + 19 = 18 but gt is 28\n",
    "GPT claims that 83 + 40 = 133 but gt is 123\n",
    "test final score: 482/500 = 96.40% correct\n",
    "...\n",
    "iter_dt 12.52ms; iter 9490: train loss 0.05852\n",
    "iter_dt 12.13ms; iter 9500: train loss 0.02480\n",
    "train final score: 9500/9500 = 100.00% correct\n",
    "test final score: 500/500 = 100.00% correct\n",
    "...\n",
    "```\n",
    "\n",
    "## Large Language models - LLM\n",
    "\n",
    "What are Large Language Models or LLM? They are the core that powers ChatGPT, Gemini and many other modern AI tools (in the years 2024).\n",
    "\n",
    "As we have seen a Transformer neural network is composed of an encoder and decoder. The Transformer encoder is often used to encoded entire sentences, and it is useful to turn language into embeddings. On the other hand, a Transformer decoder is capable of producing language, and thus is often referred as a \"language model\". \n",
    "\n",
    "The Transformer decoder, scaled to more and more parameters than in the original Transformer papers gave rise to \"GPT\" or Generative Pre-trained Transformer. These models, including GPT-2, GPT-3, etc. are decoder-only models pretrained on large-scale unsupervised text data. They are trained to predict the next word (token) from a series of words (tokens).\n",
    "\n",
    "These models eventually scaled up to Trillion of parameters such as GPT-4 and beyond. They are the core that powers the LLM revolution of the last few years.\n",
    "\n",
    "\n",
    "### LLM visualization\n",
    "\n",
    "See this interesting [LLMvisualization](https://bbycroft.net/llm).\n",
    "\n",
    "\n",
    "### Tokenization examples\n",
    "\n",
    "Learn how LLM encode sentences with this [tokenizer](https://platform.openai.com/tokenizer) tool.\n",
    "\n",
    "\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}