Skip to main content

ML 101 - How LLMs Generate Text

· 5 min read

Understanding LLMs Using Kitchen Analogy

Imagine you're running a high-end restaurant, where every dish served is an AI-generated response.

  • The chef represents the AI model (LLM).
  • The ingredients are the tokens (words, subwords).
  • The recipe book represents the model's training data.
  • The cooking process is the text generation pipeline.

Let's go step by step and see how the kitchen (LLM) operates.

1. Tokenization = Preparing Ingredients

Before cooking begins, a chef prepares ingredients by cutting vegetables, and measuring spices needed for a dish. LLMs follow a similar process—they break down text into smaller units called tokens.

Example:

  • Input sentence:
    "We're revolutionizing grocery shopping."
  • Chopping process: Breaking it into ingredients (tokens):
    ["We're", "revolutionizing", "grocery", "shopping"]
  • Each token gets a numeric ID, like labeling ingredients with barcodes for inventory.

👉 Just like a chef doesn't use whole vegetables but cuts them into usable pieces, LLMs split text into tokens for efficient processing.

2. Logits & Softmax = Deciding the Next Ingredient Based on Taste

A chef doesn't just throw in random spices — they taste the dish and decide which ingredient will make it better. Similarly, LLMs predict the most likely next token.

  • The chef has a list of possible next ingredients (logits) ranked by suitability.
  • They smell, taste, and evaluate (Softmax) to choose the best one.

Example: The chef considers:

  • Salt (low probability) – Might overpower the dish.
  • Garlic (medium probability) – Could add depth.
  • Basil (high probability) – Complements the dish well.

After sampling (Softmax), the chef selects Basil because it enhances the dish.

👉 Just as a chef refines flavors with seasoning, LLMs select the next word based on learned probabilities.

3. Autoregressive Generation = Cooking Step-by-Step

A chef doesn't prepare a dish by mixing everything at once. Instead, they add ingredients in a specific order, letting each step build upon the last.

  • The chef starts with a base (prompt):

    "We're revolutionizing grocery shopping."
  • Follow a structured cooking process, adding one ingredient at a time:

    1. Add "with" → Stir well.
      • Current dish: "We're revolutionizing grocery shopping with"
    2. Add "same-day" → Let it cook.
      • Current dish: "We're revolutionizing grocery shopping with same-day"
    3. Add "delivery" → Adjust seasoning.
      • Current dish: "We're revolutionizing grocery shopping with same-day delivery"
    4. Add "powered by AI" → Final plating.
      • Current dish: "We're revolutionizing grocery shopping with same-day delivery powered by AI"
  • The dish is complete when a special stop signal appears (like the [EOS] token in AI).

4. Transformer Architecture

A well-organized kitchen doesn't rely on a single chef doing everything. Instead, it follows a highly efficient workflow, where different kitchen staff members specialize in different tasks. This is exactly how the Transformer model works—processing information in parallel rather than sequentially.

4.1 Self-Attention = A Waiter Managing Multiple Tables at Once

A great waiter doesn't just focus on one table at a time. Instead, they keep track of all their tables, making sure each one gets the right service at the right time. Similarly, self-attention allows an LLM to analyze all words in a sentence at once, rather than just looking at the last word.

Example: Suppose a waiter is serving multiple tables:

  • Table 1 orders an appetizer.
  • Table 2 asks for a drink refill.
  • Table 3 needs the check.

Instead of serving one table at a time, a skilled waiter manages all tables simultaneously, prioritizing based on urgency.

👉 Self-attention ensures the model doesn't just focus on the last word but considers the entire sentence at once.


4.2 Positional Encoding = The Right Order of Courses

In a restaurant, a meal has a specific order—you wouldn't serve dessert before the main course. Similarly, LLMs use positional encoding to keep track of word order, ensuring that sentences are structured correctly.

Correct order:

  1. Serve the appetizer.
  2. Bring the main course.
  3. Deliver dessert.

Wrong order:

  1. Deliver dessert first.
  2. Bring the main course.
  3. Serve the appetizer.

Even though the same dishes are served, the experience is completely wrong if the order is mixed up. Similarly, an AI model ensures the correct sentence structure by encoding word positions.


4.3 Feedforward Layers = Final Presentation

Before a dish is served to the customer, it goes through a final quality check—the chef ensures the presentation is perfect, adds final garnishes, and makes sure the seasoning is balanced. Similarly, feedforward layers refine token embeddings, making sure the model's predictions are polished and well-formed.

Example: A chef checks a dish before serving:

  • Too bland? → Add a final sprinkle of seasoning.
  • Messy plating? → Rearrange for better presentation.
  • Overcooked steak? → Adjust for future orders.

This last step ensures the final dish meets high standards.