Guidance

Control LLM output with regex and grammars for guaranteed valid generation

✨ The solution you've been looking for

Verified
Tested and verified by our team
16036 Stars

Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance - Microsoft Research's constrained generation framework

constrained-generation prompt-engineering structured-output json-validation grammar microsoft-research format-enforcement multi-step-workflows
Repository

See It In Action

Interactive preview & real-world examples

Live Demo
Skill Demo Animation

AI Conversation Simulator

See how users interact with this skill

User Prompt

Generate a user profile with name, age, and email fields in valid JSON format

Skill Processing

Analyzing request...

Agent Response

Structured JSON output with regex-validated email format, numeric age, and proper string formatting - guaranteed to be parseable

Quick Start (3 Steps)

Get up and running in minutes

1

Install

claude-code skill install guidance

claude-code skill install guidance
2

Config

3

First Trigger

@guidance help

Commands

CommandDescriptionRequired Args
@guidance guaranteed-json-generationGenerate valid JSON objects with enforced field formats using regex constraintsNone
@guidance multi-step-reasoning-agentBuild ReAct-style agents with tool selection and structured thought processesNone
@guidance data-extraction-with-format-validationExtract structured entities from text with guaranteed format complianceNone

Typical Use Cases

Guaranteed JSON Generation

Generate valid JSON objects with enforced field formats using regex constraints

Multi-Step Reasoning Agent

Build ReAct-style agents with tool selection and structured thought processes

Data Extraction with Format Validation

Extract structured entities from text with guaranteed format compliance

Overview

Guidance: Constrained LLM Generation

When to Use This Skill

Use Guidance when you need to:

  • Control LLM output syntax with regex or grammars
  • Guarantee valid JSON/XML/code generation
  • Reduce latency vs traditional prompting approaches
  • Enforce structured formats (dates, emails, IDs, etc.)
  • Build multi-step workflows with Pythonic control flow
  • Prevent invalid outputs through grammatical constraints

GitHub Stars: 18,000+ | From: Microsoft Research

Installation

1# Base installation
2pip install guidance
3
4# With specific backends
5pip install guidance[transformers]  # Hugging Face models
6pip install guidance[llama_cpp]     # llama.cpp models

Quick Start

Basic Example: Structured Generation

1from guidance import models, gen
2
3# Load model (supports OpenAI, Transformers, llama.cpp)
4lm = models.OpenAI("gpt-4")
5
6# Generate with constraints
7result = lm + "The capital of France is " + gen("capital", max_tokens=5)
8
9print(result["capital"])  # "Paris"

With Anthropic Claude

 1from guidance import models, gen, system, user, assistant
 2
 3# Configure Claude
 4lm = models.Anthropic("claude-sonnet-4-5-20250929")
 5
 6# Use context managers for chat format
 7with system():
 8    lm += "You are a helpful assistant."
 9
10with user():
11    lm += "What is the capital of France?"
12
13with assistant():
14    lm += gen(max_tokens=20)

Core Concepts

1. Context Managers

Guidance uses Pythonic context managers for chat-style interactions.

 1from guidance import system, user, assistant, gen
 2
 3lm = models.Anthropic("claude-sonnet-4-5-20250929")
 4
 5# System message
 6with system():
 7    lm += "You are a JSON generation expert."
 8
 9# User message
10with user():
11    lm += "Generate a person object with name and age."
12
13# Assistant response
14with assistant():
15    lm += gen("response", max_tokens=100)
16
17print(lm["response"])

Benefits:

  • Natural chat flow
  • Clear role separation
  • Easy to read and maintain

2. Constrained Generation

Guidance ensures outputs match specified patterns using regex or grammars.

Regex Constraints

 1from guidance import models, gen
 2
 3lm = models.Anthropic("claude-sonnet-4-5-20250929")
 4
 5# Constrain to valid email format
 6lm += "Email: " + gen("email", regex=r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")
 7
 8# Constrain to date format (YYYY-MM-DD)
 9lm += "Date: " + gen("date", regex=r"\d{4}-\d{2}-\d{2}")
10
11# Constrain to phone number
12lm += "Phone: " + gen("phone", regex=r"\d{3}-\d{3}-\d{4}")
13
14print(lm["email"])  # Guaranteed valid email
15print(lm["date"])   # Guaranteed YYYY-MM-DD format

How it works:

  • Regex converted to grammar at token level
  • Invalid tokens filtered during generation
  • Model can only produce matching outputs

Selection Constraints

 1from guidance import models, gen, select
 2
 3lm = models.Anthropic("claude-sonnet-4-5-20250929")
 4
 5# Constrain to specific choices
 6lm += "Sentiment: " + select(["positive", "negative", "neutral"], name="sentiment")
 7
 8# Multiple-choice selection
 9lm += "Best answer: " + select(
10    ["A) Paris", "B) London", "C) Berlin", "D) Madrid"],
11    name="answer"
12)
13
14print(lm["sentiment"])  # One of: positive, negative, neutral
15print(lm["answer"])     # One of: A, B, C, or D

3. Token Healing

Guidance automatically “heals” token boundaries between prompt and generation.

Problem: Tokenization creates unnatural boundaries.

1# Without token healing
2prompt = "The capital of France is "
3# Last token: " is "
4# First generated token might be " Par" (with leading space)
5# Result: "The capital of France is  Paris" (double space!)

Solution: Guidance backs up one token and regenerates.

1from guidance import models, gen
2
3lm = models.Anthropic("claude-sonnet-4-5-20250929")
4
5# Token healing enabled by default
6lm += "The capital of France is " + gen("capital", max_tokens=5)
7# Result: "The capital of France is Paris" (correct spacing)

Benefits:

  • Natural text boundaries
  • No awkward spacing issues
  • Better model performance (sees natural token sequences)

4. Grammar-Based Generation

Define complex structures using context-free grammars.

 1from guidance import models, gen
 2
 3lm = models.Anthropic("claude-sonnet-4-5-20250929")
 4
 5# JSON grammar (simplified)
 6json_grammar = """
 7{
 8    "name": <gen name regex="[A-Za-z ]+" max_tokens=20>,
 9    "age": <gen age regex="[0-9]+" max_tokens=3>,
10    "email": <gen email regex="[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}" max_tokens=50>
11}
12"""
13
14# Generate valid JSON
15lm += gen("person", grammar=json_grammar)
16
17print(lm["person"])  # Guaranteed valid JSON structure

Use cases:

  • Complex structured outputs
  • Nested data structures
  • Programming language syntax
  • Domain-specific languages

5. Guidance Functions

Create reusable generation patterns with the @guidance decorator.

 1from guidance import guidance, gen, models
 2
 3@guidance
 4def generate_person(lm):
 5    """Generate a person with name and age."""
 6    lm += "Name: " + gen("name", max_tokens=20, stop="\n")
 7    lm += "\nAge: " + gen("age", regex=r"[0-9]+", max_tokens=3)
 8    return lm
 9
10# Use the function
11lm = models.Anthropic("claude-sonnet-4-5-20250929")
12lm = generate_person(lm)
13
14print(lm["name"])
15print(lm["age"])

Stateful Functions:

 1@guidance(stateless=False)
 2def react_agent(lm, question, tools, max_rounds=5):
 3    """ReAct agent with tool use."""
 4    lm += f"Question: {question}\n\n"
 5
 6    for i in range(max_rounds):
 7        # Thought
 8        lm += f"Thought {i+1}: " + gen("thought", stop="\n")
 9
10        # Action
11        lm += "\nAction: " + select(list(tools.keys()), name="action")
12
13        # Execute tool
14        tool_result = tools[lm["action"]]()
15        lm += f"\nObservation: {tool_result}\n\n"
16
17        # Check if done
18        lm += "Done? " + select(["Yes", "No"], name="done")
19        if lm["done"] == "Yes":
20            break
21
22    # Final answer
23    lm += "\nFinal Answer: " + gen("answer", max_tokens=100)
24    return lm

Backend Configuration

Anthropic Claude

1from guidance import models
2
3lm = models.Anthropic(
4    model="claude-sonnet-4-5-20250929",
5    api_key="your-api-key"  # Or set ANTHROPIC_API_KEY env var
6)

OpenAI

1lm = models.OpenAI(
2    model="gpt-4o-mini",
3    api_key="your-api-key"  # Or set OPENAI_API_KEY env var
4)

Local Models (Transformers)

1from guidance.models import Transformers
2
3lm = Transformers(
4    "microsoft/Phi-4-mini-instruct",
5    device="cuda"  # Or "cpu"
6)

Local Models (llama.cpp)

1from guidance.models import LlamaCpp
2
3lm = LlamaCpp(
4    model_path="/path/to/model.gguf",
5    n_ctx=4096,
6    n_gpu_layers=35
7)

Common Patterns

Pattern 1: JSON Generation

 1from guidance import models, gen, system, user, assistant
 2
 3lm = models.Anthropic("claude-sonnet-4-5-20250929")
 4
 5with system():
 6    lm += "You generate valid JSON."
 7
 8with user():
 9    lm += "Generate a user profile with name, age, and email."
10
11with assistant():
12    lm += """{
13    "name": """ + gen("name", regex=r'"[A-Za-z ]+"', max_tokens=30) + """,
14    "age": """ + gen("age", regex=r"[0-9]+", max_tokens=3) + """,
15    "email": """ + gen("email", regex=r'"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"', max_tokens=50) + """
16}"""
17
18print(lm)  # Valid JSON guaranteed

Pattern 2: Classification

 1from guidance import models, gen, select
 2
 3lm = models.Anthropic("claude-sonnet-4-5-20250929")
 4
 5text = "This product is amazing! I love it."
 6
 7lm += f"Text: {text}\n"
 8lm += "Sentiment: " + select(["positive", "negative", "neutral"], name="sentiment")
 9lm += "\nConfidence: " + gen("confidence", regex=r"[0-9]+", max_tokens=3) + "%"
10
11print(f"Sentiment: {lm['sentiment']}")
12print(f"Confidence: {lm['confidence']}%")

Pattern 3: Multi-Step Reasoning

 1from guidance import models, gen, guidance
 2
 3@guidance
 4def chain_of_thought(lm, question):
 5    """Generate answer with step-by-step reasoning."""
 6    lm += f"Question: {question}\n\n"
 7
 8    # Generate multiple reasoning steps
 9    for i in range(3):
10        lm += f"Step {i+1}: " + gen(f"step_{i+1}", stop="\n", max_tokens=100) + "\n"
11
12    # Final answer
13    lm += "\nTherefore, the answer is: " + gen("answer", max_tokens=50)
14
15    return lm
16
17lm = models.Anthropic("claude-sonnet-4-5-20250929")
18lm = chain_of_thought(lm, "What is 15% of 200?")
19
20print(lm["answer"])

Pattern 4: ReAct Agent

 1from guidance import models, gen, select, guidance
 2
 3@guidance(stateless=False)
 4def react_agent(lm, question):
 5    """ReAct agent with tool use."""
 6    tools = {
 7        "calculator": lambda expr: eval(expr),
 8        "search": lambda query: f"Search results for: {query}",
 9    }
10
11    lm += f"Question: {question}\n\n"
12
13    for round in range(5):
14        # Thought
15        lm += f"Thought: " + gen("thought", stop="\n") + "\n"
16
17        # Action selection
18        lm += "Action: " + select(["calculator", "search", "answer"], name="action")
19
20        if lm["action"] == "answer":
21            lm += "\nFinal Answer: " + gen("answer", max_tokens=100)
22            break
23
24        # Action input
25        lm += "\nAction Input: " + gen("action_input", stop="\n") + "\n"
26
27        # Execute tool
28        if lm["action"] in tools:
29            result = tools[lm["action"]](lm["action_input"])
30            lm += f"Observation: {result}\n\n"
31
32    return lm
33
34lm = models.Anthropic("claude-sonnet-4-5-20250929")
35lm = react_agent(lm, "What is 25 * 4 + 10?")
36print(lm["answer"])

Pattern 5: Data Extraction

 1from guidance import models, gen, guidance
 2
 3@guidance
 4def extract_entities(lm, text):
 5    """Extract structured entities from text."""
 6    lm += f"Text: {text}\n\n"
 7
 8    # Extract person
 9    lm += "Person: " + gen("person", stop="\n", max_tokens=30) + "\n"
10
11    # Extract organization
12    lm += "Organization: " + gen("organization", stop="\n", max_tokens=30) + "\n"
13
14    # Extract date
15    lm += "Date: " + gen("date", regex=r"\d{4}-\d{2}-\d{2}", max_tokens=10) + "\n"
16
17    # Extract location
18    lm += "Location: " + gen("location", stop="\n", max_tokens=30) + "\n"
19
20    return lm
21
22text = "Tim Cook announced at Apple Park on 2024-09-15 in Cupertino."
23
24lm = models.Anthropic("claude-sonnet-4-5-20250929")
25lm = extract_entities(lm, text)
26
27print(f"Person: {lm['person']}")
28print(f"Organization: {lm['organization']}")
29print(f"Date: {lm['date']}")
30print(f"Location: {lm['location']}")

Best Practices

1. Use Regex for Format Validation

1# ✅ Good: Regex ensures valid format
2lm += "Email: " + gen("email", regex=r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")
3
4# ❌ Bad: Free generation may produce invalid emails
5lm += "Email: " + gen("email", max_tokens=50)

2. Use select() for Fixed Categories

1# ✅ Good: Guaranteed valid category
2lm += "Status: " + select(["pending", "approved", "rejected"], name="status")
3
4# ❌ Bad: May generate typos or invalid values
5lm += "Status: " + gen("status", max_tokens=20)

3. Leverage Token Healing

1# Token healing is enabled by default
2# No special action needed - just concatenate naturally
3lm += "The capital is " + gen("capital")  # Automatic healing

4. Use stop Sequences

1# ✅ Good: Stop at newline for single-line outputs
2lm += "Name: " + gen("name", stop="\n")
3
4# ❌ Bad: May generate multiple lines
5lm += "Name: " + gen("name", max_tokens=50)

5. Create Reusable Functions

 1# ✅ Good: Reusable pattern
 2@guidance
 3def generate_person(lm):
 4    lm += "Name: " + gen("name", stop="\n")
 5    lm += "\nAge: " + gen("age", regex=r"[0-9]+")
 6    return lm
 7
 8# Use multiple times
 9lm = generate_person(lm)
10lm += "\n\n"
11lm = generate_person(lm)

6. Balance Constraints

1# ✅ Good: Reasonable constraints
2lm += gen("name", regex=r"[A-Za-z ]+", max_tokens=30)
3
4# ❌ Too strict: May fail or be very slow
5lm += gen("name", regex=r"^(John|Jane)$", max_tokens=10)

Comparison to Alternatives

FeatureGuidanceInstructorOutlinesLMQL
Regex Constraints✅ Yes❌ No✅ Yes✅ Yes
Grammar Support✅ CFG❌ No✅ CFG✅ CFG
Pydantic Validation❌ No✅ Yes✅ Yes❌ No
Token Healing✅ Yes❌ No✅ Yes❌ No
Local Models✅ Yes⚠️ Limited✅ Yes✅ Yes
API Models✅ Yes✅ Yes⚠️ Limited✅ Yes
Pythonic Syntax✅ Yes✅ Yes✅ Yes❌ SQL-like
Learning CurveLowLowMediumHigh

When to choose Guidance:

  • Need regex/grammar constraints
  • Want token healing
  • Building complex workflows with control flow
  • Using local models (Transformers, llama.cpp)
  • Prefer Pythonic syntax

When to choose alternatives:

  • Instructor: Need Pydantic validation with automatic retrying
  • Outlines: Need JSON schema validation
  • LMQL: Prefer declarative query syntax

Performance Characteristics

Latency Reduction:

  • 30-50% faster than traditional prompting for constrained outputs
  • Token healing reduces unnecessary regeneration
  • Grammar constraints prevent invalid token generation

Memory Usage:

  • Minimal overhead vs unconstrained generation
  • Grammar compilation cached after first use
  • Efficient token filtering at inference time

Token Efficiency:

  • Prevents wasted tokens on invalid outputs
  • No need for retry loops
  • Direct path to valid outputs

Resources

See Also

  • references/constraints.md - Comprehensive regex and grammar patterns
  • references/backends.md - Backend-specific configuration
  • references/examples.md - Production-ready examples

What Users Are Saying

Real feedback from the community

Environment Matrix

Dependencies

guidance (Python package)
transformers (optional, for Hugging Face models)
llama-cpp-python (optional, for local GGUF models)

Framework Support

Anthropic Claude ✓ (recommended) OpenAI GPT models ✓ Hugging Face Transformers ✓ llama.cpp GGUF models ✓

Context Window

Token Usage ~2K-8K tokens for typical structured generation tasks

Security & Privacy

Information

Author
davila7
Updated
2026-01-30
Category
productivity-tools