Evals Context

Navigate the Roo Code evals system structure with confidence

✨ The solution you've been looking for

Verified
Tested and verified by our team
21739 Stars

Provides context about the Roo Code evals system structure in this monorepo. Use when tasks mention "evals", "evaluation", "eval runs", "eval exercises", or working with the evals infrastructure. Helps distinguish between the evals execution system (packages/evals, apps/web-evals) and the public website evals display page (apps/web-roo-code/src/app/evals).

evals debugging monorepo infrastructure evaluation system-architecture codebase-navigation developer-tools
Repository

See It In Action

Interactive preview & real-world examples

Live Demo
Skill Demo Animation

AI Conversation Simulator

See how users interact with this skill

User Prompt

The eval system is failing to run exercises properly. I need to debug the execution infrastructure.

Skill Processing

Analyzing request...

Agent Response

Clear guidance on whether to look in packages/evals (execution system), apps/web-evals (management UI), or apps/web-roo-code (public display)

Quick Start (3 Steps)

Get up and running in minutes

1

Install

claude-code skill install evals-context

claude-code skill install evals-context
2

Config

3

First Trigger

@evals-context help

Commands

CommandDescriptionRequired Args
@evals-context debug-eval-execution-issuesIdentify the correct location when eval runs fail or need modificationsNone
@evals-context add-new-eval-exerciseUnderstand the complete flow for adding new coding evaluation exercisesNone
@evals-context modify-public-evals-displayUpdate how eval results are displayed on the public websiteNone

Typical Use Cases

Debug Eval Execution Issues

Identify the correct location when eval runs fail or need modifications

Add New Eval Exercise

Understand the complete flow for adding new coding evaluation exercises

Modify Public Evals Display

Update how eval results are displayed on the public website

Overview

Evals Codebase Context

When to Use This Skill

Use this skill when the task involves:

  • Modifying or debugging the evals execution infrastructure
  • Adding new eval exercises or languages
  • Working with the evals web interface (apps/web-evals)
  • Modifying the public evals display page on roocode.com
  • Understanding where evals code lives in this monorepo

When NOT to Use This Skill

Do NOT use this skill when:

  • Working on unrelated parts of the codebase (extension, webview-ui, etc.)
  • The task is purely about the VS Code extension’s core functionality
  • Working on the main website pages that don’t involve evals

Key Disambiguation: Two “Evals” Locations

This monorepo has two distinct evals-related locations that can cause confusion:

ComponentPathPurpose
Evals Execution Systempackages/evals/Core eval infrastructure: CLI, DB schema, Docker configs
Evals Management UIapps/web-evals/Next.js app for creating/monitoring eval runs (localhost:3446)
Website Evals Pageapps/web-roo-code/src/app/evals/Public roocode.com page displaying eval results
External Exercises RepoRoo-Code-EvalsActual coding exercises (NOT in this monorepo)

Directory Structure Reference

packages/evals/ - Core Evals Package

packages/evals/
├── ARCHITECTURE.md          # Detailed architecture documentation
├── ADDING-EVALS.md          # Guide for adding new exercises/languages
├── README.md                # Setup and running instructions
├── docker-compose.yml       # Container orchestration
├── Dockerfile.runner        # Runner container definition
├── Dockerfile.web           # Web app container
├── drizzle.config.ts        # Database ORM config
├── src/
│   ├── index.ts             # Package exports
│   ├── cli/                 # CLI commands for running evals
│   │   ├── runEvals.ts      # Orchestrates complete eval runs
│   │   ├── runTask.ts       # Executes individual tasks in containers
│   │   ├── runUnitTest.ts   # Validates task completion via tests
│   │   └── redis.ts         # Redis pub/sub integration
│   ├── db/
│   │   ├── schema.ts        # Database schema (runs, tasks)
│   │   ├── queries/         # Database query functions
│   │   └── migrations/      # SQL migrations
│   └── exercises/
│       └── index.ts         # Exercise loading utilities
└── scripts/
    └── setup.sh             # Local macOS setup script

apps/web-evals/ - Evals Management Web App

apps/web-evals/
├── src/
│   ├── app/
│   │   ├── page.tsx         # Home page (runs list)
│   │   ├── runs/
│   │   │   ├── new/         # Create new eval run
│   │   │   └── [id]/        # View specific run status
│   │   └── api/runs/        # SSE streaming endpoint
│   ├── actions/             # Server actions
│   │   ├── runs.ts          # Run CRUD operations
│   │   ├── tasks.ts         # Task queries
│   │   ├── exercises.ts     # Exercise listing
│   │   └── heartbeat.ts     # Controller health checks
│   ├── hooks/               # React hooks (SSE, models, etc.)
│   └── lib/                 # Utilities and schemas

apps/web-roo-code/src/app/evals/ - Public Website Evals Page

apps/web-roo-code/src/app/evals/
├── page.tsx      # Fetches and displays public eval results
├── evals.tsx     # Main evals display component
├── plot.tsx      # Visualization component
└── types.ts      # EvalRun type (extends packages/evals types)

This page displays eval results on the public roocode.com website. It imports types from @roo-code/evals but does NOT run evals.

Architecture Overview

The evals system is a distributed evaluation platform that runs AI coding tasks in isolated VS Code environments:

┌─────────────────────────────────────────────────────────────┐
│  Web App (apps/web-evals)  ──────────────────────────────── │
│        │                                                    │
│        ▼                                                    │
│  PostgreSQL ◄────► Controller Container                     │
│        │               │                                    │
│        ▼               ▼                                    │
│     Redis ◄───► Runner Containers (1-25 parallel)           │
└─────────────────────────────────────────────────────────────┘

Key components:

  • Controller: Orchestrates eval runs, spawns runners, manages task queue (p-queue)
  • Runner: Isolated Docker container with VS Code + Roo Code extension + language runtimes
  • Redis: Pub/sub for real-time events (NOT task queuing)
  • PostgreSQL: Stores runs, tasks, metrics

Common Tasks Quick Reference

Adding a New Eval Exercise

  1. Add exercise to Roo-Code-Evals repo (external)
  2. See packages/evals/ADDING-EVALS.md for structure

Modifying Eval CLI Behavior

Edit files in packages/evals/src/cli/:

Modifying the Evals Web Interface

Edit files in apps/web-evals/src/:

Modifying the Public Evals Display Page

Edit files in apps/web-roo-code/src/app/evals/:

Database Schema Changes

  1. Edit packages/evals/src/db/schema.ts
  2. Generate migration: cd packages/evals && pnpm drizzle-kit generate
  3. Apply migration: pnpm drizzle-kit migrate

Running Evals Locally

1# From repo root
2pnpm evals
3
4# Opens web UI at http://localhost:3446

Ports (defaults):

  • PostgreSQL: 5433
  • Redis: 6380
  • Web: 3446

Testing

1# packages/evals tests
2cd packages/evals && npx vitest run
3
4# apps/web-evals tests
5cd apps/web-evals && npx vitest run

Key Types/Exports from @roo-code/evals

The package exports are defined in packages/evals/src/index.ts:

  • Database queries: getRuns, getTasks, getTaskMetrics, etc.
  • Schema types: Run, Task, TaskMetrics
  • Used by both apps/web-evals and apps/web-roo-code

What Users Are Saying

Real feedback from the community

Environment Matrix

Dependencies

PostgreSQL 5433 (default port)
Redis 6380 (default port)
Docker and Docker Compose
Node.js with pnpm package manager

Framework Support

Next.js ✓ (apps/web-evals) Drizzle ORM ✓ (database) React ✓ (web interfaces) p-queue ✓ (task orchestration)

Context Window

Token Usage ~3K-8K tokens for comprehensive codebase navigation

Security & Privacy

Information

Author
RooCodeInc
Updated
2026-01-30
Category
debugging