Cocoindex

Build real-time AI data transformation pipelines with incremental processing

✨ The solution you've been looking for

Verified
Tested and verified by our team
16036 Stars

Comprehensive toolkit for developing with the CocoIndex library. Use when users need to create data transformation pipelines (flows), write custom functions, or operate flows via CLI or API. Covers building ETL workflows for AI data processing, including embedding documents into vector databases, building knowledge graphs, creating search indexes, or processing data streams with incremental updates.

etl data-transformation vector-database embeddings knowledge-graph ai-pipeline incremental-processing real-time
Repository

See It In Action

Interactive preview & real-world examples

Live Demo
Skill Demo Animation

AI Conversation Simulator

See how users interact with this skill

User Prompt

Build a vector search index for my PDF documents stored in S3, chunk them, and embed using OpenAI for Qdrant

Skill Processing

Analyzing request...

Agent Response

Complete flow that processes PDFs, creates text chunks, generates embeddings, and exports to Qdrant with vector similarity search

Quick Start (3 Steps)

Get up and running in minutes

1

Install

claude-code skill install cocoindex

claude-code skill install cocoindex
2

Config

3

First Trigger

@cocoindex help

Commands

CommandDescriptionRequired Args
@cocoindex document-vector-search-indexCreate embeddings for documents and store in vector database for semantic searchNone
@cocoindex code-repository-embeddingIndex code files with language detection and embeddings for code searchNone
@cocoindex knowledge-graph-from-documentsExtract structured entities and relationships using LLMs and build knowledge graphsNone

Typical Use Cases

Document Vector Search Index

Create embeddings for documents and store in vector database for semantic search

Code Repository Embedding

Index code files with language detection and embeddings for code search

Knowledge Graph from Documents

Extract structured entities and relationships using LLMs and build knowledge graphs

Overview

CocoIndex

Overview

CocoIndex is an ultra-performant real-time data transformation framework for AI with incremental processing. This skill enables building indexing flows that extract data from sources, apply transformations (chunking, embedding, LLM extraction), and export to targets (vector databases, graph databases, relational databases).

Core capabilities:

  1. Write indexing flows - Define ETL pipelines using Python
  2. Create custom functions - Build reusable transformation logic
  3. Operate flows - Run and manage flows using CLI or Python API

Key features:

  • Incremental processing (only processes changed data)
  • Live updates (continuously sync source changes to targets)
  • Built-in functions (text chunking, embeddings, LLM extraction)
  • Multiple data sources (local files, S3, Azure Blob, Google Drive, Postgres)
  • Multiple targets (Postgres+pgvector, Qdrant, LanceDB, Neo4j, Kuzu)

For detailed documentation: https://cocoindex.io/docs/ Search documentation: https://cocoindex.io/docs/search?q=url%20encoded%20keyword

When to Use This Skill

Use when users request:

  • “Build a vector search index for my documents”
  • “Create an embedding pipeline for code/PDFs/images”
  • “Extract structured information using LLMs”
  • “Build a knowledge graph from documents”
  • “Set up live document indexing”
  • “Create custom transformation functions”
  • “Run/update my CocoIndex flow”

Flow Writing Workflow

Step 1: Understand Requirements

Ask clarifying questions to understand:

Data source:

  • Where is the data? (local files, S3, database, etc.)
  • What file types? (text, PDF, JSON, images, code, etc.)
  • How often does it change? (one-time, periodic, continuous)

Transformations:

  • What processing is needed? (chunking, embedding, extraction, etc.)
  • Which embedding model? (SentenceTransformer, OpenAI, custom)
  • Any custom logic? (filtering, parsing, enrichment)

Target:

  • Where should results go? (Postgres, Qdrant, Neo4j, etc.)
  • What schema? (fields, primary keys, indexes)
  • Vector search needed? (specify similarity metric)

Step 2: Set Up Dependencies

Guide user to add CocoIndex with appropriate extras to their project based on their needs:

Required dependency:

  • cocoindex - Core functionality, CLI, and most built-in functions

Optional extras (add as needed):

  • cocoindex[embeddings] - For SentenceTransformer embeddings (when using SentenceTransformerEmbed)
  • cocoindex[colpali] - For ColPali image/document embeddings (when using ColPaliEmbedImage or ColPaliEmbedQuery)
  • cocoindex[lancedb] - For LanceDB target (when exporting to LanceDB)
  • cocoindex[embeddings,lancedb] - Multiple extras can be combined

What’s included:

  • Base package: Core functionality, CLI, most built-in functions, Postgres/Qdrant/Neo4j/Kuzu targets
  • embeddings extra: SentenceTransformers library for local embedding models
  • colpali extra: ColPali engine for multimodal document/image embeddings
  • lancedb extra: LanceDB client library for LanceDB vector database support

Users can install using their preferred package manager (pip, uv, poetry, etc.) or add to pyproject.toml.

For installation details: https://cocoindex.io/docs/getting_started/installation

Step 3: Set Up Environment

Check existing environment first:

  1. Check if COCOINDEX_DATABASE_URL exists in environment variables

    • If not found, use default: postgres://cocoindex:cocoindex@localhost/cocoindex
  2. For flows requiring LLM APIs (embeddings, extraction):

    • Ask user which LLM provider they want to use:
      • OpenAI - Both generation and embeddings
      • Anthropic - Generation only
      • Gemini - Both generation and embeddings
      • Voyage - Embeddings only
      • Ollama - Local models (generation and embeddings)
    • Check if the corresponding API key exists in environment variables
    • If not found, ask user to provide the API key value
    • Never create simplified examples without LLM - always get the proper API key and use the real LLM functions

Guide user to create .env file:

1# Database connection (required - internal storage)
2COCOINDEX_DATABASE_URL=postgres://cocoindex:cocoindex@localhost/cocoindex
3
4# LLM API keys (add the ones you need)
5OPENAI_API_KEY=sk-...          # For OpenAI (generation + embeddings)
6ANTHROPIC_API_KEY=sk-ant-...   # For Anthropic (generation only)
7GOOGLE_API_KEY=...             # For Gemini (generation + embeddings)
8VOYAGE_API_KEY=pa-...          # For Voyage (embeddings only)
9# Ollama requires no API key (local)

For more LLM options: https://cocoindex.io/docs/ai/llm

Create basic project structure:

 1# main.py
 2from dotenv import load_dotenv
 3import cocoindex
 4
 5@cocoindex.flow_def(name="FlowName")
 6def my_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
 7    # Flow definition here
 8    pass
 9
10if __name__ == "__main__":
11    load_dotenv()
12    cocoindex.init()
13    my_flow.update()

Step 4: Write the Flow

Follow this structure:

 1@cocoindex.flow_def(name="DescriptiveName")
 2def flow_name(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
 3    # 1. Import source data
 4    data_scope["source_name"] = flow_builder.add_source(
 5        cocoindex.sources.SourceType(...)
 6    )
 7
 8    # 2. Create collector(s) for outputs
 9    collector = data_scope.add_collector()
10
11    # 3. Transform data (iterate through rows)
12    with data_scope["source_name"].row() as item:
13        # Apply transformations
14        item["new_field"] = item["existing_field"].transform(
15            cocoindex.functions.FunctionName(...)
16        )
17
18        ...
19
20        # Nested iteration (e.g., chunks within documents)
21        with item["nested_table"].row() as nested_item:
22            # More transformations
23            nested_item["embedding"] = nested_item["text"].transform(...)
24
25            # Collect data for export
26            collector.collect(
27                field1=nested_item["field1"],
28                field2=item["field2"],
29                generated_id=cocoindex.GeneratedField.UUID
30            )
31
32    # 4. Export to target
33    collector.export(
34        "target_name",
35        cocoindex.targets.TargetType(...),
36        primary_key_fields=["field1"],
37        vector_indexes=[...]  # If needed
38    )

Key principles:

  • Each source creates a field in the top-level data scope
  • Use .row() to iterate through table data
  • CRITICAL: Always assign transformed data to row fields - Use item["new_field"] = item["existing_field"].transform(...), NOT local variables like new_field = item["existing_field"].transform(...)
  • Transformations create new fields without mutating existing data
  • Collectors gather data from any scope level
  • Export must happen at top level (not within row iterations)

Common mistakes to avoid:

Wrong: Using local variables for transformations

1with data_scope["files"].row() as file:
2    summary = file["content"].transform(...)  # ❌ Local variable
3    summaries_collector.collect(filename=file["filename"], summary=summary)

Correct: Assigning to row fields

1with data_scope["files"].row() as file:
2    file["summary"] = file["content"].transform(...)  # ✅ Field assignment
3    summaries_collector.collect(filename=file["filename"], summary=file["summary"])

Wrong: Creating unnecessary dataclasses to mirror flow fields

1from dataclasses import dataclass
2
3@dataclass
4class FileSummary:  # ❌ Unnecessary - CocoIndex manages fields automatically
5    filename: str
6    summary: str
7    embedding: list[float]
8
9# This dataclass is never used in the flow!

Step 5: Design the Flow Solution

IMPORTANT: The patterns listed below are common starting points, but you cannot exhaustively enumerate all possible scenarios. When user requirements don’t match existing patterns:

  1. Combine elements from multiple patterns - Mix and match sources, transformations, and targets creatively
  2. Review additional examples - See https://github.com/cocoindex-io/cocoindex?tab=readme-ov-file#-examples-and-demo for diverse real-world use cases (face recognition, multimodal search, product recommendations, patient form extraction, etc.)
  3. Think from first principles - Use the core APIs (sources, transforms, collectors, exports) and apply common sense to solve novel problems
  4. Be creative - CocoIndex is flexible; unique combinations of components can solve unique problems

Common starting patterns (use references for detailed examples):

For text embedding: Load references/flow_patterns.md and refer to “Pattern 1: Simple Text Embedding”

For code embedding: Load references/flow_patterns.md and refer to “Pattern 2: Code Embedding with Language Detection”

For LLM extraction + knowledge graph: Load references/flow_patterns.md and refer to “Pattern 3: LLM-based Extraction to Knowledge Graph”

For live updates: Load references/flow_patterns.md and refer to “Pattern 4: Live Updates with Refresh Interval”

For custom functions: Load references/flow_patterns.md and refer to “Pattern 5: Custom Transform Function”

For reusable query logic: Load references/flow_patterns.md and refer to “Pattern 6: Transform Flow for Reusable Logic”

For concurrency control: Load references/flow_patterns.md and refer to “Pattern 7: Concurrency Control”

Example of pattern composition:

If a user asks to “index images from S3, generate captions with a vision API, and store in Qdrant”, combine:

  • AmazonS3 source (from S3 examples)
  • Custom function for vision API calls (from custom functions pattern)
  • EmbedText to embed the captions (from embedding patterns)
  • Qdrant target (from target examples)

No single pattern covers this exact scenario, but the building blocks are composable.

Step 6: Test and Run

Guide user through testing:

1# 1. Run with setup
2cocoindex update --setup -f main   # -f force setup without confirmation prompts
3
4
5# 2. Start a server and redirect users to CocoInsight
6cocoindex server -ci main
7# Then open CocoInsight at https://cocoindex.io/cocoinsight

Data Types

CocoIndex has a type system independent of programming languages. All data types are determined at flow definition time, making schemas clear and predictable.

IMPORTANT: When to define types:

  • Custom functions: Type annotations are required for return values (these are the source of truth for type inference)
  • Flow fields: Type annotations are NOT needed - CocoIndex automatically infers types from sources, functions, and transformations
  • Dataclasses/Pydantic models: Only create them when they’re actually used (as function parameters/returns or ExtractByLlm output_type), NOT to mirror flow field schemas

Type annotation requirements:

  • Return values of custom functions: Must use specific type annotations - these are the source of truth for type inference
  • Arguments of custom functions: Relaxed - can use Any, dict[str, Any], or omit annotations; engine already knows the types
  • Flow definitions: No explicit type annotations needed - CocoIndex automatically infers types from sources and functions

Why specific return types matter: Custom function return types let CocoIndex infer field types throughout the flow without processing real data. This enables creating proper target schemas (e.g., vector indexes with fixed dimensions).

Common type categories:

  1. Primitive types: str, int, float, bool, bytes, datetime.date, datetime.datetime, uuid.UUID

  2. Vector types (embeddings): Specify dimension in return type if you plan to export as vectors to targets, as most targets require a fixed vector dimension

    • cocoindex.Vector[cocoindex.Float32, typing.Literal[768]] - 768-dim float32 vector (recommended)
    • list[float] without dimension also works
  3. Struct types: Dataclass, NamedTuple, or Pydantic model

    • Return type: Must use specific class (e.g., Person)
    • Argument: Can use dict[str, Any] or Any
  4. Table types:

    • KTable (keyed): dict[K, V] where K = key type (primitive or frozen struct), V = Struct type
    • LTable (ordered): list[R] where R = Struct type
    • Arguments: Can use dict[Any, Any] or list[Any]
  5. Json type: cocoindex.Json for unstructured/dynamic data

  6. Optional types: T | None for nullable values

Examples:

 1from dataclasses import dataclass
 2from typing import Literal
 3import cocoindex
 4
 5@dataclass
 6class Person:
 7    name: str
 8    age: int
 9
10# ✅ Vector with dimension (recommended for vector search)
11@cocoindex.op.function(behavior_version=1)
12def embed_text(text: str) -> cocoindex.Vector[cocoindex.Float32, Literal[768]]:
13    """Generate 768-dim embedding - dimension needed for vector index."""
14    # ... embedding logic ...
15    return embedding  # numpy array or list of 768 floats
16
17# ✅ Struct return type, relaxed argument
18@cocoindex.op.function(behavior_version=1)
19def process_person(person: dict[str, Any]) -> Person:
20    """Argument can be dict[str, Any], return must be specific Struct."""
21    return Person(name=person["name"], age=person["age"])
22
23# ✅ LTable return type
24@cocoindex.op.function(behavior_version=1)
25def filter_people(people: list[Any]) -> list[Person]:
26    """Return type specifies list of specific Struct."""
27    return [p for p in people if p.age >= 18]
28
29# ❌ Wrong: dict[str, str] is not a valid specific CocoIndex type
30# @cocoindex.op.function(...)
31# def bad_example(person: Person) -> dict[str, str]:
32#     return {"name": person.name}

For comprehensive data types documentation: https://cocoindex.io/docs/core/data_types

Custom Functions

When users need custom transformation logic, create custom functions.

Decision: Standalone vs Spec+Executor

Use standalone function when:

  • Simple transformation
  • No configuration needed
  • No setup/initialization required

Use spec+executor when:

  • Needs configuration (model names, API endpoints, parameters)
  • Requires setup (loading models, establishing connections)
  • Complex multi-step processing

Creating Standalone Functions

 1@cocoindex.op.function(behavior_version=1)
 2def my_function(input_arg: str, optional_arg: int | None = None) -> dict:
 3    """
 4    Function description.
 5
 6    Args:
 7        input_arg: Description
 8        optional_arg: Optional description
 9    """
10    # Transformation logic
11    return {"result": f"processed-{input_arg}"}

Requirements:

  • Decorator: @cocoindex.op.function()
  • Type annotations on all arguments and return value
  • Optional parameters: cache=True for expensive ops, behavior_version (required with cache)

Creating Spec+Executor Functions

 1# 1. Define configuration spec
 2class MyFunction(cocoindex.op.FunctionSpec):
 3    """Configuration for MyFunction."""
 4    model_name: str
 5    threshold: float = 0.5
 6
 7# 2. Define executor
 8@cocoindex.op.executor_class(cache=True, behavior_version=1)
 9class MyFunctionExecutor:
10    spec: MyFunction  # Required: link to spec
11    model = None      # Instance variables for state
12
13    def prepare(self) -> None:
14        """Optional: run once before execution."""
15        # Load model, setup connections, etc.
16        self.model = load_model(self.spec.model_name)
17
18    def __call__(self, text: str) -> dict:
19        """Required: execute for each data row."""
20        # Use self.spec for configuration
21        # Use self.model for loaded resources
22        result = self.model.process(text)
23        return {"result": result}

When to enable cache:

  • LLM API calls
  • Model inference
  • External API calls
  • Computationally expensive operations

Important: Increment behavior_version when function logic changes to invalidate cache.

For detailed examples and patterns, load references/custom_functions.md.

For more on custom functions: https://cocoindex.io/docs/custom_ops/custom_functions

Operating Flows

CLI Operations

Setup flow (create resources):

1cocoindex setup main

One-time update:

1cocoindex update main
2
3# With auto-setup
4cocoindex update --setup main
5
6# Force reset everything before setup and update
7cocoindex update --reset main

Live update (continuous monitoring):

1cocoindex update main.py -L
2
3# Requires refresh_interval on source or source-specific change capture

Drop flow (remove all resources):

1cocoindex drop main.py

Inspect flow:

1cocoindex show main.py:FlowName

Test without side effects:

1cocoindex evaluate main.py:FlowName --output-dir ./test_output

For complete CLI reference, load references/cli_operations.md.

For CLI documentation: https://cocoindex.io/docs/core/cli

API Operations

Basic setup:

 1from dotenv import load_dotenv
 2import cocoindex
 3
 4load_dotenv()
 5cocoindex.init()
 6
 7@cocoindex.flow_def(name="MyFlow")
 8def my_flow(flow_builder, data_scope):
 9    # ... flow definition ...
10    pass

One-time update:

1stats = my_flow.update()
2print(f"Processed {stats.total_rows} rows")
3
4# Async
5stats = await my_flow.update_async()

Live update:

 1# As context manager
 2with cocoindex.FlowLiveUpdater(my_flow) as updater:
 3    # Updater runs in background
 4    # Your application logic here
 5    pass
 6
 7# Manual control
 8updater = cocoindex.FlowLiveUpdater(
 9    my_flow,
10    cocoindex.FlowLiveUpdaterOptions(
11        live_mode=True,
12        print_stats=True
13    )
14)
15updater.start()
16# ... application logic ...
17updater.wait()

Setup/drop:

1my_flow.setup(report_to_stdout=True)
2my_flow.drop(report_to_stdout=True)
3cocoindex.setup_all_flows()
4cocoindex.drop_all_flows()

Query with transform flows:

 1@cocoindex.transform_flow()
 2def text_to_embedding(text: cocoindex.DataSlice[str]) -> cocoindex.DataSlice[list[float]]:
 3    return text.transform(
 4        cocoindex.functions.SentenceTransformerEmbed(model="...")
 5    )
 6
 7# Use in flow for indexing
 8doc["embedding"] = text_to_embedding(doc["content"])
 9
10# Use for querying
11query_embedding = text_to_embedding.eval("search query")

For complete API reference and patterns, load references/api_operations.md.

For API documentation: https://cocoindex.io/docs/core/flow_methods

Built-in Functions

Text Processing

SplitRecursively - Chunk text intelligently

1doc["chunks"] = doc["content"].transform(
2    cocoindex.functions.SplitRecursively(),
3    language="markdown",  # or "python", "javascript", etc.
4    chunk_size=2000,
5    chunk_overlap=500
6)

ParseJson - Parse JSON strings

1data = json_string.transform(cocoindex.functions.ParseJson())

DetectProgrammingLanguage - Detect language from filename

1file["language"] = file["filename"].transform(
2    cocoindex.functions.DetectProgrammingLanguage()
3)

Embeddings

SentenceTransformerEmbed - Local embedding model

1# Requires: cocoindex[embeddings]
2chunk["embedding"] = chunk["text"].transform(
3    cocoindex.functions.SentenceTransformerEmbed(
4        model="sentence-transformers/all-MiniLM-L6-v2"
5    )
6)

EmbedText - LLM API embeddings

This is the recommended way to generate embeddings using LLM APIs (OpenAI, Voyage, etc.).

1chunk["embedding"] = chunk["text"].transform(
2    cocoindex.functions.EmbedText(
3        api_type=cocoindex.LlmApiType.OPENAI,
4        model="text-embedding-3-small",
5    )
6)

ColPaliEmbedImage - Multimodal image embeddings

1# Requires: cocoindex[colpali]
2image["embedding"] = image["img_bytes"].transform(
3    cocoindex.functions.ColPaliEmbedImage(model="vidore/colpali-v1.2")
4)

LLM Extraction

ExtractByLlm - Extract structured data with LLM

This is the recommended way to use LLMs for extraction and summarization tasks. It supports both structured outputs (dataclasses, Pydantic models) and simple text outputs (str).

 1import dataclasses
 2
 3# For structured extraction
 4@dataclasses.dataclass
 5class ProductInfo:
 6    name: str
 7    price: float
 8    category: str
 9
10item["product_info"] = item["text"].transform(
11    cocoindex.functions.ExtractByLlm(
12        llm_spec=cocoindex.LlmSpec(
13            api_type=cocoindex.LlmApiType.OPENAI,
14            model="gpt-4o-mini"
15        ),
16        output_type=ProductInfo,
17        instruction="Extract product information"
18    )
19)
20
21# For text summarization/generation
22file["summary"] = file["content"].transform(
23    cocoindex.functions.ExtractByLlm(
24        llm_spec=cocoindex.LlmSpec(
25            api_type=cocoindex.LlmApiType.OPENAI,
26            model="gpt-4o-mini"
27        ),
28        output_type=str,
29        instruction="Summarize this document in one paragraph"
30    )
31)

Common Sources and Targets

Browse all sources: https://cocoindex.io/docs/sources/ Browse all targets: https://cocoindex.io/docs/targets/

Sources

LocalFile:

1cocoindex.sources.LocalFile(
2    path="documents",
3    included_patterns=["*.md", "*.txt"],
4    excluded_patterns=["**/.*", "node_modules"]
5)

AmazonS3:

1cocoindex.sources.AmazonS3(
2    bucket="my-bucket",
3    prefix="documents/",
4    aws_access_key_id=cocoindex.add_transient_auth_entry("..."),
5    aws_secret_access_key=cocoindex.add_transient_auth_entry("...")
6)

Postgres:

1cocoindex.sources.Postgres(
2    connection=cocoindex.add_auth_entry("conn", cocoindex.sources.PostgresConnection(...)),
3    query="SELECT id, content FROM documents"
4)

Targets

Postgres (with vector support):

 1collector.export(
 2    "target_name",
 3    cocoindex.targets.Postgres(),
 4    primary_key_fields=["id"],
 5    vector_indexes=[
 6        cocoindex.VectorIndexDef(
 7            field_name="embedding",
 8            metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY
 9        )
10    ]
11)

Qdrant:

1collector.export(
2    "target_name",
3    cocoindex.targets.Qdrant(collection_name="my_collection"),
4    primary_key_fields=["id"]
5)

LanceDB:

1# Requires: cocoindex[lancedb]
2collector.export(
3    "target_name",
4    cocoindex.targets.LanceDB(uri="lancedb_data", table_name="my_table"),
5    primary_key_fields=["id"]
6)

Neo4j (nodes):

1collector.export(
2    "nodes",
3    cocoindex.targets.Neo4j(
4        connection=neo4j_conn,
5        mapping=cocoindex.targets.Nodes(label="Entity")
6    ),
7    primary_key_fields=["id"]
8)

Neo4j (relationships):

 1collector.export(
 2    "relationships",
 3    cocoindex.targets.Neo4j(
 4        connection=neo4j_conn,
 5        mapping=cocoindex.targets.Relationships(
 6            rel_type="RELATES_TO",
 7            source=cocoindex.targets.NodeFromFields(
 8                label="Entity",
 9                fields=[cocoindex.targets.TargetFieldMapping(source="source_id", target="id")]
10            ),
11            target=cocoindex.targets.NodeFromFields(
12                label="Entity",
13                fields=[cocoindex.targets.TargetFieldMapping(source="target_id", target="id")]
14            )
15        )
16    ),
17    primary_key_fields=["id"]
18)

Common Issues and Solutions

“Flow not found”

  • Check APP_TARGET format: cocoindex show main.py
  • Use --app-dir if not in project root
  • Verify flow name matches decorator

“Database connection failed”

  • Check .env has COCOINDEX_DATABASE_URL
  • Test connection: psql $COCOINDEX_DATABASE_URL
  • Use --env-file to specify custom location

“Schema mismatch”

  • Re-run setup: cocoindex setup main.py
  • Drop and recreate: cocoindex drop main.py && cocoindex setup main.py

“Live update exits immediately”

  • Add refresh_interval to source
  • Or use source-specific change capture (Postgres notifications, S3 events)

“Out of memory”

  • Add concurrency limits on sources: max_inflight_rows, max_inflight_bytes
  • Set global limits in .env: COCOINDEX_SOURCE_MAX_INFLIGHT_ROWS

Reference Documentation

This skill includes comprehensive reference documentation for common patterns and operations:

  • references/flow_patterns.md - Complete examples of common flow patterns (text embedding, code embedding, knowledge graphs, live updates, concurrency control, etc.)
  • references/custom_functions.md - Detailed guide for creating custom functions with examples (standalone functions, spec+executor pattern, LLM calls, external APIs, caching)
  • references/cli_operations.md - Complete CLI reference with all commands, options, and workflows
  • references/api_operations.md - Python API reference with examples for programmatic flow control, live updates, queries, and application integration patterns

Load these references when users need:

  • Detailed examples of specific patterns
  • Complete API documentation
  • Advanced usage scenarios
  • Troubleshooting guidance

For comprehensive documentation: https://cocoindex.io/docs/ Search specific topics: https://cocoindex.io/docs/search?q=url%20encoded%20keyword

What Users Are Saying

Real feedback from the community

Environment Matrix

Dependencies

cocoindex (core functionality)
cocoindex[embeddings] (for SentenceTransformer models)
cocoindex[colpali] (for multimodal image/document embeddings)
cocoindex[lancedb] (for LanceDB vector database)
PostgreSQL database (for internal storage)
Python 3.8+

Framework Support

PostgreSQL ✓ (recommended for internal storage) Qdrant ✓ (vector database) LanceDB ✓ (vector database) Neo4j ✓ (graph database) Kuzu ✓ (graph database)

Context Window

Token Usage Varies based on document size and LLM operations (~2K-8K tokens typical)

Security & Privacy

Information

Author
davila7
Updated
2026-01-30
Category
productivity-tools