Local-first code search that finds intent, not strings Open Source

Search code by meaning.
Rebuild legacy systems with confidence.

ogrep is semantic code search for AI agents and humans: it chunks your repository, embeds those chunks, stores them in a single local SQLite index, and retrieves the most relevant code for any question. With v0.12.0, the MCP server auto-refreshes new files on every query and supports optional background indexing to keep your index fresh during long coding sessions. Zero-config, always up-to-date.

Token savings: ogrep uses embeddings for indexing + retrieval. It does not require a chat model. Chat/completion tokens are only spent if you choose to have an LLM interpret the retrieved snippets.

Quick start CLI or Claude Code Skill
# Install with AST support
pip install "ogrep[ast]"

# Index (AST chunking is now default)
export OPENAI_API_KEY="sk-..."
ogrep index .

# Ask questions (JSON output is default)
ogrep query "where is authentication handled?" -n 12
ogrep query "how are API errors mapped?" -M hybrid

What's New in v0.12.0

MCP queries now auto-detect new files, the server can keep indexes fresh in the background, and the ogrep_index tool is simplified to incremental-only.

Auto-refresh fix Background indexing Simplified MCP MCP-native zero-config

Auto-Refresh Catches New Files

MCP queries with refresh=True (the default) now run a full incremental index, picking up new files that were never indexed — not just modified or deleted ones. No more stale results after adding code.

ogrep_query("auth") # auto-indexes new files

Background Refresh

Set OGREP_REFRESH_INTERVAL=600 and the MCP server re-indexes all known repos every 10 minutes in the background. Keeps your index fresh during long coding sessions without any manual intervention.

OGREP_REFRESH_INTERVAL=600 # seconds

Simplified ogrep_index

The MCP ogrep_index tool is now purely incremental: creates if missing, updates changed files, skips unchanged. The destructive “nuke and rebuild” stays CLI-only (ogrep reindex).

ogrep_index() # always safe, always incremental

MCP-Native & Token-Efficient

5 native tools, persistent process, warm caches. Structured dicts (200-500 tokens) instead of raw CLI output (2,000+). Works with Claude Code, Gemini, or any MCP client.

# ~85% fewer tokens than grep

AST + Voyage AI

AST-aware chunking (default) + Voyage AI code-optimized embeddings. Best-in-class search quality (MRR 0.717).

ogrep index . -m voyage-code-3

Benchmark-Driven Defaults

JSON output always. No reranking for strong embeddings. FlashRank only for local models. All based on real MRR benchmarks.

Architecture: MCP = Data Layer, Agent = Orchestrator, Skill = Router

Three layers work together. The skill tells Claude when to search. The agent decides how (summarize→narrow→drill). MCP tools execute the actual queries against SQLite.

Skill (when to use ogrep)
  → Agent (summarize → narrow → drill)
       → ogrep_query(summarize=true)    # file-level overview
       → ogrep_query(glob="src/*.py")    # narrow to files
       → ogrep_chunk(ref, context=1)     # expand context

Direct use (simple queries):
  Claude → ogrep_query("where is auth?")
  Claude → ogrep_status()

Key Finding: Reranking Hurts Strong Embeddings

Our benchmarks show that reranking degrades results for Voyage and OpenAI embeddings (MRR drops 12-21%). Only use --rerank with local embeddings like Nomic. High-quality embeddings are already well-calibrated.

Open Source on GitHub

ogrep is MIT licensed. Star the repo, report issues, contribute, or fork it for your own use.

View on GitHub

AST-Aware Chunking

Now the default. Respects function, class, and method boundaries for better search accuracy.

Default in v0.8+
Without AST (line-based) use --no-ast to disable
Lines 55-115 (one chunk):
  - End of ClassA
  - Start of ClassB  ← Semantic mixing!
  - Beginning of method foo()
With AST chunking (default) coherent boundaries
Chunk 1: ClassA (complete)
Chunk 2: ClassB.foo() method
Chunk 3: ClassB.bar() method

Supported Languages

Python JavaScript TypeScript Go Rust C/C++ Java Ruby Bash C#

Install with pip install "ogrep[ast]" for core languages (Python, JS, TS, Go, Rust) or pip install "ogrep[ast-all]" for all. Unsupported languages fall back to line-based chunking automatically.

Recommended Configurations

Based on benchmarks with 10 ground-truth queries. MRR = Mean Reciprocal Rank (higher is better).

πŸ₯‡ Best Quality: Voyage AI

MRR: 0.717 β€’ Code-optimized β€’ No reranking needed

pip install "ogrep[ast,voyage]"
export VOYAGE_API_KEY="pa-..."
ogrep index . -m voyage-code-3
ogrep query "your search"

Best for production systems where search quality matters most.

πŸ₯ˆ Best Value: OpenAI

MRR: 0.700 β€’ 3x cheaper β€’ No reranking needed

pip install "ogrep[ast]"
export OPENAI_API_KEY="sk-..."
ogrep index . -m small
ogrep query "your search"

Only 2.4% quality drop vs Voyage. Great balance of cost and quality.

πŸ₯‰ Offline/Free: Nomic + FlashRank

MRR: ~0.63 β€’ Free β€’ Reranking helps

pip install "ogrep[ast,rerank-light]"
export OGREP_BASE_URL=http://localhost:1234/v1
ogrep index . -m nomic
ogrep query "your search" --rerank

Zero API costs. Works offline. FlashRank compensates for weaker embeddings.

Configuration Quality Cost Rerank? Best For
Voyage + AST MRR 0.717 $0.06/M tokens ❌ Skip Production quality
OpenAI + AST MRR 0.700 $0.02/M tokens ❌ Skip Budget-conscious
Nomic + FlashRank MRR ~0.63 Free βœ… Use Offline/privacy

Reranking: When to Use It

Reranking helps weak embeddings but hurts strong ones. Use it selectively.

benchmarked

⚠️ Don't Rerank Strong Embeddings

Our benchmarks show reranking degrades Voyage and OpenAI results by 12-21%. These embeddings are already well-calibrated. Only use --rerank with local embeddings like Nomic.

Embedding Without Rerank With FlashRank Change Recommendation
Voyage MRR 0.717 MRR ~0.60 -16% ❌ Skip reranking
OpenAI MRR 0.700 MRR 0.550 -21% ❌ Skip reranking
Nomic (local) MRR 0.545 MRR 0.633 +16% βœ… Use reranking

FlashRank (Recommended)

Lightweight ONNX model (~4MB). Parallel-safe, no file locking. Best balance of speed and quality for local embeddings.

pip install "ogrep[rerank-light]"

Voyage Reranker

Voyage AI's rerank-2.5 model. 32K context, instruction-following. Cloud API, requires VOYAGE_API_KEY.

--rerank-model voyage

sentence-transformers

Heavy PyTorch models (90-300MB). bge-m3 is slow on CPU (~30s/query). Use only with GPU acceleration.

pip install "ogrep[rerank]"

Performance & Requirements

Reranking uses cross-encoder models that benefit from GPU acceleration. Here's what to expect.

Hardware Reranking Speed Notes
NVIDIA GPU (CUDA) ~10x faster Requires CUDA 12.x drivers. Best experience.
Apple Silicon (MPS) ~3-5x faster Automatic on macOS 12.3+. No setup needed.
CPU only Baseline Works but slower. Expect 2-5 seconds per query.

Model Downloads

The reranker model (bge-reranker-v2-m3) is ~300MB, downloaded on first use. AST parsers add ~5-15MB per language. All cached locally after first download.

Check Your Hardware

Run ogrep device to detect GPU/CPU capabilities and get recommendations. JSON output is the default; use --no-json for text.

ogrep device

CPU-Only Tips

Without GPU: use --rerank-top 20 for faster response, or skip --rerank entirely. Hybrid search without reranking is still very accurate and fast.

Graceful Degradation

If reranking fails (missing dependencies, GPU issues), ogrep automatically falls back to non-reranked results. The JSON output includes "rerank_skipped": true and a "suggestion" field explaining why and what to do. Your queries always return resultsβ€”never fail due to reranking issues.

Real-world scenarios

This is where ogrep shines: legacy code archaeology, behavior reconstruction, and fast intent-level navigation.

Legacy archaeology β†’ rebuild by outcome

Instead of rewriting in place and fighting old architecture, use ogrep to extract what the system does: flows, invariants, edge cases, and the real source-of-truth logic. Build a clean replacement that mimics the original behavior while enabling modern development.

Stop the grep β†’ paste β†’ token blackhole loop

Index once, then retrieve small, high-signal snippets. This reduces the need to shovel entire files into a chat model. With local embeddings (LM Studio), indexing is fully local and cost-free; with cloud embeddings, you still avoid repeated read-everything prompts.

Understand "meaning", not naming

Names lie in legacy repos. ogrep helps you find the intent behind code: where auth truly happens, how state transitions work, where validation is enforced, or which code actually sends emails.

How it works

A simple pipeline: chunk, embed, store, retrieve. The MCP server runs as a persistent process — Claude calls native tools directly, the agent orchestrates multi-step workflows. In v0.12.0, queries auto-refresh the index and background indexing keeps it current.

Index: SQLite Embeddings: Voyage AI, OpenAI, or LM Studio MCP: native tools
Step 1
Index
Scan repo, chunk by AST boundaries, embed, and store in local SQLite via ogrep_index.
Step 2
Dispatch
Claude auto-dispatches the ogrep-search agent, or calls MCP tools directly for simple queries.
Step 3
Summarize
Agent calls ogrep_query(summarize=true) for a cheap file-level overview.
Step 4
Drill
Narrows with ogrep_query(glob=...), expands with ogrep_chunk(context=1).
Step 5
Synthesize
Returns concise findings with file:line references. No raw JSON clutter.

Embedding Providers

Three options: Voyage AI (best quality), OpenAI (best value), or local (free/offline).

Provider Model Quality Cost Setup
Voyage AI voyage-code-3 MRR 0.717 $0.06/M tokens Set VOYAGE_API_KEY
OpenAI text-embedding-3-small MRR 0.700 $0.02/M tokens Set OPENAI_API_KEY
LM Studio (local) nomic-embed-text-v1.5 MRR ~0.63* Free Set OGREP_BASE_URL

*With FlashRank reranking. Without reranking: MRR ~0.55.

Voyage AI (best quality) code-optimized
pip install "ogrep[voyage]"
export VOYAGE_API_KEY="pa-..."
ogrep index . -m voyage-code-3
OpenAI (best value) 3x cheaper
export OPENAI_API_KEY="sk-..."
ogrep index . -m small
LM Studio (free/offline) privacy first
export OGREP_BASE_URL=http://localhost:1234/v1
ogrep index . -m nomic

Install

Use the CLI directly, or integrate as a Claude Code plugin with MCP server and agentic search.

CLI (recommended: pip)

Simple pip install. Add optional extras for AST and reranking.

Install via pip recommended
# Basic install
pip install ogrep

# With AST chunking (recommended)
pip install "ogrep[ast]"

# With reranking
pip install "ogrep[rerank]"

# Full install (AST + reranking)
pip install "ogrep[ast,rerank]"

Claude Code (MCP + agent)

Marketplace plugin with MCP server and dedicated search agent. Claude calls ogrep tools natively and dispatches the agent for deep exploration.

Claude Code marketplace
/plugin marketplace add gplv2/ogrep-marketplace
/plugin install ogrep@ogrep-marketplace

Get Started

Pick your embedding provider. Index once. Query forever. AST chunking is automatic.

Quick start copy/paste
# Install with AST support
pip install "ogrep[ast]"

# Choose your embedding provider:

## Option A: Voyage AI (best quality, MRR 0.717)
pip install "ogrep[voyage]"
export VOYAGE_API_KEY="pa-..."
ogrep index . -m voyage-code-3
ogrep query "where is authentication handled?"

## Option B: OpenAI (best value, MRR 0.700)
export OPENAI_API_KEY="sk-..."
ogrep index . -m small
ogrep query "where is authentication handled?"

## Option C: Local/Free (MRR ~0.63 with reranking)
pip install "ogrep[rerank-light]"
export OGREP_BASE_URL=http://localhost:1234/v1
ogrep index . -m nomic
ogrep query "where is authentication handled?" --rerank

# Check index status
ogrep status

JSON Output is Default

All commands output JSON by default. Use --no-json for human-readable text. The JSON includes confidence scoring, language detection, and search stats.

JSON response example structured for parsing
{
  "query": "database connections",
  "results": [
    {
      "rank": 1,
      "chunk_ref": "src/db.py:2",
      "score": 0.72,
      "confidence": {"level": "high", "relative_pct": 100.0},
      "language": "python",
      "text": "def connect_to_database(config):\n    ..."
    }
  ],
  "stats": {
    "search_mode": "hybrid",
    "reranked": false,
    "ast_mode": true
  }
}

Fast

Local SQLite index. Semantic queries ~200ms. Fulltext queries ~5ms.

Accurate

AST chunking preserves code structure. Hybrid search combines meaning + keywords.

Cost-aware

Index once, reuse forever. Embedding tokens only. No chat model required.

FAQ

Short answers to the questions people ask immediately.

What changed in v0.12.0?

MCP queries now auto-detect new files (not just modified/deleted), so the index stays current without manual reindexing. A new background refresh thread can periodically re-index all known repos (OGREP_REFRESH_INTERVAL=600). The ogrep_index MCP tool was simplified to incremental-only — the destructive rebuild stays CLI-only (ogrep reindex).

What changed in v0.10.0?

ogrep added an MCP server with 5 native tools: ogrep_query, ogrep_chunk, ogrep_index, ogrep_status, and ogrep_health. Claude calls these directly — no shell spawning, no CLI parsing. The server runs as a persistent process, keeping SQLite connections, reranking models, and tree-sitter parsers warm in memory.

Why MCP instead of just Bash?

Every ogrep query via Bash spawns a fresh Python process (~500ms startup), loads models from disk, opens a new SQLite connection, and returns raw text that Claude must parse. The MCP server starts once and keeps everything warm: sub-second reranking instead of 5-30s cold starts, structured data instead of text parsing, and ~85% fewer tokens per search. For simple queries, Claude can call MCP tools directly without even dispatching the agent.

Does ogrep use tokens?

ogrep uses embeddings for indexing and query retrieval. With local embeddings, there is no per-token bill. With cloud embeddings, indexing costs embedding tokens (and a small amount per query). ogrep does not require a chat model; chat/completion tokens are only spent if you choose to have an LLM interpret the retrieved snippets.

Is AST chunking automatic now?

Yes! As of v0.8, AST chunking is enabled by default when tree-sitter is installed. Install with pip install "ogrep[ast]" and just run ogrep index .. Use --no-ast to disable if needed.

Should I use --rerank?

Only with local embeddings like Nomic. Our benchmarks show reranking hurts Voyage and OpenAI results by 12-21%. If you're using local embeddings, --rerank with FlashRank helps. Otherwise, skip it.

Which embedding model should I use?

Voyage AI (voyage-code-3) for best quality. OpenAI (text-embedding-3-small) for best value. Nomic (local) for free/offline use. See the recommendations section for benchmarks.

Where does the index live?

By default: .ogrep/index.sqlite. It is a single local file, so it is easy to keep per repo or per profile.

How do I get better results?

1. Use AST chunking: install pip install "ogrep[ast]" (enabled by default)
2. Use reranking with local embeddings: --rerank (skip for Voyage/OpenAI)
3. Tune chunk size: ogrep tune .
4. Use fulltext mode for exact identifiers: --mode fulltext

What does the JSON hint mean?

When querying an index built without AST chunking, the JSON output includes a hint: "hint": "Index was built without AST chunking. For better semantic boundaries, run: ogrep reindex .". This tells AI tools to suggest rebuilding the index with AST (now the default) for better results.

ogrep v0.12.0 β€” semantic grep for codebases (local-first, SQLite-backed, MCP + agentic Claude Code integration)
GitHub MIT License