Projects explorer

A map of the work

Projects, the topics they touch, and the outputs they produce, drawn as a graph. Knowledge graphs are how I think about information, so here is mine. Click a node for detail, filter by type, or read the structured list below.

AI agents

Systems that plan, call tools, and act over multiple steps to accomplish a goal. The throughline of my current work at Sourcegraph and across SciX.

Projects

  • Agent Diagnostics — A behavioral taxonomy, annotation framework, and shareable dataset backend for analyzing why coding agents succeed or fail on benchmark tasks.
  • Agent Tidal Wave — A booth game for the AI World's Fair. Guess how much code AI agents are writing on GitHub, and a wave of agent-written code crashes in.
  • Cross-Repo Invariant Verifier — A background agent that checks organization-wide code invariants across every repository indexed by Sourcegraph, triggered by PR events and a weekly cron.
  • Code Intelligence Digest — Aggregates feeds and presents curated weekly and monthly digests of code intelligence, tools, and AI agents using hybrid LLM + BM25 + recency scoring.
  • CodeProbe — Benchmarks AI coding agents against your own codebase by mining evaluation tasks from its git history, so the suite can't be contaminated by training data.
  • CodeScaleBench — A benchmark suite for evaluating how AI coding agents use external context-retrieval tools on realistic developer tasks in large, enterprise-scale codebases.
  • EnterpriseBench — A benchmark for evaluating how well coding agents understand and navigate code across large, distributed enterprise codebases.
  • Gas City Dashboard — A dashboard for Gas City multi-agent orchestrations.
  • Gas City Packs — Reusable packs for Gas City. The PR-pipeline and Slack packs are mine.
  • Coding Agent Workflows — Coding standards, agent roles, skills, and multi-step workflows that read the same whether you drive Claude Code, Codex, Amp, or anything that reads an AGENTS.md.
  • Gas City — An orchestration-builder SDK for multi-agent coding workflows. I'm a maintainer.
  • Livedocs — Keep docs in sync with code. Livedocs extracts structural claims from source into per-repo SQLite databases that AI agents query over MCP, no expensive grep-and-read cycles.
  • mem — Build and benchmark agentic memory using a multi-agent orchestrator's own work traces as the evaluation corpus, where every unit of work has a verifiable outcome.
  • mcp-ax — An MCP tool agentic-experience evaluation framework, measuring how usable MCP tools actually are for agents.
  • SciX Agent — An agentic research assistant over the NASA SciX / ADS corpus, bridging AI agents with scholarly search infrastructure.
  • ToM-SWE — A theory-of-mind agent for Claude Code that learns your coding preferences, interaction style, and project patterns across sessions.

Agent memory

How agents store, retrieve, and forget context across turns and sessions. Memory architectures, retrieval over history, and design tradeoffs.

Projects

  • Literature Explorers — Curated, navigable surveys of recent research, organized into thematic maps rather than linear reading lists. Built on SciX MCP and code-intel sources.
  • mem — Build and benchmark agentic memory using a multi-agent orchestrator's own work traces as the evaluation corpus, where every unit of work has a verifiable outcome.
  • ToM-SWE — A theory-of-mind agent for Claude Code that learns your coding preferences, interaction style, and project patterns across sessions.

Code intelligence

Understanding codebases at scale: search, navigation, and agents that reason over source. The domain of my work at Sourcegraph.

Projects

  • Cross-Repo Invariant Verifier — A background agent that checks organization-wide code invariants across every repository indexed by Sourcegraph, triggered by PR events and a weekly cron.
  • Code Intelligence Digest — Aggregates feeds and presents curated weekly and monthly digests of code intelligence, tools, and AI agents using hybrid LLM + BM25 + recency scoring.
  • CodeProbe — Benchmarks AI coding agents against your own codebase by mining evaluation tasks from its git history, so the suite can't be contaminated by training data.
  • CodeScaleBench — A benchmark suite for evaluating how AI coding agents use external context-retrieval tools on realistic developer tasks in large, enterprise-scale codebases.
  • EnterpriseBench — A benchmark for evaluating how well coding agents understand and navigate code across large, distributed enterprise codebases.
  • Gas City Dashboard — A dashboard for Gas City multi-agent orchestrations.
  • Gas City Packs — Reusable packs for Gas City. The PR-pipeline and Slack packs are mine.
  • Coding Agent Workflows — Coding standards, agent roles, skills, and multi-step workflows that read the same whether you drive Claude Code, Codex, Amp, or anything that reads an AGENTS.md.
  • Gas City — An orchestration-builder SDK for multi-agent coding workflows. I'm a maintainer.
  • Livedocs — Keep docs in sync with code. Livedocs extracts structural claims from source into per-repo SQLite databases that AI agents query over MCP, no expensive grep-and-read cycles.
  • Migration Evals — A tiered-oracle funnel for evaluating automated code migrations end to end, Java 8 to 17, Python 2 to 3, with a pluggable ecosystem.

Evaluation & benchmarks

Measuring whether AI systems work. Benchmarks, evals, and honest comparison of search engines and agents.

Projects

  • Agent Diagnostics — A behavioral taxonomy, annotation framework, and shareable dataset backend for analyzing why coding agents succeed or fail on benchmark tasks.
  • CodeProbe — Benchmarks AI coding agents against your own codebase by mining evaluation tasks from its git history, so the suite can't be contaminated by training data.
  • CodeScaleBench — A benchmark suite for evaluating how AI coding agents use external context-retrieval tools on realistic developer tasks in large, enterprise-scale codebases.
  • EnterpriseBench — A benchmark for evaluating how well coding agents understand and navigate code across large, distributed enterprise codebases.
  • GEO — Generative Engine Optimization — Measuring how LLM-powered tools discover, recommend, and describe products. GEO is the AI equivalent of SEO.
  • mem — Build and benchmark agentic memory using a multi-agent orchestrator's own work traces as the evaluation corpus, where every unit of work has a verifiable outcome.
  • mcp-ax — An MCP tool agentic-experience evaluation framework, measuring how usable MCP tools actually are for agents.
  • Migration Evals — A tiered-oracle funnel for evaluating automated code migrations end to end, Java 8 to 17, Python 2 to 3, with a pluggable ecosystem.

Knowledge graphs

Entity extraction, linking, and graph-structured representations of knowledge. The method behind this site's own projects explorer.

Projects

  • Literature Explorers — Curated, navigable surveys of recent research, organized into thematic maps rather than linear reading lists. Built on SciX MCP and code-intel sources.

Retrieval

Finding the right information at the right time. Embeddings, ranking, hybrid search, and retrieval-augmented systems over scientific and code corpora.

Projects

  • Code Intelligence Digest — Aggregates feeds and presents curated weekly and monthly digests of code intelligence, tools, and AI agents using hybrid LLM + BM25 + recency scoring.
  • CodeScaleBench — A benchmark suite for evaluating how AI coding agents use external context-retrieval tools on realistic developer tasks in large, enterprise-scale codebases.
  • Literature Explorers — Curated, navigable surveys of recent research, organized into thematic maps rather than linear reading lists. Built on SciX MCP and code-intel sources.
  • GEO — Generative Engine Optimization — Measuring how LLM-powered tools discover, recommend, and describe products. GEO is the AI equivalent of SEO.
  • Livedocs — Keep docs in sync with code. Livedocs extracts structural claims from source into per-repo SQLite databases that AI agents query over MCP, no expensive grep-and-read cycles.
  • NLS Fine-tune (SciX) — Fine-tuning infrastructure for converting natural language into ADS / SciX scientific-literature search queries.
  • SciX Agent — An agentic research assistant over the NASA SciX / ADS corpus, bridging AI agents with scholarly search infrastructure.