Work
Projects
Things I'm building across agents, scientific search, code intelligence, and evaluation. For how they connect, try the interactive graph.
-
An orchestration-builder SDK for multi-agent coding workflows. I'm a maintainer.
GoAgentsOrchestration
-
A benchmark for evaluating how well coding agents understand and navigate code across large, distributed enterprise codebases.
PythonEvaluationAgents
-
Curated, navigable surveys of recent research, organized into thematic maps rather than linear reading lists. Built on SciX MCP and code-intel sources.
Knowledge graphsRetrievalSciX MCP
-
Keep docs in sync with code. Livedocs extracts structural claims from source into per-repo SQLite databases that AI agents query over MCP, no expensive grep-and-read cycles.
GoMCPtree-sitter
-
A benchmark suite for evaluating how AI coding agents use external context-retrieval tools on realistic developer tasks in large, enterprise-scale codebases.
C++EvaluationRetrieval
-
A theory-of-mind agent for Claude Code that learns your coding preferences, interaction style, and project patterns across sessions.
TypeScriptAgentsMemory
-
Aggregates feeds and presents curated weekly and monthly digests of code intelligence, tools, and AI agents using hybrid LLM + BM25 + recency scoring.
TypeScriptRetrievalLLM
-
A background agent that checks organization-wide code invariants across every repository indexed by Sourcegraph, triggered by PR events and a weekly cron.
TypeScriptAgentsMCP
-
A behavioral taxonomy, annotation framework, and shareable dataset backend for analyzing why coding agents succeed or fail on benchmark tasks.
PythonEvaluationAgents
-
An agentic research assistant over the NASA SciX / ADS corpus, bridging AI agents with scholarly search infrastructure.
PythonAgentsMCPRetrieval
-
Coding standards, agent roles, skills, and multi-step workflows that read the same whether you drive Claude Code, Codex, Amp, or anything that reads an AGENTS.md.
JavaScriptAgents
-
A tiered-oracle funnel for evaluating automated code migrations end to end, Java 8 to 17, Python 2 to 3, with a pluggable ecosystem.
PythonEvaluationMigration
-
An MCP tool agentic-experience evaluation framework, measuring how usable MCP tools actually are for agents.
PythonEvaluationMCP
-
Measuring how LLM-powered tools discover, recommend, and describe products. GEO is the AI equivalent of SEO.
EvaluationRetrievalLLM
-
Benchmarks AI coding agents against your own codebase by mining evaluation tasks from its git history, so the suite can't be contaminated by training data.
PythonEvaluationAgents
-
Fine-tuning infrastructure for converting natural language into ADS / SciX scientific-literature search queries.
PythonNLPSciX
-
Reusable packs for Gas City. The PR-pipeline and Slack packs are mine.
GoAgents
-
A dashboard for Gas City multi-agent orchestrations.
TypeScriptReact
-
A booth game for the AI World's Fair. Guess how much code AI agents are writing on GitHub, and a wave of agent-written code crashes in.
NodeExpressPostgres
-
A Wheel of Fortune practice app built with React Native and Expo, three game modes, real puzzle packs, seeded for repeatable practice.
React NativeExpoTypeScript
-
Build and benchmark agentic memory using a multi-agent orchestrator's own work traces as the evaluation corpus, where every unit of work has a verifiable outcome.
TypeScriptEvaluationAgent memory