Information scientist · AI agent advocate
Stephanie Jarmak — information scientist and AI agent advocate
I work on multi-agent orchestration and code intelligence: getting teams of agents to reliably understand and change large codebases, and evaluating whether they actually help. Currently at Sourcegraph, and a research affiliate with NASA SciX.
Currently
- AI agent advocate / applied research scientist Sourcegraph
- Research affiliate NASA Science Explorer (SciX)
Selected work
All projects →-
CodeScaleBench
A benchmark suite for evaluating how AI coding agents use external context-retrieval tools on realistic developer tasks in large, enterprise-scale codebases.
C++EvaluationRetrieval
-
EnterpriseBench
A benchmark for evaluating how well coding agents understand and navigate code across large, distributed enterprise codebases.
PythonEvaluationAgents
-
SciX Agent
An agentic research assistant over the NASA SciX / ADS corpus, bridging AI agents with scholarly search infrastructure.
PythonAgentsMCPRetrieval
-
Code Intelligence Digest
Aggregates feeds and presents curated weekly and monthly digests of code intelligence, tools, and AI agents using hybrid LLM + BM25 + recency scoring.
TypeScriptRetrievalLLM
-
Agent Diagnostics
A behavioral taxonomy, annotation framework, and shareable dataset backend for analyzing why coding agents succeed or fail on benchmark tasks.
PythonEvaluationAgents
-
Coding Agent Workflows
Coding standards, agent roles, skills, and multi-step workflows that read the same whether you drive Claude Code, Codex, Amp, or anything that reads an AGENTS.md.
JavaScriptAgents
-
Gas City Dashboard
A dashboard for Gas City multi-agent orchestrations.
TypeScriptReact
-
CodeProbe
Benchmarks AI coding agents against your own codebase by mining evaluation tasks from its git history, so the suite can't be contaminated by training data.
PythonEvaluationAgents
-
mem
Build and benchmark agentic memory using a multi-agent orchestrator's own work traces as the evaluation corpus, where every unit of work has a verifiable outcome.
TypeScriptEvaluationAgent memory