Work

Projects

Things I'm building across agents, scientific search, code intelligence, and evaluation. For how they connect, try the interactive graph.

Open the knowledge graph →

Gas City

An orchestration-builder SDK for multi-agent coding workflows. I'm a maintainer.

GoAgentsOrchestration

Details Code Architecture
EnterpriseBench

A benchmark for evaluating how well coding agents understand and navigate code across large, distributed enterprise codebases.

PythonEvaluationAgents

Details Code Architecture
Livedocs

Keep docs in sync with code. Livedocs extracts structural claims from source into per-repo SQLite databases that AI agents query over MCP, no expensive grep-and-read cycles.

GoMCPtree-sitter

Details Code Architecture
CodeScaleBench

A benchmark suite for evaluating how AI coding agents use external context-retrieval tools on realistic developer tasks in large, enterprise-scale codebases.

PythonEvaluationRetrievalDocker

Details Code Architecture
ToM-SWE

A theory-of-mind agent for Claude Code that learns your coding preferences, interaction style, and project patterns across sessions.

TypeScriptAgentsMemory

Details Code Architecture
Code Intelligence Digest

Aggregates feeds and presents curated weekly and monthly digests of code intelligence, tools, and AI agents using hybrid LLM + BM25 + recency scoring.

TypeScriptRetrievalLLM

Details Code Architecture
Cross-Repo Invariant Verifier

A background agent that checks organization-wide code invariants across every repository indexed by Sourcegraph, triggered by PR events and a weekly cron.

TypeScriptAgentsMCP

Details Code Architecture
Agent Diagnostics

A behavioral taxonomy, annotation framework, and shareable dataset backend for analyzing why coding agents succeed or fail on benchmark tasks.

PythonEvaluationAgents

Details Code Architecture
SciX Agent

An agentic research assistant over the NASA SciX / ADS corpus, bridging AI agents with scholarly search infrastructure.

PythonAgentsMCPRetrieval

Details Code Architecture
Coding Agent Workflows

Coding standards, agent roles, skills, and multi-step workflows that read the same whether you drive Claude Code, Codex, Amp, or anything that reads an AGENTS.md.

JavaScriptAgents

Details Code Architecture
Migration Evals

A tiered-oracle funnel for evaluating automated code migrations end to end, Java 8 to 17, Python 2 to 3, with a pluggable ecosystem.

PythonEvaluationMigration

Details Code Architecture
mcp-ax

An MCP tool agentic-experience evaluation framework, measuring how usable MCP tools actually are for agents.

PythonEvaluationMCP

Details Code Architecture
GEO: Generative Engine Optimization

Measuring how LLM-powered tools discover, recommend, and describe products. GEO is the AI equivalent of SEO.

EvaluationRetrievalLLM

Details Code Architecture
CodeProbe

Benchmarks AI coding agents against your own codebase by mining evaluation tasks from its git history, so the suite can't be contaminated by training data.

PythonEvaluationAgents

Details Code Architecture
Embertide

A browser co-op deck-builder in the Slay-the-Spire / Ascension lineage, rendered entirely in hand-authored stained-glass art and tuned to be readable by a six-year-old.

ReactZustandViteTypeScript

Details Play ▸ Code Architecture
NLS Fine-tune (SciX)

Fine-tuning infrastructure for converting natural language into ADS / SciX scientific-literature search queries.

PythonNLPSciX

Details Code Architecture
Gas City Packs

Reusable packs for Gas City. The PR-pipeline and Slack packs are mine.

GoPythonAgents

Details Code Architecture
Gas City Dashboard

A dashboard for Gas City multi-agent orchestrations.

TypeScriptReact

Details Code Architecture
Agent Tidal Wave

A booth game for the AI World's Fair. Guess how much code AI agents are writing on GitHub, and a wave of agent-written code crashes in.

NodeExpressPostgres

Details Code Architecture
Wheel Practice App

A Wheel of Fortune practice app built with React Native and Expo, three game modes, real puzzle packs, seeded for repeatable practice.

React NativeExpoTypeScript

Details Play ▸ Code Architecture
mem

Build and benchmark agentic memory using a multi-agent orchestrator's own work traces as the evaluation corpus, where every unit of work carries a real lifecycle outcome and a full trace.

TypeScriptPythonEvaluationAgent memory

Details Code Architecture
Sourcegraph GTM Assistant

A stateless MCP server on Cloud Run that gives any authenticated Sourcegraph employee, through claude.ai, one tool surface over curated per-account research (GCS corpus) and live internal data (Salesforce, Looker, PostHog, HubSpot via cost-safeguarded databot), spanning account discovery, intelligence, lead scoring, and voice-checked outreach drafting.

PythonAgentsSlackLLM

Details Architecture
Personal Website

sjarmak.ai: an Astro static site whose content collections form a typed knowledge graph. This very project entry is one node in it.

AstroTypeScriptZodCytoscapeMDX

Details Code Architecture Live
Agent Oriented Architecture Toolkit

Measures whether a repository actually works for AI coding agents by running an agent against tasks mined from its own git history and scoring what it did, instead of checking for the presence of files like AGENTS.md.

RustEvaluationAgents

Details Code Architecture

Gas City

EnterpriseBench

Livedocs

CodeScaleBench

ToM-SWE

Code Intelligence Digest

Cross-Repo Invariant Verifier

Agent Diagnostics

SciX Agent

Coding Agent Workflows

Migration Evals

mcp-ax

GEO: Generative Engine Optimization

CodeProbe

Embertide

NLS Fine-tune (SciX)

Gas City Packs

Gas City Dashboard

Agent Tidal Wave

Wheel Practice App

mem

Sourcegraph GTM Assistant

Personal Website

Agent Oriented Architecture Toolkit