Research

Digest

A running library of digest issues: newsletters and podcasts on agentic coding, evals, multi-agent orchestration, agent memory, and information retrieval. Most come out of my code-intelligence-digest pipeline; a few I curate by hand. Filter by cadence, topic, or format.

4 issues

  • Daily Jun 9, 2026 🎧 8 min

    The week the benchmarks broke

    Opus 4.8 scores 13.8% on FrontierCode Diamond, and METR says over half of passing SWE-bench results are unmergeable slop. The field spent the week rebuilding its measuring sticks: cheating-resistant evals, exploration and memory benchmarks, and the finding that orchestration is a skill distinct from coding.

    evalsagentic codinginformation retrievalagent memorymulti agent orchestration

    6 links

  • Weekly Jun 9, 2026 🎧 36 min

    Agents Get Graded on Process, Not Just Pass/Fail

    A week of instrumentation: benchmarks broke the binary resolved/unresolved score into exploration, maintainability, and handoff cost, while a Sonnet 4.6 judge that flags agents contradicting their own reasoning predicted failure 94% of the time. Memory research converged on agent-controlled storage over fixed pipelines, self-evolving agents started learning from their own traces, and multi-agent orchestration finally got a cost accounting. Adoption more than doubled in the same window.

    evalsagent memorymulti agentagentic codinginformation retrieval

    16 links

  • Curated Jun 9, 2026

    Enhancing Developer Productivity with Google Colab CLI and Agentic Observability

    Four things worth your time: Google's Colab CLI, which requests a GPU and runs scripts from the terminal; agentic observability from DevOps.com, automating asset management and root-cause triage; SWE-Marathon, an ADS benchmark of 20 long-horizon tasks averaging 27.2M tokens each; and MEnvAgent, reporting 8.6% higher success and 43% lower cost from giving coding agents verifiable environments.

    developer productivityevalsagent memoryinfrastructureknowledge basesbenchmarksreliability

    0 links

  • Weekly Jun 8, 2026 🎧 45 min

    Weekly: the orchestration stack consolidates

    This week the multi-agent orchestration tooling started to converge on a few shared patterns: typed message contracts, deterministic fan-out, and adversarial review as a default stage. Plus a strong week for coding-agent benchmarks and a quietly important retrieval-eval release.

    multi agentagentic codingevalsinformation retrieval

    4 links