Tag

testing

5 articles tagged testing. Browse the full blog.

AI Agent Evaluation Suites in 2026: Real Metrics That Matter

How to evaluate AI agents using Promptfoo, Langfuse, Helicone, and custom test suites. Real metrics, real failure modes, and what to actually measure.

Apr 8, 2026 · Editorial Team · ai-agents evaluation testing

TDD with AI Coding Agents: The Write-Failing-Test Pattern

How to do test-driven development with AI coding agents. The failing test first workflow with Claude Code and Copilot. Real examples, real pitfalls.

Mar 25, 2026 · Editorial Team · tdd testing ai-coding

How to Use GitHub Copilot to Write Better Unit Tests

Use GitHub Copilot's /tests command, edge-case prompting, and table-driven patterns to write unit tests faster without sacrificing quality.

Mar 18, 2026 · Editorial Team · github-copilot testing unit-tests

How to Use Claude Code to Add Tests to an Untested Codebase

How to point Claude Code at a legacy module, generate Vitest or pytest tests, iterate on coverage, and set conventions via CLAUDE.md.

Mar 11, 2026 · Editorial Team · claude-code testing vitest

AI Agent Evaluation: Benchmarks, Custom Evals, and What Actually Matters

How to evaluate AI agents using SWE-bench, WebArena, GAIA, and custom evals. What the numbers mean, what they miss, and how to measure what matters.

Feb 12, 2026 · Editorial Team · evaluation benchmarks ai-fundamentals