Initial release of the AI agent evaluation toolkit for Copilot Studio.
What's included
- 6 skills:
/eval-guide,/eval-suite-planner,/eval-generator,/eval-result-interpreter,/eval-triage-and-improvement,/eval-faq - Interactive HTML dashboards for reviewing and editing eval artifacts at each stage
- Architecture-aware eval scoping — automatically adjusts test depth for prompt-level, RAG, and agentic architectures
- Single-response and conversation (multi-turn) test case generation
- Copilot Studio CSV import — generated test cases import directly into Copilot Studio
- Works in both Claude Code and GitHub Copilot
Install
Claude Code:
claude plugin add microsoft/eval-guideGitHub Copilot:
npx skills add microsoft/eval-guide