Skip to content

v1.0.0

Latest

Choose a tag to compare

@serenaxxiee serenaxxiee released this 11 Apr 21:21
8ff56ef

Initial release of the AI agent evaluation toolkit for Copilot Studio.

What's included

  • 6 skills: /eval-guide, /eval-suite-planner, /eval-generator, /eval-result-interpreter, /eval-triage-and-improvement, /eval-faq
  • Interactive HTML dashboards for reviewing and editing eval artifacts at each stage
  • Architecture-aware eval scoping — automatically adjusts test depth for prompt-level, RAG, and agentic architectures
  • Single-response and conversation (multi-turn) test case generation
  • Copilot Studio CSV import — generated test cases import directly into Copilot Studio
  • Works in both Claude Code and GitHub Copilot

Install

Claude Code:

claude plugin add microsoft/eval-guide

GitHub Copilot:

npx skills add microsoft/eval-guide