Evaluations in Opal overview

  • Updated

Evaluations measure how well an Optimizely Opal agent performs against your quality standards. Use evaluations to score agent outputs, catch unexpected behavior, and improve agent quality over time.

Opal supports the following two evaluation approaches:

  • Quality tab – Automated scoring and runtime safeguards available on every specialized agent.
  • Evaluation agents – Specialized agents built to evaluate content with custom logic and tools.

Quality tab

The Quality tab adds automated quality scoring and runtime safeguards to every specialized agent. It scores each run against your criteria and detects behavior that deviates from your agent's baseline.

Screenshot of the Quality tab with the quality tab highlighted

The Quality tab includes the following three features:

  • Output Evaluation – Scores every run against criteria, examples, and a baseline evaluation score you define.
  • Execution Guardrails – Detects runs that deviate from your agent's normal behavior.
  • Execution Advisor – Intervenes during runs to recover anomalies, retry failed tool calls, and review sensitive tool calls.

See Quality tab overview for details on how the three features work together.

Evaluation agents

When your evaluation needs go beyond the Quality tab, build a specialized agent as a custom evaluator. An evaluation agent uses targeted tools and criteria to assess content. Common evaluation areas include grammar, search engine optimization, and brand voice.

Use an evaluation agent as part of a workflow that revises outputs until they meet your quality threshold.

See Create a specialized agent for instructions on building an agent.

When to use each approach

Decide between the two approaches based on whether you need automated scoring or custom evaluation logic.

  • Quality tab – Automatically score every run and detect anomalies in agent behavior.
  • Evaluation agent – Build custom evaluation logic for specialized domain needs.

Both approaches work together on the same agent. The Quality tab scores every run automatically, while an evaluation agent handles specialized content quality dimensions that need expert review.

Related articles

If you use Opti ID, administrators can turn off generative AI in the Opti ID Admin Center. See Turn generative AI off across Optimizely applications.