Evaluate Your AI Agents Faster and More Effectively with DigitalOcean Gradient™ Platform Updates

Product Updates

DigitalOcean's Gradient AI Platform introduces enhanced agent evaluations, offering goal-oriented metrics, example datasets, clear error messages, and trace integration for faster debugging and more reliable AI agent development.

Product Update

Evaluate Your AI Agents Faster and More Effectively

Evaluating AI agents presents unique challenges, particularly when existing tools don't align with intuitive workflows. To address this, we are thrilled to announce significant updates to our agent evaluations experience within the DigitalOcean Gradient™ AI Platform. These enhancements are designed to accelerate and simplify the process of evaluating AI agents, interpreting results, and efficiently debugging any issues.

What's Changed for Agent Evaluations?

While the initial evaluations feature was robust, it posed certain adoption hurdles for developers. This redesign directly addresses those pain points with several key improvements:

Goal-Oriented Metric Grouping: Metrics are now intuitively organized into goal-oriented categories such as Safety & Security, Correctness, and RAG Performance. The Safety & Security group is preselected to facilitate quick and confident starts for developers.
Example Datasets: A curated list of example datasets is now available for common evaluations, enabling developers to rapidly and efficiently create their own custom datasets.
Clear, Persistent Error Messaging: Upload errors are now explicit, persistent, and highly specific, providing messages like "Validation Error: 'query' column is missing." This clarity empowers developers to easily comprehend and resolve issues, significantly reducing friction in the testing process.
Interpretable Results with Trace Integration: Evaluation results are presented according to the same metric groups used during setup, complete with tooltips explaining each metric and its scoring. A deep integration with observability tools allows developers to directly navigate from a low score to the complete trace, enabling rapid debugging and iterative improvement.

Why You Should Use Evaluations

Agent evaluations are crucial for systematically testing and enhancing your AI agents, streamlining the process of identifying issues and optimizing performance. For new users, the preselected Safety & Security metrics and readily available dataset examples allow for swift checks against common concerns such as unsafe or biased outputs, fostering greater confidence in your agent's behavior.

For developers scaling their agents, the platform offers advanced capabilities including custom test cases, specialized metric groups like RAG Performance, and the option to upload bespoke datasets, providing deeper insights into agent performance. The integrated trace functionality enables precise debugging, allowing users to drill down into low scores to pinpoint and rectify issues with accuracy. Ultimately, these evaluations accelerate the conversion of results into actionable improvements, empowering developers at every stage to build safer, more reliable, and production-ready AI agents.

How to Get Started with Agent Evaluations

Ready to rigorously test your AI agents? Initiating evaluations within the DigitalOcean Gradient™ AI Platform is straightforward:

Access the Evaluations Tab: Navigate to your agent's evaluations tab in the Cloud Console.
Create a New Test Case: Establish a new test case and assign it a unique, descriptive name that clearly reflects its goal or context for easy retrieval later.
Select Metrics: Choose the specific metrics essential for evaluating your agent's critical qualities.
Choose a Dataset: Select an appropriate dataset. If creating your own, refer to the documentation examples for quickly generating a CSV file.
Run and Review: Execute the evaluation and meticulously review the results. Utilize the trace integration to investigate any low scores and efficiently debug your agent.

For a comprehensive, step-by-step walkthrough, we recommend consulting our dedicated tutorial, which provides guidance on creating test cases, selecting metrics, and interpreting evaluation results.

Take proactive control of your AI's performance. Begin evaluating your agents today to swiftly identify issues, optimize behavior, and deploy reliable, production-ready systems faster than ever before.

About the Author

Grace Morgan Author