Anthropic expands developer console for creating and evaluating AI prompts
Generative artificial intelligence startup Anthropic PBC has opened up new features on its back-end console aimed at developers and technical teams that will allow them to generate, test and evaluate AI prompts.
The company announced Tuesday that developers can now access a built-in prompt generator, powered by Anthropic’s AI model Claude 3.5 Sonnet, which allows users to describe the task at hand and have the AI large language model create a high-quality prompt for them. Descriptions can be anything, such as “Write me a triage document for inbound customer support requests.”
The generated prompts include input variables that can include things such as the inbound customer support message or other information expected to be part of issue management. This feeds into the second feature Anthropic announced test suites for prompts.
Users can manually add or import test cases that provide data for the variables so that prompts can be tested to see how the LLM will react, or ask Claude to auto-generate test cases.
This is useful because coming up with realistic test data is time-consuming and difficult, especially if there’s not much of it already on hand. Even with a small amount of customer support messages, an LLM such as Claude can generate a large variety of convincing synthetic test data that will robustly cover many cases. Users can also manage the test cases as needed, so if the technical teams know of specific issues that might fall through the cracks, they can add them.
Once test cases are generated, users can view and adjust them to match their needs to iterate on how they were produced. This includes manual and generated test cases.
Model evaluation and refining prompts are now quicker as users can create new versions of their prompt and re-run tests to rapidly iterate. The test cases from previous prompts remain and can be regenerated on modified prompts quickly with the variable sets.
Anthropic also added the ability to compare the output of two or more prompts side by side so performance can be evaluated. As a result, how a modification or iteration on a prompt changed the output can be better understood. For example, if the triage prompt from above were adjusted to have the priority of certain types of customer calls elevated over others, but the wording in the elevation needed to be changed to get the attention of the LLM better, it could be seen happening in the comparison.
The company said that subject matter experts are also available to grade responses from AI models on a five-point scale to show if changes have improved response quality.
Anthropic said that the prompt generation and output comparison features are now available to all users on the console. Documentation and tutorials are available on the company’s website.
Image: Anthropic
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU