Skip to main content
Response quality metrics help you measure how well your AI system answers user questions, follows instructions, and provides useful information. These metrics are key for building reliable, helpful, and user-friendly AI applications. Use these metrics when you want to:
  • Ensure your AI’s responses are factually correct and complete.
  • Check that the model follows instructions and uses retrieved information effectively.
  • Evaluate how well your system grounds answers in context or source material.
Below is a quick reference table of all response quality metrics:
NameDescriptionSupported NodesWhen to UseExample Use Case
Chunk Attribution UtilizationAssesses whether the response uses the retrieved chunks in its response, and properly attributes information to source documents.Retriever spanWhen implementing RAG systems and want to ensure proper attribution and that retrieved information is used efficiently.A legal research assistant that must cite specific cases and statutes when providing legal information.
Chunk RelevanceMeasures whether each retrieved chunk contains information that could help answer the user’s query.Retriever spanWhen evaluating the relevance of individual retrieved chunks to the query.A RAG system that needs to ensure each retrieved document chunk contributes useful information toward answering user questions.
CompletenessMeasures how thoroughly the response covers the relevant information available in the provided contextLLM spanWhen evaluating if responses fully address the user’s intent.A healthcare chatbot, when provided with a patient’s medical record as context, must include all relevant critical information from that record in its response.
Context AdherenceMeasures how well the response aligns with the provided context.LLM spanWhen you want to ensure the model is grounding its responses in the provided context.A financial advisor bot that must base investment recommendations on the client’s specific financial situation and goals.
Context PrecisionMeasures the percentage of relevant chunks in the retrieved context, weighted by their position in the retrieval order.Retriever spanWhen evaluating the overall quality of your retrieval system’s results and ranking effectiveness.A document search system that needs to ensure retrieved chunks are relevant and properly ranked by importance.
Context Relevance (Query Adherence)Evaluates whether the retrieved context is relevant to the user’s query.Retriever spanWhen assessing the quality of your retrieval system’s results.An internal knowledge base search that retrieves company policies relevant to specific employee questions.
Correctness (factuality)Evaluates the factual accuracy of information provided in the response.LLM spanWhen accuracy of information is critical to your application.A medical information system providing drug interaction details to healthcare professionals.
Ground Truth AdherenceMeasures how well the response aligns with established ground truth.

This metric is only available for experiments as it needs ground truth set in your dataset.
TraceWhen evaluating model responses against known correct answers.A customer service AI that must provide accurate product specifications from an official catalog.
Instruction AdherenceAssesses whether the model followed the instructions in your prompt template.LLM spanWhen using complex prompts and need to verify the model is following all instructions.A content generation system that must follow specific brand guidelines and formatting requirements.
Precision @ KMeasures the percentage of relevant chunks among the top K retrieved chunks at a specific rank position.Retriever spanWhen determining the optimal number of chunks to retrieve (Top K) and evaluating ranking quality at specific positions.A RAG system that needs to optimize retrieval parameters to balance between capturing all relevant chunks and avoiding irrelevant ones.

Next steps