Context Relevance measures if the context has enough information to answer the user query.
Context relevance
Context Relevance measures if the context has enough information to answer the user query. High Context Relevance values indicate strong confidence that there is enough context to answer the question. Low Context Relevance values are a sign that you need to increase your Top K, modify your retrieval strategy, or use better embeddings.Context Relevance is differentiated from Context Adherence: Context Relevance evaluates whether the retrieved context is relevant to a user’s query whereas Context Adherence determines how well the response aligns to provided context.
Best practices
Use for Results Assessment
Leverage Context Relevance when assessing the quality of your retrieval system’s results and determining how accurately it adheres to queries.
Combine with Other Metrics
Use context relevance alongside context adherence, correctness, and completeness metrics for a comprehensive view of response quality.
Performance Benchmarks
We evaluated Context Relevance against human expert labels on an internal dataset of RAG samples using top frontier models.| Model | F1 (True) |
|---|---|
| GPT-4.1 | 0.82 |
| GPT-4.1-mini (judges=3) | 0.85 |
| Claude Sonnet 4.5 | 0.81 |
| Gemini 3 Flash | 0.81 |
GPT-4.1 Classification Report
| Precision | Recall | F1-Score | |
|---|---|---|---|
| False | 0.82 | 0.99 | 0.89 |
| True | 0.97 | 0.71 | 0.82 |
Confusion Matrix (Normalized)
Predicted
True
False
Actual
True
0.708
0.292
False
0.014
0.986
0.01.0
Benchmarks based on internal evaluation dataset. Performance may vary by use case.
Related Resources
If you would like to dive deeper or start implementing Context Relevance, check out the following resources:Examples
- Context Relevance Examples - Log in and explore the “Context Relevance” Log Stream in the “Preset Metric Examples” Project to see this metric in action.