Context adherence with Luna-2
You can also leverage Galileo’s proprietary Evaluation SLMs to calculate context adherence. Context Adherence Luna is computed using Galileo in-house small language models (Luna-2). Context Adherence Luna is a cost-effective way to scale up your RAG evaluation workflows. To leverage Luna-2 for context adherence or other metrics, reach out to our team.Performance Benchmarks
We evaluated Context Adherence against human expert labels on an internal dataset of RAG samples using top frontier models.| Model | F1 (True) |
|---|---|
| GPT-4.1 | 0.90 |
| GPT-4.1-mini (judges=3) | 0.90 |
| Claude Sonnet 4.5 | 0.89 |
| Gemini 3 Flash | 0.89 |
GPT-4.1 Classification Report
| Precision | Recall | F1-Score | |
|---|---|---|---|
| False | 0.90 | 0.89 | 0.89 |
| True | 0.89 | 0.90 | 0.90 |
Confusion Matrix (Normalized)
Predicted
True
False
Actual
True
0.898
0.102
False
0.108
0.892
0.01.0
Benchmarks based on internal evaluation dataset. Performance may vary by use case.
Related Resources
If you would like to dive deeper or start implementing Context Adherence, check out the following resources:Examples
- Context Adherence Examples - Log in and explore the “Context Adherence” Log Stream in the “Preset Metric Examples” Project to see this metric in action.