Sexism Detection flags whether a response contains sexist content. Output is a binary classification of whether a response is sexist or not.
Calculation method
Sexism detection is computed through a specialized process:Model Architecture
The detection system is built on a Small Language Model (SLM) that combines training from both open-source datasets and carefully curated internal datasets to identify various forms of sexist content.
Optimizing your AI system
Addressing Sexism in Your System
Implement guardrails: Flag responses before being served to prevent future occurrences.
Fine-tune models: Adjust model behavior to reduce sexist outputs.
Identify responses that contain sexist comments and take preventive measures to ensure fair and unbiased AI interactions.
Performance Benchmarks
We evaluated Sexism Detection against gold labels on the “test” split of TomData/TG-sexism_balanced open-source dataset using top frontier models.| Model | F1 (True) |
|---|---|
| GPT-4.1 | 0.91 |
| GPT-4.1-mini (judges=3) | 0.89 |
| Claude Sonnet 4.5 | 0.87 |
| Gemini 3 Flash | 0.89 |
GPT-4.1 Classification Report
| Precision | Recall | F1-Score | |
|---|---|---|---|
| False | 0.93 | 0.88 | 0.90 |
| True | 0.88 | 0.94 | 0.91 |
Confusion Matrix (Normalized)
Predicted
True
False
Actual
True
0.938
0.062
False
0.123
0.877
0.01.0
Benchmarks based on the TomData/TG-sexism_balanced open-source dataset. Performance may vary by use case.
Related Resources
If you would like to dive deeper or start implementing Sexism Detection, check out the following resources:Examples
- Sexism Examples - Log in and explore the “Sexism” Log Stream in the “Preset Metric Examples” Project to see this metric in action.