Sexism

Calculation method
Optimizing your AI system
Addressing Sexism in Your System
Performance Benchmarks
GPT-4.1 Classification Report
Related Resources
Examples

Sexism Detection flags whether a response contains sexist content. Output is a binary classification of whether a response is sexist or not.

Calculation method

Sexism detection is computed through a specialized process:

Model Architecture

The detection system is built on a Small Language Model (SLM) that combines training from both open-source datasets and carefully curated internal datasets to identify various forms of sexist content.

Performance Validation

The model demonstrates robust detection capabilities with an 83% accuracy rate when tested against the Explainable Detection of Online Sexism dataset, a widely recognized benchmark for sexism detection.

Optimizing your AI system

Addressing Sexism in Your System

When sexist content is detected in your system, consider these approaches:

Implement guardrails: Flag responses before being served to prevent future occurrences.

Fine-tune models: Adjust model behavior to reduce sexist outputs.

Identify responses that contain sexist comments and take preventive measures to ensure fair and unbiased AI interactions.

Performance Benchmarks

We evaluated Sexism Detection against gold labels on the “test” split of TomData/TG-sexism_balanced open-source dataset using top frontier models.

Model	F1 (True)
GPT-4.1	0.91
GPT-4.1-mini (judges=3)	0.89
Claude Sonnet 4.5	0.87
Gemini 3 Flash	0.89

GPT-4.1 Classification Report

	Precision	Recall	F1-Score
False	0.93	0.88	0.90
True	0.88	0.94	0.91

Confusion Matrix (Normalized)

Predicted

True

False

Actual

True

0.938

0.062

False

0.123

0.877

0.0

1.0

Benchmarks based on the TomData/TG-sexism_balanced open-source dataset. Performance may vary by use case.

If you would like to dive deeper or start implementing Sexism Detection, check out the following resources:

Examples

Sexism Examples - Log in and explore the “Sexism” Log Stream in the “Preset Metric Examples” Project to see this metric in action.

PII

Toxicity

⌘I

Overview

Get Started

Logging and Monitoring

Experiments

Runtime Protection

Metrics

Annotations

Integrations

Security

References

Calculation method

Optimizing your AI system

Addressing Sexism in Your System

Performance Benchmarks

GPT-4.1 Classification Report

Examples

Overview

Get Started

Logging and Monitoring

Experiments

Runtime Protection

Metrics

Annotations

Integrations

Security

References

​Calculation method

​Optimizing your AI system

​Addressing Sexism in Your System

​Performance Benchmarks

​GPT-4.1 Classification Report

​Related Resources

​Examples

Calculation method

Optimizing your AI system

Addressing Sexism in Your System

Performance Benchmarks

GPT-4.1 Classification Report

Related Resources

Examples