references to discriminatory behavior driven by gender or related to harassment accusations

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

-1.98

'));

-1.89

'],

-1.87

'))

-1.84

".

-1.82

']);

-1.80

"];

-1.80

';

-1.80

$.

-1.79

"]);

-1.79

POSITIVE LOGITS

0.77

0.67

0.66

0.65

0.64

0.59

on

0.59

Activations Density 0.015%