INDEX
Explanations
references to racism and racial prejudices
New Auto-Interp
Negative Logits
Personendaten
-0.67
يتيمه
-0.64
DockStyle
-0.63
faptul
-0.62
TagMode
-0.60
Clik
-0.59
DSS
-0.59
OutputType
-0.57
parse
-0.56
perine
-0.55
POSITIVE LOGITS
racist
0.99
Racism
0.94
racism
0.91
racist
0.88
Racism
0.80
discriminatory
0.74
discrimin
0.74
Discrimin
0.72
prejudices
0.71
Prejudice
0.70
Activations Density 0.024%