INDEX
Explanations
references to discrimination based on sexual orientation
New Auto-Interp
Negative Logits
uckland
-0.74
Purg
-0.68
jam
-0.67
answered
-0.65
jet
-0.65
nan
-0.64
Edit
-0.64
upd
-0.63
Loop
-0.63
cean
-0.62
POSITIVE LOGITS
nationality
1.12
ethnicity
1.07
gender
0.96
resemblance
0.87
colour
0.87
creed
0.85
whim
0.85
whims
0.85
likeness
0.85
merit
0.83
Activations Density 0.239%