INDEX
Explanations
references to minority groups or concepts related to being a minority
New Auto-Interp
Negative Logits
amine
-0.19
changer
-0.17
_MPI
-0.16
butt
-0.16
arme
-0.15
eding
-0.15
acier
-0.15
andon
-0.15
en
-0.15
eller
-0.15
POSITIVE LOGITS
itized
0.28
league
0.23
(<
0.22
-league
0.21
itarian
0.19
-major
0.18
leagues
0.17
/no
0.17
league
0.17
itty
0.16
Activations Density 0.016%