INDEX
Explanations
references to male athletes or male-related achievements in sports
New Auto-Interp
Negative Logits
er
-0.19
殿
-0.16
dy
-0.16
èµ·
-0.15
nev
-0.15
zt
-0.14
볨
-0.14
oui
-0.14
ezi
-0.14
elyn
-0.14
POSITIVE LOGITS
heimer
0.22
enheim
0.17
heim
0.17
NB
0.17
iche
0.16
agement
0.16
405
0.16
boro
0.16
chaft
0.16
esk
0.15
Activations Density 0.005%