INDEX
Explanations
references to male athletes or sportsmen
New Auto-Interp
Negative Logits
er
-0.17
à¤ľà¤¨
-0.16
edii
-0.15
dy
-0.15
iais
-0.15
ty
-0.15
rette
-0.15
skl
-0.15
RYPTO
-0.15
ÏĢη
-0.15
POSITIVE LOGITS
heimer
0.24
Islands
0.19
heim
0.18
agement
0.18
lig
0.17
chor
0.17
nen
0.16
ually
0.16
ematik
0.15
sÃłng
0.15
Activations Density 0.004%