INDEX
Explanations
references to the term "man" or its variations
New Auto-Interp
Negative Logits
kilde
-0.48
yll
-0.46
orios
-0.45
inmediato
-0.45
xiu
-0.44
TokenNameR
-0.41
uque
-0.41
reshold
-0.41
笈
-0.41
Bildungs
-0.41
POSITIVE LOGITS
Man
1.20
Man
1.14
man
1.14
man
1.02
MAN
0.98
MAN
0.88
Manning
0.82
Mann
0.81
Woman
0.79
woman
0.74
Activations Density 0.013%