INDEX
Explanations
the word "man"
New Auto-Interp
Negative Logits
man
-2.86
Man
-1.98
Man
-1.94
man
-1.89
MAN
-1.75
MAN
-1.42
mans
-1.30
hombre
-1.23
homem
-1.19
mann
-1.13
POSITIVE LOGITS
Men
0.59
Men
0.52
Regel
0.49
men
0.47
MEN
0.47
of
0.45
✭
0.44
msub
0.43
'
0.43
Oby
0.43
Activations Density 0.272%