INDEX
Explanations
the word "man" in various contexts and frequencies
New Auto-Interp
Negative Logits
tone
-0.19
imler
-0.17
yaw
-0.17
sher
-0.17
shire
-0.16
ermen
-0.16
онаÑħ
-0.16
ammers
-0.15
shal
-0.15
Msp
-0.15
POSITIVE LOGITS
made
0.32
agements
0.28
tras
0.28
ufac
0.28
hattan
0.28
agment
0.26
ouver
0.25
-made
0.24
eu
0.23
hole
0.23
Activations Density 0.014%