INDEX
Explanations
occurrences of the word "man" and its derivatives in various contexts
New Auto-Interp
Negative Logits
ts
-0.17
alars
-0.16
gor
-0.15
imler
-0.14
inz
-0.14
ÏĢη
-0.14
icket
-0.14
elm
-0.14
Larson
-0.14
ügen
-0.14
POSITIVE LOGITS
agements
0.28
hattan
0.26
tras
0.26
iscal
0.25
agment
0.23
ifest
0.23
handled
0.22
ifold
0.22
fred
0.22
uka
0.22
Activations Density 0.028%