INDEX
Explanations
occurrences of the word "Man" or variations related to it
New Auto-Interp
Negative Logits
istrovstvÃŃ
-0.17
imler
-0.16
genic
-0.16
ç¶
-0.14
gs
-0.14
rez
-0.14
klass
-0.14
graphics
-0.14
side
-0.14
leanor
-0.14
POSITIVE LOGITS
ifold
0.26
iac
0.25
tras
0.25
hattan
0.25
agements
0.24
iscal
0.24
fred
0.24
handled
0.23
ificent
0.22
agment
0.21
Activations Density 0.028%