INDEX
Explanations
proper nouns or titles related to individuals or characters
mentions of the word "man"
New Auto-Interp
Negative Logits
CAST
-0.77
ritical
-0.76
irtual
-0.74
rss
-0.73
¥ŀ
-0.73
iesel
-0.72
ython
-0.71
daily
-0.71
insured
-0.70
Seym
-0.70
POSITIVE LOGITS
hunt
1.22
hood
1.13
gling
0.98
uscript
0.94
volent
0.92
liness
0.90
agers
0.90
ifest
0.89
liest
0.88
nered
0.81
Activations Density 0.061%