INDEX
Explanations
full names of people
proper nouns, especially names
New Auto-Interp
Negative Logits
selection
-0.64
akedown
-0.63
noon
-0.60
cessive
-0.59
Round
-0.58
geries
-0.56
berra
-0.55
levant
-0.55
intervening
-0.55
lockout
-0.55
POSITIVE LOGITS
said
1.14
joked
1.08
exclaimed
1.07
says
1.03
explained
1.03
told
1.02
replied
1.00
remarked
1.00
laughed
0.99
wrote
0.96
Activations Density 0.179%