INDEX
Explanations
phrases related to statements and opinions made by a specific individual
instances of the pronoun "he" referring to a male subject
New Auto-Interp
Negative Logits
noon
-0.80
etheless
-0.69
cannabin
-0.67
cious
-0.66
BALL
-0.64
interfering
-0.64
berra
-0.63
rocket
-0.63
Operation
-0.62
visible
-0.62
POSITIVE LOGITS
said
1.19
wrote
1.11
joked
1.04
says
1.03
tweeted
1.01
said
0.99
told
0.96
'd
0.96
explained
0.91
added
0.91
Activations Density 0.056%