INDEX
Explanations
phrases related to opinions or statements made by a specific person
references to the pronoun "he"
New Auto-Interp
Negative Logits
noon
-0.84
etheless
-0.75
acters
-0.72
rocket
-0.71
evidence
-0.64
Operation
-0.64
selection
-0.60
totality
-0.60
fractions
-0.60
lihood
-0.57
POSITIVE LOGITS
said
1.43
wrote
1.33
says
1.24
joked
1.17
exclaimed
1.15
said
1.14
tweeted
1.14
explained
1.13
told
1.12
replied
1.08
Activations Density 0.065%