INDEX
Explanations
words expressing personal opinions or statements made by individuals
instances of speech or quotations in the text
New Auto-Interp
Negative Logits
Role
-0.74
taboola
-0.69
mol
-0.66
Charge
-0.66
oil
-0.66
eneg
-0.65
graded
-0.64
shut
-0.63
isol
-0.62
otin
-0.61
POSITIVE LOGITS
orically
0.73
sarcast
0.69
Donna
0.68
Joyce
0.68
Dave
0.68
anecd
0.67
heny
0.67
Stef
0.66
bluntly
0.66
Angela
0.66
Activations Density 0.056%