INDEX
Explanations
dialogue segments encapsulated in quotation marks
repeated quotation marks in the text
New Auto-Interp
Negative Logits
fres
-0.79
prec
-0.74
sprint
-0.73
endeav
-0.72
valued
-0.72
adjud
-0.71
care
-0.70
spr
-0.70
repro
-0.70
pir
-0.70
POSITIVE LOGITS
Obviously
1.25
Certainly
1.25
That
1.25
Because
1.25
What
1.24
But
1.24
We
1.23
Indeed
1.22
They
1.22
Yeah
1.22
Activations Density 0.119%