INDEX
Explanations
short phrases or sentences expressing certainty or confidence
punctuation marks and quotation marks in the text
New Auto-Interp
Negative Logits
intentional
-0.56
planned
-0.55
automated
-0.54
stray
-0.53
planning
-0.53
advanced
-0.52
leve
-0.51
formally
-0.51
wildlife
-0.50
morp
-0.49
POSITIVE LOGITS
↵Âł
0.88
Otherwise
0.85
Therefore
0.85
Anyway
0.81
Thus
0.79
<|endoftext|>
0.79
Similarly
0.79
Likewise
0.79
Nevertheless
0.78
Moreover
0.76
Activations Density 0.688%