INDEX
Explanations
punctuation marks, particularly periods and exclamation points
New Auto-Interp
Negative Logits
rall
-0.81
targ
-0.76
volunte
-0.75
uninterrupted
-0.73
skelet
-0.72
perty
-0.72
teasp
-0.72
accomp
-0.72
answ
-0.72
behavi
-0.70
POSITIVE LOGITS
Didn
1.03
Exactly
1.02
What
1.02
Lots
1.00
That
0.98
Yeah
0.98
Okay
0.97
Maybe
0.96
Something
0.96
Anyway
0.96
Activations Density 0.046%