INDEX
Explanations
discourse markers and words that indicate reasoning or contrast
punctuation
New Auto-Interp
Negative Logits
^(@)
-0.96
/>";
-0.96
)|^{-0.90
]";
-0.87
$")
-0.84
%";
-0.82
NUMX
-0.82
_))
-0.81
_
-0.81
%");
-0.80
POSITIVE LOGITS
.
1.04
,
0.85
?
0.79
!
0.73
;
0.67
…
0.60
—
0.60
..
0.58
:
0.57
(
0.54
Activations Density 5.635%