INDEX
Explanations
expressions conveying uncertainty or disagreement
yes or no, doubt, or exclamation
New Auto-Interp
Negative Logits
[@BOS@]
-0.75
<unused17>
-0.75
<pad>
-0.75
<unused68>
-0.74
<unused42>
-0.74
<unused3>
-0.74
<unused14>
-0.74
<unused23>
-0.74
<unused16>
-0.74
<unused8>
-0.74
POSITIVE LOGITS
!
0.42
OMITBAD
0.35
Yes
0.33
probably
0.33
indeed
0.31
surely
0.31
EMPTY
0.30
It
0.30
very
0.29
regler
0.28
Activations Density 0.035%