INDEX
Explanations
punctuation marks, particularly commas
descriptive adjectives
New Auto-Interp
Negative Logits
AddTagHelper
-0.96
zwiſchen
-0.90
<unused43>
-0.90
<unused28>
-0.89
<unused23>
-0.89
<unused8>
-0.89
<unused14>
-0.89
<unused51>
-0.89
[@BOS@]
-0.89
<pad>
-0.89
POSITIVE LOGITS
,
0.42
,
0.33
0.33
cool
0.32
big
0.32
、
0.32
:
0.31
old
0.31
"
0.31
0.30
Activations Density 0.023%