INDEX
Explanations
punctuation marks and formatting symbols in the text
specific following tokens
New Auto-Interp
Negative Logits
Harms
-0.39
enumi
-0.39
.
-0.36
quema
-0.34
Several
-0.33
this
-0.33
descendre
-0.33
táctil
-0.33
relsen
-0.32
리
-0.32
POSITIVE LOGITS
Vidite
0.71
<unused28>
0.69
<unused52>
0.69
<unused23>
0.69
<unused41>
0.69
<unused16>
0.69
ſſung
0.69
<unused8>
0.69
<unused3>
0.69
[@BOS@]
0.69
Activations Density 0.038%