INDEX
Explanations
segments of text containing repeated patterns or structures
code, math, and text data
New Auto-Interp
Negative Logits
ſelf
-0.87
faſt
-0.85
<unused14>
-0.82
$_"
-0.82
<unused16>
-0.82
<unused8>
-0.82
<unused52>
-0.82
<unused68>
-0.82
<unused51>
-0.82
[@BOS@]
-0.82
POSITIVE LOGITS
1.48
0.99
0.95
0.89
0.89
0.88
0.87
0.86
0.76
0.75
Activations Density 0.042%