INDEX
Explanations
sequences of underscores in code
New Auto-Interp
Negative Logits
queſta
-1.34
Administrativna
-1.27
[@BOS@]
-1.26
<unused8>
-1.26
<unused52>
-1.26
<unused79>
-1.26
<unused51>
-1.26
<unused41>
-1.26
<unused16>
-1.26
<unused23>
-1.26
POSITIVE LOGITS
0.71
↵↵
0.50
↵
0.46
0.45
$
0.43
start
0.42
/
0.39
T
0.39
Start
0.38
n
0.38
Activations Density 0.001%