INDEX
Explanations
references to module definitions and specific data handling structures in code
New Auto-Interp
Negative Logits
<
-0.51
1
-0.50
2
-0.47
↵
-0.45
-0.45
3
-0.44
><
-0.40
>
-0.40
0
-0.40
}^{-0.39
POSITIVE LOGITS
beſte
0.91
Geſch
0.90
<unused8>
0.88
[@BOS@]
0.88
<unused52>
0.88
<unused42>
0.87
<unused43>
0.87
<unused41>
0.87
<unused16>
0.87
<unused23>
0.87
Activations Density 0.581%