INDEX
Explanations
quotations and associated dialogue
New Auto-Interp
Negative Logits
leo
-0.17
STRU
-0.17
blas
-0.15
antro
-0.15
ectl
-0.15
inalg
-0.15
ADDE
-0.14
-gnu
-0.14
ureka
-0.14
ëĭĪëĭ¤
-0.14
POSITIVE LOGITS
åºŃ
0.18
Peak
0.16
peak
0.15
Gio
0.15
lines
0.14
íģ
0.14
ration
0.14
020
0.13
spread
0.13
credit
0.13
Activations Density 0.003%