INDEX
Explanations
references to reports and rebuttals in a discussion or commentary context
New Auto-Interp
Negative Logits
ñana
-0.18
163
-0.16
ìłķ
-0.16
Nova
-0.15
éal
-0.15
563
-0.14
TTY
-0.14
silk
-0.14
Silk
-0.14
Jenkins
-0.14
POSITIVE LOGITS
below
0.16
oulos
0.16
abaixo
0.15
bedo
0.15
-IN
0.15
imizi
0.15
UNUSED
0.15
Below
0.15
Below
0.15
vyk
0.14
Activations Density 0.099%