INDEX
Explanations
mathematical or technical terminology and symbols used in formal contexts
New Auto-Interp
Negative Logits
<unused23>
-1.54
<unused41>
-1.54
<unused16>
-1.53
<unused8>
-1.53
<unused42>
-1.53
<unused79>
-1.53
<unused74>
-1.53
<unused51>
-1.53
<unused43>
-1.53
<pad>
-1.52
POSITIVE LOGITS
.
0.76
↵↵
0.65
,
0.64
0.60
(
0.53
1
0.52
-
0.49
;
0.48
"
0.48
-
0.48
Activations Density 0.217%