INDEX
Explanations
mathematical symbols and expressions related to formal logic or quantifiers
mathematical notation and logic symbols
New Auto-Interp
Negative Logits
<unused68>
-1.09
<pad>
-1.09
[@BOS@]
-1.08
<unused3>
-1.08
<unused23>
-1.08
<unused28>
-1.08
<unused17>
-1.08
<unused16>
-1.08
<unused8>
-1.08
<unused14>
-1.08
POSITIVE LOGITS
(
0.39
$
0.33
false
0.31
<eos>
0.29
f
0.29
$\
0.29
!
0.29
x
0.28
h
0.28
S
0.28
Activations Density 0.783%