INDEX
Explanations
structured proof elements in mathematical writing
New Auto-Interp
Negative Logits
elta
-0.17
oran
-0.15
ista
-0.15
Grim
-0.14
yps
-0.14
thon
-0.14
ateg
-0.14
ahr
-0.14
ort
-0.13
activation
-0.13
POSITIVE LOGITS
ichern
0.14
neod
0.14
?}",
0.14
COMPARE
0.14
compareTo
0.13
å®ı
0.13
.dw
0.13
tür
0.13
706
0.13
tility
0.13
Activations Density 0.032%