INDEX
Explanations
arguments and discussions around errors and reasoning in logical contexts
New Auto-Interp
Negative Logits
orable
-0.44
ove
-0.44
assertAll
-0.43
indexOf
-0.43
onika
-0.43
quoque
-0.42
Total
-0.42
quai
-0.41
Reve
-0.41
Total
-0.40
POSITIVE LOGITS
minus
0.82
without
0.82
modified
0.79
scaled
0.78
#+#
0.77
MessageTagHelper
0.73
plus
0.73
tweaked
0.71
upside
0.70
WITHOUT
0.70
Activations Density 0.560%