INDEX
Explanations
mathematical symbols and expressions related to functions and operations in abstract settings
New Auto-Interp
Negative Logits
(x
-0.41
x
-0.39
[x
-0.38
,x
-0.38
/x
-0.37
=x
-0.36
|x
-0.35
xv
-0.34
xx
-0.32
$x
-0.32
POSITIVE LOGITS
X
0.82
X
0.66
X
0.53
*X
0.51
_X
0.51
>X
0.49
.X
0.49
-X
0.48
=X
0.47
(X
0.47
Activations Density 0.135%