INDEX
Explanations
math-related symbols and variables in equations
New Auto-Interp
Negative Logits
\)
-0.16
)).↵
-0.15
umer
-0.15
igon
-0.14
Hockey
-0.14
Congress
-0.14
")).
-0.14
"
-0.13
ingu
-0.13
););↵
-0.13
POSITIVE LOGITS
}$
0.48
)$
0.47
]$
0.43
>$
0.33
">$
0.30
'>$
0.29
"$
0.28
)$/
0.28
"$
0.27
|$
0.27
Activations Density 0.111%