INDEX
Explanations
mathematical symbols and code placeholders
New Auto-Interp
Negative Logits
B
0.98
A
0.97
C
0.97
A
0.89
F
0.87
B
0.86
Y
0.81
X
0.81
X
0.81
P
0.76
POSITIVE LOGITS
m
1.84
d
1.55
r
1.52
v
1.52
q
1.51
n
1.51
g
1.48
s
1.47
h
1.47
p
1.44
Activations Density 0.685%