INDEX
Explanations
various identifiers and codes associated with articles and publications
New Auto-Interp
Negative Logits
!
-0.19
?
-0.19
'
-0.17
-
-0.17
l
-0.17
.
-0.17
,
-0.17
Eg
-0.16
v
-0.16
sm
-0.15
POSITIVE LOGITS
ed
0.23
ae
0.23
f
0.22
ede
0.22
ac
0.22
/GPL
0.21
af
0.21
b
0.21
acea
0.21
a
0.21
Activations Density 0.038%