INDEX
Explanations
references to academic citations and authorship
New Auto-Interp
Negative Logits
Gam
-0.17
.spawn
-0.16
LAS
-0.15
GAM
-0.15
gam
-0.15
SAS
-0.14
urat
-0.14
McCartney
-0.14
Bing
-0.14
geist
-0.14
POSITIVE LOGITS
omaly
0.18
dual
0.18
ories
0.16
Dual
0.16
Dual
0.16
uluk
0.16
Myers
0.16
theories
0.16
SYM
0.15
akov
0.15
Activations Density 0.027%