INDEX
Explanations
obscure characters or formatting in the text
New Auto-Interp
Negative Logits
gramm
-0.16
avery
-0.15
ushima
-0.15
angl
-0.14
Gram
-0.14
zb
-0.14
addon
-0.14
iffe
-0.14
eron
-0.14
jem
-0.14
POSITIVE LOGITS
ILLS
0.15
_tcb
0.14
NG
0.14
imedia
0.14
(*)
0.14
Backdrop
0.14
Ring
0.13
achuset
0.13
ills
0.13
.xticks
0.13
Activations Density 0.002%