INDEX
Explanations
phrases that emphasize or introduce lists or examples
New Auto-Interp
Negative Logits
aki
-0.16
verte
-0.15
ola
-0.15
verter
-0.15
eba
-0.14
upertino
-0.14
kaar
-0.14
ZR
-0.14
using
-0.14
duct
-0.14
POSITIVE LOGITS
instance
0.25
unately
0.20
example
0.20
cing
0.20
getting
0.19
ged
0.18
instance
0.18
ced
0.17
bid
0.17
decades
0.17
Activations Density 0.063%