INDEX
Explanations
comparative and superlative terms indicating improvement or preference
New Auto-Interp
Negative Logits
ATO
-0.17
ato
-0.16
duk
-0.16
ittle
-0.15
engo
-0.15
avan
-0.14
zend
-0.14
ieri
-0.14
mere
-0.13
zeich
-0.13
POSITIVE LOGITS
-su
0.21
idge
0.20
avoided
0.18
suited
0.18
served
0.16
err
0.15
Su
0.15
suite
0.14
suicide
0.14
sticking
0.14
Activations Density 0.058%