INDEX
Explanations
the word "rather" and its variations indicating preference or comparison
New Auto-Interp
Negative Logits
ery
-0.17
smith
-0.17
sert
-0.16
tar
-0.15
sw
-0.15
urator
-0.15
system
-0.15
ys
-0.15
eat
-0.15
entre
-0.15
POSITIVE LOGITS
than
0.18
-than
0.16
ODE
0.15
вÑģего
0.15
711
0.15
ìĦľ
0.15
_than
0.15
apy
0.15
UNIX
0.15
rière
0.15
Activations Density 0.016%