INDEX
Explanations
the phrase "don't" and its variants indicating a negative imperative or advice
New Auto-Interp
Negative Logits
erno
-0.18
ffe
-0.16
fter
-0.15
ual
-0.15
.freeze
-0.15
clide
-0.14
šak
-0.14
ually
-0.14
intl
-0.14
.pivot
-0.14
POSITIVE LOGITS
cel
0.16
indh
0.15
afari
0.14
аза
0.14
olith
0.14
ipers
0.14
ascus
0.14
itored
0.13
argout
0.13
chant
0.13
Activations Density 0.077%