INDEX
Explanations
negations and phrases indicating impossibility or refusal
New Auto-Interp
Negative Logits
anes
-0.19
abi
-0.15
ÑģÑĤаÑĢа
-0.15
auses
-0.14
ýv
-0.14
esini
-0.14
æ¬
-0.14
avs
-0.14
apper
-0.14
ntax
-0.13
POSITIVE LOGITS
be
0.26
anymore
0.26
necessarily
0.25
able
0.23
ever
0.23
unless
0.22
mind
0.21
EVER
0.20
-ever
0.19
ability
0.18
Activations Density 0.087%