INDEX
Explanations
negative phrases that express doubt or criticism
New Auto-Interp
Negative Logits
chner
-0.19
æĹ¢
-0.15
unden
-0.15
بÛĮشترÛĮ
-0.14
λι
-0.14
ambos
-0.14
olid
-0.14
ELLOW
-0.14
99
-0.14
uren
-0.13
POSITIVE LOGITS
particularly
0.26
anywhere
0.25
nearly
0.23
worth
0.23
going
0.21
what
0.20
very
0.20
gonna
0.20
terribly
0.20
remotely
0.20
Activations Density 0.175%