INDEX
Explanations
words indicating suspicion or doubt
New Auto-Interp
Negative Logits
iry
-0.17
lean
-0.15
.mods
-0.15
OTA
-0.15
mage
-0.14
ird
-0.14
otas
-0.14
ancy
-0.14
.motion
-0.14
veau
-0.14
POSITIVE LOGITS
Sharp
0.16
itra
0.16
iale
0.15
éĹ
0.15
ovich
0.15
ITTE
0.15
Cole
0.14
Engel
0.14
ãĤ¥
0.14
enko
0.14
Activations Density 0.002%