INDEX
Explanations
words related to stealth or subtlety
New Auto-Interp
Negative Logits
erase
-0.17
796
-0.17
747
-0.16
quam
-0.15
readcrumb
-0.15
794
-0.15
SES
-0.14
ruh
-0.14
cri
-0.14
aversal
-0.14
POSITIVE LOGITS
uth
0.29
azy
0.24
UTH
0.24
aze
0.23
dd
0.21
Sle
0.20
AZE
0.19
ight
0.18
igh
0.17
emo
0.16
Activations Density 0.004%