INDEX
Explanations
phrases indicating a lack of awareness or being disconnected from reality
New Auto-Interp
Negative Logits
jid
-0.17
embro
-0.17
anki
-0.17
amber
-0.16
ems
-0.15
áž
-0.15
ávÄĽ
-0.15
_gem
-0.14
маÑģÑĤ
-0.14
adge
-0.14
POSITIVE LOGITS
enna
0.16
itta
0.15
ights
0.15
eness
0.14
err
0.14
Hath
0.14
ató
0.14
fol
0.14
iyat
0.14
467
0.14
Activations Density 0.105%