INDEX
Explanations
indications of delusions and beliefs related to mental disorders
New Auto-Interp
Negative Logits
ugin
-0.16
ogui
-0.15
elon
-0.15
laus
-0.15
zcze
-0.15
trys
-0.15
illon
-0.15
ugen
-0.14
apphire
-0.14
niên
-0.14
POSITIVE LOGITS
entitlement
0.19
superior
0.18
unique
0.18
kowski
0.16
inf
0.15
entitled
0.15
victim
0.15
unique
0.15
Unique
0.15
convinced
0.14
Activations Density 0.183%