INDEX
Explanations
descriptors associated with toxic or narcissistic behavior
New Auto-Interp
Negative Logits
.gb
-0.16
xAA
-0.15
ên
-0.14
ollen
-0.14
éĩ
-0.14
·¸
-0.14
oub
-0.14
Pregnancy
-0.14
.logic
-0.14
éľĬ
-0.13
POSITIVE LOGITS
Narc
0.33
narciss
0.30
personality
0.26
Border
0.24
Cluster
0.24
Personality
0.24
soci
0.23
traits
0.22
borderline
0.22
Border
0.21
Activations Density 0.024%