INDEX
Explanations
terms related to narcissism and self-centered behaviors
New Auto-Interp
Negative Logits
elsen
-0.17
ÐŁÐļ
-0.15
era
-0.15
ÙĨØ´
-0.15
.scalablytyped
-0.15
nen
-0.15
roken
-0.15
erver
-0.14
wise
-0.14
erd
-0.14
POSITIVE LOGITS
Narc
0.29
narciss
0.27
Personality
0.22
personality
0.21
narc
0.21
istic
0.19
Border
0.18
Nar
0.16
/self
0.16
PD
0.16
Activations Density 0.009%