INDEX
Explanations
expressions of emotional and impulsive tendencies
New Auto-Interp
Negative Logits
ilip
-0.14
hea
-0.13
Delicious
-0.13
ahat
-0.13
\a
-0.13
elsen
-0.13
кÑĥÑģ
-0.13
Conscious
-0.13
ign
-0.13
ãĥ
-0.13
POSITIVE LOGITS
intro
0.33
ext
0.29
Intro
0.27
intro
0.27
Type
0.25
Intro
0.25
outgoing
0.24
extrav
0.23
perfection
0.21
analytical
0.21
Activations Density 0.108%