INDEX
Explanations
phrases related to personal connections and social networks
New Auto-Interp
Negative Logits
mouths
-0.18
hearts
-0.17
assembly
-0.17
noses
-0.16
faces
-0.16
Hearts
-0.15
faces
-0.15
heads
-0.15
797
-0.15
orna
-0.15
POSITIVE LOGITS
personal
0.43
personal
0.35
Personal
0.31
Personal
0.28
career
0.28
个人
0.26
personally
0.25
personality
0.25
persona
0.24
лиÑĩ
0.24
Activations Density 0.019%