INDEX
Explanations
sentiments related to social preferences and interactions, particularly with a focus on introversion and extroversion
New Auto-Interp
Negative Logits
orate
-0.14
orian
-0.14
ipay
-0.14
лади
-0.13
ogg
-0.13
åį
-0.13
strand
-0.13
aku
-0.13
inde
-0.13
nth
-0.13
POSITIVE LOGITS
social
0.53
social
0.44
Social
0.41
Social
0.38
-social
0.37
sociale
0.37
.social
0.36
SOCIAL
0.36
_social
0.36
soci
0.35
Activations Density 0.328%