INDEX
Explanations
phrases related to emotional responses and interpersonal relationships
New Auto-Interp
Negative Logits
à¹Ģà¸Ńà¸ĩ
-0.17
amins
-0.17
sebou
-0.17
entials
-0.17
him
-0.16
Humans
-0.15
mình
-0.15
éº
-0.14
à¸Ńà¸ļ
-0.14
gregar
-0.14
POSITIVE LOGITS
their
0.43
THEIR
0.36
peoples
0.33
their
0.32
everyone
0.31
Their
0.31
Their
0.30
deren
0.30
theirs
0.30
everybody
0.29
Activations Density 0.547%