INDEX
Explanations
phrases related to social dynamics and interpersonal relationships
New Auto-Interp
Negative Logits
them
-0.71
Them
-0.68
selves
-0.67
ſelves
-0.61
hennes
-0.61
herself
-0.58
Them
-0.58
給我
-0.56
Him
-0.56
Yourself
-0.55
POSITIVE LOGITS
we
1.58
they
1.52
that
1.38
you
1.25
he
1.17
she
1.03
everyone
0.89
mà
0.87
someone
0.84
mình
0.83
Activations Density 1.215%