INDEX
Explanations
emotional responses and actions related to care and interpersonal relationships
New Auto-Interp
Negative Logits
她们
-0.20
å®ĥ们
-0.18
THEY
-0.15
uada
-0.15
usi
-0.14
ategorized
-0.14
вони
-0.14
okers
-0.14
-fw
-0.13
uchos
-0.13
POSITIVE LOGITS
him
1.55
him
1.09
lui
1.05
Him
0.98
ihn
0.93
ihm
0.91
HIM
0.79
него
0.74
емÑĥ
0.69
немÑĥ
0.67
Activations Density 0.978%