INDEX
Explanations
expressions of emotional support and familial relationships
New Auto-Interp
Negative Logits
him
-0.17
herself
-0.17
à¹Ģà¸Ńà¸ĩ
-0.16
oneself
-0.16
him
-0.15
insanlar
-0.14
iy
-0.14
iry
-0.14
Him
-0.14
šku
-0.14
POSITIVE LOGITS
her
0.38
their
0.32
Her
0.30
HER
0.30
deren
0.28
Her
0.25
Their
0.24
ihr
0.23
her
0.23
their
0.23
Activations Density 0.336%