INDEX
Explanations
phrases related to personal relationships and emotional dependencies
New Auto-Interp
Negative Logits
themselves
-0.42
itself
-0.35
kita
-0.17
ald
-0.16
Ø®ÙĪØ¯Ø´
-0.16
ÄĽl
-0.15
esin
-0.15
alm
-0.15
himself
-0.15
os
-0.15
POSITIVE LOGITS
yourself
0.82
Yourself
0.55
yourselves
0.54
your
0.46
your
0.43
ä½łçļĦ
0.39
Your
0.30
ваÑĪ
0.29
можеÑĤе
0.28
votre
0.27
Activations Density 1.150%