INDEX
Explanations
phrases related to reflection and self-awareness
New Auto-Interp
Negative Logits
conmigo
-1.02
comigo
-0.96
me
-0.90
Itself
-0.88
asegurarse
-0.82
私に
-0.79
dirinya
-0.78
magát
-0.78
itself
-0.78
私の
-0.75
POSITIVE LOGITS
ourselves
2.71
our
1.18
можем
1.13
знаем
1.02
будем
0.96
نحن
0.92
our
0.92
هستیم
0.92
видим
0.91
we
0.89
Activations Density 0.459%