INDEX
Explanations
references to self-awareness and identity
New Auto-Interp
Negative Logits
eyh
-0.18
yapılır
-0.15
bulunur
-0.15
yapar
-0.15
icari
-0.14
_ping
-0.13
dont
-0.13
:size
-0.13
Bec
-0.13
kullanılır
-0.13
POSITIVE LOGITS
Äijang
0.48
æŃ£åľ¨
0.42
is
0.40
are
0.38
à¸ģำล
0.35
estamos
0.31
está
0.30
æĺ¯åľ¨
0.29
æĺ¯
0.28
æĺ¯
0.28
Activations Density 0.678%