INDEX
Explanations
keywords and concepts related to misunderstandings and their implications in various contexts
New Auto-Interp
Negative Logits
however
-0.22
jednak
-0.17
ιά
-0.16
ẩu
-0.16
gado
-0.15
ÄĽl
-0.15
bah
-0.15
HOWEVER
-0.15
пÑĢавда
-0.15
ajes
-0.14
POSITIVE LOGITS
they
0.24
they
0.22
it
0.21
Ù쨥ÙĨ
0.17
he
0.17
æĿ¥è¯´
0.16
они
0.16
ula
0.15
became
0.15
al
0.15
Activations Density 0.290%