INDEX
Explanations
phrases related to assertions and claims, particularly regarding beliefs and statements
New Auto-Interp
Negative Logits
they
-0.21
we
-0.20
you
-0.17
they
-0.16
it
-0.15
otron
-0.15
они
-0.15
you
-0.15
Logic
-0.15
someone
-0.14
POSITIVE LOGITS
that
0.28
rằng
0.27
bahwa
0.25
ÏĮÏĦι
0.24
daÃŁ
0.24
dass
0.24
that
0.23
that
0.22
että
0.22
že
0.21
Activations Density 0.266%