INDEX
Explanations
phrases indicating conviction or certainty
New Auto-Interp
Negative Logits
did
-0.21
did
-0.19
DID
-0.19
Did
-0.18
Did
-0.18
shall
-0.17
does
-0.16
ahl
-0.16
didn
-0.16
λλι
-0.15
POSITIVE LOGITS
is
0.33
are
0.32
ARE
0.28
was
0.28
æĺ¯
0.26
adalah
0.26
ÑıвлÑıеÑĤÑģÑı
0.25
ÑıвлÑıÑİÑĤÑģÑı
0.24
_are
0.24
æĺ¯
0.23
Activations Density 0.170%