INDEX
Explanations
interrogative phrases or questions
New Auto-Interp
Negative Logits
they
-0.17
they
-0.15
kop
-0.15
ÑģÑĤÑĭ
-0.15
itsu
-0.14
escription
-0.14
atile
-0.14
it
-0.14
wor
-0.13
eso
-0.13
POSITIVE LOGITS
do
0.29
did
0.22
does
0.22
are
0.21
æĺ¯æĪij
0.20
do
0.18
.do
0.18
about
0.17
did
0.17
Does
0.17
Activations Density 0.060%