INDEX
Explanations
possessive forms or contractions of the word "is."
New Auto-Interp
Negative Logits
воÑĢ
-0.15
наÑĢ
-0.14
uat
-0.13
воÑİ
-0.13
ewater
-0.13
ville
-0.13
bestimm
-0.12
thon
-0.12
these
-0.12
arrow
-0.12
POSITIVE LOGITS
how
0.31
where
0.30
what
0.29
why
0.27
hoping
0.26
another
0.26
a
0.23
an
0.23
my
0.23
something
0.21
Activations Density 0.028%