INDEX
Explanations
phrases indicating ownership or possession
New Auto-Interp
Negative Logits
these
-0.17
воÑĢ
-0.16
EITHER
-0.15
THESE
-0.14
them
-0.13
uat
-0.13
uko
-0.13
_BOTH
-0.13
queda
-0.13
osto
-0.13
POSITIVE LOGITS
how
0.30
hoping
0.29
what
0.26
why
0.25
how
0.22
where
0.22
everything
0.20
some
0.19
links
0.19
proof
0.18
Activations Density 0.023%