INDEX
Explanations
references to temporal concepts and the likelihood of events or conditions occurring
New Auto-Interp
Negative Logits
him
-0.23
нÑĮого
-0.21
them
-0.21
него
-0.20
THEM
-0.19
eux
-0.19
lui
-0.18
ниÑħ
-0.18
him
-0.18
немÑĥ
-0.18
POSITIVE LOGITS
they
0.47
we
0.38
there
0.35
someone
0.34
it
0.34
that
0.31
things
0.31
somebody
0.29
she
0.29
something
0.29
Activations Density 0.200%