INDEX
Explanations
instances of punctuation or symbols
New Auto-Interp
Negative Logits
eux
-0.18
lui
-0.17
them
-0.17
THEM
-0.16
нÑĮого
-0.15
ниÑħ
-0.15
him
-0.14
Otherwise
-0.14
него
-0.13
ãģ¨ãĤĤ
-0.13
POSITIVE LOGITS
there
0.52
it
0.49
there
0.35
we
0.35
they
0.30
you
0.30
many
0.28
nothing
0.28
it
0.26
this
0.26
Activations Density 0.526%