INDEX
Explanations
instances of the word "them."
New Auto-Interp
Negative Logits
itself
-0.26
اÙĨÙĩ
-0.20
ibly
-0.19
ovna
-0.17
_DECREF
-0.16
quine
-0.15
(es
-0.15
taire
-0.15
odge
-0.15
bucks
-0.15
POSITIVE LOGITS
/us
0.48
/her
0.43
self
0.38
atically
0.35
/th
0.34
elves
0.32
zelf
0.28
SELF
0.26
selves
0.25
SEL
0.24
Activations Density 0.156%