INDEX
Explanations
instances of the word "the" and variations of it
New Auto-Interp
Negative Logits
ered
-0.16
own
-0.15
ours
-0.14
ld
-0.14
led
-0.13
own
-0.13
ned
-0.13
(ed
-0.13
seek
-0.13
ishly
-0.13
POSITIVE LOGITS
ses
0.29
same
0.25
latter
0.20
following
0.20
(ir
0.18
likes
0.18
orex
0.18
osoph
0.18
entire
0.18
same
0.18
Activations Density 3.593%