INDEX
Explanations
frequency and patterns of the word "the" in various contexts throughout the text
New Auto-Interp
Negative Logits
own
-0.15
ours
-0.15
ered
-0.14
own
-0.13
(ed
-0.13
less
-0.12
ld
-0.12
ishly
-0.12
ord
-0.12
liest
-0.12
POSITIVE LOGITS
ses
0.29
same
0.26
following
0.21
latter
0.20
entire
0.19
likes
0.18
(ir
0.18
odore
0.18
sse
0.18
osoph
0.18
Activations Density 3.631%