INDEX
Explanations
the word "the" in various contexts throughout the document
New Auto-Interp
Negative Logits
erten
-0.15
ura
-0.14
iverse
-0.14
erties
-0.14
eler
-0.14
hone
-0.14
mq
-0.13
embre
-0.13
apas
-0.13
ansa
-0.13
POSITIVE LOGITS
sake
0.42
purposes
0.35
geries
0.21
feit
0.21
bidden
0.20
instance
0.19
aging
0.18
cing
0.18
ney
0.18
reasons
0.18
Activations Density 0.172%