INDEX
Explanations
occurrences of the word "the."
New Auto-Interp
Negative Logits
slightly
-0.18
est
-0.18
orst
-0.16
achuset
-0.16
Zus
-0.15
deadliest
-0.15
stown
-0.15
alc
-0.14
brightest
-0.14
anship
-0.14
POSITIVE LOGITS
sooner
0.28
more
0.24
wiÄĻcej
0.21
greater
0.21
æĽ´å¤ļ
0.21
MORE
0.21
бÑĸлÑĮÑĪе
0.19
болÑĮÑĪе
0.19
MORE
0.18
fewer
0.18
Activations Density 0.010%