INDEX
Explanations
occurrences of the word "the."
New Auto-Interp
Negative Logits
apan
-0.15
andi
-0.15
pu
-0.14
iera
-0.14
ccoli
-0.13
ायन
-0.13
Atl
-0.13
sub
-0.13
nie
-0.13
zug
-0.13
POSITIVE LOGITS
isl
0.14
bih
0.14
raquo
0.14
iná
0.14
fulness
0.14
orex
0.13
icha
0.13
CHED
0.13
à¥ģण
0.13
FR
0.13
Activations Density 0.027%