INDEX
Explanations
instances of the word "the."
New Auto-Interp
Negative Logits
imbus
-0.18
enstein
-0.15
airo
-0.14
enci
-0.14
erland
-0.13
eness
-0.13
rhs
-0.13
åłĤ
-0.13
ivre
-0.13
rig
-0.13
POSITIVE LOGITS
oret
0.22
uc
0.14
ologically
0.14
attest
0.14
jen
0.14
abre
0.14
G
0.13
isay
0.13
of
0.13
LOAT
0.13
Activations Density 0.190%