INDEX
Explanations
foreign language articles/pronouns
New Auto-Interp
Negative Logits
g
1.63
ing
1.58
u
1.19
ine
1.15
é
1.14
ın
1.13
and
1.09
ون
1.07
AND
1.07
im
1.06
POSITIVE LOGITS
at
1.59
an
1.32
1.15
jenigen
1.14
,
1.14
was
1.13
マ
1.12
ما
1.06
מו
1.05
.
1.04
Activations Density 0.008%