INDEX
Explanations
lunch break or conversation
New Auto-Interp
Negative Logits
an
1.78
on
1.60
in
1.32
u
1.27
en
1.25
ap
1.04
st
1.01
com
1.01
model
1.00
it
0.99
POSITIVE LOGITS
:
1.14
ي
1.14
не
1.12
ع
1.05
for
1.03
м
1.03
يات
1.02
з
1.02
ك
1.02
й
1.00
Activations Density 0.003%