INDEX
Explanations
committee meetings and work
New Auto-Interp
Negative Logits
่
1.32
it
1.31
ние
1.26
can
1.16
'
1.16
be
1.14
are
1.10
ка
1.02
garantia
1.00
we
0.98
POSITIVE LOGITS
᱘
1.23
ut
1.16
1.14
at
1.14
ت
1.13
The
1.10
ל
1.06
ו
1.05
is
1.05
一
1.05
Activations Density 0.001%