INDEX
Explanations
adverbial and adjectival suffixes
New Auto-Interp
Negative Logits
'
1.19
I
0.92
-
0.83
AT
0.79
스는
0.79
(
0.78
רים
0.76
Prothorax
0.74
ot
0.73
Burgund
0.73
POSITIVE LOGITS
and
1.30
be
1.27
in
1.21
σ
1.16
of
1.09
ง
1.09
ق
1.06
ة
1.03
of
1.00
were
0.99
Activations Density 0.745%