INDEX
Explanations
certainly followed by affirmation or description
New Auto-Interp
Negative Logits
that
1.16
with
1.08
by
1.05
(
1.00
which
0.96
ك
0.95
ERVER
0.90
ουν
0.88
ب
0.88
of
0.87
POSITIVE LOGITS
in
1.09
certainly
1.03
Certainly
0.97
Certainly
0.88
обратить
0.77
।
0.73
appears
0.73
。
0.72
。",
0.71
᱘
0.70
Activations Density 0.002%