INDEX
Explanations
rich, requires, increased, quantitative
New Auto-Interp
Negative Logits
{0.79
I
0.78
ذلك
0.75
of
0.74
Aby
0.72
ب
0.71
(
0.71
وأ
0.71
StartZ
0.71
氇
0.69
POSITIVE LOGITS
t
1.16
л
1.10
ä
1.03
ą
0.97
ા
0.96
il
0.95
a
0.95
ı
0.94
is
0.92
f
0.91
Activations Density 0.077%