INDEX
Negative Logits
ari
0.68
ING
0.63
art
0.59
ă
0.59
ara
0.57
)।
0.55
ة
0.54
September
0.54
Author
0.52
Assistant
0.52
POSITIVE LOGITS
(+)
1.09
(+
0.87
(+)
0.83
+,
0.82
$+
0.82
++
0.81
,+
0.80
.+
0.79
">+</
0.78
/+
0.78
Activations Density 0.081%
ari
ING
art
ă
ara
)।
ة
September
Author
Assistant
(+)
(+
(+)
+,
$+
++
,+
.+
">+</
/+