INDEX
Explanations
phrases indicating the acceptance of manuscripts for publication
accepted for publication
New Auto-Interp
Negative Logits
tur
-0.36
tit
-0.35
demon
-0.35
pin
-0.35
emp
-0.34
ؤلاء
-0.34
=("-0.34
̥
-0.33
evening
-0.33
feeling
-0.33
POSITIVE LOGITS
Accepted
1.27
Accepted
1.23
accepted
1.07
accepted
1.03
ACCEPTED
0.88
Acceptance
0.84
Accept
0.80
accepte
0.79
Acceptance
0.77
ACCEPT
0.75
Activations Density 0.002%