INDEX
Explanations
occupations and roles
researchers and developers
New Auto-Interp
Negative Logits
at
1.34
to
1.17
ری
1.17
た
1.15
ش
1.12
ع
1.11
t
1.09
as
1.07
ت
1.07
س
1.04
POSITIVE LOGITS
’”
0.86
lerce
0.81
li
0.76
professionnels
0.73
liğini
0.72
preneurs
0.71
锏
0.70
politici
0.69
الذين
0.68
мощности
0.68
Activations Density 0.918%