INDEX
Explanations
government actions and policies
New Auto-Interp
Negative Logits
ك
0.85
ка
0.79
larda
0.74
português
0.71
humana
0.70
كيف
0.69
ように
0.68
ді
0.68
predom
0.68
distint
0.67
POSITIVE LOGITS
is
1.17
ad
0.97
H
0.90
ine
0.88
government
0.82
id
0.82
Government
0.80
V
0.79
S
0.77
(
0.77
Activations Density 0.010%