INDEX
Explanations
police and associated actions
New Auto-Interp
Negative Logits
i
0.85
و
0.80
the
0.75
is
0.74
nější
0.74
to
0.73
r
0.73
ो
0.71
νέ
0.68
zwią
0.68
POSITIVE LOGITS
อ
0.80
you
0.76
৬
0.75
স
0.71
৯
0.71
им
0.69
ο
0.69
א
0.67
об
0.64
ти
0.63
Activations Density 0.003%