INDEX
Negative Logits
'
0.66
Wit
0.43
r
0.43
εν
0.42
unsupervised
0.41
Trinity
0.40
输入
0.40
iciencia
0.39
目的地
0.39
l
0.39
POSITIVE LOGITS
violations
0.77
violations
0.72
violation
0.70
Violation
0.68
حقوق
0.65
अधिकारों
0.64
rights
0.61
direitos
0.59
उल्लंघन
0.59
violated
0.57
Activations Density 0.035%