INDEX
Negative Logits
ди
0.79
지
0.70
ك
0.70
خ
0.69
ڈ
0.66
ية
0.64
ul
0.64
مي
0.64
dı
0.64
ق
0.64
POSITIVE LOGITS
).
0.59
的同时
0.59
ethnicities
0.59
0.55
Of
0.53
ethnicity
0.51
Preventing
0.51
<
0.50
Racial
0.50
Detecting
0.49
Activations Density 0.022%