INDEX
Negative Logits
discriminate
-0.08
yaptı
-0.07
getters
-0.07
slaughtered
-0.07
superheroes
-0.07
뤠
-0.06
(encoder
-0.06
-0.06
controversies
-0.06
สาย
-0.06
POSITIVE LOGITS
Kul
0.07
ik
0.07
我相信
0.07
Paragraph
0.07
𝚃
0.07
CONDITION
0.07
Western
0.06
opportun
0.06
POL
0.06
monitoring
0.06
Activations Density 0.001%