INDEX
Negative Logits
gagner
0.71
risk
0.63
grij
0.60
위험
0.59
dominated
0.58
rd
0.57
Risk
0.57
términos
0.57
raim
0.57
ঝুঁক
0.57
POSITIVE LOGITS
supposed
3.78
supposedly
3.06
seharusnya
2.46
meant
2.46
purportedly
2.44
allegedly
2.35
intended
2.32
suppose
2.24
sollen
2.18
should
2.16
Activations Density 0.304%