INDEX
Negative Logits
reen
-0.80
moda
-0.80
oka
-0.79
сен
-0.75
фом
-0.73
Россий
-0.72
Hwang
-0.72
ngoài
-0.71
スカート
-0.71
privati
-0.71
POSITIVE LOGITS
disdain
2.25
contempt
2.09
looked
1.93
condescending
1.84
bel
1.80
瞧
1.77
look
1.72
down
1.70
despise
1.69
underestimate
1.67
Activations Density 0.027%