INDEX
Negative Logits
sexism
0.44
sexist
0.44
misog
0.42
chival
0.42
ecologists
0.40
Saj
0.38
rescaling
0.38
scaling
0.38
))(
0.38
nem
0.38
POSITIVE LOGITS
ත්
0.45
يش
0.44
фонд
0.42
било
0.38
σουμε
0.38
inträ
0.37
arın
0.37
لے
0.36
浔
0.36
ilikom
0.35
Activations Density 0.001%