INDEX
Negative Logits
POP
-0.07
Nelson
-0.06
faker
-0.06
ompson
-0.06
masturbation
-0.06
KUR
-0.06
pastor
-0.06
modificar
-0.06
ako
-0.06
Rush
-0.06
POSITIVE LOGITS
dye
0.07
dying
0.07
益
0.07
آذ
0.07
Nay
0.06
distinctive
0.06
ublished
0.06
dece
0.06
odyn
0.06
icine
0.06
Activations Density 0.005%