INDEX
Negative Logits
Enforcement
-0.07
essentially
-0.06
澤
-0.06
κέ
-0.06
эй
-0.06
(n
-0.06
Welfare
-0.06
DEAD
-0.05
deeply
-0.05
collapse
-0.05
POSITIVE LOGITS
Ran
0.07
Svens
0.07
Shir
0.06
皆
0.06
phận
0.06
istrar
0.06
Bengal
0.06
истории
0.06
ireccion
0.06
lacağ
0.06
Activations Density 0.005%