INDEX
Negative Logits
alloween
-0.07
".$
-0.07
toddlers
-0.07
ころ
-0.07
earable
-0.06
topic
-0.06
Rural
-0.06
시
-0.06
acin
-0.06
.sponge
-0.06
POSITIVE LOGITS
lead
0.06
Nhap
0.06
Scottish
0.06
加入
0.06
asserting
0.06
allure
0.06
crunchy
0.06
Amend
0.06
of
0.06
aide
0.06
Activations Density 0.008%