INDEX
Negative Logits
cola
-0.07
Week
-0.07
blatantly
-0.06
958
-0.06
دهه
-0.06
_ok
-0.06
haha
-0.06
pred
-0.06
باغ
-0.06
جز
-0.06
POSITIVE LOGITS
yeterli
0.07
Hor
0.06
butcher
0.06
sonrası
0.06
Mant
0.06
liced
0.06
δε
0.06
tamp
0.06
/person
0.06
work
0.06
Activations Density 0.000%