INDEX
Negative Logits
oldValue
-0.08
isset
-0.07
viol
-0.07
(e
-0.07
Rules
-0.07
gördüğü
-0.07
wollte
-0.07
蒄
-0.07
_PULL
-0.07
территории
-0.06
POSITIVE LOGITS
坐
0.07
politically
0.07
�
0.07
慧
0.07
�
0.07
etak
0.06
�
0.06
詈
0.06
listed
0.06
�
0.06
Activations Density 0.002%