INDEX
Negative Logits
دف
-0.07
�
-0.07
Rule
-0.07
disp
-0.06
Warranty
-0.06
confused
-0.06
klim
-0.06
Chapters
-0.06
_draw
-0.06
obvious
-0.06
POSITIVE LOGITS
([
0.09
{[0.08
(([
0.08
<[
0.07
([^
0.07
/car
0.07
,{0.07
хови
0.07
(/*
0.07
(({0.07
Activations Density 0.008%