INDEX
Negative Logits
punishments
-0.07
ific
-0.07
movers
-0.07
SUN
-0.07
hoạt
-0.07
ilig
-0.07
temperatures
-0.06
.edu
-0.06
public
-0.06
�
-0.06
POSITIVE LOGITS
kurum
0.07
SQ
0.06
igslist
0.06
(""0.06
Iter
0.06
exert
0.06
wrong
0.06
.ss
0.06
.MIN
0.06
ness
0.06
Activations Density 0.081%