INDEX
Negative Logits
Chord
-0.09
leftovers
-0.09
brot
-0.09
cof
-0.09
(Color
-0.08
butterknife
-0.08
leftover
-0.08
aguj
-0.08
warum
-0.08
cudd
-0.08
POSITIVE LOGITS
绩
0.15
accountability
0.15
penal
0.14
奖励
0.13
rewarding
0.13
penalties
0.13
incentiv
0.13
Accountability
0.13
reward
0.12
績
0.12
Activations Density 0.026%