INDEX
Negative Logits
Info
-0.07
EOS
-0.07
reen
-0.07
core
-0.07
kernel
-0.06
strides
-0.06
legends
-0.06
Pro
-0.06
-dot
-0.06
(column
-0.06
POSITIVE LOGITS
punishment
0.14
punish
0.13
punished
0.12
punishments
0.10
punishing
0.09
Pun
0.08
yasak
0.08
rewarded
0.07
наказ
0.07
punishable
0.07
Activations Density 0.006%