INDEX
Negative Logits
astr
-0.10
Spending
-0.09
pond
-0.09
spending
-0.09
æĮĩ导
-0.09
èĥĨ
-0.09
atten
-0.09
apore
-0.08
NU
-0.08
ieves
-0.08
POSITIVE LOGITS
reward
0.24
rewards
0.21
Reward
0.19
Rewards
0.18
Reward
0.17
reward
0.17
rewarded
0.16
rewarding
0.15
_reward
0.13
Find
0.13
Activations Density 0.054%