INDEX
Negative Logits
affairs
-0.09
plonge
-0.08
জানা
-0.08
кан
-0.08
בנ
-0.07
nio
-0.07
جات
-0.07
-src
-0.07
berlin
-0.07
-defense
-0.07
POSITIVE LOGITS
rewarded
0.14
reward
0.14
奖励
0.13
Reward
0.13
.reward
0.12
rewarding
0.12
incentiv
0.12
rewards
0.12
Rewards
0.12
Reward
0.11
Activations Density 0.013%