INDEX
Explanations
references to rewards or rewards-related terms
terms related to rewards or incentives
New Auto-Interp
Negative Logits
abad
-0.77
abases
-0.75
Ange
-0.72
Osw
-0.69
lander
-0.69
obiles
-0.68
enium
-0.67
Sloven
-0.66
ahime
-0.64
inx
-0.63
POSITIVE LOGITS
reward
1.08
rewards
1.01
rewarded
0.97
rewarding
0.79
Reward
0.78
recipients
0.75
tiers
0.74
Reward
0.73
incentive
0.73
fulfillment
0.72
Activations Density 0.020%