INDEX
Explanations
terms related to rewards or benefits offered to encourage specific behaviors or actions
repeated mentions of the word "incentive."
New Auto-Interp
Negative Logits
rooms
-0.91
angers
-0.76
agn
-0.75
â̲
-0.73
rup
-0.72
gaard
-0.72
room
-0.70
thing
-0.69
asp
-0.69
afa
-0.67
POSITIVE LOGITS
incentive
1.30
incent
1.11
incentives
1.10
incentiv
0.97
compensation
0.90
compel
0.85
Reviewer
0.85
compulsion
0.85
induce
0.84
rewarded
0.83
Activations Density 0.008%