INDEX
Explanations
narratives centered around self-interest and exploitation, particularly in the context of power dynamics and financial gain
Acting in one's own interest
selfish interests and gain
New Auto-Interp
Negative Logits
ModelState
-0.53
LabelTagHelper
-0.53
شهاد
-0.51
Innoc
-0.49
unarmed
-0.49
Descriere
-0.46
esía
-0.45
sério
-0.43
innoc
-0.43
noh
-0.43
POSITIVE LOGITS
selfish
1.27
interests
1.25
Interests
1.12
selfish
1.11
profit
1.10
interests
1.06
selfishness
1.03
ego
1.00
greed
1.00
Interests
0.97
Activations Density 0.418%