INDEX
    Explanations

    narratives centered around self-interest and exploitation, particularly in the context of power dynamics and financial gain

    Acting in one's own interest

    selfish interests and gain

    New Auto-Interp
    Negative Logits
    ModelState
    -0.53
    LabelTagHelper
    -0.53
    شهاد
    -0.51
    Innoc
    -0.49
     unarmed
    -0.49
    Descriere
    -0.46
    esía
    -0.45
     sério
    -0.43
    innoc
    -0.43
    noh
    -0.43
    POSITIVE LOGITS
     selfish
    1.27
     interests
    1.25
     Interests
    1.12
    selfish
    1.11
     profit
    1.10
    interests
    1.06
     selfishness
    1.03
     ego
    1.00
     greed
    1.00
    Interests
    0.97
    Act Density 0.418%

    No Known Activations