INDEX
    Explanations

    instances of manipulation and deception in societal contexts

    New Auto-Interp
    Negative Logits
     AssemblyProduct
    -0.69
    httphttps
    -0.60
    ніципалі
    -0.60
    addCriterion
    -0.57
    hyrchwyd
    -0.57
     lenker
    -0.57
    DotNetBar
    -0.56
     AssemblyTitle
    -0.56
    Personensuche
    -0.55
    RSpec
    -0.55
    POSITIVE LOGITS
     fooled
    1.34
     gul
    1.24
     deceived
    1.18
     unsuspecting
    1.11
     naive
    1.06
     fool
    1.05
     fools
    1.05
     tricked
    1.05
     foolish
    1.04
     fall
    1.02
    Act Density 0.242%

    No Known Activations