INDEX
    Explanations

    instances of fraudulent and deceptive behavior

    terms related to fraudulent and deceptive practices

    New Auto-Interp
    Negative Logits
    arium
    -0.77
    nat
    -0.75
    hung
    -0.74
    area
    -0.73
    mun
    -0.73
    bur
    -0.72
    alist
    -0.72
    hed
    -0.72
    resent
    -0.71
    raq
    -0.70
    POSITIVE LOGITS
     fraudulent
    0.88
     scam
    0.85
     unsuspecting
    0.85
     fraud
    0.84
     scams
    0.82
     dece
    0.82
     deceive
    0.78
     manipulative
    0.77
     deception
    0.76
     cheat
    0.75
    Act Density 0.025%

    No Known Activations