INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    it
    0.81
     
    0.80
    و
    0.77
    ના
    0.68
    ல்
    0.66
    ون
    0.66
    س
    0.66
    D
    0.65
    ियों
    0.64
    ských
    0.64
    POSITIVE LOGITS
     defraud
    1.12
     scams
    0.95
     frauds
    0.84
     Fraud
    0.75
     fraudulent
    0.75
     fraude
    0.74
    0.74
     fraud
    0.68
     attaques
    0.68
    Fraud
    0.66
    Act Density 0.004%

    No Known Activations