INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    territ
    -0.07
    ">
    
    ↵
    -0.07
     взаим
    -0.06
    aştır
    -0.06
    stoup
    -0.06
    ispens
    -0.06
    ."','".$
    -0.06
    webtoken
    -0.06
     руках
    -0.06
     horas
    -0.06
    POSITIVE LOGITS
     fraud
    0.12
     Fraud
    0.11
    ===============↵
    0.07
     grid
    0.07
    -standing
    0.07
     frag
    0.06
     док
    0.06
     theft
    0.06
     fan
    0.06
     Prom
    0.06
    Act Density 0.002%

    No Known Activations