INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     izvo
    -0.08
     verg
    -0.08
     Assume
    -0.08
    ायल
    -0.07
     XR
    -0.07
    оград
    -0.07
     تاب
    -0.07
    УР
    -0.07
    XR
    -0.07
     TILE
    -0.07
    POSITIVE LOGITS
    Spam
    0.10
     contamination
    0.10
     combating
    0.09
     preventing
    0.09
     contaminants
    0.09
    .prevent
    0.09
     Spam
    0.09
    prevent
    0.09
     scammers
    0.09
    Prevent
    0.09
    Act Density 0.006%

    No Known Activations