INDEX
    Explanations

    phrases related to internet user behavior and online community management

    references to spam and platform moderation policies

    New Auto-Interp
    Negative Logits
     ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
    -0.92
     Clarke
    -0.80
     Jennings
    -0.77
     Bundy
    -0.76
     Desmond
    -0.76
     ensemble
    -0.74
     Quart
    -0.73
     Calder
    -0.71
     Roland
    -0.69
     Bauer
    -0.69
    POSITIVE LOGITS
     spam
    1.85
     scams
    1.34
     abusive
    1.25
     scam
    1.25
     bots
    1.24
     harassing
    1.24
     slander
    1.23
     malicious
    1.20
     imperson
    1.17
     annoying
    1.17
    Act Density 0.464%

    No Known Activations