INDEX
    Explanations

    mentions of trolling behavior or trolls in online interactions

    New Auto-Interp
    Negative Logits
    ++++++++++++++++
    -0.48
    iyah
    -0.46
     kosher
    -0.46
    erald
    -0.45
     preservation
    -0.45
    riott
    -0.45
    hani
    -0.45
    enture
    -0.45
    âĢ¢âĢ¢âĢ¢âĢ¢
    -0.45
    clusive
    -0.44
    POSITIVE LOGITS
     trolls
    0.63
     troll
    0.62
    ãĥĦ
    0.61
    hattan
    0.57
    bag
    0.57
    tro
    0.56
    bags
    0.55
     trolling
    0.54
     Troll
    0.52
    boxes
    0.49
    Act Density 10.973%

    No Known Activations