INDEX
    Explanations

    references to tabloid news outlets and sensational stories

    New Auto-Interp
    Negative Logits
    esser
    -0.19
     CBC
    -0.15
    olen
    -0.15
    olis
    -0.14
     UsersController
    -0.14
     Olymp
    -0.14
    ost
    -0.14
    oufl
    -0.13
    ะ
    -0.13
    ullen
    -0.13
    POSITIVE LOGITS
    vio
    0.18
    #ad
    0.16
    iid
    0.15
    iw
    0.14
    wik
    0.14
    idity
    0.14
    lal
    0.14
    над
    0.14
     Lal
    0.14
    atchet
    0.14
    Act Density 0.018%

    No Known Activations