INDEX
    Explanations

    expressions of criticism and negative sentiment towards individuals or groups

    New Auto-Interp
    Negative Logits
    omu
    -0.16
     Nar
    -0.15
    856
    -0.15
    otland
    -0.15
    ador
    -0.15
     mou
    -0.15
    oen
    -0.14
    iali
    -0.14
    uka
    -0.14
     Bale
    -0.14
    POSITIVE LOGITS
    inh
    0.16
     Thur
    0.14
    жи
    0.14
    á»Ĩ
    0.14
    itecture
    0.14
    ÏĦο
    0.14
    .disk
    0.14
    ISO
    0.14
    elt
    0.14
    Exited
    0.14
    Act Density 0.408%

    No Known Activations