INDEX
    Explanations

    instances of strong negative emotions or harsh language

    New Auto-Interp
    Negative Logits
    )|^{
    -0.69
     GIP
    -0.63
    abestanden
    -0.61
    — 
    -0.61
    eradish
    -0.61
     &___
    -0.59
     Sanger
    -0.59
    зульта
    -0.59
     snippetHide
    -0.58
     quí
    -0.58
    POSITIVE LOGITS
    ine
    0.74
    SourceChecksum
    0.66
     ویکی‌پدی
    0.64
    boarding
    0.57
    master
    0.55
     quedarse
    0.55
    baga
    0.54
     belast
    0.54
    +#+#
    0.54
    mm
    0.53
    Act Density 0.068%

    No Known Activations