INDEX
    Explanations

    words related to negative actions or emotions

    complex themes related to human emotions and societal issues

    New Auto-Interp
    Negative Logits
    arnaev
    -0.72
    eatures
    -0.68
     bag
    -0.63
     version
    -0.61
     Achievements
    -0.57
    uscript
    -0.56
     SAY
    -0.55
     Tracks
    -0.55
     Cases
    -0.54
     LOT
    -0.54
    POSITIVE LOGITS
    lessness
    1.20
    fulness
    1.07
    iness
    0.94
    liness
    0.92
    thood
    0.91
    ulence
    0.85
    smanship
    0.83
    ism
    0.80
    ality
    0.79
    ishment
    0.79
    Act Density 0.412%

    No Known Activations