INDEX
    Explanations

    words related to vulnerability or lack of control

    expressions of helplessness and powerlessness

    New Auto-Interp
    Negative Logits
    ickr
    -0.81
    issue
    -0.74
    edia
    -0.74
    anners
    -0.64
    aters
    -0.64
    WAYS
    -0.63
    cius
    -0.63
    rosso
    -0.62
    dule
    -0.62
    ramid
    -0.62
    POSITIVE LOGITS
     helpless
    1.38
    nesses
    1.10
    ness
    1.08
    NESS
    1.01
    ingly
    0.83
     strugg
    0.83
     powerless
    0.83
     redes
    0.80
     hopeless
    0.79
    TPPStreamerBot
    0.76
    Act Density 0.016%

    No Known Activations