INDEX
    Explanations

    words related to negative events or situations

    terms related to feelings of shame or social failure

    New Auto-Interp
    Negative Logits
     corrid
    -0.80
    ramer
    -0.78
    bors
    -0.77
    eways
    -0.76
    estone
    -0.75
    cium
    -0.72
    rame
    -0.69
     Aires
    -0.69
     livest
    -0.68
    nda
    -0.68
    POSITIVE LOGITS
     embarrassment
    1.16
     certs
    0.80
    ously
    0.75
    è£ħ
    0.75
    èª
    0.74
     dishon
    0.73
    ãĥĭ
    0.73
    é¾įå¥ij士
    0.73
    UAL
    0.72
    lessly
    0.72
    Act Density 0.015%

    No Known Activations