INDEX
    Explanations

    expressions of strong emotions or reactions, especially negative ones

    New Auto-Interp
    Negative Logits
    imum
    -0.91
     Formation
    -0.86
    atari
    -0.79
    aceutical
    -0.77
    itivity
    -0.73
    omi
    -0.73
    ulton
    -0.72
    ancial
    -0.69
     Attribution
    -0.68
     Virtue
    -0.68
    POSITIVE LOGITS
     ashamed
    1.59
     humiliated
    1.48
     frustrated
    1.47
     disgusted
    1.45
     saddened
    1.44
     embarrassed
    1.43
     depressed
    1.42
     afraid
    1.39
     powerless
    1.39
     confused
    1.39
    Act Density 0.214%

    No Known Activations