INDEX
    Explanations

    words expressing negative emotions or outcomes

    expressions of regret or misfortune

    New Auto-Interp
    Negative Logits
    arnaev
    -0.78
    addons
    -0.69
    rouse
    -0.67
    ĸļ
    -0.67
    rounder
    -0.67
    aver
    -0.66
    aeda
    -0.64
    kefeller
    -0.64
    afort
    -0.64
    arij
    -0.64
    POSITIVE LOGITS
    ,
    0.91
     enough
    0.90
    ,...
    0.80
     for
    0.74
     though
    0.74
     neither
    0.73
     however
    0.70
     alas
    0.69
     there
    0.68
     none
    0.67
    Act Density 0.058%

    No Known Activations