INDEX
    Explanations

    words related to strong negative emotions or reactions

    expressions of anger or frustration

    New Auto-Interp
    Negative Logits
    acea
    -0.88
    erva
    -0.85
    cius
    -0.82
    oak
    -0.81
    arette
    -0.78
    querque
    -0.77
    win
    -0.77
    elia
    -0.76
    pty
    -0.75
    ynski
    -0.75
    POSITIVE LOGITS
    idious
    0.74
     ultras
    0.70
     Furious
    0.68
     Attacks
    0.64
     Dug
    0.64
    ãĥ£
    0.62
    quished
    0.61
    icago
    0.61
     furious
    0.61
     err
    0.60
    Act Density 0.040%

    No Known Activations