INDEX
    Explanations

    instances of the word "upset," indicating emotions of dissatisfaction or distress

    New Auto-Interp
    Negative Logits
    glas
    -0.85
     livest
    -0.81
    ographies
    -0.75
    audi
    -0.70
    istered
    -0.70
    gravity
    -0.69
    haar
    -0.69
    perty
    -0.69
    estones
    -0.68
    atures
    -0.67
    POSITIVE LOGITS
    dy
    0.86
    der
    0.80
     stomach
    0.75
    ingly
    0.74
    quished
    0.72
     upset
    0.72
    NESS
    0.70
    bur
    0.70
     roy
    0.69
    ful
    0.67
    Act Density 0.008%

    No Known Activations