INDEX
    Explanations

    emotional responses, particularly feelings of disappointment, anger, shock, and happiness

    New Auto-Interp
    Negative Logits
    eds
    -0.16
    416
    -0.15
    ader
    -0.15
    oss
    -0.14
    lor
    -0.14
    rop
    -0.14
    onth
    -0.14
    andalone
    -0.14
    θα
    -0.14
    berman
    -0.13
    POSITIVE LOGITS
     about
    0.20
     withObject
    0.17
    contres
    0.16
    ingly
    0.15
     hearing
    0.15
    åIJ¬åΰ
    0.15
    ajar
    0.15
    isque
    0.15
    /dist
    0.15
     bahwa
    0.15
    Act Density 0.151%

    No Known Activations