INDEX
    Explanations

    emotional reactions and feelings conveyed in the text

    New Auto-Interp
    Negative Logits
    erus
    -0.15
    anela
    -0.14
    /Peak
    -0.14
    zw
    -0.14
    iente
    -0.14
    izedName
    -0.14
    isz
    -0.14
    ahat
    -0.13
     spoiler
    -0.13
    elon
    -0.13
    POSITIVE LOGITS
    echo
    0.15
     others
    0.15
    ãĥ©ãĥĥãĤ¯
    0.15
    Ñĥг
    0.15
     us
    0.14
    achers
    0.14
    others
    0.14
    eÄį
    0.14
    .bootstrap
    0.13
    utr
    0.13
    Act Density 0.170%

    No Known Activations