INDEX
    Explanations

    emotional responses or expressions related to feelings

    New Auto-Interp
    Negative Logits
    <unused52>
    -0.82
    <unused68>
    -0.82
    <unused14>
    -0.82
    <unused79>
    -0.82
    [@BOS@]
    -0.82
    <unused16>
    -0.82
    <unused28>
    -0.82
    <unused8>
    -0.82
    <unused3>
    -0.81
    <unused21>
    -0.81
    POSITIVE LOGITS
     fanfic
    0.76
     fanart
    0.76
     fanfiction
    0.75
     OC
    0.70
     canon
    0.63
     fandom
    0.63
     tumblr
    0.62
     AU
    0.59
    OC
    0.54
     fandoms
    0.54
    Act Density 0.370%

    No Known Activations