INDEX
    Explanations

    expressions related to emotions, especially deep feelings like gratitude, loss, and determination

    phrases connected to personal emotions and introspection

    New Auto-Interp
    Negative Logits
    iste
    -0.65
     GOODMAN
    -0.59
    bley
    -0.53
    inth
    -0.52
    semb
    -0.50
    nia
    -0.50
     WATCHED
    -0.50
    ANS
    -0.49
    going
    -0.49
    ymm
    -0.49
    POSITIVE LOGITS
     your
    1.14
    their
    1.14
    his
    1.14
     his
    1.13
     YOUR
    1.09
     their
    1.08
    your
    1.07
     my
    1.02
     THEIR
    0.99
     HIS
    0.96
    Act Density 0.691%

    No Known Activations