INDEX
    Explanations

    sentences with positive sentiment, expressions of gratitude, and requests for feedback

    New Auto-Interp
    Negative Logits
     instinct
    -0.80
     questioning
    -0.80
     elevated
    -0.78
     rallying
    -0.77
     imperson
    -0.75
     tricked
    -0.74
     undermining
    -0.74
     escaping
    -0.73
     raiding
    -0.73
    ensibly
    -0.73
    POSITIVE LOGITS
     Lastly
    1.63
    <|endoftext|>
    1.60
     Additionally
    1.42
     Alternatively
    1.39
     Also
    1.38
     Anyway
    1.30
     Finally
    1.24
     Please
    1.22
     Enjoy
    1.21
     Otherwise
    1.19
    Act Density 4.496%

    No Known Activations