INDEX
    Explanations

    punctuation marks and quotation marks, suggesting a focus on dialogue or spoken expressions in the text

    New Auto-Interp
    Negative Logits
     drawn
    -0.18
     taken
    -0.17
     driven
    -0.17
     risen
    -0.15
     eaten
    -0.15
    itas
    -0.15
     undertaken
    -0.15
     given
    -0.15
     flown
    -0.15
     arisen
    -0.14
    POSITIVE LOGITS
    didn
    0.36
    couldn
    0.32
    had
    0.30
    went
    0.29
    Didn
    0.28
    felt
    0.27
    took
    0.27
    did
    0.27
    fell
    0.27
    forgot
    0.27
    Act Density 0.054%

    No Known Activations