INDEX
    Explanations

    occurrences of the word "every."

    New Auto-Interp
    Negative Logits
    ulate
    -0.17
    eworthy
    -0.17
    st
    -0.15
    ilitary
    -0.14
    kee
    -0.14
    ilent
    -0.14
    arith
    -0.14
    incare
    -0.14
    side
    -0.14
    atcher
    -0.14
    POSITIVE LOGITS
    /all
    0.21
    hone
    0.19
    THING
    0.18
    things
    0.17
    where
    0.17
    ones
    0.17
    thin
    0.16
     einzel
    0.16
    though
    0.16
    ied
    0.15
    Act Density 0.047%

    No Known Activations