INDEX
    Explanations

    frequency and patterns of the word "the" in various contexts throughout the text

    New Auto-Interp
    Negative Logits
    own
    -0.15
     ours
    -0.15
    ered
    -0.14
     own
    -0.13
    (ed
    -0.13
    less
    -0.12
    ld
    -0.12
    ishly
    -0.12
    ord
    -0.12
    liest
    -0.12
    POSITIVE LOGITS
    ses
    0.29
     same
    0.26
     following
    0.21
     latter
    0.20
     entire
    0.19
     likes
    0.18
    (ir
    0.18
    odore
    0.18
    sse
    0.18
    osoph
    0.18
    Act Density 3.631%

    No Known Activations