INDEX
    Explanations

    instances of the word "the" and variations of it

    New Auto-Interp
    Negative Logits
    ered
    -0.16
    own
    -0.15
     ours
    -0.14
    ld
    -0.14
    led
    -0.13
     own
    -0.13
    ned
    -0.13
    (ed
    -0.13
    seek
    -0.13
    ishly
    -0.13
    POSITIVE LOGITS
    ses
    0.29
     same
    0.25
     latter
    0.20
     following
    0.20
    (ir
    0.18
     likes
    0.18
    orex
    0.18
    osoph
    0.18
     entire
    0.18
    same
    0.18
    Act Density 3.593%

    No Known Activations