INDEX
    Explanations

    occurrences of the word "the" with high activation values

    the definite article "the"

    New Auto-Interp
    Negative Logits
    iffe
    -0.77
    thood
    -0.63
    aba
    -0.60
    leeve
    -0.58
     Pastebin
    -0.58
    claw
    -0.57
    craft
    -0.57
    bee
    -0.56
    Edit
    -0.55
     assume
    -0.55
    POSITIVE LOGITS
     same
    1.16
    ses
    1.08
     longest
    1.05
     hardest
    1.01
     fastest
    1.00
     entire
    0.96
     entirety
    0.96
     quickest
    0.95
     slightest
    0.94
     latter
    0.94
    Act Density 0.244%

    No Known Activations