INDEX
    Explanations

    proper nouns or specific entities

    the word "the" in various contexts

    New Auto-Interp
    Negative Logits
    thood
    -0.72
    iffe
    -0.71
    earch
    -0.70
    leeve
    -0.69
    rehend
    -0.68
    Background
    -0.67
    rade
    -0.67
    verage
    -0.66
    eno
    -0.64
    hire
    -0.64
    POSITIVE LOGITS
    oret
    1.23
     latter
    1.18
     longest
    1.16
     shortest
    1.14
     same
    1.11
     fastest
    1.11
     biggest
    1.09
     smallest
    1.08
     largest
    1.07
     simplest
    1.07
    Act Density 0.168%

    No Known Activations