INDEX
    Explanations

    phrases related to news headlines or current events

    occurrences of the word "the"

    New Auto-Interp
    Negative Logits
    thood
    -0.74
    leeve
    -0.64
    iffe
    -0.63
     Rahul
    -0.60
     assume
    -0.59
     Yuri
    -0.58
     Gerard
    -0.58
    aba
    -0.57
    ALT
    -0.57
     suppose
    -0.57
    POSITIVE LOGITS
    ses
    1.25
     same
    1.16
     longest
    1.06
     entire
    1.05
     fastest
    1.04
     slightest
    1.03
     latter
    1.03
     smallest
    1.01
     hardest
    0.97
     widest
    0.97
    Act Density 0.332%

    No Known Activations