INDEX
    Explanations

    references to specific dates or historical events

    occurrences of the word "the."

    New Auto-Interp
    Negative Logits
    .''
    -0.55
    .
    -0.54
     without
    -0.52
    .</
    -0.52
    âĢł
    -0.50
     with
    -0.49
    SPONSORED
    -0.48
    ."
    -0.48
    .-
    -0.48
    leeve
    -0.48
    POSITIVE LOGITS
     same
    0.96
     latter
    0.96
     aforementioned
    0.95
     slightest
    0.91
     entirety
    0.90
     smallest
    0.88
     entire
    0.88
     simplest
    0.87
    oret
    0.86
     latest
    0.86
    Act Density 1.540%

    No Known Activations