INDEX
    Explanations

    references to authoritative figures or institutions

    occurrences of the word "the."

    New Auto-Interp
    Negative Logits
     thereof
    -0.62
     thereby
    -0.57
     respectively
    -0.57
    thood
    -0.57
    .
    -0.55
    iffe
    -0.55
    wen
    -0.55
    elaide
    -0.54
     namely
    -0.54
    âĢł
    -0.54
    POSITIVE LOGITS
     simplest
    1.08
     slightest
    1.07
     same
    1.03
     smallest
    1.02
    oret
    0.99
     widest
    0.95
     easiest
    0.93
     vast
    0.93
     entirety
    0.92
     largest
    0.92
    Act Density 1.479%

    No Known Activations