INDEX
    Explanations

    phrases containing the word "the" followed by a specific word.

    phrases that indicate comparison or contrast involving the word "the."

    New Auto-Interp
    Negative Logits
    iband
    -0.73
    frey
    -0.72
     Became
    -0.70
     respectively
    -0.70
     Accessed
    -0.69
     anew
    -0.66
    isin
    -0.66
     apiece
    -0.65
    fw
    -0.64
     exceeded
    -0.62
    POSITIVE LOGITS
     rest
    1.40
     ones
    1.24
     originals
    1.17
     usual
    1.14
     others
    1.14
     original
    1.09
     previous
    1.09
     likes
    1.06
     aforementioned
    1.02
     norm
    1.02
    Act Density 0.187%

    No Known Activations