INDEX
    Explanations

    occurrences of the definite article "the."

    New Auto-Interp
    Negative Logits
    urious
    -1.51
    uel
    -1.48
     racket
    -1.48
    ULAR
    -1.47
    same
    -1.44
    bia
    -1.43
     spacing
    -1.37
     friendly
    -1.37
    eeee
    -1.37
    ea
    -1.35
    POSITIVE LOGITS
    «
    2.63
    ·
    2.14
    ¨
    2.03
    Ļª
    2.02
    ŀ
    2.00
    ©
    1.99
    ¬
    1.99
    ļ
    1.90
    ²
    1.90
    ĻĤ
    1.86
    Act Density 2.924%

    No Known Activations