INDEX
    Explanations

    repeated use of the word "the."

    New Auto-Interp
    Negative Logits
    ihar
    -0.16
    oun
    -0.14
    urrect
    -0.14
    ponsible
    -0.14
    IRD
    -0.13
     OTHERWISE
    -0.13
    orny
    -0.13
    nar
    -0.13
    ing
    -0.13
    è¶
    -0.13
    POSITIVE LOGITS
     few
    0.27
    few
    0.21
     Few
    0.19
     pret
    0.18
    Few
    0.17
     many
    0.16
     nhiá»ģu
    0.15
     liv
    0.15
     rare
    0.15
    emap
    0.15
    Act Density 0.059%

    No Known Activations