INDEX
    Explanations

    phrases related to specific entities or concepts

    the definite article "the."

    New Auto-Interp
    Negative Logits
    Ò
    -0.81
    thood
    -0.77
    elaide
    -0.74
     because
    -0.72
    leground
    -0.72
    aba
    -0.71
    eno
    -0.68
    !!!!
    -0.66
    arate
    -0.66
    rage
    -0.65
    POSITIVE LOGITS
    resa
    1.07
     simplest
    1.04
     slightest
    1.02
     biggest
    1.02
    oret
    1.00
     latter
    0.99
     vast
    0.98
     majority
    0.98
     entire
    0.98
     oldest
    0.97
    Act Density 0.256%

    No Known Activations