INDEX
    Explanations

    articles and determiners, particularly variations of "the."

    New Auto-Interp
    Negative Logits
    izer
    -0.17
    ani
    -0.17
    ader
    -0.16
    anni
    -0.15
    ives
    -0.15
    iser
    -0.15
    agn
    -0.15
    kovi
    -0.15
     Hart
    -0.14
    aly
    -0.14
    POSITIVE LOGITS
    ãĥIJãĤ¤
    0.15
    utta
    0.15
    utsch
    0.15
    üstü
    0.14
    /loose
    0.14
    gons
    0.14
     Už
    0.14
    :init
    0.14
    richt
    0.14
     CAUSED
    0.14
    Act Density 0.050%

    No Known Activations