INDEX
    Explanations

    foreign language articles/pronouns

    New Auto-Interp
    Negative Logits
    g
    1.63
    ing
    1.58
    u
    1.19
    ine
    1.15
    é
    1.14
    ın
    1.13
    and
    1.09
    ون
    1.07
    AND
    1.07
    im
    1.06
    POSITIVE LOGITS
     at
    1.59
     an
    1.32
     
    1.15
    jenigen
    1.14
    1.14
     was
    1.13
    1.12
    ما
    1.06
    מו
    1.05
    .
    1.04
    Act Density 0.008%

    No Known Activations