INDEX
    Explanations

    languages and their endings

    New Auto-Interp
    Negative Logits
    a
    1.33
    an
    1.28
    as
    1.20
    i
    1.07
    on
    1.04
     در
    0.93
    en
    0.89
     decentral
    0.89
     σε
    0.88
     στη
    0.88
    POSITIVE LOGITS
    ס
    1.16
    1.13
    .
    1.13
    لی
    1.10
    0.91
    0.91
    ל
    0.89
    0.89
    س
    0.86
    נ
    0.85
    Act Density 0.002%

    No Known Activations