INDEX
    Explanations

    American or English followed by nouns

    New Auto-Interp
    Negative Logits
    ه
    1.60
    tiden
    1.53
    ن
    1.51
    1.50
    ের
    1.48
    пример
    1.48
    ات
    1.37
    ானா
    1.37
    ként
    1.34
    لية
    1.34
    POSITIVE LOGITS
    ic
    1.74
    ası
    1.64
    ll
    1.59
    th
    1.50
    ln
    1.50
    row
    1.45
    ning
    1.45
    an
    1.40
    lar
    1.40
    nt
    1.36
    Act Density 0.251%

    No Known Activations