INDEX
    Explanations

    various language word endings

    New Auto-Interp
    Negative Logits
    ğini
    2.08
    1.84
    ing
    1.80
    1.72
    л
    1.71
    तून
    1.66
    س
    1.64
    ff
    1.58
    1.58
    an
    1.57
    POSITIVE LOGITS
    ных
    2.13
    THING
    1.93
    ları
    1.91
    ların
    1.87
    ные
    1.83
    ين
    1.75
    ный
    1.70
    lık
    1.70
    िकी
    1.66
    dS
    1.63
    Act Density 0.132%

    No Known Activations