INDEX
    Explanations

    former roles or entities

    New Auto-Interp
    Negative Logits
    ال
    1.66
    ری
    1.55
    ع
    1.38
    ك
    1.32
    این
    1.31
    ای
    1.27
    ב
    1.27
    هم
    1.23
    其他
    1.20
    ת
    1.19
    POSITIVE LOGITS
     (
    1.55
    ess
    1.09
    ations
    1.07
    ena
    1.03
    ore
    0.97
    elle
    0.92
    ethe
    0.92
    ми
    0.90
    ationen
    0.88
    ility
    0.87
    Act Density 0.004%

    No Known Activations