INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ج
    1.82
    та
    1.38
    ник
    1.35
    é
    1.35
    لي
    1.32
    ق
    1.32
    ون
    1.31
    ot
    1.30
    س
    1.26
    ص
    1.26
    POSITIVE LOGITS
    '
    1.72
    UR
    1.20
    1.16
    V
    1.06
    스를
    1.05
    dine
    1.04
    1.03
    liği
    1.02
    LE
    1.02
    0.98
    Act Density 0.022%

    No Known Activations