INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    س
    1.69
    ك
    1.59
    ка
    1.51
    я
    1.33
    ק
    1.27
    ی
    1.21
    1.19
    اب
    1.17
    ف
    1.11
    ς
    1.11
    POSITIVE LOGITS
    n
    1.73
    for
    1.36
    7
    1.31
    0
    1.30
    w
    1.28
    el
    1.24
    1
    1.23
    1.23
    IC
    1.22
    u
    1.22
    Act Density 0.170%

    No Known Activations