INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    s
    2.33
    m
    1.98
    t
    1.91
    d
    1.88
    al
    1.75
    g
    1.52
    h
    1.36
    ll
    1.28
    es
    1.25
    y
    1.23
    POSITIVE LOGITS
    ка
    1.59
    د
    1.42
    ва
    1.34
    مي
    1.25
    ל
    1.24
    р
    1.22
    ?
    1.21
    то
    1.16
    1.16
    ي
    1.09
    Act Density 0.000%

    No Known Activations