INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ك
    1.55
    ين
    1.55
    માં
    1.50
    ,
    1.40
    لي
    1.37
    يك
    1.23
    يب
    1.23
     destac
    1.21
     trein
    1.16
    ش
    1.13
    POSITIVE LOGITS
     for
    1.62
    a
    1.59
     on
    1.51
    us
    1.46
    d
    1.37
    st
    1.35
    s
    1.33
    5
    1.32
    negative
    1.31
    9
    1.30
    Act Density 0.045%

    No Known Activations