INDEX
    Explanations

    champion, winner, expert

    New Auto-Interp
    Negative Logits
    س
    1.23
    های
    1.10
    ا
    1.10
    1.10
    א
    1.10
    ي
    1.09
     as
    1.05
     are
    1.04
    のは
    1.04
    1.03
    POSITIVE LOGITS
     for
    1.31
    arın
    1.30
     Champion
    1.30
    w
    1.25
     championed
    1.22
     champion
    1.19
    0
    1.16
    for
    1.11
    st
    1.05
    inin
    1.04
    Act Density 0.004%

    No Known Activations