INDEX
    Explanations

    Russian, Korean, and other non-English words

    New Auto-Interp
    Negative Logits
     as
    1.20
    to
    1.18
    ра
    1.17
    йс
    1.02
    рс
    1.00
    are
    0.98
     той
    0.98
    0.98
     то
    0.96
    0.95
    POSITIVE LOGITS
    א
    1.88
    ת
    1.74
    EN
    1.66
    О
    1.63
    لی
    1.55
    Z
    1.48
    ב
    1.43
    1.42
    ال
    1.40
    IC
    1.39
    Act Density 0.000%

    No Known Activations