INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    i
    1.38
    t
    1.28
    Y
    1.13
     niya
    1.12
     불구하고
    1.12
             
    1.10
    tops
    1.09
    u
    1.07
    highway
    1.06
    er
    1.01
    POSITIVE LOGITS
    ামুটি
    2.06
    1.95
    1.95
    ных
    1.77
    ב
    1.77
    ا
    1.67
    1.60
    не
    1.58
    ل
    1.56
    му
    1.55
    Act Density 0.112%

    No Known Activations