INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     
    1.29
     are
    1.05
     is
    0.89
    ،
    0.84
    0.84
    0.82
    {
    0.80
    }{
    0.78
    2
    0.76
    ется
    0.74
    POSITIVE LOGITS
    u
    1.94
    an
    1.85
    il
    1.63
    et
    1.59
    in
    1.49
    on
    1.46
    ad
    1.46
    на
    1.43
    as
    1.40
    z
    1.39
    Act Density 0.006%

    No Known Activations