INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     as
    1.20
    ला
    0.91
    SON
    0.89
     cosmopolitan
    0.88
     a
    0.84
     and
    0.81
    eli
    0.80
     at
    0.80
     are
    0.80
    ре
    0.79
    POSITIVE LOGITS
    in
    1.43
    is
    1.34
    m
    1.30
    i
    1.28
    at
    1.23
    u
    1.15
    ي
    1.07
    r
    1.06
    ar
    1.01
    ad
    1.01
    Act Density 0.011%

    No Known Activations