INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ك
    1.86
     can
    1.66
     an
    1.58
     be
    1.53
     are
    1.33
    ка
    1.24
     isn
    1.17
     on
    1.14
     is
    1.13
    ה
    1.13
    POSITIVE LOGITS
    n
    1.34
    and
    1.33
    is
    1.24
    categories
    1.23
    at
    1.19
    re
    1.13
    as
    1.10
    ig
    1.07
    en
    1.06
    u
    1.06
    Act Density 0.082%

    No Known Activations