INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    is
    1.21
    en
    1.20
    in
    1.07
    y
    1.06
    an
    1.04
    oretically
    1.03
    o
    1.01
    er
    1.00
    as
    0.99
    es
    0.99
    POSITIVE LOGITS
    0
    0.80
    रिडोर
    0.66
    Й
    0.62
    بعض
    0.62
    Ι
    0.62
    Ч
    0.61
    Ы
    0.60
    0.60
    ባድ
    0.60
    Ε
    0.59
    Act Density 0.337%

    No Known Activations