INDEX
    Explanations

    basic categories and types

    New Auto-Interp
    Negative Logits
    ۹
    1.35
    F
    1.20
    Ι
    1.18
     by
    1.16
    P
    1.16
    G
    1.16
    Y
    1.14
    H
    1.09
    Ա
    1.08
    E
    1.06
    POSITIVE LOGITS
    س
    1.64
    с
    1.51
    f
    1.30
    basic
    1.29
    es
    1.23
    a
    1.20
    an
    1.16
    ur
    1.13
    b
    1.13
    d
    1.12
    Act Density 0.028%

    No Known Activations