INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ти
    1.17
    م
    1.08
    يد
    1.05
    1.04
    اف
    1.02
     أب
    1.02
     is
    1.00
    0.99
    𝙖
    0.96
    "
    0.95
    POSITIVE LOGITS
    that
    1.34
     (
    1.30
     that
    1.23
    G
    1.11
    Y
    1.10
    de
    1.07
    X
    1.07
    F
    1.02
     که
    1.01
    Co
    0.96
    Act Density 0.010%

    No Known Activations