INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ۰
    1.13
     ν
    0.91
     Și
    0.85
    𝕝
    0.84
    0.84
     إدارة
    0.83
     το
    0.82
     Oekra
    0.82
     trám
    0.82
     Alic
    0.82
    POSITIVE LOGITS
    g
    2.11
    t
    2.02
    z
    1.84
    s
    1.84
    d
    1.70
    k
    1.70
    n
    1.67
    a
    1.62
    c
    1.46
    j
    1.46
    Act Density 0.035%

    No Known Activations