INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     produz
    1.06
    د
    1.06
    ريق
    1.05
    را
    1.04
    1.02
     première
    1.01
    ва
    0.99
    دين
    0.98
     tým
    0.98
    ш
    0.98
    POSITIVE LOGITS
    h
    1.89
    k
    1.50
    a
    1.44
    f
    1.31
    g
    1.30
    o
    1.29
    harth
    1.24
    m
    1.24
    ap
    1.23
    ING
    1.20
    Act Density 0.000%

    No Known Activations