INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.26
    2
    1.12
    3
    1.11
    ،
    1.09
    leri
    1.05
    dir
    1.04
    die
    1.00
    I
    1.00
    daniel
    0.99
    dır
    0.95
    POSITIVE LOGITS
    ak
    1.36
    ad
    1.33
    as
    1.28
    ap
    1.20
    ing
    1.17
    1.17
    at
    1.15
    ia
    1.15
    ня
    1.15
    ag
    1.10
    Act Density 0.000%

    No Known Activations