INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    م
    1.23
    ovať
    1.01
    bene
    0.97
    кий
    0.96
     erhö
    0.95
    ую
    0.95
    nehmen
    0.93
     právě
    0.93
    ل
    0.93
    ють
    0.90
    POSITIVE LOGITS
    𝑛
    1.28
     dozen
    1.28
     swirls
    1.25
    𝑣
    1.24
    𝑏
    1.24
     gång
    1.20
     swirl
    1.18
    1.17
    Rt
    1.17
    עה
    1.17
    Act Density 0.005%

    No Known Activations