INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ۰
    1.37
    1.15
    на
    1.13
    е
    1.13
    ø
    1.13
    1.11
    f
    1.08
    h
    1.06
    о
    1.06
     affectionately
    1.04
    POSITIVE LOGITS
    د
    1.21
     بطور
    1.16
    𝘀
    1.15
     theres
    1.13
    كن
    1.12
    1.12
    ものの
    1.11
     decim
    1.11
     buil
    1.10
    𝗻
    1.09
    Act Density 0.140%

    No Known Activations