INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    graphHead
    0.53
    ݢ
    0.49
     толькі
    0.46
    ڽ
    0.46
    𝘸
    0.45
    𝐰
    0.45
     ظِلِّ
    0.45
    ITTING
    0.45
    addAlignment
    0.44
    <unused639>
    0.43
    POSITIVE LOGITS
    ası
    0.56
     özel
    0.53
     ş
    0.53
    0.53
     bir
    0.52
     yaş
    0.52
    ı
    0.52
     yap
    0.52
     anlam
    0.51
     kaynak
    0.51
    Act Density 0.002%

    No Known Activations