INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ни
    1.77
    ن
    1.72
    ных
    1.68
    ب
    1.66
    ský
    1.66
    ные
    1.62
    níci
    1.59
    ش
    1.59
    ská
    1.58
    ق
    1.58
    POSITIVE LOGITS
    its
    1.75
    д
    1.52
    ef
    1.38
    up
    1.26
     k
    1.25
    aty
    1.24
    শিকান্ত
    1.16
    ism
    1.11
    ीकरण
    1.11
    using
    1.10
    Act Density 0.001%

    No Known Activations