INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    с
    1.45
    ю
    1.22
    к
    1.21
    ات
    1.18
    िया
    1.16
    1.10
    1.07
    1.06
    да
    1.05
    ان
    1.01
    POSITIVE LOGITS
    and
    1.63
    :
    1.47
    ur
    1.44
     by
    1.42
    IR
    1.42
    IS
    1.39
    d
    1.33
    1.26
    dh
    1.23
    CM
    1.19
    Act Density 0.002%

    No Known Activations