INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.44
    ной
    1.23
    да
    1.20
    ни
    1.17
    are
    1.16
    ва
    1.14
    ic
    1.11
    1.09
    ियों
    1.09
    ных
    1.08
    POSITIVE LOGITS
    1.61
    _
    1.45
    x
    1.34
    ב
    1.29
    1.27
    ;
    1.27
    D
    1.23
    b
    1.20
    1.16
    C
    1.13
    Act Density 0.021%

    No Known Activations