INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    י
    1.30
    er
    1.17
    as
    1.08
    ו
    1.05
    e
    0.98
    i
    0.94
    in
    0.93
    o
    0.93
    م
    0.93
    ق
    0.91
    POSITIVE LOGITS
    0.83
     sebagainya
    0.82
    ке
    0.80
    -
    0.79
    вается
    0.77
    %
    0.73
     in
    0.73
    İN
    0.73
     쓰는
    0.73
    \
    0.73
    Act Density 0.000%

    No Known Activations