INDEX
    Explanations

    retain leading, more, or existing

    New Auto-Interp
    Negative Logits
    '
    1.16
    वरिश
    1.02
     is
    0.98
    0.95
    ות
    0.94
    ''.
    0.92
    0.89
    ک
    0.86
    ിയ
    0.86
    "
    0.85
    POSITIVE LOGITS
    सम्म
    1.21
    mış
    1.14
    retain
    1.13
    W
    1.09
    T
    1.09
    ם
    1.08
    ur
    1.06
     retains
    1.05
    D
    1.05
    e
    1.05
    Act Density 0.003%

    No Known Activations