INDEX
    Explanations

    numbers associated with model sizes

    New Auto-Interp
    Negative Logits
    ین
    1.41
    ιού
    1.09
    1.04
    ان
    1.03
    an
    1.01
    ва
    1.01
    inės
    1.00
    عی
    0.98
    iin
    0.98
    iados
    0.98
    POSITIVE LOGITS
     a
    1.44
     by
    1.19
    7
    1.19
    '.
    1.13
    3
    1.10
     i
    1.09
    ri
    1.08
    的外
    1.08
     for
    1.07
    8
    1.07
    Act Density 0.080%

    No Known Activations