INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1
    0.78
     of
    0.73
     a
    0.70
     I
    0.66
    s
    0.62
     with
    0.60
    lement
    0.59
     A
    0.58
     was
    0.58
    =\
    0.58
    POSITIVE LOGITS
    ла
    0.93
    на
    0.89
    ли
    0.88
    د
    0.87
    ला
    0.81
    0.81
    ↵↵
    0.80
    ری
    0.79
    ان
    0.78
    να
    0.78
    Act Density 2.031%

    No Known Activations