INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    il
    1.11
    in
    1.06
    ك
    1.03
    на
    1.01
    ل
    0.92
    Z
    0.89
    0.88
    N
    0.84
     expedit
    0.83
    ,
    0.82
    POSITIVE LOGITS
    of
    1.44
    '
    1.38
     of
    1.35
    )
    1.12
    was
    1.09
    ного
    1.09
     که
    1.02
     was
    1.01
     
    0.97
    ные
    0.95
    Act Density 1.199%

    No Known Activations