INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ه
    1.11
    ти
    1.06
     when
    1.01
     for
    1.00
     if
    0.91
    ці
    0.91
    𝙠
    0.91
    𝙧
    0.90
    ط
    0.90
    ³
    0.88
    POSITIVE LOGITS
    in
    1.54
    at
    1.36
    as
    1.18
    ou
    1.13
    ance
    1.10
    ur
    1.04
     
    1.04
    on
    1.02
    or
    1.00
    اب
    0.96
    Act Density 0.005%

    No Known Activations