INDEX
    Explanations

    convert to other formats

    New Auto-Interp
    Negative Logits
    ا
    1.58
    س
    1.49
     are
    1.36
    ش
    1.36
    ور
    1.28
    1.24
    1.22
     in
    1.19
    یم
    1.15
    ă
    1.08
    POSITIVE LOGITS
    л
    1.38
    (
    1.20
    1.00
    an
    0.96
    РА
    0.91
    0.91
    з
    0.89
    loed
    0.89
    at
    0.85
    stwo
    0.85
    Act Density 0.125%

    No Known Activations