INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    AndEndTag
    -1.26
     виправивши
    -1.17
     שוליים
    -1.13
     $_"
    -1.09
    ItemBackground
    -1.05
     Paglinawan
    -1.04
    IsContent
    -1.04
     ویکی‌پدیا
    -1.03
     myſelf
    -1.02
     propOrder
    -1.01
    POSITIVE LOGITS
    ,
    0.51
    0.51
     to
    0.46
    .
    0.46
    ↵↵
    0.41
    </b>
    0.40
     -
    0.40
    </strong>
    0.38
    9
    0.38
    <eos>
    0.36
    Act Density 0.001%

    No Known Activations