INDEX
    Explanations

    punctuation marks and symbols, particularly parentheses and question marks

    New Auto-Interp
    Negative Logits
     ویکی‌پدیا
    -0.72
     banc
    -0.65
    ον
    -0.65
     виправивши
    -0.65
    ilit
    -0.65
    gany
    -0.64
     squee
    -0.63
    führt
    -0.62
    Abit
    -0.60
     newBuilder
    -0.60
    POSITIVE LOGITS
     (
    1.22
    1.18
    :(
    1.07
    》(
    0.99
    0.97
    )(
    0.97
    0.94
    !(
    0.92
    出版年
    0.90
    ”(
    0.89
    Act Density 0.058%

    No Known Activations