INDEX
    Explanations

    parentheses and related punctuation in text

    New Auto-Interp
    Negative Logits
     ویکی‌پدیا
    -0.82
    Skocz
    -0.64
     referenties
    -0.64
     wij
    -0.60
    depend
    -0.60
     "
    
    -0.59
    teto
    -0.59
    <blockquote>
    -0.59
    idxs
    -0.59
    tsz
    -0.58
    POSITIVE LOGITS
    1.50
     (
    1.33
    ”(
    1.14
    』(
    1.11
    》(
    1.08
    )(
    1.06
    !(
    1.05
    !(
    1.04
    ?(
    1.03
    」(
    0.98
    Act Density 0.035%

    No Known Activations