INDEX
    Explanations

    punctuation marks, particularly periods and question marks

    New Auto-Interp
    Negative Logits
    irez
    -0.17
    inan
    -0.16
    -0.15
     “â̦
    -0.15
    iano
    -0.15
    Ŀ
    -0.15
    óst
    -0.14
    umd
    -0.14
    rier
    -0.14
    iente
    -0.14
    POSITIVE LOGITS
     them
    0.19
    them
    0.16
    -INF
    0.16
    787
    0.14
    yth
    0.14
    ationToken
    0.14
    "It
    0.14
    "I
    0.14
    "They
    0.14
     concession
    0.14
    Act Density 0.174%

    No Known Activations