INDEX
    Explanations

    punctuation marks, particularly periods and quotation marks

    New Auto-Interp
    Negative Logits
    ...↵↵
    -0.15
     (↵↵
    -0.14
    —↵↵
    -0.14
    -0.14
    بÙĬÙĨ
    -0.14
    â̦↵↵
    -0.14
    &amp
    -0.14
    --↵↵
    -0.13
    -0.13
    .lua
    -0.13
    POSITIVE LOGITS
    .↵
    0.20
    ा.↵
    0.16
    â̬↵
    0.15
    ."↵
    0.14
    ).↵
    0.13
    avo
    0.13
    comed
    0.13
    à¥Ī.↵
    0.13
    ี↵
    0.13
    ë§¹
    0.13
    Act Density 1.024%

    No Known Activations