INDEX
    Explanations

    punctuation marks indicating the start and end of sentences

    New Auto-Interp
    Negative Logits
    <![
    -0.15
    anger
    -0.15
    ษ
    -0.15
    iddle
    -0.14
    uro
    -0.14
    رÙĪÛĮ
    -0.14
    ó
    -0.14
    mares
    -0.13
    å£
    -0.13
    ivity
    -0.13
    POSITIVE LOGITS
    âĢ¢
    0.32
     âĢ¢
    0.28
    .âĢ¢
    0.19
    \$
    0.18
    atan
    0.16
    ç¾
    0.16
    ISA
    0.15
    ÑĨов
    0.15
     swe
    0.14
    |↵↵
    0.14
    Act Density 0.130%

    No Known Activations