INDEX
    Explanations

    punctuation marks, particularly periods and quotation marks

    New Auto-Interp
    Negative Logits
    ï¼ļ↵↵
    -0.18
    ï¼ļ↵
    -0.15
     :↵↵
    -0.15
    :↵↵↵
    -0.15
    :↵↵↵↵
    -0.15
    ['__
    -0.15
    :");↵
    -0.14
    :↵↵
    -0.14
    tek
    -0.14
    elles
    -0.14
    POSITIVE LOGITS
     Adds
    0.27
    "And
    0.26
    "But
    0.25
    “And
    0.22
     "
    0.22
    “But
    0.21
    Adds
    0.21
     added
    0.20
     Added
    0.20
    Added
    0.19
    Act Density 0.107%

    No Known Activations