INDEX
    Explanations

    punctuation marks and sentence-ending symbols

    New Auto-Interp
    Negative Logits
    -0.80
      
    -0.67
     (
    -0.67
     K
    -0.64
     S
    -0.60
    -
    -0.55
     A
    -0.54
     -
    -0.53
     R
    -0.53
     P
    -0.52
    POSITIVE LOGITS
    .:
    1.90
    .-
    1.81
    ./
    1.67
    .):
    1.66
    .!
    1.64
    .–
    1.60
    .).
    1.58
    .);
    1.57
    .—
    1.56
    .;
    1.55
    Act Density 0.211%

    No Known Activations