INDEX
    Explanations

    punctuation marks and specific formatting characters in the text

    New Auto-Interp
    Negative Logits
    </caption>
    -0.99
     Efq
    -0.98
    )";
    
    -0.98
    "){
    
    -0.93
    */;
    -0.93
    "):
    
    -0.88
    ",{
    -0.88
    $.
    
    -0.87
     IBRARY
    -0.87
     ARXIV
    -0.86
    POSITIVE LOGITS
     The
    1.10
    The
    0.90
     This
    0.80
    <eos>
    0.78
     In
    0.72
     *
    0.72
     When
    0.68
     A
    0.66
     •
    0.66
     What
    0.65
    Act Density 0.341%

    No Known Activations