INDEX
    Explanations

    symbols, punctuation, and formatting elements in text

    New Auto-Interp
    Negative Logits
    131
    -0.17
    .TabStop
    -0.16
    132
    -0.15
    /REC
    -0.15
    iller
    -0.15
    grave
    -0.15
    17
    -0.14
    ertools
    -0.14
    arel
    -0.14
    gnore
    -0.14
    POSITIVE LOGITS
                       
    0.42
                        
    0.40
                      
    0.31
                         
    0.30
    --------------------
    0.26
    0.24
                        č↵
    0.24
     --------------------
    0.23
    ↵                    ↵
    0.23
    ................
    0.22
    Act Density 0.005%

    No Known Activations