INDEX
    Explanations

    punctuation and formatting characters in text

    New Auto-Interp
    Negative Logits
    Č
    -0.23
    ayscale
    -0.16
    camp
    -0.15
     Fried
    -0.15
    aland
    -0.15
     Edition
    -0.14
    itchens
    -0.14
     Butt
    -0.14
    Occurred
    -0.14
    úb
    -0.14
    POSITIVE LOGITS
    č↵č↵č↵
    0.22
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.22
    ###
    0.21
    ##
    0.21
    ####
    0.18
    ----------↵↵
    0.17
    ---↵↵
    0.16
    ↵↵↵↵↵↵↵
    0.16
    алом
    0.16
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.15
    Act Density 0.356%

    No Known Activations