INDEX
    Explanations

    instances of punctuation or formatting marks in text

    New Auto-Interp
    Negative Logits
    arakter
    -0.15
    å½¹
    -0.14
    msp
    -0.14
    /*č↵
    -0.13
    .TabStop
    -0.13
    leich
    -0.13
    mise
    -0.12
    ãĤ¤ãĥī
    -0.12
    contres
    -0.12
    -↵↵
    -0.12
    POSITIVE LOGITS
     is
    0.20
     has
    0.20
     are
    0.17
     can
    0.16
     will
    0.16
     was
    0.15
    ling
    0.15
     cannot
    0.15
     could
    0.15
     must
    0.15
    Act Density 0.149%

    No Known Activations