INDEX
    Explanations

    punctuation marks, particularly periods

    New Auto-Interp
    Negative Logits
    idge
    -0.17
    rens
    -0.15
    ibase
    -0.15
    дÑĢом
    -0.14
    ills
    -0.14
    à¥įवव
    -0.14
    .gnu
    -0.13
    _scaling
    -0.13
    ople
    -0.13
    uania
    -0.13
    POSITIVE LOGITS
    637
    0.20
     
    0.15
    lest
    0.15
     official
    0.15
    arer
    0.15
    avery
    0.15
     pret
    0.15
     Abs
    0.14
    669
    0.14
    ags
    0.14
    Act Density 0.005%

    No Known Activations