INDEX
    Explanations

    punctuation marks, particularly periods

    New Auto-Interp
    Negative Logits
    rovers
    -0.15
    iros
    -0.15
     ins
    -0.15
    co
    -0.15
    aman
    -0.14
    ed
    -0.14
     Grade
    -0.14
    030
    -0.14
    2
    -0.14
     cu
    -0.14
    POSITIVE LOGITS
    VIC
    0.15
    .Flush
    0.15
     mastur
    0.15
    аÑĢÑħ
    0.15
    ttp
    0.15
    ¶Į
    0.14
    ÐĽÐŀ
    0.14
    TK
    0.14
    fal
    0.14
     cazzo
    0.14
    Act Density 0.006%

    No Known Activations