INDEX
    Explanations

    various forms of punctuation and formatting in the text

    New Auto-Interp
    Negative Logits
    dit
    -0.20
    egend
    -0.16
    ishi
    -0.15
    ummer
    -0.15
    xD
    -0.14
    ãģŁãĤī
    -0.14
    awn
    -0.13
    ottes
    -0.13
     ÑħоÑĤÑı
    -0.13
    eli
    -0.13
    POSITIVE LOGITS
     Nam
    0.20
     thanks
    0.18
     Jud
    0.17
     judging
    0.17
     such
    0.17
    Nam
    0.16
    اÙĦØ¥ÙĨجÙĦÙĬزÙĬØ©
    0.16
     Talking
    0.16
     handjob
    0.16
     Thanks
    0.15
    Act Density 0.045%

    No Known Activations