INDEX
    Explanations

    explicit language and profanity

    New Auto-Interp
    Negative Logits
    746
    -0.14
    las
    -0.14
     Kare
    -0.13
    iev
    -0.13
    å±ķ
    -0.13
    æī¿
    -0.13
    dest
    -0.13
     Bakan
    -0.13
    ector
    -0.13
     Hillary
    -0.13
    POSITIVE LOGITS
    èĦĤ
    0.16
    illy
    0.15
    伦
    0.15
    adge
    0.15
    iffe
    0.15
    à¸Ĺรà¸ĩ
    0.14
    wik
    0.14
     pokoj
    0.14
    ignon
    0.14
    illon
    0.14
    Act Density 0.028%

    No Known Activations