INDEX
    Explanations

    negative phrases related to abuse, misconduct, and the protection of people in vulnerable situations

    New Auto-Interp
    Negative Logits
     hardly
    -0.17
     actually
    -0.17
    lus
    -0.15
    Ø£Ùĥ
    -0.15
    asons
    -0.15
     rất
    -0.15
    actually
    -0.15
     almost
    -0.14
     quite
    -0.14
     nearly
    -0.14
    POSITIVE LOGITS
     EVER
    0.18
    ictim
    0.17
     cave
    0.17
     lightly
    0.17
     succ
    0.17
    imbus
    0.16
     politic
    0.16
     cherry
    0.15
    AGAIN
    0.15
    ukkit
    0.15
    Act Density 0.196%

    No Known Activations