INDEX
    Explanations

    instances of physical violence and abuse, including terms like "beaten," "raped," "burned," and "assaulted."

    instances of physical abuse or violence

    New Auto-Interp
    Negative Logits
    hang
    -0.70
    ARE
    -0.69
    istries
    -0.66
     formation
    -0.65
    FK
    -0.65
    alities
    -0.64
    allows
    -0.63
    tions
    -0.63
    Zone
    -0.63
    rium
    -0.62
    POSITIVE LOGITS
     by
    0.88
     merciless
    0.88
     aback
    0.85
    ĸļ
    0.80
     unfairly
    0.78
     hostage
    0.72
    nikov
    0.71
     Sapphire
    0.71
     unnecessarily
    0.71
     inappropriately
    0.70
    Act Density 0.194%

    No Known Activations