INDEX
    Explanations

    phrases related to harmful actions towards individuals

    terms related to violence and abusive actions, particularly harassment and murder

    New Auto-Interp
    Negative Logits
    issue
    -0.81
    worthiness
    -0.79
    ffic
    -0.72
    translation
    -0.72
    alach
    -0.71
    money
    -0.71
    issues
    -0.70
     emphasis
    -0.69
    marked
    -0.69
     regimen
    -0.69
    POSITIVE LOGITS
    ĸļ
    0.80
     adolesc
    0.70
    ModLoader
    0.69
     Parenthood
    0.68
     Penguin
    0.67
    Ô
    0.65
    ï¸
    0.65
    Ò
    0.64
    nesday
    0.64
     Prometheus
    0.64
    Act Density 0.097%

    No Known Activations