INDEX
    Explanations

    references to violent actions and injuries, particularly involving facial harm

    New Auto-Interp
    Negative Logits
    abe
    -0.17
    elp
    -0.17
    687
    -0.15
     Bened
    -0.14
    inne
    -0.14
    pute
    -0.14
    .jetbrains
    -0.14
    ige
    -0.13
    idget
    -0.13
    lor
    -0.13
    POSITIVE LOGITS
     Bund
    0.15
    ána
    0.15
     withString
    0.15
     bund
    0.15
     Roose
    0.14
    .parameter
    0.14
     thereby
    0.14
    ooled
    0.14
    |required
    0.14
    ormap
    0.13
    Act Density 0.037%

    No Known Activations