INDEX
    Explanations

    words related to causing physical harm or damage

    action words indicating processes or activities

    New Auto-Interp
    Negative Logits
    ULTS
    -0.68
    cius
    -0.64
     ACTION
    -0.63
    aimon
    -0.61
     Pwr
    -0.61
     Nare
    -0.59
     sidx
    -0.59
     Harlem
    -0.56
     brim
    -0.56
    Ear
    -0.55
    POSITIVE LOGITS
    ing
    2.70
    ership
    1.29
    ING
    1.28
    ments
    1.25
    ging
    1.25
    ingham
    1.24
    eering
    1.24
    edIn
    1.21
    ingly
    1.20
    ning
    1.13
    Act Density 0.237%

    No Known Activations