INDEX
    Explanations

    words related to causing harm or injury

    instances of the word "hurt."

    New Auto-Interp
    Negative Logits
    aut
    -0.76
    uther
    -0.75
    clerosis
    -0.71
    arch
    -0.69
    aer
    -0.67
    atching
    -0.67
    vironment
    -0.65
    gran
    -0.65
     liner
    -0.65
    au
    -0.65
    POSITIVE LOGITS
     hurt
    1.14
     hurting
    0.90
     hurts
    0.90
    onies
    0.87
    ful
    0.82
    lehem
    0.81
     losers
    0.81
    igue
    0.78
     badly
    0.77
    ting
    0.76
    Act Density 0.008%

    No Known Activations