INDEX
    Explanations

    references to violent events and incidents

    New Auto-Interp
    Negative Logits
     Father
    -0.30
     fathers
    -0.30
     grandfather
    -0.29
     himself
    -0.29
     boy
    -0.29
     guy
    -0.28
     masculinity
    -0.28
     brothers
    -0.28
     gentleman
    -0.28
     Fathers
    -0.28
    POSITIVE LOGITS
     woman
    0.44
     women
    0.44
     herself
    0.42
     female
    0.41
     actresses
    0.40
    woman
    0.39
     girl
    0.38
     Women
    0.38
    women
    0.37
     females
    0.37
    Act Density 1.475%

    No Known Activations