INDEX
    Explanations

    concepts related to health and safety concerns

    New Auto-Interp
    Negative Logits
    them
    -0.92
    Them
    -0.80
     Them
    -0.75
    selves
    -0.69
     henne
    -0.66
     honom
    -0.66
    herself
    -0.65
    Him
    -0.63
     THEM
    -0.63
     hennes
    -0.63
    POSITIVE LOGITS
     we
    1.16
     they
    1.04
     that
    1.02
     you
    0.86
     everyone
    0.84
     someone
    0.81
     he
    0.81
     anyone
    0.81
     everybody
    0.73
     people
    0.73
    Act Density 0.716%

    No Known Activations