INDEX
Explanations
concepts related to health and safety concerns
before "that"
concepts and outcomes
New Auto-Interp
Negative Logits
them
-0.92
Them
-0.80
Them
-0.75
selves
-0.69
henne
-0.66
honom
-0.66
herself
-0.65
Him
-0.63
THEM
-0.63
hennes
-0.63
POSITIVE LOGITS
we
1.16
they
1.04
that
1.02
you
0.86
everyone
0.84
someone
0.81
he
0.81
anyone
0.81
everybody
0.73
people
0.73
Activations Density 0.716%