INDEX
Explanations
references to gender-related employment issues
Comes after certain words (it, can, serum)
biological models and effects
New Auto-Interp
Negative Logits
.",
-1.05
."]
-1.01
NUMX
-0.98
".
-0.98
.")
-0.97
.[/
-0.96
.’”
-0.94
.";
-0.91
`,
-0.88
’.”
-0.88
POSITIVE LOGITS
=
0.89
ppl
0.87
->
0.86
:
0.85
!!
0.83
!!!!
0.83
&
0.82
!!!
0.82
govt
0.81
!!!!!
0.76
Activations Density 0.253%