INDEX
Explanations
references to female pronouns
New Auto-Interp
Head Attr Weights
0:0.11
1:0.02
2:0.27
3:0.06
4:0.07
5:0.04
6:0.07
7:0.08
8:0.04
9:0.03
10:0.07
11:0.09
Negative Logits
confidentiality
-1.82
urgency
-1.74
ZIP
-1.72
estrogen
-1.63
guiActiveUnfocused
-1.56
Buckingham
-1.54
respectively
-1.53
inoc
-1.50
Appalach
-1.50
contraceptives
-1.49
POSITIVE LOGITS
usky
2.18
Archdemon
2.10
BUG
1.92
antom
1.90
/
1.89
��
1.84
urch
1.84
utt
1.82
zek
1.78
kinson
1.76
Activations Density 0.000%