INDEX
Explanations
references to feminism and its criticisms
New Auto-Interp
Head Attr Weights
0:0.12
1:0.02
2:0.17
3:0.09
4:0.03
5:0.11
6:0.07
7:0.07
8:0.13
9:0.08
10:0.03
11:0.02
Negative Logits
Poké
-3.19
Ulster
-2.88
NAS
-2.83
Newark
-2.72
Delaware
-2.70
Antar
-2.70
Reviewer
-2.70
phabet
-2.70
Tourism
-2.69
Wilmington
-2.67
POSITIVE LOGITS
feminism
8.16
feminists
8.12
feminist
7.74
femin
7.67
Femin
7.62
Feminist
7.53
femin
6.46
patriarchy
6.03
sexist
5.86
sexism
5.78
Activations Density 0.058%