INDEX
Explanations
terms related to sexism, misogyny, and patriarchy
references to misogyny and patriarchal themes
New Auto-Interp
Negative Logits
++++++++++++++++
-0.80
Solitaire
-0.79
Package
-0.79
xxxxxxxx
-0.79
Mint
-0.76
HER
-0.71
EVA
-0.71
Lent
-0.69
Assembly
-0.69
Lago
-0.66
POSITIVE LOGITS
misogyn
1.11
misogyny
0.99
ogyn
0.95
sexist
0.91
ataka
0.79
volent
0.76
jokes
0.74
oir
0.74
offenders
0.74
stereotyp
0.72
Activations Density 0.012%