INDEX
Explanations
references to women and female-related terms in various contexts
New Auto-Interp
Negative Logits
.googleapis
-0.16
nik
-0.15
cup
-0.15
aro
-0.15
uem
-0.15
uels
-0.14
auc
-0.14
dirty
-0.14
aja
-0.14
еÑĢин
-0.14
POSITIVE LOGITS
Äįi
0.15
elts
0.15
rial
0.15
rary
0.14
quarters
0.14
cott
0.13
ears
0.13
ÙĨدا
0.13
roupon
0.13
alike
0.13
Activations Density 0.021%