INDEX
Explanations
references to women and their social issues
New Auto-Interp
Negative Logits
çĶ
-0.08
ç¾Ĭ
-0.07
andaÅŁ
-0.07
hại
-0.07
داÙħ
-0.07
sovere
-0.07
anne
-0.07
ër
-0.06
izr
-0.06
Ł¥
-0.06
POSITIVE LOGITS
rights
0.09
empowerment
0.07
wings
0.07
Rights
0.07
welfare
0.07
rights
0.07
wing
0.07
Emp
0.07
ardin
0.07
Wid
0.07
Activations Density 0.018%