INDEX
Explanations
references to female individuals, especially when the subject is a woman or girl and is referred to with feminine pronouns or names.
New Auto-Interp
Negative Logits
способен
0.57
сам
0.56
sám
0.56
Escolhido
0.55
आला
0.54
равен
0.52
نفسه
0.52
который
0.51
должен
0.50
शकतो
0.50
POSITIVE LOGITS
herself
1.44
woman
1.05
actresses
1.02
girl
1.01
businesswoman
1.01
heroine
1.00
women
0.97
xinh
0.97
femenina
0.97
نفسها
0.96
Activations Density 0.431%