INDEX
Explanations
references to specific individuals and their roles or actions
New Auto-Interp
Negative Logits
وفاته
-0.75
męski
-0.75
himself
-0.69
męskie
-0.65
móg
-0.65
himself
-0.64
zijne
-0.64
sám
-0.63
boyhood
-0.61
♂️
-0.61
POSITIVE LOGITS
herself
0.97
businesswoman
0.68
lesbian
0.65
feminist
0.63
herself
0.63
woman
0.61
motherhood
0.60
womanhood
0.58
girl
0.57
goddess
0.57
Activations Density 2.304%