INDEX
Explanations
references to characters and roles often associated with masculinity or traditional gender roles
New Auto-Interp
Negative Logits
ig
-0.16
ime
-0.16
ala
-0.16
IME
-0.15
ux
-0.15
uent
-0.15
erten
-0.14
arda
-0.14
guy
-0.14
eden
-0.14
POSITIVE LOGITS
.toObject
0.15
panion
0.15
unto
0.14
ailable
0.14
esseract
0.14
æĹ¶åĢĻ
0.14
actionDate
0.14
बनन
0.14
BOVE
0.13
पà¤ķ
0.13
Activations Density 0.203%