INDEX
Explanations
references to female pronouns and possessive forms
New Auto-Interp
Negative Logits
emin
-0.15
utin
-0.15
ilo
-0.14
inar
-0.14
ifer
-0.14
Pag
-0.14
ogui
-0.14
ingly
-0.13
Examiner
-0.13
apg
-0.13
POSITIVE LOGITS
alike
0.15
_userid
0.14
Beaut
0.14
Montserrat
0.13
ulner
0.13
Vend
0.13
andbox
0.13
Lorem
0.13
æ´¥
0.13
ifold
0.13
Activations Density 0.006%