INDEX
Explanations
references to gender pronouns in a narrative context
New Auto-Interp
Negative Logits
Fac
-0.16
fac
-0.15
phia
-0.15
彦
-0.15
awn
-0.15
anni
-0.14
arily
-0.14
ox
-0.14
Jenn
-0.14
faithful
-0.14
POSITIVE LOGITS
autos
0.16
odore
0.15
also
0.15
ÙĩÙħÚĨÙĨÛĮÙĨ
0.14
irma
0.14
æ½®
0.14
ayrıca
0.14
InternalServerError
0.14
å±
0.14
also
0.14
Activations Density 0.122%