INDEX
Explanations
pronouns related to gendered third-person references
New Auto-Interp
Negative Logits
umpt
-0.15
lez
-0.15
yers
-0.14
Äģn
-0.14
umper
-0.14
cosine
-0.14
edral
-0.14
agues
-0.14
ask
-0.13
dela
-0.13
POSITIVE LOGITS
isman
0.17
òi
0.16
stÅĻÃŃ
0.16
HEEL
0.15
ston
0.14
æĺ¯æĪij
0.14
dog
0.14
ron
0.14
PackageName
0.14
itemprop
0.14
Activations Density 0.706%