INDEX
Explanations
gendered nouns and their associated articles in a variety of contexts
New Auto-Interp
Negative Logits
amient
-0.15
Cabr
-0.15
ista
-0.14
otland
-0.14
Tribe
-0.14
adium
-0.14
onn
-0.14
shed
-0.14
togg
-0.14
véd
-0.14
POSITIVE LOGITS
warn
0.15
олÑı
0.14
wart
0.14
arro
0.14
itto
0.14
une
0.14
anny
0.14
anson
0.14
unto
0.14
dum
0.14
Activations Density 0.074%