INDEX
Explanations
the words "ladies" and "gentlemen"
references to gender-specific terms, particularly 'ladies' and 'gentlemen'
New Auto-Interp
Negative Logits
ramid
-0.74
enf
-0.72
Fuel
-0.72
yrus
-0.69
aya
-0.69
erenn
-0.67
aeda
-0.66
Emb
-0.65
osta
-0.64
ilon
-0.64
POSITIVE LOGITS
gentleman
1.03
gentlemen
0.98
entle
0.76
traveler
0.73
Gentleman
0.72
men
0.72
gery
0.71
owship
0.70
utenant
0.68
maid
0.65
Activations Density 0.030%