INDEX
Explanations
terms addressing audiences in a formal setting
mentions of "ladies" and "gentlemen."
New Auto-Interp
Negative Logits
Ds
-0.79
osta
-0.70
yrus
-0.68
onis
-0.68
aya
-0.67
ython
-0.67
icted
-0.66
Emb
-0.65
sequence
-0.65
aeda
-0.65
POSITIVE LOGITS
gentlemen
0.89
maid
0.85
gentleman
0.83
utenant
0.77
men
0.75
woman
0.75
owship
0.75
Gaga
0.74
bugs
0.72
Toast
0.70
Activations Density 0.015%