INDEX
Explanations
gender-related words and workplace terms
New Auto-Interp
Negative Logits
Buster
-0.65
vich
-0.65
Warwick
-0.64
Everything
-0.62
Corpus
-0.61
Byr
-0.61
Sandwich
-0.60
Lavrov
-0.60
RAW
-0.60
forcer
-0.59
POSITIVE LOGITS
clustered
0.80
perished
0.80
individually
0.77
outright
0.76
eligible
0.75
either
0.74
ebted
0.71
voluntarily
0.70
classified
0.70
surveyed
0.70
Activations Density 3.952%