INDEX
Explanations
phrases referring to diverse groups of people
references to inclusivity across various demographics
New Auto-Interp
Negative Logits
LESS
-0.83
potion
-0.69
atl
-0.60
parser
-0.60
unn
-0.59
PP
-0.59
IPM
-0.57
lessly
-0.57
pmwiki
-0.57
Mole
-0.57
POSITIVE LOGITS
faiths
1.34
genders
1.31
backgrounds
1.27
ages
1.24
stripes
1.16
genres
1.11
age
1.11
sexes
1.11
religions
1.10
denominations
1.10
Activations Density 0.191%