INDEX
Explanations
references to inclusivity or collective terms
New Auto-Interp
Negative Logits
hran
-0.87
yip
-0.72
culosis
-0.69
Gw
-0.63
IND
-0.62
KH
-0.61
artz
-0.61
hz
-0.60
isu
-0.60
Nap
-0.59
POSITIVE LOGITS
usions
1.00
iances
0.98
uding
0.96
attendant
0.94
udes
0.93
kinds
0.91
associated
0.84
alike
0.84
iance
0.83
ocating
0.83
Activations Density 0.051%