INDEX
Explanations
inclusive language referencing diverse groups or categories
New Auto-Interp
Negative Logits
consin
-0.52
<=",
-0.50
ligible
-0.49
nowrap
-0.49
clara
-0.48
Sti
-0.47
podes
-0.47
UTERS
-0.46
JsonFormat
-0.46
inform
-0.45
POSITIVE LOGITS
AssemblyCulture
0.77
végét
0.67
ⓧ
0.66
تضيفلها
0.65
rophoresis
0.64
وعة
0.63
déput
0.61
+:+
0.60
StoreMessageInfo
0.60
ویکیپدیا
0.59
Activations Density 0.093%