INDEX
Explanations
terms and phrases related to unfairness and equity issues
New Auto-Interp
Negative Logits
heel
-0.16
.crm
-0.16
completo
-0.15
ÙĩرÙĩ
-0.15
iens
-0.14
><![
-0.14
ANDLE
-0.14
icit
-0.14
VERRIDE
-0.14
rgan
-0.14
POSITIVE LOGITS
erli
0.15
sla
0.14
etooth
0.14
Gomez
0.14
lu
0.14
usercontent
0.13
avenport
0.13
gni
0.13
******/
0.13
anguard
0.13
Activations Density 0.001%