INDEX
Explanations
phrases related to equality and fairness in treatment
New Auto-Interp
Negative Logits
ьаж
-0.56
bootstrapcdn
-0.46
лтемелер
-0.46
ویکیپدیای
-0.45
balleur
-0.45
Habits
-0.44
AndEndTag
-0.44
Qual
-0.43
виправивши
-0.43
NPS
-0.43
POSITIVE LOGITS
fairness
1.34
unfair
1.20
inequ
1.09
Fairness
1.06
injustice
1.02
inequality
1.02
fairer
1.01
justice
1.00
inequalities
0.99
unequal
0.99
Activations Density 0.730%