INDEX
Explanations
references to fairness and unfairness in discussions
fairness and concessions
New Auto-Interp
Negative Logits
SharedDtor
-0.49
Wikispecies
-0.45
GEBURTSDATUM
-0.44
"}";
-0.43
dieß
-0.43
hibited
-0.43
➊
-0.42
oarece
-0.41
afficheront
-0.41
ویکیپدی
-0.41
POSITIVE LOGITS
fairness
0.89
fair
0.85
fair
0.81
Fairness
0.81
unfair
0.77
Fair
0.75
Fair
0.73
fairer
0.65
fairest
0.63
FAIR
0.61
Activations Density 0.012%