INDEX
Explanations
phrases concerning equitable distribution or sharing of resources and opportunities
New Auto-Interp
Negative Logits
bij
-0.14
wil
-0.14
Rank
-0.14
atorial
-0.14
dis
-0.14
asto
-0.14
vant
-0.14
qi
-0.14
utr
-0.13
awe
-0.13
POSITIVE LOGITS
fair
0.28
fair
0.26
Fair
0.23
Fair
0.22
fairly
0.17
áce
0.16
Brotherhood
0.15
faire
0.15
649
0.15
IMPLEMENT
0.15
Activations Density 0.024%