INDEX
    Explanations

    references to fairness and unfairness in discussions

    fairness and concessions

    New Auto-Interp
    Negative Logits
    SharedDtor
    -0.49
     Wikispecies
    -0.45
    GEBURTSDATUM
    -0.44
    "}";
    -0.43
     dieß
    -0.43
    hibited
    -0.43
    -0.42
    oarece
    -0.41
    afficheront
    -0.41
     ویکی‌پدی
    -0.41
    POSITIVE LOGITS
     fairness
    0.89
     fair
    0.85
    fair
    0.81
     Fairness
    0.81
     unfair
    0.77
    Fair
    0.75
     Fair
    0.73
     fairer
    0.65
     fairest
    0.63
     FAIR
    0.61
    Act Density 0.012%

    No Known Activations