INDEX
    Explanations

    fair wages, compensation, or admissions

    New Auto-Interp
    Negative Logits
    忘记
    0.46
     evoking
    0.45
    0.45
     ignore
    0.45
     corrobor
    0.43
     компании
    0.43
    忽略
    0.43
     общества
    0.43
    boxylate
    0.43
     imaginations
    0.42
    POSITIVE LOGITS
     fairness
    0.69
     Fairness
    0.57
     fairer
    0.55
    fair
    0.50
    Fair
    0.47
     fair
    0.44
     tax
    0.44
     unfair
    0.43
     tema
    0.42
    tema
    0.42
    Act Density 0.010%

    No Known Activations