INDEX
    Explanations

    words related to disparities and discrepancies

    terms related to inequality and differences between groups

    New Auto-Interp
    Negative Logits
     sworn
    -0.69
    cend
    -0.68
     guided
    -0.66
    safe
    -0.63
     bark
    -0.63
     Kiss
    -0.63
     Commands
    -0.63
    assad
    -0.63
    help
    -0.62
    Tag
    -0.62
    POSITIVE LOGITS
     disparity
    3.01
     discrepancy
    2.88
     discrepancies
    2.68
     disparities
    2.54
     imbalance
    2.39
     mismatch
    2.09
     inconsistencies
    2.07
     inconsistency
    2.00
     disproportion
    1.96
     discrep
    1.92
    Act Density 0.035%

    No Known Activations