INDEX
    Explanations

    terms related to affirmative action and socioeconomic disparities

    New Auto-Interp
    Negative Logits
     myſelf
    -0.82
     houſe
    -0.80
     Theſe
    -0.78
     Monfieur
    -0.77
     Houſe
    -0.73
     Diſ
    -0.73
     pleaſure
    -0.72
     faſt
    -0.72
     ſtate
    -0.72
     Reſ
    -0.71
    POSITIVE LOGITS
     to
    0.75
     or
    0.74
     and
    0.73
     versus
    0.60
     nonetheless
    0.51
     rather
    0.51
     nevertheless
    0.50
     vs
    0.49
    tagHelperRunner
    0.48
     through
    0.47
    Act Density 0.400%

    No Known Activations