INDEX
    Explanations

    terms related to institutions and historical figures associated with oppression or inequality

    New Auto-Interp
    Negative Logits
    ================================================================
    -0.18
    ../../../
    -0.17
    .Unicode
    -0.15
    numer
    -0.15
    Æł
    -0.14
    stroy
    -0.14
    áŀ¶
    -0.13
    ancybox
    -0.13
    íĴĪ
    -0.13
    ents
    -0.13
    POSITIVE LOGITS
    erman
    0.18
    itan
    0.17
    ier
    0.17
    /he
    0.17
    berg
    0.15
    -water
    0.15
    efeller
    0.15
    ermann
    0.15
    lear
    0.15
    DTD
    0.15
    Act Density 0.767%

    No Known Activations