INDEX
    Explanations

    references to social justice and inequality

    New Auto-Interp
    Negative Logits
    eyse
    -0.15
    eyn
    -0.14
    EG
    -0.14
    rous
    -0.13
    ench
    -0.13
    apat
    -0.13
    mere
    -0.13
    Ñĸж
    -0.13
    afc
    -0.13
    ึ
    -0.13
    POSITIVE LOGITS
    coni
    0.17
    plit
    0.14
    reh
    0.14
    ominated
    0.14
     Duplicate
    0.14
     Dit
    0.14
    itage
    0.14
     etc
    0.14
    uil
    0.13
    aData
    0.13
    Act Density 0.173%

    No Known Activations