INDEX
    Explanations

    references to historical injustices and their societal impact

    New Auto-Interp
    Negative Logits
    æ´¥
    -0.17
     sm
    -0.14
     Cro
    -0.14
     cro
    -0.14
     bir
    -0.14
     danger
    -0.14
    unbind
    -0.14
     vice
    -0.13
    еÑĢж
    -0.13
    ζί
    -0.13
    POSITIVE LOGITS
    iesen
    0.18
    aryl
    0.15
    oodle
    0.14
    cek
    0.14
    Ïįν
    0.14
    OUTH
    0.14
    áºŃu
    0.13
    istring
    0.13
    flation
    0.13
    raries
    0.13
    Act Density 0.333%

    No Known Activations