INDEX
    Explanations

    references to systemic injustice and commentary on societal inequalities

    New Auto-Interp
    Negative Logits
    ertas
    -0.16
    ubar
    -0.15
    ÏģÏī
    -0.14
    (strtolower
    -0.14
    idth
    -0.14
    bjerg
    -0.13
    oldt
    -0.13
    iesz
    -0.13
    ãĤµãĤ¤
    -0.13
    миÑĤ
    -0.13
    POSITIVE LOGITS
     simply
    0.28
     while
    0.27
     without
    0.26
     merely
    0.24
     solely
    0.24
     even
    0.22
     despite
    0.22
     mere
    0.21
     using
    0.21
     under
    0.21
    Act Density 0.763%

    No Known Activations