INDEX
    Explanations

    themes related to oppression and injustices faced by marginalized groups

    New Auto-Interp
    Negative Logits
    lify
    -0.17
    iero
    -0.15
    isphere
    -0.14
    ãĥ¼ãĥķ
    -0.14
    uyu
    -0.13
    pane
    -0.13
    allen
    -0.13
    uest
    -0.13
     aç
    -0.13
    erno
    -0.13
    POSITIVE LOGITS
     experienced
    0.19
     suffered
    0.18
     Experienced
    0.17
    uous
    0.17
    /errors
    0.17
    /problems
    0.17
    ulence
    0.16
     visited
    0.16
    ishment
    0.16
    /error
    0.15
    Act Density 0.117%

    No Known Activations