INDEX
    Explanations

    phrases about societal structures and inequalities

    New Auto-Interp
    Negative Logits
     otherwise
    -0.22
     Otherwise
    -0.18
     enough
    -0.17
    awa
    -0.16
     OTHERWISE
    -0.16
     dernier
    -0.16
     last
    -0.16
    Otherwise
    -0.16
    lint
    -0.15
    ady
    -0.15
    POSITIVE LOGITS
    orative
    0.18
    legate
    0.18
    world
    0.17
    ramework
    0.17
    892
    0.16
    ihn
    0.16
    lessness
    0.16
     opat
    0.15
    igator
    0.15
    suite
    0.15
    Act Density 0.027%

    No Known Activations