INDEX
    Explanations

    phrases related to social justice and equality issues

    New Auto-Interp
    Negative Logits
    otal
    -0.17
    eland
    -0.16
     *__
    -0.14
    alue
    -0.14
    â̦and
    -0.14
    ------+------+
    -0.14
    rawn
    -0.13
    ium
    -0.13
     Sind
    -0.13
    074
    -0.13
    POSITIVE LOGITS
    rv
    0.14
    LAS
    0.14
    ADIUS
    0.14
    inas
    0.14
    dio
    0.14
    errick
    0.14
    elik
    0.14
    bdd
    0.14
    icl
    0.13
     lidi
    0.13
    Act Density 0.424%

    No Known Activations