INDEX
    Explanations

    discussions around awareness and acknowledgment of social issues and inequalities, particularly related to race and history

    New Auto-Interp
    Negative Logits
    -wow
    -0.15
    å¾Īå¤ļ
    -0.14
    -many
    -0.14
    ä¸Ģå®ļ
    -0.14
    Ú©ÛĮÙĦ
    -0.13
    ampo
    -0.13
    noho
    -0.13
    许å¤ļ
    -0.13
    deniz
    -0.13
    ạ
    -0.13
    POSITIVE LOGITS
     these
    0.29
     and
    0.25
     the
    0.25
     those
    0.24
     reality
    0.23
     what
    0.23
     how
    0.23
     their
    0.23
     this
    0.22
     or
    0.20
    Act Density 0.516%

    No Known Activations