INDEX
Explanations
discussions around awareness and acknowledgment of social issues and inequalities, particularly related to race and history
New Auto-Interp
Negative Logits
-wow
-0.15
å¾Īå¤ļ
-0.14
-many
-0.14
ä¸Ģå®ļ
-0.14
Ú©ÛĮÙĦ
-0.13
ampo
-0.13
noho
-0.13
许å¤ļ
-0.13
deniz
-0.13
ạ
-0.13
POSITIVE LOGITS
these
0.29
and
0.25
the
0.25
those
0.24
reality
0.23
what
0.23
how
0.23
their
0.23
this
0.22
or
0.20
Activations Density 0.516%