INDEX
Explanations
phrases related to social justice and equality issues
New Auto-Interp
Negative Logits
otal
-0.17
eland
-0.16
*__
-0.14
alue
-0.14
â̦and
-0.14
------+------+
-0.14
rawn
-0.13
ium
-0.13
Sind
-0.13
074
-0.13
POSITIVE LOGITS
rv
0.14
LAS
0.14
ADIUS
0.14
inas
0.14
dio
0.14
errick
0.14
elik
0.14
bdd
0.14
icl
0.13
lidi
0.13
Activations Density 0.424%