INDEX
Explanations
language related to social justice, particularly efforts to combat various forms of injustice and inequality
New Auto-Interp
Negative Logits
ÅĻes
-0.17
blame
-0.17
коÑĤ
-0.16
rieving
-0.16
imposs
-0.14
xit
-0.14
ipc
-0.14
mana
-0.14
ysi
-0.14
attent
-0.14
POSITIVE LOGITS
poverty
0.22
Poverty
0.20
discrimination
0.20
hunger
0.18
homelessness
0.18
/address
0.18
climate
0.17
rampant
0.16
exploitation
0.16
/mit
0.16
Activations Density 0.222%