INDEX
Explanations
words related to social justice, activism, and policy change
key social issues and barriers that contribute to violence and discrimination
New Auto-Interp
Negative Logits
çͰ
-0.71
oln
-0.69
mathemat
-0.67
estial
-0.66
chart
-0.65
ITNESS
-0.61
furt
-0.61
okin
-0.61
Kash
-0.61
ISO
-0.60
POSITIVE LOGITS
menace
0.81
Rhino
0.73
prejudice
0.70
terness
0.69
mong
0.68
Survivors
0.68
erno
0.67
worms
0.66
worm
0.64
rene
0.64
Activations Density 0.613%