INDEX
Explanations
phrases related to raising awareness about social issues or causes
New Auto-Interp
Negative Logits
--+
-0.74
verages
-0.68
ulla
-0.68
glide
-0.67
Iterator
-0.66
ée
-0.64
OE
-0.63
*/(
-0.63
keys
-0.62
oult
-0.62
POSITIVE LOGITS
injust
1.05
wrongdoing
1.00
atrocities
0.93
misogyny
0.90
injustice
0.89
issues
0.88
corruption
0.88
homosexuality
0.87
crimes
0.86
sexism
0.86
Activations Density 0.373%