INDEX
Explanations
references to social injustice and its impact on marginalized communities
New Auto-Interp
Negative Logits
İngilizce
-0.15
legate
-0.14
ãĥ³ãĤ¸
-0.14
bau
-0.14
avian
-0.13
æķħ
-0.13
_defs
-0.13
ulton
-0.13
murderers
-0.13
SLOT
-0.13
POSITIVE LOGITS
affected
0.40
affected
0.34
Affected
0.30
afect
0.28
victim
0.27
victims
0.27
affect
0.26
targets
0.25
targeted
0.25
vulnerable
0.25
Activations Density 0.185%