INDEX
Explanations
concepts related to social justice issues, including discrimination and various forms of abuse
New Auto-Interp
Negative Logits
à¸Ĺà¸ĺ
-0.16
appa
-0.15
Platforms
-0.14
ifen
-0.14
velope
-0.14
gether
-0.14
hindsight
-0.14
wl
-0.14
adio
-0.13
unci
-0.13
POSITIVE LOGITS
charges
0.29
ring
0.28
committed
0.28
rings
0.26
cases
0.26
claims
0.24
charge
0.23
committed
0.23
prevention
0.23
allegations
0.22
Activations Density 0.124%