INDEX
Explanations
words related to social justice issues
New Auto-Interp
Negative Logits
çīĪ
-0.81
cade
-0.80
ulum
-0.74
liest
-0.74
amera
-0.72
ixel
-0.72
ainer
-0.71
eters
-0.70
gue
-0.70
itatively
-0.69
POSITIVE LOGITS
persecution
1.07
extremism
1.06
violence
1.05
degradation
1.02
aggression
1.01
terrorism
1.01
sexism
1.01
criminality
1.00
mayhem
1.00
misinformation
1.00
Activations Density 1.621%