INDEX
Explanations
words related to systemic issues or conditions
references to systemic issues or problems
New Auto-Interp
Negative Logits
kers
-0.77
quer
-0.76
/+
-0.71
spell
-0.67
girl
-0.67
erb
-0.67
nery
-0.67
rug
-0.67
erer
-0.67
liest
-0.66
POSITIVE LOGITS
systemic
1.01
racism
0.85
ultural
0.83
eleph
0.82
overhaul
0.80
earthqu
0.76
violations
0.75
onduct
0.74
ized
0.74
undermin
0.73
Activations Density 0.013%