INDEX
Explanations
mentions of minority groups or minority-related topics
references to minority populations and issues related to them
New Auto-Interp
Negative Logits
sis
-0.95
DCS
-0.79
hran
-0.77
Closure
-0.75
rence
-0.73
lov
-0.72
earchers
-0.72
raltar
-0.72
atche
-0.70
============
-0.70
POSITIVE LOGITS
outreach
0.86
populations
0.84
minority
0.84
viewpoints
0.84
voices
0.82
groups
0.81
sects
0.81
communities
0.80
opinion
0.79
faiths
0.78
Activations Density 0.028%