INDEX
Explanations
references to social or economic marginalization
references to marginalization and specific individuals related to political contexts
New Auto-Interp
Negative Logits
aird
-0.82
sterdam
-0.80
enegger
-0.79
edom
-0.71
perture
-0.69
ily
-0.69
creen
-0.67
abba
-0.67
yne
-0.67
atcher
-0.64
POSITIVE LOGITS
osaurs
0.86
iously
0.83
vill
0.77
bed
0.76
itic
0.75
ius
0.73
culosis
0.73
rics
0.73
Marginal
0.71
ocl
0.71
Activations Density 0.075%