INDEX
Explanations
references to political and social critiques related to power dynamics and inequality
New Auto-Interp
Negative Logits
Ministers
-0.15
-0.15
orde
-0.14
ronic
-0.14
šet
-0.14
OLS
-0.14
lef
-0.14
uk
-0.13
wide
-0.13
abee
-0.13
POSITIVE LOGITS
United
0.21
scales
0.18
country
0.18
Koch
0.17
Bible
0.17
House
0.17
White
0.16
Found
0.16
current
0.16
fram
0.16
Activations Density 0.590%