INDEX
Explanations
socio-political terms related to power dynamics and institutions
references to the concept of dominance or dominant forces in various contexts
New Auto-Interp
Negative Logits
sterdam
-0.77
hire
-0.77
dump
-0.72
neys
-0.72
adra
-0.71
abre
-0.70
adr
-0.68
chery
-0.68
ander
-0.68
roma
-0.68
POSITIVE LOGITS
theme
0.81
themes
0.72
dominant
0.72
tendency
0.72
impression
0.71
role
0.70
personality
0.70
bidder
0.68
majority
0.67
male
0.67
Activations Density 0.035%