INDEX
Explanations
terms related to power dynamics and political authority
New Auto-Interp
Negative Logits
rades
-0.16
æĢĿ
-0.15
finity
-0.15
Tavern
-0.14
uien
-0.14
_QUAL
-0.14
liers
-0.14
iqueta
-0.13
vitam
-0.13
èģĮä¸ļ
-0.13
POSITIVE LOGITS
/power
0.33
power
0.31
power
0.27
control
0.27
(power
0.26
/control
0.25
Power
0.24
authority
0.23
Power
0.23
.Power
0.23
Activations Density 0.118%