INDEX
Explanations
mentions of power dynamics and political conflicts
New Auto-Interp
Negative Logits
aroo
-0.16
emplates
-0.15
Ì£
-0.15
cctor
-0.15
Arbeit
-0.15
inspace
-0.14
ndern
-0.14
dete
-0.14
amework
-0.14
EMPLARY
-0.14
POSITIVE LOGITS
power
0.23
political
0.22
purge
0.20
trait
0.20
Political
0.19
politically
0.18
powerful
0.18
loy
0.17
faction
0.17
palace
0.17
Activations Density 0.094%