Using Reinforcement Learning to Design a Better Rocket Engine

NASA [Public domain]

In this blog, I’ll discuss how I worked collaboratively with various domain experts, using reinforcement learning to develop innovative solutions in rocket engine development. In doing so, I’ll demonstrate the application of ML techniques to the manufacturing industry and the role of the Machine Learning Product Manager.

Machine learning beyond the tech industry

Machine learning (ML) has had an incredible impact across industries with numerous applications such as personalized TV recommendations and dynamic price models in your rideshare app. Because it is such a core component to the success of companies in the tech industry, advances in ML research and applications are developing at an astonishing rate. For industries outside of tech, ML can be utilized to personalize a user’s experience, automate laborious tasks and optimize subjective decision making. However, staying current on ML advancements and knowing how to best leverage available technologies is difficult, even for those fully immersed in the tech industry, let alone those working to optimize in other fields, such as manufacturing.

Multi-disciplinary product managers, however, are well-equipped to tackle this very challenge. By having knowledge of the industry, processes, and business values, as well as an understanding of the breadth of machine learning approaches, a product manager can identify areas ripe for innovation. Individuals with experience in product management, software engineering, and data science provide a unique perspective allowing them to bridge the gap in applications of advanced technologies in industries where ML isn’t typically applied. By working collaboratively with domain experts across disciplines, product managers can reshape manufacturing processes, making widespread improvements in efficiency, safety, and reliability.

As a Data Product Management Fellow at Insight, I worked with ML Engineering Fellow, Nina Lopatina, Simulation Engineer, Saeed Jahangirian, and Propulsion Engineer, Jordan Noone, to improve efficiency in manufacturing rocket engines. The biggest categories of cost for hardware designers and manufacturers are testing, verification, and calibration of their control systems. We built a proof of concept using reinforcement learning to automate the tuning of a sub-component in a rocket engine to address the extensive time and resource requirement for verification and collaboration. Our solution could save thousands of dollars and cut up to three months of manual testing on expensive testing equipment. The traditional procedures are also hazardous, and small mistakes can cause significant harm to expensive hardware and, more importantly, technicians who are conducting the tests.

Testing, verification, and calibration are the most expensive and time-consuming task in hardware development

The process of developing control software in manufacturing is very tedious

At my last job, I was a software and control engineer developing a control loop for a large metal 3D printer. The control loop is another name for software that controls a piece of machinery. The software that controls cruise control in a car is a rather simple example. It monitors the speed of the vehicle and throttles the gas until the target speed is reached. In the case of a 3D metal printer, the control algorithm is a bit more complicated. The printer we were developing was a welder attached to a robotic arm. The robot traces the part, layer by layer, while the welder welds a new layer to previous layers and builds the part up.

While tracing the part layer by layer, the heat source melts the wire and fuses a new layer to the previous layer — Image courtesy of Sciaky Inc.

The control software manipulates the heat input, traversal speed, speed of wire feed, and a few other knobs to make sure the part is built to specification. The specification includes both quality of the final part, such as the count and size of defects, and dimensions of the part, such as the width and height of each layer. If everything works well, the result will match the specification. But if the control algorithm doesn’t do its job correctly, the part deforms, breaks and tears, and has many cracks and pores.

Developing control algorithms consists of three phases:

  1. Controller law design: In this phase, the goal is to understand the physics that govern the process. A simulation of the process is developed and used to create control software without running expensive and lengthy physical trials.
  2. Software development: In this phase, we use different modeling techniques to define relationships between inputs and outputs of the process. This step requires an engineer to break down the problem into smaller pieces and develop a model for each. These models are then used to create software that can control the process toward a desired outcome.
  3. Control calibration: Once the software successfully passes tests in simulation, engineers spend months tweaking the software on a physical system to account for the differences between simulation and physical reality.

The problem is that each phase can take from weeks to years and, for the most part, consists of trials to fine tune the models of the process or the control software to achieve the desired output. It is an optimization problem done by many trials. A smarter way is to break down the system to subcomponents that are easier to model and then use direct and iterative methods to find the best way to control the subcomponent. This approach still requires the ingenuity of the engineer to break down the problem ,and it ultimately needs trials to optimize interactions between subcomponents. It is far from an automated process.

Looking for automated ways to solve this problem, we turned to reinforcement learning as an end-to-end solution for developing control loops for complex pieces of machinery.

Reinforcement Learning

Reinforcement learning (RL) is learning what to do to maximize a reward function. In a sense, RL is the automated process of learning a control algorithm for an agent in an environment.

  1. An agent operates in an environment and can manipulate the environment with its actuators which we call actions.
  2. The environment then responds to actions the agent takes, and that puts the agent and environment in a new state.
  3. Reward function is then defined on the state of the agent and the environment.
  4. The goal of RL is to learn the best policy for taking actions in which the sum of rewards in the future is maximized.
Components of reinforcement learning

For example, a video game like Tetris can be considered an environment, and a player of the game an agent.

  1. Actions are the those that the player can take, like rotating the shapes.
  2. These actions change the state of the game which can be defined as all the pixels on the monitor at each point in time
  3. We can define the reward function as +1 for every row the player clears and -100 for losing the game.
  4. The goal of RL will be to come up with a function that maps states to actions so that the total reward is maximized.

Similarly, the development of a control loop for a metal 3D printer can be formulated as an RL problem.

  1. The actions taken are changing the intensity of heat input, traversal speed, the speed of the wire feed, etc.
  2. These actions change the geometry of the print and its quality, which we call the state of the print.
  3. The reward function can be defined such that it shows how close the print is to its specification at any moment.
  4. The goal is to come up with a function that tells the printer how to control its actuators, given its current state of the print, for the best resulting print.

All control problems can be described as RL problems. The goal is to estimate a function called policy. The policy maps states to actions so that the reward function is maximized. If the function domain is finite, you can explore and store every input and output mapping, but if the function has complicated dynamics and large or infinite domain, that’s when machine learning can come into play.

Recently, RL researchers have been focusing on solving tough problems. Following in the footsteps of deep learning, which got a lot of traction and attention by solving a task that classical algorithms couldn’t solve (image classification), RL researchers set out to solve very hard problems that are currently near impossible for classical algorithms to solve. DeepMind at Google focuses on beating humans in very complex games like Go and OpenAI focuses on developing general artificial intelligence.

Alternatively, instead of focusing on hard problems, RL can be used to automate solving simpler problems that are currently manual and take a lot of time and effort to solve, such as developing a control loop for a 3D printer, or other complex pieces of machinery. The impact is not as newsworthy as developing general intelligence, but it can save time and effort for many control engineers in many manufacturing organization.

Tuning a rocket engine with RL

We created a simplified version of a fluid dynamics problem, encountered in rocket engines or gas turbines. Developing control algorithms for a system like this can take up to 3 months of design, testing, and verification. This is a nonlinear control problem that requires an engineer’s ingenuity and time to solve, and could show the feasibility of RL in freeing up engineers’ time.

A nonlinear control problem in fluid dynamics to show the feasibility of developing control algorithms with RL

It was a multidisciplinary project that required collaboration between propulsion engineers to define the problem, simulation engineers to build an accurate simulation of the system, and a machine learning engineer to train an agent. Ultimately, the result shows that RL algorithms can produce a control policy as good as control engineers, and save many months of trial.

Snapshots of simulation state during policy optimization for our fluid dynamics control problem.
An agent trained using PPO, responds to a new goal. The agent autonomously changes input parameters to match the desired output in 4 steps, on average. The result is as good as a control algorithm developed and tuned by an engineer.


Rather than utilizing machine learning techniques to solve a previously near-impossible task, we applied reinforcement learning (RL) to create an impactful solution for the manufacturing of rocket engines. Addressing simpler problems with machine learning is an approach that’s applicable in many areas such as manufacturing, automotive, and aerospace industries. The difficulties faced by these industries are often not visible to RL researchers, and likewise, these industries are often not familiar with advancements in RL. This disconnect highlights the role that a Machine Learning Product Manager play to connect the dots between the capabilities of machine learning and the needs of their product, bridging the gap between disciplines.

The Insight Data Product Management Fellowship provides a collaborative learning environment to bridge this gap. Product Managers, Engineers, and Data Scientists work together to build multidisciplinary products by leveraging their expertise in each of these domains. Insight draws Fellows from diverse backgrounds in engineering, science, and product and, when they come together, we see novel applications of machine learning to successfully attack problems across industries.

A special thank you to Nina Lopatina, Saeed Jahangirian, and Jordan Noone, who were partners on this project.

Interested in transitioning to a career in data? Learn more about the Insight Fellows program and start your application today.