What is Gabriel Gusmão's research focus?

Gabriel Gusmão specializes in Scientific Machine Learning (SciML), Physics-Informed Neural Networks (PINNs), and their applications in chemical engineering and healthcare. He developed Kinetics-Informed Neural Networks and works on inverse problems in transient kinetics.

GlucoScore™ is a proprietary real-time glucose stability metric designed and implemented by Gabriel Gusmão at GlucoSense Inc. for diabetes management.

What are Kinetics-Informed Neural Networks?

Kinetics-Informed Neural Networks (KINNs) are feed-forward artificial neural networks used as basis functions to solve ordinary differential equations constrained by differential algebraic equations describing microkinetic models. The framework was developed by Gabriel Gusmão et al. and published in Catalysis Today.

Where did Gabriel Gusmão get his PhD?

Gabriel Gusmão holds a Ph.D. in Chemical and Biomolecular Engineering from the Georgia Institute of Technology, where he was a graduate research assistant in the Medford Group under Dr. Andrew J. Medford.

Kinetics-Informed Neural Networks

Neural network bases for forward and inverse problems in chemical kinetics, with self-adapting uncertainty through error propagation.

Gabriel S. Gusmão · Based on Catalysis Today & Comp. & Chem. Eng.

Explanation level

Welcome! This page has an interactive toy that lets you watch a computer learn science. You do not need to know any math. Just press buttons, move sliders, and see what happens. The explanation below tells you what is going on, and then you get to try it yourself.

What Is This?

Species A turns into B, then B turns into C

Imagine you are a detective in a science lab. Someone mixed three chemicals together overnight, and by morning, the mixture looks completely different. Your job is to figure out what happened: how fast did the chemicals react? What was the recipe? The only clues you have are some measurements that were taken every few minutes, but they are not perfect because the measuring tool was a little shaky. Can you crack the case? That is exactly what this tool helps you do.

In science, when substances combine and change, we call it a reaction. Reactions follow rules: certain ingredients mix at certain speeds. Scientists write those rules as equations that describe how things change over time. The trouble is, we often do not know the exact rules. All we have are measurements, and those measurements are never perfect. There is always a little bit of error, like trying to read a ruler while your hand is shaking.

Scientists use the word concentration to describe how much of a substance is in a mixture. Think of it like the strength of a juice: a strong juice has a high concentration of flavor, and a weak juice has a low concentration. In the tool below, the chart shows how the concentration of each substance changes over time.

What Is a Neural Network?

A neural network is like a robot brain that learns by practice. You show it examples, it makes a guess, checks how wrong it was, and adjusts itself to do better next time. After thousands of tries, it gets really good at predicting patterns. In this tool, the neural network is learning to draw smooth curves through scattered, noisy data points.

But here is the clever part: we do not let the robot brain guess freely. We teach it the rules of chemistry at the same time. Imagine telling someone: "Draw a curve through these dots, but the curve must always go downhill here and uphill there, and the total amount of stuff must stay the same." Those rules keep the robot from making silly mistakes.

A neural network learns by adjusting itself to reduce mistakes

Forward vs. Inverse

There are two ways to use this tool. The forward problem is like following a recipe: you know exactly how fast the ingredients react, and you predict what happens over time. The inverse problem is the opposite: you taste the cake and try to figure out the recipe. You have measurements of what happened, and you work backwards to discover the hidden rules.

Forward: recipe to result — Inverse: result to recipe

How Sure Are We?

When the robot brain makes a guess, we also want to know: how confident is it? If the measurements are very noisy, the robot should be less sure about its answer. The "Auto" mode in the tool figures this out automatically, adjusting how much to trust the data versus the rules.

Try this: Press Train and watch the colored lines get closer to the dots! Then switch to Inverse mode and see if the robot can figure out the hidden speed values. The dashed lines show the real answers.

The interactive tool below lets you explore this yourself. Press buttons, change settings, and watch the robot brain learn in real time.

What You Will See Below

Below is the playground. You will see a chart: the bottom axis is time (how many seconds have passed), and the side axis is how much of each substance is left. The colored dots are the measurements, and the colored lines are the computer's predictions. At the bottom of the screen, you will see buttons. Start by pressing the big Train button and watch what happens. The lines will slowly move toward the dots!

This page explores how neural networks can solve problems in chemical kinetics, figuring out how fast reactions happen from experimental measurements. The interactive tool below lets you run the whole process in your browser. The text walks through the key ideas first, then you experiment yourself.

Reactions, Rates, and the Problem of Unknowns

Reversible reactions: forward rates (blue) and reverse rates (red) with relative concentrations

Every chemical reaction has a speed limit, and figuring out what it is from messy lab data is one of the oldest puzzles in chemistry. When substance A reacts with substance B to form C, the speed depends on a rate constant, a dial controlling how fast the transformation happens. A high rate constant means the reaction is over in seconds; a low one means it takes hours. But here is the problem: you cannot directly see the rate constant. All you can do is measure concentrations over time and try to work backwards. And those measurements always contain noise.

The mathematical description of how concentrations change over time is called a differential equation. Instead of telling you what the concentration is at each moment, it tells you how fast the concentration is changing. Like the difference between knowing your position on a road versus knowing your speedometer reading. If you know the rate constants and the starting concentrations, you can use the differential equation to trace out the full concentration curve over time.

If you know the rate constants, you can solve the differential equation to predict the concentration curves. This is the forward problem: known rules, predicted outcomes.

But what if you do not know the rate constants? What if all you have is a set of concentration measurements at different times, with experimental error mixed in? Recovering the hidden rate constants from noisy data is the inverse problem, and it is one of the central challenges in experimental science.

Neural Networks as Universal Curve Fitters

A neural network is a universal function approximator: given enough neurons, it can learn to draw any smooth curve through data. During training, the network adjusts its internal parameters to minimize a loss function, which is essentially a score that says "how wrong is the prediction?" The lower the loss, the better the fit.

The problem with unconstrained curve fitting is that the network can overfit. It might produce a curve that passes through every noisy data point perfectly but behaves wildly between them, even predicting negative concentrations, which makes no physical sense.

Training loop: guess, measure error, improve, repeat

Adding Physics to the Mix

Physics-informed neural networks solve this by adding a second term to the loss function: the physics loss. This term penalizes the network whenever its predictions violate the governing differential equations. The network must simultaneously fit the data and obey the laws of chemistry. Conservation laws, like "the total amount of matter is constant," are built directly into the architecture.

This dual constraint is powerful. It prevents overfitting, allows the network to make sensible predictions even outside the range of the training data, and in inverse mode, lets us recover the unknown rate constants alongside the concentration curves.

The hyperparameter α balances how much the network trusts data vs. physics

Automatic Balancing with MLE

One practical challenge is deciding how much to weight the data versus the physics. Too much weight on data and the network overfits to noise. Too much weight on physics and it ignores the measurements. The maximum-likelihood estimation (MLE) formulation solves this automatically. Think of it like a self-adjusting scale: if the data is noisy, the scale tips toward trusting the physics more. If the data is clean, it trusts the measurements more. The scale also tells you how uncertain each estimated rate constant is — noisier data means wider error bars.

Think about it: If you have very noisy data, should the network trust the data more or the physics more? What happens when you increase the noise slider in the tool and retrain?

The interactive tool below lets you explore all of this yourself. Try switching between forward and inverse modes, adjusting the noise, and comparing manual versus automatic weighting.

This page presents an interactive implementation of physics-informed neural networks for chemical kinetics — embedding stoichiometric constraints and conservation laws directly into the network architecture for both forward simulation and inverse parameter estimation from noisy data. The mathematical framework is described below, followed by a fully client-side interactive tool.

The Setup: ODEs and Parameter Estimation

Consider a system of coupled ordinary differential equations describing the time evolution of species concentrations in a chemical reaction network:

$$\frac{d\mathbf{c}}{dt} = \mathbf{M} \cdot \mathbf{r}(\mathbf{c}, \mathbf{k})$$

Here $\mathbf{c}(t) \in \mathbb{R}^n$ is the concentration vector, $\mathbf{M}$ is the stoichiometric matrix encoding how reactions map to species changes, $\mathbf{r}$ is the rate vector (typically power-law or mass-action kinetics), and $\mathbf{k}$ contains the unknown rate constants. For example, the reversible reaction $A + B \rightleftharpoons C$ has the rate expression $r = k_1[A][B] - k_{-1}[C]$.

The forward problem is standard initial value problem (IVP) integration: given $\mathbf{k}$ and $\mathbf{c}(0)$, compute $\mathbf{c}(t)$. The inverse problem is nonlinear parameter estimation: given noisy observations $\tilde{\mathbf{c}}(t_i)$, recover $\mathbf{k}$.

Neural Network Basis Functions

Instead of numerically integrating the ODE at each optimization step, we represent the solution as a neural network $\hat{\mathbf{c}}(t; \boldsymbol{\theta})$ parameterized by weights $\boldsymbol{\theta}$. The key advantage: automatic differentiation gives us $d\hat{\mathbf{c}}/dt$ analytically, so we can evaluate the ODE residual without a numerical integrator. The network serves as a differentiable basis function for the solution manifold.

Why use a neural network instead of a standard numerical ODE solver inside an optimization loop? Because each evaluation of a traditional solver (e.g., Runge-Kutta) is expensive, especially for stiff systems, and must be repeated at every optimization step. The neural network surrogate provides the solution and its derivatives analytically via automatic differentiation, eliminating the need to repeatedly solve the IVP. The entire optimization — network weights, kinetic parameters, and covariance matrices — proceeds through a single, unified gradient computation.

The training objective combines two terms:

$$\mathcal{L} = \underbrace{\frac{1}{N_c}\sum_{j=1}^{N_c} \left\|\frac{d\hat{\mathbf{c}}}{dt}\bigg|_{t_j} - \mathbf{M}\cdot\mathbf{r}(\hat{\mathbf{c}}(t_j), \mathbf{k})\right\|^2}_{j_m \text{ (physics residual)}} + \alpha \cdot \underbrace{\frac{1}{N_d}\sum_{i=1}^{N_d} \left\|\hat{\mathbf{c}}(t_i) - \tilde{\mathbf{c}}_i\right\|^2}_{j_d \text{ (data fidelity)}}$$

The physics term $j_m$ is evaluated at collocation points spanning the entire time domain. The data term $j_d$ is evaluated at observation times.

Collocation points are a grid of time values — typically evenly spaced or sampled from the full time domain — at which we evaluate the ODE residual. Unlike the data points, which are fixed by the experiment, collocation points are chosen by us and can be placed anywhere, including in regions where no data exists. The more collocation points we use, the more tightly the physics is enforced, but at greater computational cost.

Stoichiometric constraints and conservation laws ($\mathbf{M}^{\perp} \cdot \mathbf{c} = \text{const}$, where $\mathbf{M}^{\perp}$ spans the null space of $\mathbf{M}$) are enforced structurally through the network architecture, guaranteeing that mass balance holds for any $\boldsymbol{\theta}$.

SVD decomposes the concentration space into dynamic (range) and conserved (null) subspaces

The Inverse Problem and Parameter Recovery

In inverse mode, $\mathbf{k}$ joins $\boldsymbol{\theta}$ as trainable parameters. The rate constants are parameterized as $k_i = \exp(p_i)$ to enforce positivity and to place them on a log-scale commensurate with the network weights. The optimizer simultaneously learns the concentration profiles and recovers the kinetic parameters from data.

The hyperparameter $\alpha$ controls the data-physics trade-off and is notoriously difficult to set. Too small, and the physics dominates at the expense of data fidelity. Too large, and the network overfits to noise.

Loss landscape: gradient descent spirals toward the optimum k*; contour shape reflects the Fisher information

MLE Formulation: Automatic Weighting and Uncertainty

The MLE reformulation replaces the scalar $\alpha$ with learned covariance matrices. Assuming Gaussian residuals, the negative log-likelihood becomes:

$$\mathcal{L}_{MLE} = \sum_{i} \left[\boldsymbol{\varepsilon}_{x,i}^T \boldsymbol{\Sigma}_x^{-1} \boldsymbol{\varepsilon}_{x,i} + \boldsymbol{\varepsilon}_{\dot{x},i}^T \boldsymbol{\Sigma}_{\dot{x},i}^{-1} \boldsymbol{\varepsilon}_{\dot{x},i}\right] + N\ln|\boldsymbol{\Sigma}_x| + \sum_i \ln|\boldsymbol{\Sigma}_{\dot{x},i}|$$

The physics-residual covariance $\boldsymbol{\Sigma}_{\dot{x},i}$ is not independent. It propagates from the measurement uncertainty through the model Jacobians via error propagation:

$$\boldsymbol{\Sigma}_{\dot{x},i} = \mathbf{J}_c^{(i)} \boldsymbol{\Sigma}_x {\mathbf{J}_c^{(i)}}^T + \mathbf{J}_k^{(i)} \boldsymbol{\Sigma}_k {\mathbf{J}_k^{(i)}}^T$$

where $\mathbf{J}_c^{(i)} = \partial \mathbf{f}/\partial \mathbf{c}|_{t_i}$ and $\mathbf{J}_k^{(i)} = \partial \mathbf{f}/\partial \mathbf{k}|_{t_i}$ are the Jacobians of the rate law with respect to concentrations and parameters, respectively. This couples the data and physics covariances through the model structure, and the optimal weighting emerges from the likelihood maximization itself.

In plain terms: a small uncertainty in measured concentrations propagates through the nonlinear rate expression to produce an uncertainty in the expected rate of change. The Jacobian with respect to concentrations determines how much a concentration error amplifies into a rate error — for a second-order reaction like $r = k[\text{A}][\text{B}]$, the sensitivity to $[\text{A}]$ scales with $[\text{B}]$, so the propagated uncertainty depends on the local state. This coupling means the optimal data-physics weighting is not a fixed number but varies across time and across species, emerging naturally from the structure of the kinetic model.

Once training converges, the Fisher information matrix $\mathcal{I}(\mathbf{k}) = -\mathbb{E}[\nabla^2 \ln L]$ quantifies the curvature of the likelihood surface at the optimum. Its inverse provides the Cramér-Rao lower bound on parameter variance, yielding confidence intervals on each recovered rate constant.

Uncertainty propagation: data covariance flows through Jacobians to physics covariance; Fisher information feeds back

Think about it: Why does the physics-residual covariance depend on the Jacobian of the rate law? Consider what happens when a small error in concentration $\delta c$ propagates through a nonlinear rate expression $r(c, k)$.

The interactive tool below implements this full pipeline. You can compare the fixed-$\alpha$ formulation against the MLE approach, observe covariance evolution during training, and see how parameter uncertainty depends on noise level and data density.

How do you figure out the rules governing a system when all you have is noisy measurements? This is the inverse problem, and it arises everywhere: a chemist measuring reaction rates to infer activation energies, a biologist fitting population dynamics to recover growth constants, an engineer calibrating a turbulence model from wind-tunnel data. In each case, the underlying physics is described by differential equations whose parameters are unknown, and the goal is to recover those parameters from observations alone.

Neural networks can approximate any continuous function, which makes them powerful interpolators. But without constraints, they overfit to noise and produce predictions that violate basic physical laws: conservation of mass, non-negative concentrations, and thermodynamic consistency. Physics-informed neural networks (PINNs) [3] address this by encoding the governing differential equations directly into the training objective. The network must simultaneously fit the data and satisfy the physics. This dual constraint acts as a powerful regularizer: it prevents overfitting, enables extrapolation beyond the training window, and, crucially, allows the unknown parameters of the differential equations to be recovered alongside the solution itself.

We developed Kinetics-Informed Neural Networks (KINNs) [1] as a specialized PINN framework for chemical kinetics, embedding stoichiometric constraints, conservation laws, and mass-action rate expressions into the neural network architecture. KINNs operate in two complementary modes: the forward problem, where known rate constants yield smooth concentration profiles satisfying the ODE system; and the inverse problem, where the rate constants themselves become trainable parameters recovered from noisy data. We then reformulated the inverse problem using maximum-likelihood estimation (MLE) [2], eliminating the need for manual hyperparameter tuning and enabling automatic uncertainty quantification on every recovered parameter.

The interactive tool below lets you experience all of this in your browser. Configure the network architecture, toggle between forward and inverse modes, add noise, switch between the original and MLE-based formulations, and watch training unfold in real time. Every computation runs client-side in JavaScript, implementing the same mathematical framework described in the original publications [1], [2].

k₁=1.00 k₋₁=0.50

σ=0.05

Rate Constants

Measurement Noise

σ 0.05

Each experiment has its own initial conditions, data setup, and NN architecture

Cost Function ▾

α = 1.00 ⚙ tap to adjust

Loss Curves

Sum of data and physics loss terms

Total - Data - Physics -

Parameter Convergence

Rate constant estimates vs epoch. Bands = ±2σ Fisher CI.

Estimated k (±2σ) ⓘ
Recovered k ± 2σ. Green = within CI.

Settings

Optim

α 1.00

Algo

Smooth EMA

Colloc +obs

Epoch 0

ⓘ Take a tour

What Just Happened?

When you pressed Train, the robot brain started learning. The colored lines on the chart show its predictions. At first they were way off, but with each step the robot adjusted itself and the lines got closer to the dots. The dots are the measurements, and the lines are the robot's best guess at the pattern.

Following the Recipe (Forward)

In Forward mode, the robot already knows the speed of the reaction. Its job is just to draw smooth curves that follow the rules of chemistry. Think of it like coloring inside the lines: the rules tell it where the lines are, and it fills in the picture.

Try this: Change the number of layers or neurons and retrain. Does the robot learn faster or slower? What happens with more neurons?

Figuring Out the Recipe (Inverse)

In Inverse mode, the robot does not know the speeds. It has to figure them out! Watch the Parameter Convergence chart: the solid lines are the robot's guesses, and the dashed lines are the real answers. As training goes on, the guesses get closer to the truth.

Why Rules Matter

Without the chemistry rules, the robot might draw curves that make no sense, like having a negative amount of something, which is impossible. The rules act like guardrails, keeping the robot on track even when the data is messy.

Try this: Increase the noise slider and retrain. The dots become more scattered, but the robot still draws sensible curves because it knows the rules!

Beyond This Tool

Scientists use tools like this to study all sorts of things: how medicines move through your body, how pollution spreads in the air, and how batteries charge and discharge. Anytime something changes over time and follows rules, this kind of approach can help figure out those rules from real-world data.

The Forward Problem: Predicting From Known Rules

In forward mode, the rate constants are known. The neural network's job is to learn smooth concentration curves that satisfy both the data points and the underlying differential equation. The physics loss measures how well the network's predicted rate of change matches what the differential equation says it should be. The data loss measures how close the predictions are to the actual measurements.

Together, these two terms guide the network toward curves that are both data-consistent and physically meaningful. Even with few data points, the physics prevents the network from producing nonsensical curves.

The Inverse Problem: Discovering Hidden Parameters

Switch to Inverse mode and watch what happens. Now the rate constants are unknown, and the network must figure them out from the data alone. The optimizer adjusts both the network weights (which control the shape of the curves) and the rate constants simultaneously.

Watch the Parameter Convergence chart. The estimated rate constants start at random values and gradually converge toward the true values (dashed lines). With more data and less noise, convergence is faster and more accurate.

Think about it: What happens if you set the physics weight to zero? Try it by setting alpha to a very small value. Without physics constraints, can the network still recover the correct rate constants?

Automatic Weighting: The MLE Approach

Choosing how much to trust the data versus the physics is tricky. The Auto alpha mode uses a statistical method called maximum-likelihood estimation to figure out the optimal balance automatically. It also estimates how uncertain each rate constant is: noisier data leads to wider confidence intervals, reflecting less certainty about the recovered parameters.

Why Physics Constraints Enable Extrapolation

Reduce the training region using the sliders so the network only sees early-time data. A pure curve fit would go haywire outside that region, but with physics constraints, the network continues to produce sensible predictions because the differential equation governs the behavior everywhere, not just where data exists.

Think about it: This principle applies far beyond chemistry. Population growth models, epidemic spread, radioactive decay, and even economic models all use differential equations with unknown parameters. The same approach could recover those parameters from observed data.

Forward Mode: The Neural ODE Surrogate

In forward mode, the rate constants $\mathbf{k}$ are fixed. The network $\hat{\mathbf{c}}(t;\boldsymbol{\theta})$ is trained to minimize the combined objective $\mathcal{L} = j_m + \alpha \cdot j_d$. The physics residual $j_m$ is evaluated at collocation points distributed across the time domain, while $j_d$ is evaluated at observation times. Boundary conditions are enforced structurally: $\hat{\mathbf{c}}(0;\boldsymbol{\theta}) = \mathbf{c}_0$ for all $\boldsymbol{\theta}$ through an affine transformation of the network output.

The stoichiometric null-space constraint $\mathbf{M}^{\perp}\hat{\mathbf{c}}(t) = \mathbf{M}^{\perp}\mathbf{c}_0$ guarantees conservation of mass at every point. This is achieved through SVD decomposition of $\mathbf{M}$, projecting the network output onto the range space and reconstructing the full concentration vector using the conserved quantities from initial conditions.

Inverse Mode: Simultaneous Recovery

In inverse mode, $\mathbf{k}$ becomes a trainable parameter vector. The exponential parameterization $k_i = \exp(p_i)$ ensures positivity and places the parameters on a log-scale. The gradient $\partial\mathcal{L}/\partial p_i$ flows through both the rate expressions and the physics residual, enabling simultaneous optimization of network weights and kinetic parameters.

Observe the Parameter Convergence chart. The convergence trajectory reveals the identifiability landscape: well-identified parameters converge quickly and monotonically, while poorly identified ones oscillate or converge slowly. This behavior is directly related to the eigenspectrum of the Fisher information matrix.

MLE and Uncertainty Quantification

The MLE formulation replaces the fixed $\alpha$ with covariance matrices that adapt during training. The key relationship is the error propagation formula linking $\boldsymbol{\Sigma}_x$ to $\boldsymbol{\Sigma}_{\dot{x}}$ through the Jacobians of the rate law. This means the relative weighting between data and physics is not arbitrary but emerges from the structure of the model and the quality of the data.

After convergence, the Hessian of the negative log-likelihood evaluated at the MLE provides the observed Fisher information. Its inverse gives asymptotic confidence intervals: $k_i \pm z_{\alpha/2}\sqrt{[\mathcal{I}^{-1}]_{ii}}$. These intervals widen with increasing noise and narrow with increasing data density, which you can verify by adjusting the noise and data point sliders.

Think about it: Try reducing the training region to $t \in [0, 2]$ and compare fixed-$\alpha$ with MLE in inverse mode. Which formulation extrapolates better? Why might the MLE approach be more robust when data is limited?

Extrapolation and the Role of the Physics Prior

The physics loss evaluated at collocation points beyond the training region acts as an inductive bias: the network's predictions must satisfy the ODE everywhere, including where no data exists. This is equivalent to imposing a prior that the solution lies on the ODE manifold, and it is what enables extrapolation that a standard neural network regression cannot achieve.

The framework generalizes to any system of coupled nonlinear ODEs with linear constraint structure. The stoichiometric decomposition, conservation enforcement, and MLE uncertainty propagation are independent of the specific reaction mechanism. The same approach applies to compartmental pharmacokinetic models, metabolic network dynamics, and any system where the state evolves on a manifold defined by conservation laws with unknown rate parameters.

The Forward Problem

The forward problem consists of solving the kinetic ODE system given known rate constants ($k_1$, $k_{-1}$, $\ldots$) and initial conditions. The neural network $\hat{\mathbf{x}}(t; \boldsymbol{\theta})$ serves as a surrogate approximator for the concentration profiles over time. Under the mean-field approximation, the governing ODE relates species concentrations to reaction rates through the stoichiometric matrix:

$$\frac{d\mathbf{c}}{dt} = \mathbf{M} \cdot \mathbf{r}(\mathbf{c}, \boldsymbol{\theta})$$

For the simple reversible reaction $A + B \rightleftharpoons C$, the rate expression follows from power-law kinetics:

$$A + B \rightleftharpoons C, \quad r = k_1[A][B] - k_{-1}[C]$$

The training objective is formulated as the minimization of two complementary terms. The data loss ($j_d$) measures the mean squared error between the network output and observed concentrations at sampled time points. The physics loss ($j_m$) penalizes the residual of the ODE system at collocation points throughout the time domain, enforcing the constraint $\tfrac{d\hat{\mathbf{x}}}{dt} = \mathbf{f}(\hat{\mathbf{x}}, \mathbf{k})$, where $\mathbf{f}$ encodes the mass-action kinetics.

The physics loss acts as a regularizer: even with very few data points, the network cannot learn concentration profiles that violate conservation laws or kinetic constraints. This is fundamentally different from a purely data-driven fit, which may overfit to noise or produce physically impossible trajectories. Structural boundary condition operators further guarantee that the initial conditions are satisfied for any set of network weights, as described in the original KINNs formulation [1].

The Inverse Problem

Toggle the mode to Inverse in the tool above. In this configuration, the rate constants are unknown and become additional trainable parameters alongside the network weights. The optimizer must simultaneously learn the concentration profiles and recover the rate constants from data alone. Formally, we combine the residual between the surrogate approximator and observed data ($j_d$) with the physics regularization term ($j_m$):

$$\mathcal{L} = \frac{1}{d}\sum_{i=1}^{d} \|\dot{\mathbf{x}}_{NN}(t_i) - \mathbf{f}(\mathbf{x}_{NN}(t_i), \boldsymbol{\theta})\|^2 + \alpha \cdot \frac{1}{d}\sum_{i=1}^{d} \|\mathbf{x}_{NN}(t_i) - \tilde{\mathbf{x}}_i\|^2$$

This formulation addresses the inverse problem at the heart of experimental kinetics: given time-series measurements of species concentrations, determine the underlying rate law parameters. Traditional approaches require solving a stiff ODE system at every optimization step. KINNs circumvent this by having the neural network itself serve as the ODE solution, with automatic differentiation providing the time derivatives directly. The kinetic parameters are trained in an exponential mapping ($k = \exp(p)$) to ensure non-negativity and to place them on a similar scale as the network weights, enabling simultaneous optimization [1].

Observe the Parameter Convergence chart as training progresses. The estimated rate constants converge toward the true values (shown as dashed lines). The speed and accuracy of convergence depend on the noise level, the amount of available data, and the relative weighting between data fidelity and physics constraints, as controlled by the parameter $\alpha$.

Self-Adapting Uncertainty Through Error Propagation

In the original KINNs formulation, the total loss is a weighted sum: $\mathcal{L} = j_m + \alpha \cdot j_d$. The regularization hyperparameter $\alpha$ conveys the ratio between the variances of the data residuals (observed states) and the physics residuals (state derivatives) [1]. Selecting an appropriate value of $\alpha$ is notoriously difficult; if $\alpha$ is too small the physics constraints dominate and the data is ignored, while if $\alpha$ is too large the network overfits to noisy observations at the expense of physical consistency.

To address this hyperparameterization issue, we reformulated the inverse problem in terms of maximum-likelihood estimators (MLE), giving rise to robust-KINNs (rKINNs) [2]. Instead of a fixed scalar $\alpha$, rKINNs learn full covariance matrices $\boldsymbol{\Sigma}_x$ (data) and $\boldsymbol{\Sigma}_f$ (physics) that are continuously updated based on the incumbent mismatch between observations and network interpolations. The loss function becomes the negative log-likelihood:

$$\mathcal{L}_{MLE} = \frac{1}{d}\sum_{i=1}^{d} \left[\boldsymbol{\varepsilon}_{\dot{x},i}^T \boldsymbol{\Omega}_{\dot{x},i} \boldsymbol{\varepsilon}_{\dot{x},i} + \boldsymbol{\varepsilon}_{x,i}^T \boldsymbol{\Omega}_x \boldsymbol{\varepsilon}_{x,i}\right]$$

A key insight of the rKINNs formulation is that the physics-residual covariance is not independent of the measurement uncertainty. It propagates from the interpolation residuals through the Jacobians of the kinetic model at each time point:

$$\boldsymbol{\Sigma}_{\dot{x},i} = \mathbf{J}_x^{(i)} \boldsymbol{\Sigma}_x \mathbf{J}_x^{(i)T} + \mathbf{J}_p^{(i)} \boldsymbol{\Sigma}_p \mathbf{J}_p^{(i)T}$$

The relative weighting between data and physics thus emerges naturally from the data itself, and the hyperparameter $\alpha$ is no longer needed. Furthermore, the MLE construction provides a statistical backdrop for the interplay between data interpolation and physical model agreement, which enables parameter uncertainty estimation based on second-order analysis (Hessian-based uncertainty quantification) [2].

To observe this in practice, switch the algorithm to Auto α in the tool and note the α auto badge. The alpha slider is disabled because the covariance matrices handle the weighting automatically. The Covariance Evolution chart shows how $\text{tr}(\boldsymbol{\Sigma}_x)$ and $\text{tr}(\boldsymbol{\Sigma}_f)$ evolve during training.

Extrapolation Through Physical Constraints

One of the most compelling advantages of embedding physical constraints into the neural network is the ability to extrapolate beyond the training domain. A purely data-driven model will diverge rapidly outside the distribution of observed data, whereas a KINNs model is constrained by the ODE system at every collocation point, including regions where no training data is available.

To observe this, reduce the training region to $t \in [0, 2.5]$ using the Training Region sliders in the configuration panel. Then examine the predictions in the $t \in [2.5, 5.0]$ range. With physics constraints active, the network continues to produce physically consistent extrapolations that respect the governing differential equations. Without them (set $\alpha = 0$), the predictions quickly become nonsensical.

The mechanism behind this extrapolation is worth emphasizing. The data loss $j_d$ is evaluated only at observation times within the training region, but the physics loss $j_m$ is evaluated at collocation points spanning the entire time domain, including regions where no measurements exist. The neural network basis must satisfy $\tfrac{d\hat{\mathbf{x}}}{dt} = \mathbf{M} \cdot \mathbf{r}(\hat{\mathbf{x}}, \mathbf{k})$ everywhere, not just where data is available. This is what allows the network to produce physically consistent predictions beyond the training window: the ODE constraint propagates information from the data-rich region into the data-free region through the governing equations.

This capability is particularly relevant for experimental kinetics, where measurements are expensive and time-limited. KINNs can leverage sparse early-time data to predict late-time behavior, which is precisely the scenario where traditional regression methods struggle most. More broadly, it illustrates the benefit of physics-informed approaches: the inductive bias provided by the kinetic model enables generalization that would be unattainable with flexible but unconstrained function approximators.

Generalization to Coupled Nonlinear Systems

While we demonstrate the framework on chemical reaction networks, the underlying mathematical structure is far more general. The essential ingredients are: a system of coupled nonlinear ODEs, a fixed linear map (the stoichiometric matrix $\mathbf{M}$) relating state derivatives to a nonlinear rate vector, partial observability of the state, and unknown parameters to be recovered from noisy measurements.

This structure appears across many domains. In systems biology, gene regulatory networks and metabolic pathways obey analogous stoichiometric constraints with unknown kinetic parameters. In pharmacokinetics, compartmental models describe drug absorption and elimination through coupled ODEs with rate constants inferred from blood concentration data. In atmospheric chemistry, hundreds of coupled radical reactions govern ozone depletion and pollutant formation, with rate constants that must be extracted from spectroscopic measurements. In reactor engineering, microkinetic models with dozens of elementary steps and surface intermediates require simultaneous estimation of activation energies from transient experiments.

The key generalization is this: any high-dimensional process governed by coupled nonlinear ODEs, where the state evolves on a manifold defined by a linear constraint structure (conservation laws, stoichiometric balance, site balance), and where parameters must be recovered from partial, noisy observations, falls within the scope of this framework. The neural network provides a universal basis for the solution manifold. The stoichiometric SVD decomposes the problem into range and null space components, separating the dynamics from the conserved quantities. The MLE formulation propagates measurement uncertainty through the model Jacobians, automatically weighting the data-physics trade-off at every point in time and in every direction of the stoichiometric subspace.

The interactive tool above implements this complete pipeline for simple reaction systems. The same mathematical machinery, implemented in JAX with automatic differentiation and XLA compilation [1], [2], [4], scales to the high-dimensional microkinetic models encountered in heterogeneous catalysis, where the parameter space spans hundreds of rate constants and the state includes both gas-phase and surface species.

References

Gusmão, G. S., Medford, A. J. (2021). Kinetics-Informed Neural Networks. Catalysis Today, 375, 474-483. doi:10.1016/j.cattod.2020.06.067
Gusmão, G. S., Medford, A. J. (2024). Maximum-likelihood Estimators in Physics-Informed Neural Networks for High-dimensional Inverse Problems. Computers & Chemical Engineering, 182, 108547. doi:10.1016/j.compchemeng.2023.108547
Raissi, M., Perdikaris, P., Karniadakis, G. E. (2019). Physics-Informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations. Journal of Computational Physics, 378, 686-707. doi:10.1016/j.jcp.2018.10.045
Gusmão, G. S. (2023). Neural-Network Representations of Chemical Kinetics. Ph.D. Dissertation, Georgia Institute of Technology. Georgia Tech Repository
Source code: github.com/gusmaogabriels/kinn (branch: rkinns)

Gabriel S. Gusmão