## Abstract

A data-driven methodology is introduced for computationally efficient integration of systems of stiff rate equations in chemical kinetics using neural ordinary differential equations (NODE). A systematic algorithm is developed for training data generation and sampling. Subsequently, a novel transformation technique for sampled training data is designed to regularize the neural network parameters, leading to a stable training process. Finally, the NODE network is iteratively trained to learn the accurate neural network representation of chemical kinetics source terms by minimizing the mean absolute error between the true and predicted solutions. The computational efficiency and accuracy of the NODE network are evaluated by simulating the evolution of the thermochemical state of a constant pressure homogeneous hydrogen-air reactor. The combustion of hydrogen in air is described by a finite-rate mechanism including 9 chemical species and 21 reaction steps. The NODE network shows excellent multi-step prediction accuracy for a wide range of initial temperatures and equivalence ratios, spanning the composition space of real flames. The NODE also exhibit a significant reduction in numerical stiffness of the system, enabling the utilization of explicit solvers for integration. The present simulation results using NODE demonstrate up to 70% speed up in computation time compared to direct integration of the chemical mechanism with at most 3.16% relative error in ignition delay time.

## 1 Introduction

Reliable simulation of turbulent reacting flows in combustion systems is integral to the efficient design of engineering devices such as internal combustion engines, gas turbines, industrial furnaces, and propulsion systems. These simulations require solving fully coupled transport equations for chemical species, which is often computationally prohibitive due to the numerical stiffness introduced by chemical kinetics. Operator splitting is an approach that can facilitate the solution procedure by separating the mixing process (due to convection and diffusion) from the chemical reaction effects and solving the corresponding equations in two separate steps. The latter presents a system of stiff ordinary differential equations (ODEs) [1–3], whose integration is a primary bottleneck in these simulations because of the high nonlinearity and stiffness resulting from the chemical reaction source term. This issue is further exacerbated by a large number of chemical species (typically of order 10 − 10^{4}) involved in practical combustion applications, which increases the dimensionality of the species composition space [4]. Owing to these challenges, direct integration of chemical kinetics remains infeasible for many practical applications. This can be alleviated using efficient data-driven methods which offer a new way to realize the realistic implementation of combustion chemistry in practice.

On the traditional modeling front, mechanism reduction is one approach to mitigating the computational cost associated with chemistry [5]. Mechanism reduction involves abating the cost of chemistry integration by reducing the size and stiffness of the detailed kinetics mechanism (DKM). Here, the size of the chemical mechanism is decreased through a combination of skeletal and time-scale based reduction methods [5]. Skeletal reduction entails elimination of unimportant species and reactions from DKM through methods such as sensitivity analysis, directed relation graph [6], principal component analysis [7], and computational singular perturbation [8]. Time-scale based reduction consists of reduction in independent variables through quasi-steady-state approximations (QSSA) [9], partial-equilibrium approximations [10], rate-controlled constrained-equilibrium [11–16], and intrinsic low dimensional manifold [17]. Although these approaches aid in mitigating the computational demand, the resulting reduced mechanisms often retain some degree of stiffness, requiring the use of stiff ODE solvers to obtain the solution. Subsequently, in some research studies, skeletal/reduced mechanisms are accompanied by tabulation methods to further alleviate the computational cost in large-scale simulations. Look-up table [18] is one common tabulation strategy where integrals of reaction rates corresponding to the possible composition space are computed in advance and stored in a table. During the simulation, species rates are accessed through straightforward interpolation on table records. While this approach can reduce CPU time, it is only suitable for cases with a small number of species as the memory requirement to store the records grows exponentially with the dimensionality of the composition space. The pivotal approach towards circumventing this problem is in-situ adaptive tabulation (ISAT), proposed by Pope [4]. ISAT employs on-the-fly tabulation of composition space visited during the simulation and retrieval of the states via a binary search tree. Despite the reduction of CPU time and memory requirement asymptotically, ISAT necessitates performing direct integration of DKM at the initial stages of the simulation, which can make the tabulation prohibitively demanding for practical applications. Furthermore, as more regions of the composition space are accessed, the storage requirement and average retrieval times to traverse the search tree can increase considerably.

In recent years, data-driven approaches to scientific computing have gained popularity as a method of overcoming computational bottlenecks in traditional modeling approaches. Amongst them, artificial neural networks (ANNs) are particularly lucrative because of their capability to approximate highly nonlinear functions. ANNs are apt for a wide range of tasks, including nonlinear regression, clustering, pattern recognition, and time series prediction. Numerous researchers have devoted their attention to the ANN-based regression approaches for chemistry integration to tackle the high computational cost of direct integration of DKM and elevated storage requirement of tabulation methods [19–28]. Christo et al. [24] used ANN to incorporate chemistry in turbulent simulations using a reduced three-step H_{2}/CO_{2}/O_{2} mechanism. The methodology was further refined by Blasco et al. [19,20] to model more complex methane chemistry with 13 species. A slightly different approach was taken by Chen et al. [21], which involves fitting the results obtained from ISAT with ANN to reduce the storage requirement associated with ISAT. These studies reported good agreements with direct integration calculations in addition to considerable savings in computational time and memory requirements. They however identified a major difficulty in generating the representative datasets, challenging the generalization of ANNs to actual conditions encountered in real flames. Thereafter, Chatzopoulos and Rigopoulos [26] proposed a methodology to generate training data with an abstract problem (such as laminar flamelets) spanning the expected composition space and applying the trained ANNs to simulate real turbulent flames. This concept was further applied to the simulation of more complex Sydney flame L for methane combustion that features significant levels of local extinction and re-ignition [25]. Seemingly, all prior research work predominantly focused on employing feed-forward neural networks as single-step chemistry integrators with only slight differences in their network architectures. The model inputs comprise of the species concentration and enthalpy at a given time, and outputs are the species concentration/chemical source terms at the next time-step. Such ANN models trained via minimizing single-step prediction errors have limited applicability in multi-step predictions. By recursively feeding the outputs from the last time-step to the ANN as new inputs, the model could accumulate larger errors in longer-time predictions. This challenge was addressed in Ref. [29] where the neural network architecture of the residual network model (ResNet) is employed to minimize multi-step prediction error.

The primary objective of the present study is to address the aforementioned challenges by introducing a novel data-driven framework that can further reduce the computational cost associated with the integration of chemical kinetics. This framework is based on a recent class of machine learning algorithms called neural ordinary differential equations (NODE) to build a multi-step chemistry integrator. The NODE algorithm, which is a discrete counterpart of ResNet [30], is a hybrid of neural net and differential equation solvers with several major properties: First, embedding a differential equation solver within the network immensely simplifies its architecture, consequently reducing its memory requirements. Second, the NODE network directly minimizes the difference between the predicted and true solutions at all intermediate points along the solution trajectory, providing better accuracy than ResNet with a similar architecture [30]. Finally, the NODE network can be designed to reduce the stiffness of the original ODE by incorporating non-stiff ODE solvers during the training process. These properties make the NODE network particularly suitable for chemical kinetics integration. The work of Owoyele and Pal [31] is among the first studies to demonstrate the use of the NODE network for combustion chemistry. This study indicated that attempting to learn the source terms for all species concurrently was too unstable because of the nonlinearity associated with the dynamics of various species. Hence, separate networks were trained for each major species and temperature, while minor species were neglected. Considering the strongly coupled nature of chemical kinetics, this may lead to erroneous results when this model is used in computational fluid dynamics calculations. The novelty of the present study lies in devising a systematic approach to pre-process and regularize the data to stabilize the training of NODE and to learn the source terms of all the species with a single network. Besides NODE, Ji et al. [32] demonstrated a different approach of physics-informed neural network (PINN) for chemical kinetics integration [32]. This study elucidated the limitations of using PINN for stiff chemistry. To address the issue of stiffness, at first, the study employed QSSA to relieve the stiffness of the original ODE systems and then applied PINN to the converted non/mild-stiff systems. In the present work, we illustrate an approach to reducing the stiffness of the underlying chemical kinetics ODE by embedding an explicit ODE solver in the NODE framework.

The paper is organized as follows: Sec. 2 outlines the basics of NODE and chemically reactive systems, followed by research methodology to develop an NODE-based chemistry integrator framework. Section 3 discusses the performance and numerical efficiency of the NODE network for hydrogen-air auto-ignition in a canonical zero-dimensional constant pressure homogeneous reactor. Finally, Sec. 4 provides the concluding remarks along with an overview of the capabilities of the proposed methodology.

## 2 Research Methodology

In the present study, we develop a novel machine learning-assisted chemistry integrator for chemically reactive systems by employing the NODE framework. The primary aim is to speed up the integration of chemical kinetics particularly for turbulent combustion simulations. This is achieved by developing a systematic algorithm for data generation and pre-processing, and designing a stable NODE network to reduce stiffness and control multi-step prediction errors. To take into account species composition space representative of typical flames, a constant pressure homogeneous reactor containing a gas mixture spanning a range of initial conditions is adopted for training the model. The basics of NODE formulation, chemical kinetics specifications of DKM, and NODE network training are explained in the following subsections.

### 2.1 Basics of NODE.

In classical data modeling, for a given set of *N* pairs of data points $D={(x1,y1),(x2,y2),\u2026,(xN,yN)}$, the objective is to find a function that best describes the data. Neural networks consist of a large number of parameters, which make them capable of learning complex patterns found in data accurately. The modeling for the dataset *D* with machine learning can be broadly categorized into two approaches:

Regression: In this approach, the neural network approximates the direct functional relationship between input domain

*x*and output domain*y*. This can be achieved by designing a neural network and optimizing its parameters to minimize the loss function*L*. This function provides a measure of the difference between the true and predicted*y*values and can be defined as the mean absolute error, mean squared error (MSE), cross-entropy loss, etc. depending on the problem under consideration.NODE: In this approach, the neural network approximates the rate of change of

*y*with*x*(i.e., $dydx$) as opposed to the direct relationship between*x*and*y*. Similar to the regression approach, the objective in NODE is to optimize the network parameters to minimize*L*. The intermediate (*x*,*y*) values are then evaluated by numerically integrating the neural network for a given initial condition using an ODE solver.

**z**(

*t*) and

**f**(

**z**(

*t*),

*t*) are the dependent variable and the exact source term arrays, respectively. The objective of NODE network is to model

**f**(

**z**(

*t*),

*t*) with approximate parametric function $f^(z(t),t,\theta )$, where

**denotes the network parameters. Consider, for example, a section of**

*θ***z**(

*t*) trajectory between

*t*=

*t*

_{i}and

*t*=

*t*

_{i+1}. The approximate

**z**(

*t*) between these two points is found by numerically integrating $f^(z(t),t,\theta )$ with initial condition

**z**(

*t*

_{i}). The scalar loss function

*L*measures the difference between the approximated

**z**(

*t*

_{i+1}) and the true

**z**

_{i+1}values

**are determined by minimizing the loss function value. This optimization requires the gradients of loss function with respect to its parameters (**

*θ**t*,

**z**(

*t*) and

**). In contrast to regression neural networks, these gradients are not readily available since the dependency of loss function to its parameters is implicit and dynamic through the integration operation in Eq. (3). The main challenge in computing these gradients is performing reverse-mode differentiation or backpropagation through the ODE solver. One approach is to differentiate through the forward pass operations, which is straightforward but incurs high memory demands and introduces additional numerical errors [31]. A more efficient approach is to utilize the adjoint sensitivity method [31]. In this method, the gradients are computed by solving a second augmented ODE system backward in**

*θ**t*(i.e., from

*t*

_{N}to

*t*

_{0}). We define the adjoint

**a**(

*t*) as

*L*on the intermediate (hidden) states

**z**(

*t*) reached during the integration. As detailed in Ref. [31], the adjoint is governed by the following adjoint differential expression:

*L*/∂

**z**(

*t*

_{N}) to compute all hidden state gradients ∂

*L*/∂

**z**(

*t*) back to ∂

*L*/∂

**z**(

*t*

_{0}). Besides the adjoint, optimizing

*L*requires the calculation of loss function gradient with respect to

**. This quantity may be written in terms of ∂**

*θ**L*/∂

**z**(

*t*) as

**z**(

*t*) states are obtained numerically at the integration sub-steps and their dependency on

**is not known in advance. Chen et al. [31] resolve this problem by treating the network parameters**

*θ***to be a part of the system dynamics; i.e.,**

*θ***is state-dependent and determined alongside**

*θ***z**(

*t*) and

*t*as a part of the numerical integration. Accordingly,

**z**(

*t*),

**(**

*θ**t*), and

*t*(

*t*) are obtained concurrently by defining an augmented system of ODEs

*t*to obtain all derivatives of

*L*:

**a**(

*t*) = ∂

*L*/∂

**z**(

*t*),

**a**

_{θ}(

*t*) = ∂

*L*/∂

**(**

*θ**t*), and

**a**

_{t}(

*t*) = ∂

*L*/∂

*t*(

*t*). For example, for the part of the trajectory between

*t*

_{i}and

*t*

_{i+1}, assuming known initial conditions at

*t*

_{i+1}as follows:

*t*=

*t*

_{i}are obtained as

*t*to find the derivatives at

*t*

_{i}, equivalent to differentiation using backpropagation in conventional neural networks. For the entire

*z*(

*t*) trajectory, following this procedure with the initial conditions at

*t*

_{N}, similar to those in Eq. (12), the derivatives at all data points

*t*

_{N−1}, …,

*t*

_{0}can be obtained. The backpropagation along with the forward pass provides the

**z**(

*t*) states at all intermediate

*t*values.

### 2.2 Chemical Kinetics Specifications and Training Data Generation.

*t*) variation of the thermochemical state of the system can be expressed as

*N*

_{s}number of species $[y1,y2,\u2026,yNs]$ and temperature (

*T*), i.e., $\Phi =[\Phi 1,\Phi 2,\u2026,\Phi Ns,\Phi Ns+1]\u2261[y1,y2,\u2026,yNs,T]$. The source terms

**S**are due to chemical reaction (for $\Phi 1,\Phi 2,\u2026,\Phi Ns$) and heat release (for $\Phi Ns+1$). The chemical kinetics mechanism in this study consists of nine species: H

_{2}, H, O

_{2}, OH, O, H

_{2}O, H

_{2}O

_{2}, HO

_{2}, and N

_{2}along with 21 reaction steps [34,35].

During the training step, the source term array **S** described by DKM is integrated to generate $\Phi (t)$ as the training data for NODE. To generate the training data, 25 constant pressure homogeneous hydrogen-air reactors, whose temporal thermochemical states are governed by Eq. (14), are considered with a wide range of initial conditions. Equation (14) is integrated in time to generate representative composition space for these reactors. Here, we consider the initial temperature ranging from 1000 K to 1200 K and equivalence ratios in the range 0.5 − 1.5 at atmospheric pressure (101325 Pa). Solution for each of the 25 cases is advanced from *t* = 0 to 10^{−2} s with fixed *dt* = 10^{−6} s, generating 10^{4} uniformly spaced points along the temporal composition trajectory for each simulation. As a result, a training dataset of 25 × 10^{4} samples is obtained. In this study, we utilize the ODE solver LSODA via scipy toolkit [36] for direct integration of Eq. (14) with DKM description of chemistry. LSODA is a hybrid solver, switching automatically between the non-stiff Adams method and the stiff backward differentiation formula (BDF) solver. Cantera [37] open-source software is used for thermodynamics and chemical kinetics calculations. The use of a complete dataset (25 × 10^{4} data points) for training is redundant as most composition states reach equilibrium after the ignition. Hence, we strategically sample a subset from the complete solution and pre-process it to accelerate the training process. Samples of uneven log-spaced 50 points are chosen out of 10^{4} time-steps for each simulation. Half of the sampled points are placed before the ignition delay time and the rest after the ignition. The ignition delay time is defined as the instant when the temperature reaches *T*_{0} + 400 K. An example of sampling is shown in Fig. 1 for three cases at initial temperatures of *T*_{0} = 1000 K, 1100 K, 1200 K, and stoichiometric equivalence ratio. In this study, 50 samples are chosen as they can capture the dynamics of a single ignition curve well with reasonable computation time for training the NODE network. Depending on the desired accuracy and computational resources available a different number of samples can be chosen for the training of the network.

^{−3}− 10

^{0}while those for minor species are in the order 10

^{−3}− 10

^{−8}. Therefore, for the stability of the network, all the composition variables are normalized based on the respective minimum $\Phi min$ and maximum $\Phi max$ values of each scalar obtained from the complete training dataset

### 2.3 Architecture and Training of NODE Network.

*t*with the desired accuracy at a reduced computational cost compared to direct integration of DKM according to Eq. (14). Figure 2 shows a graphical representation of NODE chemistry integrator architecture for a single training sample which consists of randomly chosen initial conditions and integration time, i.e., $(t0,\Phi \xaf0)$ and

*t*

_{N}as input, and corresponding integrated values, i.e., $\Phi \xafN$ as output. The integrator has two parts: a neural network $Net(\Phi \xaf(t),\theta ,t)$ and an embedded ODE solver. The neural network consists of three hidden layers comprising of 150 neurons each. Each hidden layer is followed by a nonlinear exponential linear unit (ELU) activation function. ELU is chosen to mitigate potential issues with vanishing gradients of loss function caused by certain activation functions (e.g., sigmoid function) making the network hard to train. The explicit Runge–Kutta method of order (4)5 (dopri5) [33] with absolute and relative error tolerances 10

^{−9}and 10

^{−7}is chosen, respectively, for integration during the training. The embedding of an explicit solver ensures that the network learns the non-stiff representation of the chemical reaction source term through training. These error tolerances are selected to match the accuracy level of the ODE solver used during the training data generation. Nevertheless, different tolerance values can be chosen depending on the accuracy needed when utilizing the network after the training.

The network is trained by sending data in batches for better learning performance. A training batch consists of *N* = 50 filtered time instances on the composition space evolution trajectory for *I* = 25 initial conditions. Thus, in every epoch, the network learns from all training points at once. Each epoch consists of a forward pass to calculate the MAE of a training batch, subsequently followed by backpropagation to obtain derivatives of the loss function and to update the network parameters. A NODE network training algorithm is outlined in a pseudo-code in Algorithm 1.

In the forward pass, at first, the neural network, $Net(\Phi \xaf(t),\theta ,t)$, approximates the chemical and heat release source terms corresponding to the normalized species mass fractions and temperature. Then, the neural network is integrated with an embedded ODE solver to yield the predicted integration values followed by loss function evaluation. The loss function is defined as MAE between the values predicted by the neural network and those obtained by direct integration for all data points in a training batch. After the completion of a forward pass, during the backward propagation, the derivatives of the loss function are backpropagated through the ODE solver and neural network to the input values as described by Eq. (13). Subsequently, the neural network weights are adjusted to minimize the loss function using the adaptive moment estimation (Adam) optimization technique for gradient descent. The number of epochs is set to 500. The learning rate is automatically reduced from 10^{−2} to 10^{−4} by utilizing an annealing method that tracks the error and slows down the learning rate when the error reaches a plateau. At the end of the training, the MAE converged to less than $3%$. Figure 2 demonstrates the training workflow for a single training example consisting of a single initial condition $\Phi \xaf0$ evolving from *t*_{0} to time *t*_{N} along a composition evolution trajectory.

*t*

_{N}>

*t*

_{0}. For example, with the normalized initial values $\Phi \xaf0$ at time

*t*

_{0}, integrating forward to time

*t*

_{N}using an ODE solver gives

#### Algorithm for NODE based chemistry integrator

**Require**$Z={X1,X2,\u2026,XI}$ such that $Xi={(t0,\Phi \xaf0),(t1,\Phi \xaf1),\u2026,$$(tN,\Phi N)\xaf$, $\u2200$$i=1,2,\u2026,I$

$Z\u2208\u211c(Ns+2)\xd7N\xd7I\u21d0$ Samples of true solution at $I$ initial conditions and corresponding $N$ points on evolution trajectory or time instances

**Require**$\zeta \u21d0$ variable learning rate

**Require**$Lthreshold\u21d0$ loss change threshold

**Require** Numerical ODE solver

1: $\theta 0\u21d0$ initialize network parameters

2: **for**$k=1$**to**$Nepochs\u21d0$ loop over number of epochs **do**

3: **for**$j=1$**to**$I\u21d0$ loop over number of initial conditions **do**

4: **for**$i=1$**to**$(N\u22121)\u21d0$ loop over number of time instances **do**

5: $\Phi ^i+1=ODEsolve(Net(\Phi \xaf,\theta ,t),\Phi \xafi,ti,ti+1)\u21d0$ get predicted solution from ODE integration

6: $\epsilon i=|\Phi ^i+1\u2212\Phi \xafi+1|\u21d0$ get MAE loss for current time instance

7: **end for**

8: $L=1N\u22121\u2211i=1N\u22121\epsilon i\u21d0$ get mean loss for trajectory

9: $\u2202L\u21d0$ get loss derivatives from backpropagation

10: $\theta i=$ Adam optimizer $(\theta i\u22121,\zeta ,\u2202L)\u21d0$ update network weight based on optimization algorithm

11: **end for**

12: $L\xaf=1I\u2211j=1IL\u21d0$ get mean loss for all initial conditions

13: **if**$(L\xafj\u2212L\xafj\u22121)<Lthreshold$**then**

14: $\zeta \u21d0$ get new learning rate

15: **end if**

16: **end for**

17: $\theta \u21d0$ Get optimized NODE network

## 3 Results and Discussion

### 3.1 Constant Pressure Homogeneous Hydrogen-Air Reactor.

A proof-of-concept of the NODE chemistry integration framework is demonstrated by simulating hydrogen-air auto-ignition in a constant pressure homogeneous reactor for a range of initial conditions and comparing the NODE network results with those obtained by direct integration of DKM. For each initial condition corresponding to equivalence ratio (*φ*_{0}) and temperature (*T*_{0}), the solution is obtained by integrating from *t*_{0} = 0 to *t*_{N} = 10^{−2} s. Table 1 summarizes the initial *φ*_{0} (rows) and *T*_{0} (columns) for the 75 cases considered for analysis in this section. The identification (ID) number for these cases is assigned as ID = (column − 1) × 15 + row; for example, case ID = 18 corresponds to the third row (*φ*_{0} = 0.64) and the second column (*T*_{0} = 1050 K). To examine the accuracy of the solution at all time-steps, the integration is performed in a step-wise fashion; at each integration time-step the solution is obtained and its accuracy is evaluated before feeding it to the next time-step as the initial condition. Direct integration of DKM is performed with implicit multi-step variable-order (1–5) BDF solvers with an absolute error tolerance of *η*_{a} = 10^{−8} for species mass fraction and 10^{−5} for temperature, along with relative error tolerance of *η*_{r} = 10^{−6}. The larger temperature *η*_{a} is because temperature values are of 3 − 4 orders of magnitude larger than those of species mass fractions. The NODE network is integrated with an explicit ODE solver based on the Runge–Kutta method of order 5(4) (RK45) with an absolute error tolerance of *η*_{a} = 10^{−5}, and relative error tolerance of *η*_{r} = 10^{−6}. The purpose of selecting the RK45 solver is two-fold: first, to demonstrate the ability of the NODE network to operate with non-stiff ODE solvers, and second, to highlight the flexibility of the NODE network to work with different ODE solvers than those utilized for its training. The NODE network represents the source term for normalized species and temperature according to Eq. (15); hence, the *η*_{a} value for the NODE network integration is scaled accordingly to incorporate the effect of normalization. Figure 3 shows comparisons of ignition curves obtained from integration of DKM and NODE network for gas mixtures at *T*_{0} = 1000 K, 1100 K, 1200 K and *φ*_{0} = 0.5, 1.0, 1.5. Each subplot in Fig. 3 depicts the evolution of the temperature and MF of several species: fuel (H_{2}), oxidizer (O_{2}), product (H_{2}O), and OH radical —OH is a highly reactive species and can be considered as a marker of ignition in combustion applications. The NODE network satisfactorily captures the monotonic increase in temperature and H_{2}O as well as the non-monotonic behavior of OH. The complete ignition process including depletion of oxidizer and fuel is well captured by the NODE network. The minute discrepancy in equilibrium OH values for a few cases is attributed to the normalization of species values (Eq. (15))—the difference (Φ_{OH,max} − Φ_{OH,min}) is of order 10^{−2}, which amplifies the modest difference in absolute values of OH mass fraction to larger normalized values. Similar performance is obtained for all 75 simulations with different initial conditions (Table 1), proving the generalizability of the NODE network for a wide range of equivalence ratio and temperature values. The MAE in **Φ** for all cases was observed to be less than $5%$, signifying that the testing error is similar to the training error and that the NODE network is not overfitting.

To further investigate the characteristics of the NODE network in capturing the dynamics of chemical kinetics, Fig. 4 shows the ignition delay times as a function of φ_{0} at different initial *T*_{0} values. The ignition delay times under various conditions are well predicted by the NODE network. For the quantification of errors, the mean relative error ($\epsilon r$) of a quantity *ϕ* is defined as

*ϕ*

_{i,DKM}and

*ϕ*

_{i,NODE}denote the value of

*ϕ*obtained by integrating DKM and NODE network, respectively, for the case ID

*i*and

*N*= 75 is the number of cases studies (listed in Table 1). Thus, mean relative error in ignition delay time and equilibrium temperature are represented by $\epsilon r,\tau ign$ and $\epsilon r,Teq$, respectively. For all simulations in Fig. 4, $\epsilon r,\tau ign$ is observed to be $3.4%$. A similar performance was established while assessing the accuracy of the NODE network for predicting equilibrium temperatures with $\epsilon r,Teq=1.0%$. Overall, the NODE network provides good accuracy in predicting combustion characteristics for a range of operating conditions compared to DKM.

To assess the computational performance of the NODE network, the CPU time and the number of function evaluations (nfeval) required by the ODE solver in each simulation in Table 1 are recorded in Figs. 5(a) and 5(b). The CPU times are normalized with respect to the maximum CPU time for the plot. The latter indicated the number of times the source term is evaluated by the ODE solver. This number depends on the ODE solution method as well as the integration sub-interval size, which is an indication of the numerical stiffness of the system. It is evident that the NODE network (using RK45 solver) consistently outperforms DKM (using BDF solver) in terms of both CPU time and nfeval. The network reduces the function evaluation requirement of the solver by $64%$. A total of $48%$ speed up in CPU time is achieved with the NODE network compared to DKM. It can be inferred from the observations that a single function evaluation in NODE is computationally slightly more expensive than in DKM. However, this expense is outweighed by the NODE network reduced nfeval and ODE solver overhead costs. As a result, the overall computational cost of the NODE network is lower than that of DKM. This experiment highlights the NODE network ability to operate with non-stiff ODE solvers due to its reduced stiffness. The reduced stiffness decreases the overall computational demands by effectively reducing nfeval and ODE solver overhead costs, despite that the NODE network is slightly more costly per function evaluation. The computational performance of the NODE network and DKM are further compared with the same solvers as discussed in Sec. 3.2.

### 3.2 Performance Assessment of NODE Network for Varied Error Tolerances and Solvers.

The performance of adaptive step size ODE solvers, such as BDF and RK45, are influenced by the relative *η*_{r} and absolute *η*_{a} error tolerances. The former controls the error in the solution relative to dependent variable values. The latter provides a threshold below which the error is disregarded. This threshold determines the accuracy when the solution approaches zero. To study the effects of these solver tolerances on the accuracy and efficiency of the NODE network, the total CPU time and the mean relative errors $\epsilon r,\tau ign$ and $\epsilon r,Teq$ are recorded for various tolerances as shown in Fig. 6. The total CPU times are normalized with respect to the maximum CPU time for each subplot. We choose direct integration of DKM with *η*_{a} = 10^{−8} for species mass fractions and *η*_{a} = 10^{−5} for temperature, along with *η*_{r} = 10^{−6} as a base case for mean error calculation using Eq. (18). Integration of NODE network is performed with RK45 for various *η*_{a} and *η*_{r} values as shown in Fig. 6. Each subplot on the top row depicts the performance of the NODE network for fixed *η*_{a} and varying *η*_{r} values. It is evident that increasing *η*_{r} reduces the CPU time. This is attributed to the fact that reducing *η*_{r} increases the number of integration sub-intervals and hence, increases nfeval by the ODE solver. The CPU time reduction with *η*_{r} is more significant at lower *η*_{a} values, where *η*_{r} is more influential in controlling the overall accuracy of the solution. For all the three cases, *η*_{a} = 10^{−6}, 10^{−5}, 10^{−4}, the variation in $\epsilon r,Teq$ is minimal but $\epsilon r,\tau ign$ increases with *η*_{r}. For each *η*_{a}, the $\epsilon r,\tau ign$ slightly increases for *η*_{r} = 10^{−7} to 10^{−5} and abruptly jumps for *η*_{r} = 10^{−4} indicating that *η*_{r} = 10^{−5} is an optimal choice to reduce CPU time without much sacrificing the accuracy. Similarly, each subplot in the bottom row depicts the performance of the NODE network for fixed *η*_{r} and varying *η*_{a}. For all three *η*_{r} values, it is observed that increasing *η*_{a} also reduces the CPU time as it decreases the nfeval needed by the ODE solver to predict near-zero solutions with lower accuracy requirements. The CPU time reduction with *η*_{a} is more pronounced at low *η*_{r} values at which the absolute tolerance is a more determining factor in controlling the integration step size. The speedup attained with varying *η*_{a} is steeper than that when *η*_{r} is changed. This suggests that predicting near-zero components of the solution with higher precision contribute more to the overall computational demand of the calculation. Regarding the accuracy, for all the three cases *η*_{r} = 10^{−6}, 10^{−5}, 10^{−4}, the variation in $\epsilon r,Teq$ is minimal but $\epsilon r,\tau ign$ changes significantly. For each *η*_{r}, the $\epsilon r,\tau ign$ remains primarily steady until *η*_{a} = 10^{−5} and later increases with increase in *η*_{a}. Under these observations, *η*_{a} = 10^{−5} seems to be an optimal choice to reduce CPU time without significantly sacrificing accuracy.

We next study the performance of the NODE network when integrated with different ODE solvers using the optimal tolerances *η*_{a} = 10^{−5} and *η*_{r} = 10^{−5}. To assess the computational performance, the CPU time and the nfeval required by each solver for each simulation in Table 1 are shown in Figs. 7(a) and 7(b) and compared with the base case. The CPU times are normalized with respect to the maximum CPU time of the plot. As expected, for specified error tolerances, explicit solver RK45 requires the least nfeval, followed by LSODA which switches between implicit BDF and explicit Adams methods during the run-time, thus reducing the number of function and Jacobian evaluations compared to fully implicit BDF solver. For all cases, it is evident that the NODE network consistently performs better than the direct integration of DKM for the same case in terms of nfeval with any given ODE solver. The consistent reduction in the number of function evaluations despite the ODE solver choice indicates that the NODE network effectively reduces the stiffness of the ODEs. The trend in CPU times is however slightly different, especially for the stiff solver, due to the competition between the cost of function evaluations and the number of evaluations required. As discussed in the previous section, the NODE network function evaluations are more costly than those of DKM. When using the BDF solver with the NODE network, the cost of function evaluations outweighs the reduced nfeval and as a result, the CPU time increases compared to the base case. On the other hand, while using LSODA and RK45 solvers with the NODE network, the nfeval is significantly low and CPU time always remains lower than the base case.

To summarize, the total CPU time comparison for all 75 simulations listed in Table 1 for different ODE solvers is displayed in Fig. 8. Summary of cases with respective tolerances and $\epsilon r,\tau ign$ is presented in Table 2. As shown, Case 2 and Case 3 provide $47%$ and $51%$ reduction in CPU time compared to the base case. The performance is further improved by reducing *η*_{a} and *η*_{r} to 10^{−4} in Case 4, which shows a $64%$ decrease in total CPU time with less than $1%$ reduction in the accuracy. The solver performance is also recorded for the reduced-order solver of the explicit Runge–Kutta method of order 3(2) (RK23). The RK23 solver with similar error tolerances accelerates the solution process by $70%$ relative to the base case without further reduction in accuracy. These studies demonstrate the NODE network efficacy and robustness in operating with different ODE solvers and error tolerances for a complex problem of integrating stiff chemical kinetics.

Furthermore, additional gain in speed-up can be achieved by decreasing the complexity of the neural network. Neural network architecture can be optimized for the reduction in CPU time without compromising accuracy by coupling NODE with optimization algorithms, such as Bayesian, which remains a topic of further study. The computational cost for direct integration of DKM is a cubic function of a number of species [5] rendering a nonlinear increase in chemical kinetics integration cost for real fuels. On the other hand, the NODE network is a data-driven technique implying a linear increase in computational cost with respect to the number of species. Therefore, the NODE network, which is easily extendable to other commonly used complex fuels and descriptions of chemical kinetics mechanisms (detailed, skeletal, or reduced), is expected to result in a more significant reduction in the computational cost.

In light of the findings presented in this investigation, it is observed that NODE has the capability to effectively reduce the computational cost associated with chemical kinetics in combustion simulations. An important consideration in this regard involves addressing the coupling of the chemical kinetics NODE network with turbulent mixing to enable the prediction of more realistic cases. To this end, we have carried out an evaluation of the performance of the chemical kinetics NODE with the inclusion of mixing via a zero-dimensional pairwise mixing stirred reactor (PMSR), as detailed in Ref. [38]. Assessments based on PMSR have demonstrated that in presence of mixing, NODE can achieve even higher computational time speedup with comparable accuracy compared to direct integration of detailed kinetics, thus highlighting its potential in accelerating computations in turbulent combustion applications.

## 4 Conclusion

In the present study, a chemical kinetics integration methodology based on neural ordinary differential equations (NODE) is introduced. A systematic approach is presented for training data generation, pre-processing, and training of the NODE network. A proof-of-concept is demonstrated with the constant pressure homogeneous hydrogen-air reactor. The hydrogen-air chemistry is described by a finite-rate mechanism involving 9 chemical species and 21 reaction steps. Results show that the NODE simulations accurately capture the evolution of the reacting system as well as the ignition delay times for a span of equivalence ratio ranging from 0.5 to 1.5 and initial temperature from 1000 K to 1200 K. The sensitivity of the NODE with respect to the ordinary differential equation (ODE) solver error tolerances is studied and the optimal tolerance values yielding the minimum CPU time with the desired accuracy are determined. With the choice of optimal error tolerances and ODE solver, the robust NODE network in this study offers up to $70%$ reduction in CPU time compared to direct integration of the detailed kinetics mechanism. Chemical kinetics typically presents highly stiff ODEs requiring stiff ODE solvers to obtain the solution. An important benefit offered by the NODE network is its ability to effectively reduce the numerical stiffness and thus, enabling the use of less sophisticated ODE solvers, including the explicit ones, to solve the ODE system. This results in a lower number of function evaluations and the ODE solver overhead costs, leading to lower CPU times required to integrate the system. Therefore, the NODE network can provide a more affordable and accurate way to handle chemical kinetics in turbulent flame simulations which is a subject of future studies.

## Acknowledgment

This study is supported in part by the Office of the Vice President for Research at the University of Connecticut through the Research Excellence Program.

## Conflicts of Interest

There are no conflicts of interest. This article does not include research in which human participants were involved. Informed consent not applicable. This article does not include any research in which animal participants were involved.

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

## References

_{4}/H

_{2}/N

_{2}Flames

_{2}-Air Combustion