This research proposes an iterative dynamic programing (IDP) algorithm that generates an optimal supervisory control policy for hybrid electric vehicles (HEVs) considering transient powertrain dynamics. The proposed algorithm tries to solve the “curse of dimensionality” and the “curse of modeling” of conventional dynamic programing (DP). The proposed IDP algorithm iteratively updates the DP formulation using a machine learning-based powertrain model. The machine learning model is recursively trained using the outputs from the driving cycle simulation with a high-fidelity model. Once the reduced model converges to the high-fidelity model accuracy, the resulting control policy yields a 9.1% fuel economy (FE) improvement compared to the baseline nonpredictive rule-based control for the urban dynamometer driving schedule (UDDS) driving cycle. A conventional DP control strategy based on a quasi-static powertrain model and a perfect preview of future power demand yields 14.2% FE improvement. However, the FE improvement reduces to 5.7% when the policy is validated with the high-fidelity model. It is concluded that capturing the transient powertrain dynamics is critical to generating a realistic fuel economy prediction and relevant powertrain control policy. The proposed IDP strategy employs targeted state-space exploration to leverage the improving state trajectory from previous iterations. Compared to conventional fixed state-space sampling methods, this method improves the accuracy of the DP policy against discretization error. It also significantly reduces the computational load of the relatively high number of states of the transient powertrain model.