## Abstract

External and internal convertible (EIC) form-based motion control is one of the effective designs of simultaneous trajectory tracking and balance for underactuated balance robots. Under certain conditions, the EIC-based control design is shown to lead to uncontrolled robot motion. To overcome this issue, we present a Gaussian process (GP)-based data-driven learning control for underactuated balance robots with the EIC modeling structure. Two GP-based learning controllers are presented by using the EIC property. The partial EIC (PEIC)-based control design partitions the robotic dynamics into a fully actuated subsystem and a reduced-order underactuated subsystem. The null-space EIC (NEIC)-based control compensates for the uncontrolled motion in a subspace, while the other closed-loop dynamics are not affected. Under the PEIC- and NEIC-based, the tracking and balance tasks are guaranteed, and convergence rate and bounded errors are achieved without causing any uncontrolled motion by the original EIC-based control. We validate the results and demonstrate the GP-based learning control design using two inverted pendulum platforms.

## 1 Introduction

An underactuated balance robot possesses fewer control inputs than the number of degrees-of-freedom (DOFs) [1,2]. Motion control of underactuated balance robots requires both the trajectory tracking of the actuated subsystem and balance control of the unactuated, unstable subsystem [3–5]. Inverting the nonminimum phase unactuated nonlinear dynamics brings additional challenges in causal feedback control design. Several modeling and control methods have been proposed for these robots and their applications [4–10]. Orbital stabilization method was used for balancing underactuated robots [1,11–13], with applications to bipedal robot [14] and cart-inverted pendulum [1]. Energy shaping-based control was also designed for underactuated balance robots [15,16]. One feature of those methods is that the achieved balance-enforced trajectory is not unique and cannot be prescribed explicitly [1,11–13]. In Refs. [5] and [17], a simultaneous trajectory tracking and balance control of underactuated balance robots was proposed by using the property of the external and internal convertible (EIC) form of the robot dynamics. The EIC-based control has been demonstrated as one of the effective approaches to achieve fast convergence with guaranteed performance.

The above-mentioned control designs require an accurate model of robot dynamics, and the control performance would deteriorate under model uncertainties or external disturbances. Machine learning-based methods provide an efficient tool for robot modeling and control [18,19]. In particular, Gaussian process (GP) regression is an effective learning approach that generates nearly analytical structure and bounded prediction errors [7,19–21]. Development of GP-based performance-guaranteed control for underactuated balance robots has been reported in Refs. [4], [20], and [22]. In Ref. [4], the control design was conducted in two steps. A GP-based inverse dynamics controller for unactuated subsystem to achieve balance and a model predictive control (MPC) was used to simultaneously track the given reference trajectory and estimate the balance equilibrium manifold (BEM). The GP prediction uncertainties were incorporated into the control design to enhance the control robustness. The work in Ref. [5] followed the sequential control design in the EIC-based framework, and the controller was adaptive to the prediction uncertainties. The training data were selected to reduce the computational complexity.

This work takes advantage of the structured GP modeling approach in Refs. [5] and [7] and presents an integration of EIC-based control with GP models. We first present the conditions under which uncontrolled motions exist under the original EIC-based control design for underactuated balance robots. We identify these conditions and design the stable GP-based learning control with the properly selected nominal robot dynamic model. Two different controllers, called partial- and null-space-EIC (i.e., PEIC- and NEIC), are presented to improve the closed-loop performance. The PEIC-based control constructs a virtual inertia matrix to reshape the dynamics coupling between the actuated and unactuated subsystems. The EIC-induced uncontrolled motion is eliminated, and the robotic system behaves as a combined fully actuated subsystem and a reduced-order unactuated subsystem. Alternatively, the compensation effect in the NEIC-based control is applied to the uncontrolled coordinates in the null space, while the other part of the stable system motion stays unchanged. The PEIC- and NEIC-based controls achieve guaranteed robust performance with a fast convergence of the closed-loop tracking errors.

The control tasks considered in this work include both the trajectory tracking for the actuated subsystem and platform balance for the unstable subsystem. The interconnection between these two subsystems lies in implicit dynamic relationship that needs to be estimated in real time. The control problem considered here distinguishes from the work in literature. Most existing approaches, such as orbital stabilization and energy shaping, focus on stabilization only, that is, the trajectory of the actuated subsystem is not prescribed, and the main control task is to stabilize the unstable subsystem. The main contribution of this work lies in the new GP-based learning control of underactuated balance robots using the EIC structural properties. Compared with the approaches in Refs. [5] and [17], this work reveals underlying design properties and limitations of the original EIC-based control for underactuated balance robots. Compared with the work in Refs. [4] and [23], the proposed method takes advantage of the attractive EIC modeling properties for control design and does not use MPC that requires high computational demands. Compared with other learning control methods such as reinforcement learning, the proposed control integrates the robot's dynamics property (i.e., EIC structure) and the GP-based model learning. By integrating physics knowledge into model learning, we identify the conditions for nominal model selection, and the proposed control is designed with guaranteed performance. This paper is an extension of the previous conference submission [24] with new design, analysis, and experiments. Particularly, the NEIC-based control design and experiments were not presented in Ref. [24].

The rest of the paper is outlined as follows. We introduce the EIC-based control and present the problem statement in Sec. 2. Section 3 presents the GP-based robot dynamics. The PEIC- and NEIC-based controls are presented in Sec. 4. The stability analysis is discussed in Sec. 5. The experimental results are presented in Sec. 6, and finally Sec. 7 summarizes the concluding remarks.

## 2 External and Internal Convertible-Based Robot Control and Problem Statement

### 2.1 Robot Dynamics and External and Internal Convertible-Based Control.

**denotes the input matrix, and $u\u2208\mathbb{R}n$ is the control input. The coordinates are partitioned as $q=[qaT\u2009quT]T$, with actuated coordinate $qa\u2208\mathbb{R}n$ and unactuated coordinate $qu\u2208\mathbb{R}m$. We focus on the case $n\u2265m$, and without loss of generality, we assume that $B=[In\u20090]T$, where $In\u2208\mathbb{R}n$ is the identity matrix with dimension**

*B**n*. The robot dynamic model in Eq. (1) is rewritten as

for actuated ($Sa$) and unactuated ($Su$) subsystems, respectively. Subscripts “*aa* (*uu*)” and “*ua* (*au*)” indicate the variables related to the actuated (unactuated) coordinates and coupling effects, respectively. For presentation convenience, we introduce $H=Cq\u02d9+G,\u2009Ha=Caq\u02d9+Ga$, and $Hu=Cuq\u02d9+Gu$, and the dependence of ** D**,

**, and**

*C***on**

*G***and $q\u02d9$ is dropped. Subsystems $Sa$ and $Su$ are also referred to as the external and internal subsystems, respectively [4,17].**

*q*The control goal is to steer actuated coordinate $qa$ to follow a given desired trajectory $qad$ for $Sa$, while the unactuated, unstable subsystem $Su$ is balanced at unknown equilibrium $que$. Therefore, we need to estimate $que$ in real time to achieve simultaneously trajectory tracking (for $Sa$) and platform balance (for $Su$). It is noted that not all arbitrary trajectories can be followed given the underactuated dynamics and balance requirement. Such a property has been explicitly discussed for the autonomous bikebot example in Ref. [25]. In this work, we assume that the given trajectory $qda$ is well planned and the control exists. In this work, we assume that the given trajectory $qad$ is well planned and the control exists. Designing and planning feasible trajectory $qad$ is out of the scope of this work. $qad$

where $\Gamma (qu;vext)=Duuq\xa8u+Duavext+Hu$. $que$ is obtained by inverting $\Gamma 0=\Gamma (qu;vext)|q\u02d9u=q\xa8u=0=0$. Obtaining $que$ requires accurate system dynamics and needs to invert the nonminimum phase dynamics $Su$, which is challenging for noncausal control design.

where $vint$ is used as the virtual control input in $Su$, that is, under $q\xa8a=vint,\u2009q\xa8u=vuint$.

Figure 1(a) illustrates the above sequential EIC-based control design. It has been shown in Ref. [17] that the control $uint$ guarantees both $ea$ and $eu$ convergence to a neighborhood of the origin exponentially if the high-order approximation terms of the closed-loop systems are affine with error ** e**. Therefore, the EIC-based control achieves trajectory tracking for $Sa$ and balancing task for $Su$ simultaneously.

### 2.2 Motion Property Under External and Internal Convertible-Based Control.

Control design (5) uses a mapping from low-dimensional (*m*) to high-dimensional (*n*) spaces (i.e., $n\u2265m$). Under control (6) with properly selected control gains, it has been shown in Ref. [17] that there exists a finite time *T *>* *0, and for small number $\epsilon >0,\u2009||qu(t)\u2212que(t)||<\epsilon $ for *t *>* T*. Therefore, given the negligible error, we obtain $Dua(qa,qu)\u2248Dua(qa,que)$.

**, applying singular value decomposition (SVD) to $Dua$ and $Dua+$, we obtain**

*q*where $U=[u1,\u2026,\u2009um]\u2208\mathbb{R}m\xd7m$ and $V\u2208\mathbb{R}n\xd7n$ are unitary orthogonal matrices. $\Lambda =[\Lambda m\u20090]\u2208\mathbb{R}m\xd7n,\u2009\Lambda +=[\Lambda m\u22121\u20090]T\u2208\mathbb{R}n\xd7m$ and $\Lambda m=diag(\sigma 1,\u2026,\sigma m)$ with singular values $\sigma i>0,\u2009i=1,\u2026,m$. We partition ** V** into the block matrix $V=[Vm\u2009Vn],\u2009Vm\u2208\mathbb{R}n\xd7m$ and $Vn\u2208\mathbb{R}n\xd7(n\u2212m)$. Since $rank(Dau)=m$, the null space of $Dua$ is $ker(Dua)=span(Vn)$.

**serve as a complete set of basis in $\mathbb{R}n$, and we introduce a coordinate transformation $\Upsilon :x\u21a6VTx$ for $x\u2208\mathbb{R}n$. Clearly, $\Upsilon $ is a linear, time-varying, smooth map. Applying $\Upsilon $ to $qa$ and $vext$, we have**

*V*where $pa=[pamT\u2009panT]T,\u2009\nu ext=[(\nu mext)T\u2009(\nu next)T]T$, and $pam,\nu mext\u2208\mathbb{R}m,\u2009pan,\nu next\u2208\mathbb{R}n\u2212m$. Note that $[paT\u2009quT]T$ still serves as a complete set of generalized coordinates for $S$. Using the new coordinate $pa$, we have the following motion property under the original EIC-based control for $S$, and the proof is given in Appendix A1.

*For*$S$

*in Eq. (2), if*$rank(Dau)=m$

*holds for*

*q**and all n control inputs appear in*$Su$

*dynamics (through*$q\xa8a$

*), under the EIC-based control (6), the BEM in Eq. (4) is associated with only*$\nu mext$

*, and robot dynamics can be written into*

No control input appears for coordinates in $ker(Dua)$ as shown in Eq. (9*b*), and only *m* actuated coordinates in $span(V)$ are under active control, as shown in Eq. (9*a*). The results in Lemma 1 reveal the motion property of $S$ under the original EIC-based control design. The uncontrolled motion happens to a special set of underactuated balance robots under the conditions in Lemma 1. If the unactuated motion is only related to *m* (out of *n*) control inputs, the motion (9*b*) vanishes, and the EIC-based control works well. In Ref. [5], the EIC-based control worked properly for the rotary inverted pendulum with $n=m=1$. In Refs. [4] and [25], the EIC-based control also worked well for the bikebot with *n *=* *2 (planar motion) and *m *=* *1 (roll motion) but the roll motion depends on steering control only, that is, no velocity control, and therefore, does not satisfy the condition for Lemma 1. We will show an example of the three-link inverted pendulum platform that demonstrates the uncontrolled motion under the original EIC-based control in Sec. 6.

With the above-discussed motion property under the EIC-based control, we consider the following problem.

*Problem Statement*: The goal of robot control is to design an enhanced EIC-based learning control to drive the actuated coordinate $qa$ to follow a given profile $qad$ and simultaneously the unactuated coordinate $qu$ to be stabilized on the estimated $que$. The uncontrolled motion presented in Lemma 1 should be avoided for robot dynamics (2).

## 3 Gaussian Process-Based Robot Dynamics Model

We build a GP-based robot dynamics model that will be used for control design in Sec. 4.

### 3.1 Gaussian Process-Based Robot Dynamics Model.

*n*is the dimension of

_{x}**. Denote the training data as $D={X,Y}={xi,yi}i=1N$, where $X={xi}i=1N,\u2009Y={yi}i=1N$, and $N\u2208\mathbb{N}$ is the number of the data point. The GP model is trained by maximizing posterior probability $p(Y;X,\Theta )$ over the hyperparameters $\Theta $, that is, $\Theta $ is obtained by solving**

*x*where $K=(Kij),\u2009Kij=k(xi,xj)=\sigma f2\u2009exp(\u2212(1/2)(xi\u2212xj)TW(xi\u2212xj))+\u03d12\delta ij,\u2009W=diag{W1,\u2026,Wnx}>0,\u2009\delta ij=1$ for *i *=* j*, and $\Theta ={W,\sigma f,\u03d1}$ are hyperparameters.

We build GP models to estimate $He=[(Hae)T\u2009(Hue)T]T$, where $Hae$ and $Hue$ are for $Sa$ and $Su$, respectively. The training data $D={X,Y}$ are sampled from $S$ as $X={q,\u2009q\u02d9,\u2009q\xa8}$ and $Y={He}$.

To quantify the GP prediction error, the following property for $\Delta $ is obtained directly from Theorem 6 in Ref. [26].

*Given training dataset*$D$

*, if the kernel function*$k(xi,xj)$

*is chosen such that*$Hae$

*for*$Sa$

*has a finite reproducing kernel Hilbert space norm*$||Hae||k<\u221e$

*, for given*$0<\eta a<1$

*where*$Pr{\xb7}$*denotes the probability of an event,*$\kappa a\u2208\mathbb{R}n$*, and its ith entry is*$\kappa ai=2||Ha,ie||k2+300\u03c2iln3((N+1)/(1\u2212\eta a1/n)),\u2009\u03c2i=maxx,x\u2032\u2208X(1/2)ln|1+\u03d1i\u22122ki(x,x\u2032)|$*. A similar conclusion holds for*$\Delta u$*with*$0<\eta u<1$.

### 3.2 Nominal Model Selection.

The nominal model plays an important role in the EIC control. We consider the following conditions for choosing the nominal model $Sn$ to overcome the uncontrolled motion under the learning control.

$C1$: $D\xaf=D\xafT$ is positive definite, $||D\xaf||\u2264d,\u2009||H\xaf||\u2264h$, where constants $0<d,h<\u221e$;

$C2$: $rank(D\xafaa)=n,\u2009rank(D\xafuu)=rank(D\xafua)=m$; and

$C3$: nonconstant kernel of $D\xafua$.

With $C1$ and $C2$, the generalized inversions of $D\xafaa,\u2009D\xafuu$, and $D\xafau$ exist, which are used to compute the auxiliary controls. We can select $D\xaf=D\xafT$ to ensure $D\xafau=D\xafuaT$. To see the requirement of $C3$, we rewrite $qa=\u2211i=1npaivi$. By Eq. (9), under the updated control $vint,\u2009q\xa8a=\u2211i=1mp\xa8aivi+\u2211i=m+1np\xa8aivi$, where $vi$ is the *i*th column of ** V**. Note that the part $\u2211i=m+1np\xa8aivi$ of $Sa$ dynamics is free of control if

**is constant. Although $qu$ is stabilized on $que,\u2009qa$ converges to $qad$ only in an**

*V**m*-dimensional subspace and the other $(n\u2212m)$ dimensional motion uncontrolled. If the system is stable, the uncontrolled motion cannot be fixed in the configuration space throughout the entire control process. Therefore, a nonconstant kernel $D\xafua$ is needed.

Conditions $C1$–$C3$ provide sufficient nominal model selection criteria. The commonly used nominal model in Refs. [5] and [7] is $D\xafq\xa8=Bu$ with $H\xaf=0$. The constant nominal model is used in Ref. [7] as the system is fully actuated. It is not difficult to satisfy the nominal model conditions in practice. First, the nonlinear term is canceled by feedback linearization, and $H\xaf=0$ can be used. Matrix $D\xaf$ captures the robots' inertia property. The mass and length of robot links are usually available or can be measured. Meanwhile, the dynamics coupling for revolute joints shows up in the inertia matrix as trigonometric functions of the relative joint angles. Therefore, the diagonal elements can be filled with mass or inertia estimates, and the off-diagonal entries can be constructed with trigonometric functions multiplying inertia constants.

## 4 Gaussian Process-Enhanced External and Internal Convertible-Based Control

In this section, we propose two enhanced controllers using the GP model $Sgp$, i.e., PEIC- and NEIC-based control. The PEIC-based control aims to eliminate uncontrolled motion under the original EIC-based control by reassigning the dynamics coupling, while the NEIC-based control directly manages the uncontrolled motion in a transformed space; see Figs. 1(b) and 1(c).

### 4.1 Robust Auxiliary Control.

where $k\u0302p1=kp1+kn1\Sigma a$ and $k\u0302d1=kd1+kn2\Sigma a$ are control gains with parameters $kn1,kn2\u22650$. The variance of GP prediction $\Sigma a$ captures the uncertainty in robot dynamics and is updated online with sensor measurements.

where $e\u0302u=qu\u2212q\u0302ue$ is the unactuated subsystem tracking error relative to the estimated BEM. Similar to $k\u0302p2,k\u0302d2,\u2009k\u0302p2=kp2+kn3\Sigma u$ and $k\u0302d2=kd2+kn4\Sigma u$ depend on $\Sigma u$ with the parameters by $kn3,kn4\u22650$.

*σ*and

_{f}*ϑ*are the hyperparameters in each channel. Furthermore, we require the control gains to satisfy the following bounds:

for constants $kpj,kdj>0,\u2009j=1,\u2026,4$, where $\lambda (\xb7)$ denotes the eigenvalue operator.

*m*control inputs. To see this, solving $q\xa8a$ from $Sagp$ and plugging it into $Sugp$ yields

Note that $D\xafua\u2208\mathbb{R}m\xd7n,\u2009D\xafaa\u22121\u2208\mathbb{R}n\xd7n$, and $qu$ is overactuated given $n=dim(u)\u2265m=dim(qu)$. If $qu$ depends on the same number of control inputs, $(n\u2212m)$ column vectors in $D\xafuaD\xafaa\u22121$ should be zero. Thus, the EIC-based control is applied between the same number of actuated and unactuated coordinates. The uncontrolled motion is avoided.

### 4.2 Partial External and Internal Convertible-Based Control Design.

*m*control inputs for the unactuated subsystem; see Fig. 1(b). To achieve such a goal, we partition the actuated coordinates as $qa=[qaaT\u2009qauT]T,\u2009qau\u2208\mathbb{R}m,\u2009qaa\u2208\mathbb{R}n\u2212m$, and $u=[uaT\u2009uuT]T$. The $Sgp$ dynamics in Eq. (13) is rewritten as

where $Hana=D\xafaaauq\xa8au+D\xafauaq\xa8u+Haagp,\u2009Hanu=D\xafaauaq\xa8aa+D\xafauuq\xa8u+Haugp$, and $Hun=D\xafuaaq\xa8aa+Hugp$. Apparently, $Sugp$ is virtually independent of $Saagp$, and the dynamics coupling exists only between $Sugp$ and $Saugp$.

*m*inputs. The task of driving $qu$ to $que$ is assigned to $qau$ coordinates only. With this observation, the PEIC-based control takes the form of $u\u0302int=[u\u0302aT\u2009u\u0302uT]T$ with

where $v\u0302int=\u2212(D\xafuau)\u22121(Hun+D\xafuuv\u0302uint)$. Clearly, the unactuated subsystem only depends on $u\u0302u$ (or $qau$) under the PEIC design as illustrated in Fig. 1(b). The following lemma presents the qualitative assessment of the PEIC-based control, and the proof is given in Appendix A2.

Lemma 3. *If conditions*$C1$*to*$C3$*are satisfied and*$Sgp$*is stable under the EIC-based control design,*$Sgp$*is stable under the PEIC-based control*$u\u0302int$.

### 4.3 Null-Space External and Internal Convertible-Based Control Design.

where $v\u02dcaint=v\u02dcint+v\u02dcan,\u2009v\u02dcan=Vn\nu n,\u2009v\u02dcint=\u2212D\xafua+(Hugp+D\xafuuv\u0302uint),\u2009\nu n$ is the control design that drives *p _{ai}* to $paid,\u2009i=m+1,\u2026,n$, and $pad=\Upsilon (qad)$ is transformed reference trajectory. The design of $\nu n$ drives $ea$ to the origin in $ker(D\xafua)$. A straightforward yet effective design of $\nu n$ can be $\nu n=\alpha \nu \u0302next$, where $\alpha >0$. Compared to the PEIC-based control, $pan$ plays the similar role of $qaa$ coordinates. In the new coordinate, the $qu$ is associated with $pam$ only.

The following result gives the property of the NEIC-based control, and the proof is given in Appendix A3.

Lemma 4. *For*$S$*, if*$Sgp$*satisfies conditions*$C1$*to*$C3$*and*$Sgp$*is stable under the original EIC-based control,*$Sgp$*under the NEIC-based control*$v\u02dcaint$*is also stable. Meanwhile,*$Sugp$*is unchanged compared to that under the EIC-based control.*

The proofs of Lemmas 3 and 4 show that the inputs $u\u0302aint$ and $u\u02dcaint$ follow the control design guidelines. Both the PEIC- and NEIC-based controllers preserve the structured form of the EIC design. Figures 1(b) and 1(c) illustrate the overall flowchart of the PEIC- and NEIC-based control design, respectively. To take advantage of the EIC-based structure, we follow the design guideline to make sure that motion of unactuated coordinates only depends on *m* inputs in configuration space (PEIC-based control) or transformed space (NEIC-based control). The input $\nu next$ is re-used for uncontrolled motion under the NEIC-based control. The PEIC-based control assigns the balance task to a partial group of the actuated coordinates.

## 5 Control Stability Analysis

### 5.1 Closed-Loop Dynamics.

Obtaining BEM with Eq. (17) under $(q\xa8aa,v\u0302uext)$ is equivalent to inverting Eq. (21*c*). Thus, $v\u0302uext=\u2212(D\xafuau)\u22121Hun|qu=q\u0302ue,q\u02d9u=q\xa8u=0$. Substituting the above equation into the $qau$ dynamics yields $q\xa8au=v\u0302uext+Oau$, where $Oau=\u2212(D\xafuau)\u22121D\xafuuv\u0302uint\u2212(D\xafaau)\u22121\Delta au+o1$ and $o1$ denotes the higher order terms.

with $Otot=[OaT\u2009OuT]T,\u2009Oa=[OaaT\u2009OauT]T,\u2009Oaa=\u2212(D\xafaaa)\u22121\Delta aa,\u2009Ou=\u2212D\xafuu\u22121(\Delta u\u2212D\xafuau(D\xafaau)\u22121\Delta au)\u2212\Delta vuint,\u2009k\u0302p=diag(k\u0302p1,k\u0302p2)$, and $k\u0302d=diag(k\u0302d1,k\u0302d2)$.

**, i.e.,**

*e*where $d1=c2+(1+(du2/\sigma 1))c4,\u2009d2=c1+(du2/\sigma 1)c3,\u2009la1=((\sigma amax(du1+\sigma m))/du1da1),and\u2009\u2009lu1=\sigma umax/du1$.

*a*) into

where $o2$ is the residual that contains higher order terms. $Oam=o2\u2212\Lambda m\u22121UTD\xafuuv\u0302uint\u2212\Lambda m\u22121UT\Delta u\u2212VmTD\xafaa\u22121\Delta a$ denotes the total perturbations.

where $lu2=\sigma u,max((\sigma 1+du1)/\sigma 1du1)$, and $la2=\sigma a,max((\sigma m+du1)/da1du1)$.

### 5.2 Stability Results.

for given positive definite matrix $Q=QT$, where $A0$ is the constant part of ** A** in Eq. (24) and does not depend on variances $\Sigma a$ or $\Sigma u$. $kp=diag(kp1,kp2)$ and $kd=diag(kd1,kd2)$.

We denote the corresponding Lyapunov function candidates for the NEIC- and PEIC-based controls as *V*_{1} and *V*_{2}, respectively. The stability results are summarized as follows with the proof given in Appendix A4.

and the error ** e** converges to a small ball around the origin, where

*γ*is the convergence rate,

_{i}*ρ*and $\varpi i$ are the perturbation terms, and $0<\eta =\eta a\eta u<1$.

_{i}## 6 Experimental Results

Two inverted pendulum platforms are used to conduct experiments to validate the control design. The results from each platform demonstrate different aspects of the control design.^{2}

### 6.1 Two Degree-of-Freedom Rotary Inverted Pendulum

Figure 2(a) shows a 2DOF rotary inverted pendulum that was fabricated by Quanser Inc., Markham, ON, Canada. The base joint (*θ*_{1}) is actuated by a DC motor, and the inverted pendulum joint (*θ*_{2}) is unactuated, i.e., $n=m=1$. We use this platform to illustrate the original EIC-based control and also compare the performance under different nominal models and controllers. The robot dynamic model is given in Ref. [27] and is also found in Appendix B1.

where $ci=cos\u2009\theta i,\u2009si=sin\u2009\theta i$ for angle *θ _{i}*,

*i*=

*1, 2. The training data were sampled and obtained by applying control input $u=kT[\theta 1\u2212\theta 1t\u2009\u2009\theta 2\u2009\u2009\theta \u02d91\u2212\theta \u02d91t\u2009\u2009\theta \u02d92]T$, where $k\u2208\mathbb{R}4\xd71$ and $\theta 1t$ was the combination of sinusoidal waves with different amplitudes and frequencies. We chose this input to excite the system, and the gain*

**was selected without the need to balance the platform. It is difficult to guarantee that the system is fully excited. However, we changed the frequency of sinusoidal waves and obtained the motion data around the target trajectory.**

*k*We trained the GP regression models using a total of 500 data points randomly selected from a large dataset. We designed the control gains as $k\u0302p1=10+50\Sigma a,\u2009k\u0302d1=3+10\Sigma a,\u2009k\u0302p2=1000+500\Sigma u$, and $k\u0302d2=100+200\Sigma u$. The variances Σ* _{a}* and Σ

*were updated online with new measurements in real time. The reference trajectory was $\theta 1d=0.5\u2009sin\u2009t+0.3\u2009sin\u20091.5t$ rad. The control was implemented at 400 Hz in matlab/simulink real-time system. Both the velocity and acceleration are needed for control design and GP training and prediction. To reduce the influence of measurement noise on control design, BEM estimation, and GP agent training, a sliding window was used to filter the velocity measurement online. The acceleration was obtained through real-time differentiation. The same technique was also used for the three-link inverted pendulum in Sec. 6.2.*

_{u}Figures 3(a) and 3(b) show the tracking of *θ*_{1} and balance of *θ*_{2} under the EIC-based control. With either $Sn1$ or $Sn2$, the base link joint *θ*_{1} closely followed the reference trajectory $\theta 1d$, and the pendulum link joint *θ*_{2} was stabilized around its equilibrium $\theta 2e$ as well. The tracking error was reduced further, and the pendulum closely followed the small variation under $Sn1$. With $Sn2$, the tracking errors became large when the base link changed rotation direction; see Fig. 3(c) at *t *=* *10, 17, and 22 s. Both the time-varying and constant nominal models worked for the EIC-based learning control.

Table 1 further lists the tracking errors (mean and one standard deviation) under both GP models. For comparison purposes, we also conducted additional experiments to implement the original EIC-based control and the GP-based MPC design in Ref. [4]. The tracking and balance errors under the EIC-based learning control with model $Sn1$ are the smallest. In particular, with the time-varying model $Sn1$, the mean values of tracking errors $e1$ and *e*_{2} were reduced by 75% and 65%, respectively, in comparison with those under the original EIC-based control. Compared with the MPC method in Ref. [4], the tracking errors with nominal model $Sn2$ are at the same level.

Figure 3(d) shows the control performance with nominal model $Sn1$ under disturbance. At *t *=* *17 s, an impact disturbance (by manually pushing the pendulum link) was applied, and the joint angles changed rapidly with $\Delta \theta 1=0.7$ rad and $\Delta \theta 2=0.3$ rad. The control gains increased ($k\u0302p2=1215,\u2009k\u0302d2=143$) to respond to the disturbance. As a result, the pendulum motion tracked the BEM closely and maintained the pendulum balance after the impact disturbance. Figure 3(e) shows the calculated Lyapunov function candidate *V*(*t*) and its envelope (i.e., $V(t)=V(0)e\u2212\gamma t,\u2009\gamma =0.1898$) during the experiment. Figure 3(f) shows the error trajectory in the $||eq||$–$||e\u02d9q||$ plane. The solid/dashed line shows the error trajectory before/after impact disturbance. The tracking error converged quickly into the error bound. After the disturbance was applied at *t *=* *17 s, both the Lyapunov function and errors grew dramatically. As the control gains increased, the errors quickly converged back to the estimated bound again.

### 6.2 Three Degree-of-Freedom Rotary Inverted Pendulum.

*θ*

_{1}and

*θ*

_{2}) and one unactuated joint (

*θ*

_{3}), namely, $n=2,m=1$. The physical model of the robot dynamics was obtained using the Lagrangian method and is given in Appendix B2. All controllers were implemented at an updating frequency of 200 Hz through the Robot Operating System. The time-varying nominal model was selected as

where $ci\xb1j=cos(\theta i\xb1\theta j)$. The control gains were $k\u0302p1=15I2+20\Sigma a,k\u0302d1=3I2+10\Sigma a,k\u0302p2=25+20\Sigma u,and\u2009\u2009k\u0302d2=5.5+10\Sigma u$, where GP variances $\Sigma a$ and Σ* _{u}* were updated online in real-time. The reference trajectory was chosen as $\theta 1d=0.5\u2009sin\u20091.5t$ and $\theta 2d=0.4\u2009sin\u20093t$ rad.

For the PEIC-based control, we chose $qaa=\theta 1$ and $qau=\theta 2$, and the NEIC-based control was $\nu n=\nu \u0302next$. Figure 4 shows the experimental results under the PEIC- and NEIC-based control. Under both controllers, the actuated joints (*θ*_{1} and *θ*_{2}) followed the given reference trajectories ($\theta 1d$ and $\theta 2d$) closely, and the unactuated joint (*θ*_{3}) was balanced around the BEM ($\theta 3e$) as shown in Figs. 4(a) and 4(b). The pendulum link motion displayed a similar pattern for both controllers. However, the tracking error *e*_{1} under the PEIC-based control (i.e., from −0.05 to 0.05 rad) was much smaller than that under the NEIC-based control (i.e., from −0.15 to 0.15 rad); see Figs. 4(c) and 4(d). The balance task in the PEIC-based control was assigned to joint *θ*_{2}, and joint *θ*_{1} is viewed as virtually independent of *θ*_{2} and *θ*_{3}. Joint *θ*_{1} achieved almost-perfect tracking control regardless of the errors for *θ*_{2} and *θ*_{3}. The compensation effect in the null space appeared in the entire configuration space, and any motion error in the unactuated joints affected the motion of all actuated joints. Similar to the previous example, Fig. 4(e) shows the error trajectory profile in the $||eq||$–$||eq\u02d9||$ plane. Figure 4(f) shows the Lyapunov function profiles under the PEIC- and NEIC-based controls.

Figure 5 shows the motion of the actuated coordinate in the transformed coordinate $pa$ under various controllers. Under the PEIC- and NEIC-based controls, the $pa$ variables followed the reference profile $pad$ as shown in Figs. 5(a) and 5(b). Figure 5(c) shows the motion profile under the original EIC-based control. In the first 2 s, joint *θ*_{3} followed the BEM under the EIC-based control, and $pa1$ coordinates displayed a similar motion pattern. However, $pa2$ coordinate showed diverge behavior and led to a failure completely. Therefore, as analyzed previously, the system became unstable under the EIC-based control though conditions $C1$ to $C3$ were satisfied.

In NEIC-based control, $vn$ drives the uncontrolled motion variable to its reference trajectory. To further reduce the tracking error, we can increase *α* values. Figure 6 shows the experiment results of the $pa$ error profiles under various *α* values varying from 0.5 to 1.5. With a large *α* value, the tracking error of the actuated coordinates was reduced. Table 2 further lists the steady-state errors (in joint angles) under the NEIC-based control with various *α* values, the PEIC-based control and the physical model-based control design. Under the NEIC-based control with $\alpha =0.5$, the system was stabilized; when increasing *α* values to 1 and 1.5, the mean tracking errors were reduced 50% and 70% for *θ*_{1}, respectively, and 40% for *θ*_{2}. Since control input $\nu n$ did not affect the balance task of the unactuated subsystem, the tracking errors for *θ*_{3} maintained the same level. It is of interest that the control effort (i.e., last column in Table 2) only shows a slight increase with large *α* values.

$|e1|$ (rad) | $|e2|$ (rad) | $|e3|$ (rad) | $||e||$ | $\u222buTudt$ | |
---|---|---|---|---|---|

PEIC (GP) | 0.0302 ± 0.0178 | 0.0566 ± 0.0685 | 0.1182 ± 0.0160 | 0.1343 ± 0.0166 | 5.7659 |

NEIC (GP, $\alpha =0.5$) | 0.1395 ± 0.0946 | 0.1166 ± 0.0512 | 0.0303 ± 0.0209 | 0.2001 ± 0.0770 | 5.9022 |

NEIC (GP, $\alpha =1.0$) | $0.0651\u2009\xb1\u20090.0416$ | 0.0756 ± 0.0481 | 0.0195 ± 0.0152 | 0.1101 ± 0.0499 | 5.7089 |

NEIC (GP, $\alpha =1.5$) | 0.0376 ± 0.0302 | 0.0792 ± 0.0482 | 0.0207 ± 0.0169 | 0.0972 ± 0.0470 | 5.7305 |

PEIC (model) | 0.2168 ± 0.1165 | 0.2398 ± 0.1649 | 0.0179 ± 0.0140 | 0.3587 ± 0.1307 | 5.7978 |

NEIC (model, $\alpha =1.0$) | 0.1374 ± 0.0922 | 0.1237 ± 0.0597 | 0.0455 ± 0.0385 | 0.2095 ± 0.0769 | 5.8452 |

$|e1|$ (rad) | $|e2|$ (rad) | $|e3|$ (rad) | $||e||$ | $\u222buTudt$ | |
---|---|---|---|---|---|

PEIC (GP) | 0.0302 ± 0.0178 | 0.0566 ± 0.0685 | 0.1182 ± 0.0160 | 0.1343 ± 0.0166 | 5.7659 |

NEIC (GP, $\alpha =0.5$) | 0.1395 ± 0.0946 | 0.1166 ± 0.0512 | 0.0303 ± 0.0209 | 0.2001 ± 0.0770 | 5.9022 |

NEIC (GP, $\alpha =1.0$) | $0.0651\u2009\xb1\u20090.0416$ | 0.0756 ± 0.0481 | 0.0195 ± 0.0152 | 0.1101 ± 0.0499 | 5.7089 |

NEIC (GP, $\alpha =1.5$) | 0.0376 ± 0.0302 | 0.0792 ± 0.0482 | 0.0207 ± 0.0169 | 0.0972 ± 0.0470 | 5.7305 |

PEIC (model) | 0.2168 ± 0.1165 | 0.2398 ± 0.1649 | 0.0179 ± 0.0140 | 0.3587 ± 0.1307 | 5.7978 |

NEIC (model, $\alpha =1.0$) | 0.1374 ± 0.0922 | 0.1237 ± 0.0597 | 0.0455 ± 0.0385 | 0.2095 ± 0.0769 | 5.8452 |

### 6.3 Discussion.

For the rotary pendulum example, we have *n *=* m*, and the null space $ker(Dau)$ vanishes. The compensation effect is no longer needed by the NEIC-based control, i.e., $v\u02dcaint=v\u02dcint$ and $u\u02dcint=D\xafaav\u02dcaint+D\xafauq\xa8u+Hagp=uint$. In this case, the PEIC- and NEIC-based controls are degenerated to the EIC-based control. For the 3DOF inverted pendulum, the control inputs *u*_{1} and *u*_{2} act on *θ*_{3} joints through $\theta \xa81$ and $\theta \xa82$. Therefore, as shown in Lemma 1, the uncontrolled motion exists since all controls show up in $Su$ dynamics. This observation explains why the original EIC-based control failed to balance the three-link inverted pendulum. If the $Su$ dynamics is related to *m* control inputs (through $q\xa8a$) for *n *>* m* such as the bikebot dynamics in Refs. [4] and [25], only *m* external controls were updated, and the EIC-based control worked well without any uncontrolled motion.

For the PEIC-based control, the robot dynamics were partitioned into $Sgp={Saagp,{Saugp,Sugp}}$, which contains a fully actuated system $Saagp$, and a reduced-order underactuated system ${Saugp,Sugp}$. The EIC-based control is applied to $Saugp$ and $Sugp$ only. The dynamics of $qu$ in general does not depend on any specific *m* actuated coordinates, since the mapping $\Upsilon $ is time-varying across different control cycles. In the NEIC-based control design, $pam$ and $qu$ become an underactuated subsystem, and $pan$ is fully actuated.

In practice, no specific rules are defined to select $qau$ out of $qa$ coordinates, and therefore, there are a total of $Cnm=n!/(m!(n\u2212m)!)$ options to select different coordinates. We take advantage of such a property to optimize tracking performance for selected coordinates. In the 3DOF pendulum case, we assigned the balance task of *θ*_{3} to *θ*_{2} motion. The length of link 1 was only 0.09 m and was much shorter than the length of link 2 (0.23 m). The coupling effect between *θ*_{2} and *θ*_{3} was much stronger than that between *θ*_{1} and *θ*_{3}; see *D*_{13} and *D*_{23} in Appendix B2. Thus, it was efficient to use the motion of *θ*_{2} as a virtual control input to balance *θ*_{3}. When implementing the PEIC-based controller with $qau=\theta 1$, the system cannot achieve the desired performance and becomes unstable. We also implemented the proposed controller with the physical model. The control errors are listed in Table 2. Compared with the learning-based controllers, the model-based control resulted in larger errors. Since the mechanical frictions and other unstructured effects were not considered, the physical model might not capture and reflect the accurate robot dynamics. The results confirmed the advantages of the proposed learning-based control approaches.

The unique feature of the proposed control lies in integration of the robot's inherent dynamics property (EIC structure) and the GP-based model learning, compared with other learning-based control approach [18,22]. By integrating physics knowledge into model learning, we identified the conditions for nominal model selection. The overall model learning and control design framework forms a white-box-like, physics knowledge involved control, which differs from the reinforcement learning-based policy search approach [18]. The solution also has the potential to further incorporate the bounded GP prediction error for a robust control [4].

## 7 Conclusion

This paper presented a new learning-based modeling and control framework for underactuated balance robots. The proposed design was an extension and improvement of the EIC-based control with GP-enabled robot dynamics. The proposed new robot controllers preserved the structural design of the original EIC-based control and achieved both tracking and balance tasks. The PEIC-based control reshaped the coupling between the actuated and unactuated coordinates. The robot dynamics was transferred into a fully actuated subsystem and one reduced-order underactuated balance subsystem. The NEIC-based control compensated for uncontrolled motion in a subspace. We validated and demonstrated the new control design on two experimental platforms and confirmed that stability and balance were guaranteed. The comparison with the physical model-based EIC control and the MPC design confirmed superior performance in terms of the error bound. Extension of the GP-based learning control design for highly underactuated balance robots is one of the ongoing research directions.

## Funding Data

U.S. National Science Foundation (NSF) (Award No. CNS-1932370; Funder ID: 10.13039/100000001).

## Data Availability Statement

No data, models, or code were generated or used for this paper.

## Nomenclature

- $ea,eu,e$ =
tracking, balance, and overall errors

- $pa,\nu ext$ =
transformed $qa$ and $vext$ in

*p*coordinates- $pam,pan$ =
controlled and uncontrolled coordinates

- $qa,qu$ =
coordinates for actuated and unactuated subsystems

- $qaa,qau$ =
partitioned actuated coordinates in (

*n*−*m*)- and*m*-dimensions- $que,q\u0302ue$ =
actual and estimated BEMs

- $S$ =
robot dynamics

- $Sn,Sgp$ =
nominal and GP-based robot dynamics

- $uint,u\u0302int,u\u02dcint$ =
EIC-, PEIC-, NEIC-based control inputs

- $vext,vint$ =
trajectory tracking and balanced-embedded control inputs

- $vuint$ =
BEM stabilization control input

- $v\u0302aext,v\u0302uext$ =
trajectory tracking control inputs for $qaa$ and $qau$

- $v\u02dcaint$ =
control input for $pam$

- $\gamma ,r$ =
convergence rate and error bound

- $\Delta a,\Delta u$ =
estimation errors of actuated and unactuated dynamics

### Appendix A: Proofs

##### A1 Proof of Lemma 1.

**, the SVD in Eq. (7) exists and all**

*q**m*singular values are great than zero, i.e., $\sigma i>0$. Thus, $ker(Dau)=Vn$ contains $(n\u2212m)$ column vectors. Plugging Eq. (7) into Eq. (A1) and considering the coordinate transformation, we obtain

where $U\Lambda VTvext=U\Lambda m\nu mext$ is used based on the fact that $\Lambda \u2208\mathbb{R}m\xd7n$ is a rectangular diagonal matrix.

The BEM $E$ depends only on $\nu mext$, that is, the control effect in $ker(Dua)$ is not used when obtaining the BEM.

##### A2 Proof of Lemma 3.

*c*)

*a*). Similarly, we obtain

where $v\u0302aint=[(v\u0302aext)T\u2009(v\u0302uint)T]T$. Since $v\u0302aint$ is not obtained in the way as in Eq. (5), i.e., $v\u0302aint\u2209ker(D\xafua),\u2009vm+jTv\u0302aint\u22600$ and $pan$ is under active control. Meanwhile, $vm+jTv\u0302aint$ drives $qa\u2192qad$ in $ker(D\xafau)$, given that $v\u0302aext$ and $v\u0302uint$ are designed to drive $qa\u2192qad$. Therefore, if the unperturbed system under the original EIC-based control is stable, it is also stable under the PEIC-based control.

##### A3 Proof of Lemma 4.

Clearly, $Sugp$ dynamics is unchanged compared to Eq. (9).

##### A4 Proof for Theorem 1.

We present the stability proof for the PEIC- and NEIC-based controls using the Lyapunov method.

*PEIC-Based Control*: Plugging Eq. (24) into $V1=V$ and considering Eq. (32), we obtain $V\u02d91=eT(ATP+PA)e+2eTPO1=\u2212eTQe+eTQ\Sigma e+2eTPO1$, where $Q\Sigma =(A\u2212A0)TP+P(A\u2212A0)$. The bounded variance leads to the bounded eigenvalue of matrix $Q\Sigma $. Given the fact that $Q\Sigma =Q\Sigma T$, the eigenvalues of $Q\Sigma $ are real numbers.

$\rho 1=2d1\lambda max(P)||e||,\u2009\varpi 1=2\omega 1\lambda max(P)||e||$. With the bounded perturbations *ρ*_{1} and *ω*_{1}, the closed-loop system dynamics can be shown stable in probability as $Pr{V1\u2264\u2212\gamma 1V1+\rho 1+\varpi 1}>\eta $. Taking further analysis, we obtain a nominal estimation of the error convergence as $Pr{V\u02d91\u2264V1(0)e\u2212\gamma 1t}>\eta $ and the error bound estimation $Pr{||e||\u2264r1}>\eta $ with $r1=(2d1\lambda max(P))/(\lambda min(Q)\u2212\lambda max(Q\Sigma )\u22122d2\lambda max(P))$.

*NEIC-Basd Control*: Without the loss of generality, we select $\nu n=VnTv\u0302ext$. We take $V2=V$ as the Lyapunov function candidate for $Se,NEIC$. If the control gains are the same as that in the PEIC-based control and *α* = 1 for compensation effect, $\gamma 2=\gamma 1$. We choose control gains properly such that $\gamma 2>0$. The system can be shown stable as $Pr{V\u02d92\u2264\u2212\gamma 2V2+\rho 2+\varpi 2}>\eta $, where $\rho 2=2d1\lambda max(P)||e||,\u2009\varpi 2=2\omega 2\lambda max(P)||e||$, and $\omega 2=lu2||\kappa u||+la2||\kappa a||$ is defined same as *ω*_{1} containing the GP prediction uncertainties. A nominal estimation of error convergence and final error bound can also be obtained.

To show $\gamma i>0$, *i *=* *1, 2, the control gains should be properly selected. With a small predefined error limit as a stop criterion in BEM estimation, *c _{i}* values can be shown as $ci\u226a1$. Given the explicit form,

*d*are estimated for $A0$ and

_{i}**,**

*Q***is obtained by solving Eq. (32). The matrix $Q\Sigma $ depends on the control gains associated with the reduction variance. Since the variance is bounded, we design $kni$ such that $\lambda max(Q\Sigma )$ satisfies the inequality $\lambda min(Q)\u2212\lambda max(Q\Sigma )\u22122d2\lambda max(P)>0$ and then $\gamma 1>0$. Thus, the stability is obtained.**

*P*### Appendix B: Dynamics Model of Underactuated Balance Robots

##### B1 Rotary Inverted Pendulum.

where *l _{r}*,

*J*, and

_{r}*d*are the length, mass inertia, and viscous damping coefficient of the base link,

_{r}*l*,

_{p}*J*, and

_{p}*d*are corresponding parameters of the pendulum,

_{p}*m*is the pendulum mass,

_{p}*g*is the gravitational constant, and $kt,km,KG,Rm,andC$ are robot constant. The values of these parameters can be found in Ref. [27]. The control input is the motor voltage, i.e.,

*u*=

*V*.

_{m}##### B2 Three-Link Inverted Pendulum.

where *m _{i}*,

*l*, and

_{i}*J*are the mass, length, and mass inertia of each link, and $si+j=sin(\theta i+\theta j)$. Matrix

_{i}**is obtained as $Cij=\u2211k=13cijk\theta \u02d9k$, where Christoffel symbols $cijk=12((\u2202Dij/\u2202\theta k)+(\u2202Dik/\u2202\theta j)\u2212(\u2202Djk/\u2202\theta i))$. The physical parameters are $m1=0.7$ kg, $m2=1.3$ kg, $m3=0.3$ kg, $l1=0.065$ m, $l2=0.23$ m, $l3=0.25$ m, $J1=0.0008$ kg m**

*C*^{2}, $J2=0.005$ kg m

^{2}, and $J3=0.003$ kg m

^{2}.

## Footnotes

The video of the experiment is available at https://www.youtube.com/watch?v=ZOYb0UW3KS8