# Papers

Academic writtings by categories in reversed chronological order.

# Preprints

## 2024

- arXiv
We present EGN, a stochastic second-order optimization algorithm that combines the generalized Gauss-Newton (GN) Hessian approximation with low-rank linear algebra to compute the descent direction. Leveraging the Duncan-Guttman matrix identity, the parameter update is obtained by factorizing a matrix which has the size of the mini-batch. This is particularly advantageous for large-scale machine learning problems where the dimension of the neural network parameter vector is several orders of magnitude larger than the batch size. Additionally, we show how improvements such as line search, adaptive regularization, and momentum can be seamlessly added to EGN to further accelerate the algorithm. Moreover, under mild assumptions, we prove that our algorithm converges to an ϵ-stationary point at a linear rate. Finally, our numerical experiments demonstrate that EGN consistently exceeds, or at most matches the generalization performance of well-tuned SGD, Adam, and SGN optimizers across various supervised and reinforcement learning tasks.

- arXiv
The generalized Gauss-Newton (GGN) optimization method incorporates curvature estimates into its solution steps, and provides a good approximation to the Newton method for large-scale optimization problems. GGN has been found particularly interesting for practical training of deep neural networks, not only for its impressive convergence speed, but also for its close relation with neural tangent kernel regression, which is central to recent studies that aim to understand the optimization and generalization properties of neural networks. This work studies a GGN method for optimizing a two-layer neural network with explicit regularization. In particular, we consider a class of generalized self-concordant (GSC) functions that provide smooth approximations to commonly-used penalty terms in the objective function of the optimization problem. This approach provides an adaptive learning rate selection technique that requires little to no tuning for optimal performance. We study the convergence of the two-layer neural network, considered to be overparameterized, in the optimization loop of the resulting GGN method for a given scaling of the network parameters. Our numerical experiments highlight specific aspects of GSC regularization that help to improve generalization of the optimized neural network. The code to reproduce the experimental results is available at https://github.com/adeyemiadeoye/ggn-score-nn.

## 2023

- arXiv
*Adeyemi D Adeoye*, and Alberto BemporadWe introduce a notion of self-concordant smoothing for minimizing the sum of two convex functions, one of which is smooth and the other may be nonsmooth. The key highlight of our approach is in a natural property of the resulting problem’s structure which provides us with a variable-metric selection method and a step-length selection rule particularly suitable for proximal Newton-type algorithms. In addition, we efficiently handle specific structures promoted by the nonsmooth function, such as l1-regularization and group-lasso penalties. We prove the convergence of two resulting algorithms: Prox-N-SCORE, a proximal Newton algorithm and Prox-GGN-SCORE, a proximal generalized Gauss-Newton algorithm. The Prox-GGN-SCORE algorithm highlights an important approximation procedure which helps to significantly reduce most of the computational overhead associated with the inverse Hessian. This approximation is essentially useful for overparameterized machine learning models and in the mini-batch settings. Numerical examples on both synthetic and real datasets demonstrate the efficiency of our approach and its superiority over existing approaches. A Julia package implementing the proposed algorithms is available at https://github.com/adeyemiadeoye/SelfConcordantSmoothOptimization.jl.

# Journal articles

## 2024

- IEEE-TNNLS
*Adeyemi D. Adeoye*, and Alberto Bemporad*IEEE Transactions on Neural Networks and Learning Systems*2024This article considers the two-stage approach to solving a partially observable Markov decision process (POMDP): the identification stage and the (optimal) control stage. We present an inexact sequential quadratic programming framework for recurrent neural network learning (iSQPRL) for solving the identification stage of the POMDP, in which the true system is approximated by a recurrent neural network (RNN) with dynamically consistent overshooting (DCRNN). We formulate the learning problem as a constrained optimization problem and study the quadratic programming (QP) subproblem with a convergence analysis under a restarted Krylov-subspace iterative scheme that implicitly exploits the structure of the associated Karush–Kuhn–Tucker (KKT) subsystem. In the control stage, where a feedforward neural network (FNN) controller is designed on top of the RNN model, we adapt a generalized Gauss–Newton (GGN) algorithm that exploits useful approximations to the curvature terms of the training data and selects its mini-batch step size using a known property of some regularization function. Simulation results are provided to demonstrate the effectiveness of our approach.

## 2023

- COAP
*Adeyemi D Adeoye*, and Alberto Bemporad*Computational Optimization and Applications*2023Optimization problems that include regularization functions in their objectives are regularly solved in many applications. When one seeks second-order methods for such problems, it may be desirable to exploit specific properties of some of these regularization functions when accounting for curvature information in the solution steps to speed up convergence. In this paper, we propose the SCORE (self-concordant regularization) framework for unconstrained minimization problems which incorporates second-order information in the Newton-decrement framework for convex optimization. We propose the generalized Gauss–Newton with Self-Concordant Regularization (GGN-SCORE) algorithm that updates the minimization variables each time it receives a new input batch. The proposed algorithm exploits the structure of the second-order information in the Hessian matrix, thereby reducing computational overhead. GGN-SCORE demonstrates how to speed up convergence while also improving model generalization for problems that involve regularized minimization under the proposed SCORE framework. Numerical experiments show the efficiency of our method and its fast convergence, which compare favorably against baseline first-order and quasi-Newton methods. Additional experiments involving non-convex (overparameterized) neural network training problems show that the proposed method is promising for non-convex optimization.

## 2020

- JTUSCIJ. U. Abubakar, and
*A. D. Adeoye**Journal of Taibah University for Science*2020A porous tapered inclined stenosed artery under the influence of magnetic field with radiation was considered. The momentum and energy equations with thin radiation governing the blood flow in the inclined artery were obtained taking the flow to be Newtonian. These equations were simplified under assumptions of mild stenosis, non-dimensionalized and solved using Differential Transform Method (DTM). The DTM were coded on Mathematica software to obtain expressions for velocity, temperature and the volumetric flow rate of the blood. The results presented graphically show that the velocity of the blood flow and the blood temperature decreases as the radiation parameter (N) increases.

# Misc., Technical reports & theses

- arXiv
*Adeyemi D Adeoye*, and Alberto Bemporad2021In this paper we propose the SC-Reg (self-concordant regularization) framework for learning overparameterized feedforward neural networks by incorporating second-order information in the Newton decrement framework for convex problems. We propose the generalized Gauss-Newton with Self-Concordant Regularization (SCoRe-GGN) algorithm that updates the network parameters each time it receives a new input batch. The proposed algorithm exploits the structure of the second-order information in the Hessian matrix, thereby reducing the training computational overhead. Although our current analysis considers only the convex case, numerical experiments show the efficiency of our method and its fast convergence under both convex and non-convex settings, which compare favorably against baseline first-order methods and a quasi-Newton method.

- MSc
*Adeyemi Damilare Adeoye*, and Philipp Petersen2021We present, via the solution of nonlinear parabolic partial differential equations (PDEs), a continuous-time formulation for stochastic optimization algorithms used for training deep neural networks. Using continuous-time formulation of stochastic differential equations (SDEs), relaxation approaches like the stochastic gradient descent (SGD) method are interpreted as the solution of nonlinear PDEs that arise from modeling physical problems. We reinterpret, through homogenization of SDEs, the modified SGD algorithm as the solution of the viscous Burgers’ equation that models a highway traffic flow.

- MSc
*Adeyemi Damilare Adeoye*2018A tapered inclined porous artery with stenosis was considered under the influence of magnetic field and heat transfer. The mathematical formulation for the momentum and energy equations of the blood flow considered to be Newtonian were obtained. The energy equation which was obtained by taking an extra factor of heat source and the nonlinear momentum equation were simplified under the assumption of mild stenosis. These equations were non-dimensionalized and solved using Differential Transform Method (DTM) to obtain expressions for velocity, temperature and volumetric flow rate. The graphs of the expressions were plotted against radius of the artery to simulate the effects of magnetic field, heat transfer and other fluid parameters on the velocity, temperature and the volumetric flow rate of the blood. It was observed that as the magnetic field parameter (M) increases, the velocity, temperature and the volumetric flow rate of the blood increase but wall shear stress decreases at the stenosis throat. It was further observed that the effects of heat transfer and magnetic field resulted into a greater variation in the volumetric flow of an inclined artery in the converging region than in the diverging region.

- BSc
*Adeyemi Damilare Adeoye*2016