Publications
Preprints
- M. Korbit, A. D. Adeoye, A. Bemporad and M. Zanon. (2024). “Exact Gauss-Newton Optimization for Training Deep Neural Networks.” arXiv preprint arXiv:2405.14402. LINK PDF Abs Bib
We present EGN, a stochastic second-order optimization algorithm that combines the generalized Gauss-Newton (GN) Hessian approximation with low-rank linear algebra to compute the descent direction. Leveraging the Duncan-Guttman matrix identity, the parameter update is obtained by factorizing a matrix which has the size of the mini-batch. This is particularly advantageous for large-scale machine learning problems where the dimension of the neural network parameter vector is several orders of magnitude larger than the batch size. Additionally, we show how improvements such as line search, adaptive regularization, and momentum can be seamlessly added to EGN to further accelerate the algorithm. Moreover, under mild assumptions, we prove that our algorithm converges to an ϵ-stationary point at a linear rate. Finally, our numerical experiments demonstrate that EGN consistently exceeds, or at most matches the generalization performance of well-tuned SGD, Adam, and SGN optimizers across various supervised and reinforcement learning tasks.
@misc{korbit2024exact, title = {Exact Gauss-Newton Optimization for Training Deep Neural Networks}, author = {Korbit, Mikalai and Adeoye, Adeyemi D. and Bemporad, Alberto and Zanon, Mario}, year = {2024}, eprint = {2405.14402}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, journal = {arXiv preprint arXiv:2405.14402}, }
- A. D. Adeoye, P. C. Petersen and A. Bemporad. (2024). “Regularized Gauss-Newton for Optimizing Overparameterized Neural Networks.” arXiv preprint arXiv:2404.14875. LINK PDF Code Abs Bib
The generalized Gauss-Newton (GGN) optimization method incorporates curvature estimates into its solution steps, and provides a good approximation to the Newton method for large-scale optimization problems. GGN has been found particularly interesting for practical training of deep neural networks, not only for its impressive convergence speed, but also for its close relation with neural tangent kernel regression, which is central to recent studies that aim to understand the optimization and generalization properties of neural networks. This work studies a GGN method for optimizing a two-layer neural network with explicit regularization. In particular, we consider a class of generalized self-concordant (GSC) functions that provide smooth approximations to commonly-used penalty terms in the objective function of the optimization problem. This approach provides an adaptive learning rate selection technique that requires little to no tuning for optimal performance. We study the convergence of the two-layer neural network, considered to be overparameterized, in the optimization loop of the resulting GGN method for a given scaling of the network parameters. Our numerical experiments highlight specific aspects of GSC regularization that help to improve generalization of the optimized neural network. The code to reproduce the experimental results is available at https://github.com/adeyemiadeoye/ggn-score-nn.
@misc{adeoye2024regularized, title = {Regularized Gauss-Newton for Optimizing Overparameterized Neural Networks}, author = {Adeoye, Adeyemi D and Petersen, Philipp Christian and Bemporad, Alberto}, journal = {arXiv preprint arXiv:2404.14875}, year = {2024}, eprint = {2404.14875}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, }
- A. D. Adeoye and A. Bemporad. (2023). “Self-concordant Smoothing for Large-Scale Convex Composite Optimization.” arXiv preprint arXiv:2309.01781. LINK PDF Code Abs Bib
We introduce a notion of self-concordant smoothing for minimizing the sum of two convex functions, one of which is smooth and the other may be nonsmooth. The key highlight of our approach is in a natural property of the resulting problem’s structure which provides us with a variable-metric selection method and a step-length selection rule particularly suitable for proximal Newton-type algorithms. In addition, we efficiently handle specific structures promoted by the nonsmooth function, such as l1-regularization and group-lasso penalties. We prove the convergence of two resulting algorithms: Prox-N-SCORE, a proximal Newton algorithm and Prox-GGN-SCORE, a proximal generalized Gauss-Newton algorithm. The Prox-GGN-SCORE algorithm highlights an important approximation procedure which helps to significantly reduce most of the computational overhead associated with the inverse Hessian. This approximation is essentially useful for overparameterized machine learning models and in the mini-batch settings. Numerical examples on both synthetic and real datasets demonstrate the efficiency of our approach and its superiority over existing approaches. A Julia package implementing the proposed algorithms is available at https://github.com/adeyemiadeoye/SelfConcordantSmoothOptimization.jl.
@misc{adeoye2023self, title = {Self-concordant Smoothing for Large-Scale Convex Composite Optimization}, author = {Adeoye, Adeyemi D and Bemporad, Alberto}, journal = {arXiv preprint arXiv:2309.01781}, year = {2023}, eprint = {2309.01781}, archiveprefix = {arXiv}, primaryclass = {math.OC}, }
Journal articles
- A. D. Adeoye and A. Bemporad. (2025). “An Inexact Sequential Quadratic Programming Method for Learning and Control of Recurrent Neural Networks.” IEEE Transactions on Neural Networks and Learning Systems 36: 2762–2776. LINK PDF Abs Bib
This article considers the two-stage approach to solving a partially observable Markov decision process (POMDP): the identification stage and the (optimal) control stage. We present an inexact sequential quadratic programming framework for recurrent neural network learning (iSQPRL) for solving the identification stage of the POMDP, in which the true system is approximated by a recurrent neural network (RNN) with dynamically consistent overshooting (DCRNN). We formulate the learning problem as a constrained optimization problem and study the quadratic programming (QP) subproblem with a convergence analysis under a restarted Krylov-subspace iterative scheme that implicitly exploits the structure of the associated Karush–Kuhn–Tucker (KKT) subsystem. In the control stage, where a feedforward neural network (FNN) controller is designed on top of the RNN model, we adapt a generalized Gauss–Newton (GGN) algorithm that exploits useful approximations to the curvature terms of the training data and selects its mini-batch step size using a known property of some regularization function. Simulation results are provided to demonstrate the effectiveness of our approach.
@article{adeoye2024isqprnn, author = {Adeoye, Adeyemi D. and Bemporad, Alberto}, journal = {IEEE Transactions on Neural Networks and Learning Systems}, title = {{An Inexact Sequential Quadratic Programming Method for Learning and Control of Recurrent Neural Networks}}, year = {2025}, volume = {36}, number = {2}, pages = {2762-2776}, keywords = {Training;Recurrent neural networks;Optimization;Neural networks;Quadratic programming;Process control;Prediction algorithms;Gauss–Newton methods;markov decision processes;numerical optimization;recurrent neural networks (RNNs);reinforcement learning (RL);sequential quadratic programming (SQP)}, doi = {10.1109/TNNLS.2024.3354855}, }
- A. D. Adeoye and A. Bemporad. (2023). “SCORE: approximating curvature information under self-concordant regularization.” Computational Optimization and Applications 86: 599–626. Springer. LINK PDF Abs Bib
Optimization problems that include regularization functions in their objectives are regularly solved in many applications. When one seeks second-order methods for such problems, it may be desirable to exploit specific properties of some of these regularization functions when accounting for curvature information in the solution steps to speed up convergence. In this paper, we propose the SCORE (self-concordant regularization) framework for unconstrained minimization problems which incorporates second-order information in the Newton-decrement framework for convex optimization. We propose the generalized Gauss–Newton with Self-Concordant Regularization (GGN-SCORE) algorithm that updates the minimization variables each time it receives a new input batch. The proposed algorithm exploits the structure of the second-order information in the Hessian matrix, thereby reducing computational overhead. GGN-SCORE demonstrates how to speed up convergence while also improving model generalization for problems that involve regularized minimization under the proposed SCORE framework. Numerical experiments show the efficiency of our method and its fast convergence, which compare favorably against baseline first-order and quasi-Newton methods. Additional experiments involving non-convex (overparameterized) neural network training problems show that the proposed method is promising for non-convex optimization.
@article{adeoye2023score, year = {2023}, title = {{SCORE: approximating curvature information under self-concordant regularization}}, author = {Adeoye, Adeyemi D and Bemporad, Alberto}, journal = {Computational Optimization and Applications}, issn = {0926-6003}, doi = {10.1007/s10589-023-00502-2}, pages = {599--626}, number = {2}, volume = {86}, publisher = {Springer}, }
- J. U. Abubakar and A. D. Adeoye. (2020). “Effects of radiative heat and magnetic field on blood flow in an inclined tapered stenosed porous artery.” Journal of Taibah University for Science 14: 77–86. Taylor & Francis. LINK PDF Abs Bib
A porous tapered inclined stenosed artery under the influence of magnetic field with radiation was considered. The momentum and energy equations with thin radiation governing the blood flow in the inclined artery were obtained taking the flow to be Newtonian. These equations were simplified under assumptions of mild stenosis, non-dimensionalized and solved using Differential Transform Method (DTM). The DTM were coded on Mathematica software to obtain expressions for velocity, temperature and the volumetric flow rate of the blood. The results presented graphically show that the velocity of the blood flow and the blood temperature decreases as the radiation parameter (N) increases.
@article{abubakar2020effects, author = {Abubakar, J. U. and Adeoye, A. D.}, title = {Effects of radiative heat and magnetic field on blood flow in an inclined tapered stenosed porous artery}, journal = {Journal of Taibah University for Science}, volume = {14}, number = {1}, pages = {77-86}, year = {2020}, publisher = {Taylor & Francis}, doi = {10.1080/16583655.2019.1701397}, }
Technical reports & theses
- A. D. Adeoye and A. Bemporad. (2021). “SC-Reg: Training Overparameterized Neural Networks under Self-Concordant Regularization.” IMT School for Advanced Studies Lucca. LINK PDF Abs Bib
In this paper we propose the SC-Reg (self-concordant regularization) framework for learning overparameterized feedforward neural networks by incorporating second-order information in the Newton decrement framework for convex problems. We propose the generalized Gauss-Newton with Self-Concordant Regularization (SCoRe-GGN) algorithm that updates the network parameters each time it receives a new input batch. The proposed algorithm exploits the structure of the second-order information in the Hessian matrix, thereby reducing the training computational overhead. Although our current analysis considers only the convex case, numerical experiments show the efficiency of our method and its fast convergence under both convex and non-convex settings, which compare favorably against baseline first-order methods and a quasi-Newton method.
@techreport{adeoye2021sc, title = {SC-Reg: Training Overparameterized Neural Networks under Self-Concordant Regularization}, author = {Adeoye, Adeyemi D and Bemporad, Alberto}, institution = {IMT School for Advanced Studies Lucca}, year = {2021}, }
- A. D. Adeoye and P. Petersen. (2021). “A Deep Neural Network Optimization Method Via A Traffic Flow Model.” African Institue for Mathematical Sciences, Rwanda. PDF Abs Bib
We present, via the solution of nonlinear parabolic partial differential equations (PDEs), a continuous-time formulation for stochastic optimization algorithms used for training deep neural networks. Using continuous-time formulation of stochastic differential equations (SDEs), relaxation approaches like the stochastic gradient descent (SGD) method are interpreted as the solution of nonlinear PDEs that arise from modeling physical problems. We reinterpret, through homogenization of SDEs, the modified SGD algorithm as the solution of the viscous Burgers’ equation that models a highway traffic flow.
@techreport{adeoye2021dnn, title = {A Deep Neural Network Optimization Method Via A Traffic Flow Model}, author = {Adeoye, Adeyemi Damilare and Petersen, Philipp}, institution = {African Institue for Mathematical Sciences, Rwanda}, year = {2021}, }
- A. D. Adeoye. (2018). “Blood Flow in an Inclined Tapered Stenosed Porous Artery under the Influence of Magnetic Field and Heat Transfer.” African Institute for Mathematical Sciences, Cameroon. LINK PDF Abs Bib
A tapered inclined porous artery with stenosis was considered under the influence of magnetic field and heat transfer. The mathematical formulation for the momentum and energy equations of the blood flow considered to be Newtonian were obtained. The energy equation which was obtained by taking an extra factor of heat source and the nonlinear momentum equation were simplified under the assumption of mild stenosis. These equations were non-dimensionalized and solved using Differential Transform Method (DTM) to obtain expressions for velocity, temperature and volumetric flow rate. The graphs of the expressions were plotted against radius of the artery to simulate the effects of magnetic field, heat transfer and other fluid parameters on the velocity, temperature and the volumetric flow rate of the blood. It was observed that as the magnetic field parameter (M) increases, the velocity, temperature and the volumetric flow rate of the blood increase but wall shear stress decreases at the stenosis throat. It was further observed that the effects of heat transfer and magnetic field resulted into a greater variation in the volumetric flow of an inclined artery in the converging region than in the diverging region.
@mastersthesis{adeoye2018thesis, title = {Blood Flow in an Inclined Tapered Stenosed Porous Artery under the Influence of Magnetic Field and Heat Transfer}, author = {Adeoye, Adeyemi Damilare}, institution = {African Institute for Mathematical Sciences, Cameroon}, year = {2018}, }
- A. D. Adeoye. (2016). “ON SOME FINITE DIFFERENCE METHODS FOR SOLVING PARTIAL DIFFERENTIAL EQUATIONS.” University of Ilorin, Ilorin, Nigeria.