SelfConcordantSmoothOptimization.jl is a Julia package that implements the self-concordant regularization (SCORE) technique for nonsmooth convex optimization introduced in this paper. In particular, SelfConcordantSmoothOptimization.jl considers problems of the form

$\min f(x) + g(x)$

where $$f\colon \mathbb{R}^n \to \mathbb{R}$$ is smooth and convex, and $$g\colon \mathbb{R}^n \to \mathbb{R}$$, which may be nonsmooth, is proper, closed and convex. The smooth part $$f$$ defines the problem’s objective function, such as quantifying a data-misfit, while the nonsmooth part $$g$$ imposes certain properties, such as sparsity, on the decision variable $$x$$. Please see Implementation details and recommendations for functions that are currently supported for each implemented algorithm.

## Installation

Install the package via Julia’s REPL with

julia> ]


## Usage example

#### A simple sparse logistic regression problem

# Load the package
using SelfConcordantSmoothOptimization

using Random, Distributions, SparseArrays

# Generate a random data
Random.seed!(1234)
n, m = 100, 50;
A = Matrix(sprandn(n, m, 0.01));
y_prob = 1 ./ (1 .+ exp.(-A * zeros(m)));
y = rand.(Bernoulli.(y_prob));
unique_y = unique(y);
y = map(x -> x==unique_y ? -1 : 1, y);
x0 = randn(m);

# Define objective function and choose problem parameters
f(x) = 1/m*sum(log.(1 .+ exp.(-y .* (A*x))));

# choose problem parameters
reg_name = "l1";
λ = 0.4;
μ = 1.0;
hμ = PHuberSmootherL1L2(μ);

# set problem
model = Problem(A, y, x0, f, λ);

# Choose method and run the solver
method = ProxNSCORE();
solution = iterate!(method, model, reg_name, hμ; max_iter=100, x_tol=1e-6, f_tol=1e-6);
# get the solution x
solution.x


To use the ProxGGNSCORE algorithm, a model output function $$\mathcal{M}(A,x)$$ is required (Note that it is essential to define the function f in two ways to use ProxGGNSCORE; Julia will decide which one to use at any instance – thanks to the multiple dispatch feature):

# f as defined above
f(x) = 1/m*sum(log.(1 .+ exp.(-y .* (A*x))));
# f as a function of y and yhat
f(y, yhat) = -1/m*sum(y .* log.(yhat) .+ (1 .- y) .* log.(1 .- yhat))
# where yhat = Mfunc(A, x) is defined by the model output function
Mfunc(A, x) = 1 ./ (1 .+ exp.(-A*x))
# set problem
model = Problem(A, y, x0, f, λ; out_fn=Mfunc);

# Choose method and run the solver
method = ProxGGNSCORE();
solution = iterate!(method, model, reg_name, hμ; max_iter=100, x_tol=1e-6, f_tol=1e-6);
# get the solution x
solution.x


By default, this package computes derivatives using ForwardDiff.jl. But users can supply functions that compute the derivates involved in the algorithms. This has at least two benefits:

1. Faster computations
2. Avoid the need to define $$f$$ twice when using ProxGGNSCORE

In the example above:

f(x) = 1/m*sum(log.(1 .+ exp.(-y .* (A*x))));

S(x) = exp.(-y .* (A*x));
# gradient of f wrt x:
Sx = S(x)
return -A' * (y .* (Sx ./ (1 .+ Sx))) / m
end;
# Hessian of f wrt x:
function hess_fx(x)
Sx = 1 ./ (1 .+ S(x))
W = Diagonal(Sx .* (1 .- Sx))
hess = A' * W * A
return hess / m
end;

# The following are used by ProxGGNSCORE
# Jacobian of yhat wrt x:
jac_yx(yhat) = vec(yhat .* (1 .- yhat)) .* A;
# gradient of \ell wrt yhat:
grad_fy(yhat) = (-y ./ yhat .+ (1 .- y) ./ (1 .- yhat))/m;
# Hessian of \ell wrt yhat:
hess_fy(yhat) = Diagonal((y ./ yhat.^2 + (1 .- y) ./ (1 .- yhat).^2)/m);

# Now (for ProxNSCORE):
method_n = ProxNSCORE();
sol_n = iterate!(method_n, model_n, reg_name, hμ; max_iter=100, x_tol=1e-6, f_tol=1e-6);
sol_n.x

# And for ProxGGNSCORE (does not require hess_fx):
method_ggn = ProxGGNSCORE();
sol_ggn = iterate!(method_ggn, model_ggn, reg_name, hμ; max_iter=100, x_tol=1e-6, f_tol=1e-6);
sol_ggn.x


TODO: Add a simple sparse group lasso example… (for now, see example from paper in /examples/paper).

## Implementation details and recommendations

Below is a summary of functions $$f$$ supported by the algorithms implemented in the package:

Algorithm Supported $$f$$
ProxNSCORE Any twice continuously differentiable function.
ProxGGNSCORE Any twice continuously differentiable function that can be expressed in the form $$f(x) = \sum\limits_{i=1}^{m}\ell(y_i,\hat{y}_i)$$ where $$\ell$$ is a loss function that measures a data-misfit.
Requires a model $$\mathcal{M}(A,x)$$ that computes the predictions $$\hat{y}_i$$.
ProxQNSCORE Any twice continuously differentiable function (currently not recommended).
Currently not recommended.

As the package name and description imply, the implemented algorithms use a generalized self-concordant smooth approximation $$g_s$$ of $$g$$ in their procedures. The algorithms do this for specific regularization functions that are specified by reg_name that takes a string value in the iterate! function. We summarize below currently implemented regularization functions, as well as the corresponding smoothing functions $$g_s$$.

reg_name value Implemented $$g_s$$ function(s) Remark(s)
"l1" PHuberSmootherL1L2(μ)
OsBaSmootherL1L2(μ)
$$\mu>0$$
"l2" PHuberSmootherL1L2(μ)
OsBaSmootherL1L2(μ)
$$\mu>0$$
"gl" PHuberSmootherGL(μ, model) For sparse group lasso regularizer
$$\mu>0$$
"indbox" PHuberSmootherIndBox(lb,ub,μ) lb: lower bound in the box constraints
ub: upper bound in the box constraints
$$\mu>0$$
• PHuberSmootherL1L2, PHuberSmootherGL, PHuberSmootherIndBox are highly recommended.

For more details and insights on the approach implemented in this package, please see the associated paper in Citing below.

## Citing

If you use SelfConcordantSmoothOptimization.jl in your work, particularly the algorithms listed above, we kindly request that you cite the following paper:

@article{adeoye2023self,