Skip to content

Commit 8dfe442

Browse files
committed
sliced prox allows operators
1 parent 693ccfc commit 8dfe442

File tree

7 files changed

+131
-97
lines changed

7 files changed

+131
-97
lines changed

Diff for: docs/src/tutorial.md

+39-20
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,13 @@ where $f$ is a smooth function while $g$ is possibly nonsmooth.
1212

1313
## Unconstraint optimization
1414

15-
The LASSO problem is popular example of this class of problems:
15+
The *least absolute shrinkage and selection operator* (LASSO) belongs to this class of problems:
1616

1717
```math
18-
\underset{ \mathbf{x} }{\text{minimize}} \ \tfrac{1}{2} \| \mathbf{A} \mathbf{x} - \mathbf{y} \|^2+ \| \mathbf{x} \|_1.
18+
\underset{ \mathbf{x} }{\text{minimize}} \ \tfrac{1}{2} \| \mathbf{A} \mathbf{x} - \mathbf{y} \|^2+ \lambda \| \mathbf{x} \|_1.
1919
```
2020

21-
Here the squared norm $\tfrac{1}{2} \| \mathbf{A} \mathbf{x} - \mathbf{y} \|^2$ is a _smooth_ function while the $l_1$-norm is a _nonsmooth_ function.
21+
Here the squared norm $\tfrac{1}{2} \| \mathbf{A} \mathbf{x} - \mathbf{y} \|^2$ is a *smooth* function while the $l_1$-norm is a *nonsmooth* function.
2222

2323
This can be solved using `StructuredOptimization.jl` using only few lines of code:
2424

@@ -64,13 +64,13 @@ for a nonempty set $\mathcal{S}$ the constraint of
6464
can be converted into an indicator function
6565

6666
```math
67-
g(\mathbf{x}) = \begin{cases}
67+
g(\mathbf{x}) = \delta_{\mathcal{S}} (\mathbf{x}) = \begin{cases}
6868
0 & \text{if} \ \mathbf{x} \in \mathcal{S},\\
6969
+\infty & \text{otherwise},
7070
\end{cases}
7171
```
7272

73-
to obtain the standard form. Constraints are treated as _nonsmooth functions_.
73+
to obtain the standard form. Constraints are treated as *nonsmooth functions*.
7474

7575
This conversion is automatically performed by `StructuredOptimization.jl`.
7676

@@ -133,35 +133,54 @@ julia> @minimize ls(X1*X2-Y) st X1 >= 0., X2 >= 0.
133133

134134
## Limitations
135135

136-
**TODO simplify this**
136+
Currently `StructuredOptimization.jl` supports only *Proximal Gradient (aka Forward Backward) algorithms*, which require specific properties of the nonsmooth functions and costraint to be applicable.
137137

138-
Currently `StructuredOptimization.jl` supports only Proximal Gradient (aka Forward Backward) algorithms, which require certain properties of the nonsmooth functions and costraint.
139-
140-
In the general case a nonsmooth function of $M$ variables composed by $G$ terms can be written as:
138+
If we express the nonsmooth function $g$ as the composition of
139+
a function $\tilde{g}$ with a linear operator $A$:
141140
```math
142-
g(\mathbf{x}_1,\dots,\mathbf{x}_M) =
143-
\sum_{i = 0}^G g_i \left(\sum_{j = 1}^{M}
144-
A_{i,j} \mathbf{x}_j \right).
141+
g(\mathbf{x}) =
142+
\tilde{g}(A \mathbf{x})
145143
```
146-
where the functions $g_i$ are nonsmooth functions (or indicator functions resulting from constraints) and $A_{i,j}$ linear operators.
147-
148-
The problem can be solved when $g$ satisfies the following conditions:
144+
than the problem can be solved when $g$ satisifies the following properties:
149145

150-
1. for all $i\in \{1,\ldots,G \}$ and $j\in\{1,\ldots,M \}$, mapping $A_{i,j}$ satisfies $A_{i,j}^* A_{i,j} = \mu_{i,j} I$, where $\mu_{i,j} \geq 0$, $A^*$ is the adjoint of $A$ and $\mathcal{I}$ is the identity operator.
146+
1. the mapping $A$ must be a *tight frame* namely it must satisfy $A A^* = \mu Id$, where $\mu \geq 0$ and $A^*$ is the adjoint of $A$ and $Id$ is the identity operator.
151147

152-
2. for all $j \in \{1,\dots,M \}$, the cardinality of $\{i | A_{i,j} \neq 0 \} = 1$.
148+
2. if $A$ is not a tight frame, than it must be possible write $g$ as a *separable* sum $g(\mathbf{x}) = \sum_j h_j (B_j \mathbf{x}_j)$ with $\mathbf{x}_j$ being a non-overlapping slices of $\mathbf{x}$ and $B_j$ being tight frames.
153149

154150
Let us analyze these rules with a series of examples.
155151

156-
The previous example was satisfing the rules:
152+
The LASSO example above satisfy the first rule:
157153
```julia
158-
@minimize ls(X1*X2-Y) st X1 >= 0., X2 >= 0.
154+
julia> @minimize ls( A*x - y ) + λ*norm(x, 1)
155+
159156
```
160-
Here there are two constraints each one containing only one variable and
157+
since the non-smooth function $\lambda \| \cdot \|_1$ is not composed with any operator (or equivalently is composed with $Id$ which is a tight frame).
161158

159+
Also the following problem would be accepted:
160+
```julia
161+
julia> @minimize ls( A*x - y ) + λ*norm(dct(x), 1)
162162

163+
```
164+
since the discrete cosine transform (DCT) is orthogonal and is therefore a tight frame.
163165

166+
On the other hand, the following problem
167+
```julia
168+
julia> @minimize ls( A*x - y ) + λ*norm(x, 1) st x >= 1.0
164169

170+
```
171+
cannot be solved through proximal gradient algorithms, since the second rule would be violated.
172+
Here the constraint would be converted into an indicator function and the nonsmooth function $g$ can be written as the sum:
165173

174+
```math
175+
g(\mathbf{x}) =\lambda \| \mathbf{x} \|_1 + \delta_{\mathcal{S}} (\mathbf{x})
176+
```
177+
178+
which is not separable.
166179

180+
On the other hand this problem would be accepted:
181+
```julia
182+
julia> @minimize ls( A*x - y ) + λ*norm(x[1:div(n,2)], 1) st x[div(n,2)+1:n] >= 1.0
183+
184+
```
185+
as not the optimization variables $\mathbf{x}$ are partitioned into non-overlapping groups.
167186

Diff for: src/solvers/terms_extract.jl

+21-11
Original file line numberDiff line numberDiff line change
@@ -70,14 +70,23 @@ end
7070

7171
# extract function and merge operator
7272
function extract_merge_functions(t::Term)
73-
if is_eye(operator(t))
73+
if is_sliced(t)
74+
if typeof(operator(t)) <: Compose
75+
op = operator(t).A[2]
76+
else
77+
op = Eye(size(operator(t),1)...)
78+
end
79+
else
80+
op = operator(t)
81+
end
82+
if is_eye(op)
7483
f = displacement(t) == 0 ? t.f : PrecomposeDiagonal(t.f, 1.0, displacement(t))
75-
elseif is_diagonal(operator(t))
76-
f = PrecomposeDiagonal(t.f, diag(operator(t)), displacement(t))
77-
elseif is_AAc_diagonal(operator(t))
78-
f = Precompose(t.f, operator(t), diag_AAc(operator(t)), displacement(t))
84+
elseif is_diagonal(op)
85+
f = PrecomposeDiagonal(t.f, diag(op), displacement(t))
86+
elseif is_AAc_diagonal(op)
87+
f = Precompose(t.f, op, diag_AAc(op), displacement(t))
7988
end
80-
f = t.lambda == 1. ? f : Postcompose(f, t.lambda) #for now I keep this
89+
f = t.lambda == 1. ? f : Postcompose(f, t.lambda) #for now I keep this
8190
#TODO change this
8291
return f
8392
end
@@ -95,13 +104,14 @@ function extract_proximable(xAll::NTuple{N,Variable}, t::NTuple{M,Term}) where {
95104
fx = IndFree()
96105
elseif length(tx) == 1 #only one term per variable
97106
fx = extract_proximable(x,tx[1])
98-
else #multiple terms per variable
99-
#currently this happens only with GetIndex
100-
107+
else
108+
#multiple terms per variable
109+
#currently this happens only with GetIndex
101110
fxi,idxs = (),()
102111
for ti in tx
103-
fxi = (fxi..., extract_functions(ti))
104-
idxs = (idxs...,operator(ti).idx )
112+
fxi = (fxi..., extract_merge_functions(ti))
113+
idx = typeof(operator(ti)) <: Compose ? operator(ti).A[1].idx : operator(ti).idx
114+
idxs = (idxs..., idx )
105115
end
106116
fx = SlicedSeparableSum(fxi,idxs)
107117
end

Diff for: src/solvers/terms_properties.jl

+1-7
Original file line numberDiff line numberDiff line change
@@ -14,18 +14,12 @@ function is_proximable(terms::Tuple)
1414
for v in vars
1515
tv = [t for t in terms if v in variables(t)]
1616
if length(tv) != 1
17-
#TODO make this more general
18-
if all( (<:).(typeof.(operator.(tv)), GetIndex) )
17+
if all( is_sliced.(tv) ) && all( is_proximable.(tv) )
1918
return true
2019
else
2120
return false
2221
end
2322
end
2423
end
25-
# NOTE: I see why GetIndex requires a special case. However, it is a more
26-
# general case than just GetIndex, and I would postpone its implementation,
27-
# unless we have a very concrete and important example where this is
28-
# strictly required...
29-
# I agree... but we have this in the Audio Declipping demo!
3024
return true
3125
end

Diff for: src/syntax/terms/term.jl

+3-1
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,9 @@ is_f = [:is_linear,
6666
:is_orthogonal,
6767
:is_invertible,
6868
:is_full_row_rank,
69-
:is_full_column_rank]
69+
:is_full_column_rank,
70+
:is_sliced
71+
]
7072

7173
for f in is_f
7274
@eval begin

Diff for: test/runtests.jl

+1
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ end
2222

2323
@testset "Problem construction" begin
2424
include("test_problem.jl")
25+
include("test_build_minimize.jl")
2526
end
2627

2728
@testset "Integration tests" begin

Diff for: test/test_build_minimize.jl

+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
@printf("\n Testing solver build \n")
2+
3+
x = Variable(10)
4+
A = randn(5, 10)
5+
y = Variable(7)
6+
B = randn(5, 7)
7+
b = randn(5)
8+
9+
prob = problem(ls(A*x + b), norm(x, 2) <= 1.0)
10+
built_slv = build(prob, StructuredOptimization.PG())
11+
solve!(built_slv)
12+
13+
~x .= 0.
14+
prob = problem(ls(A*x - B*y + b) + norm(y, 1), norm(x, 2) <= 1.0)
15+
built_slv = build(prob, FPG())
16+
solve!(built_slv)
17+
18+
@printf("\n Testing @minimize \n")
19+
~x .= 0.
20+
~y .= 0.
21+
slv, = @minimize ls(A*x - B*y + b) st norm(x, 2) <= 1e4, norm(y, 1) <= 1.0 with PG()
22+
~x .= 0.
23+
slv, = @minimize ls(A*x - b) st norm(x, 1) <= 1.0 with PG()
24+
~x .= 0.
25+
slv, = @minimize ls(A*x - b) st norm(x, 1) <= 1.0
26+
~x .= 0.
27+
slv, = @minimize ls(A*x - b) + norm(x, 1) with PG()
28+
~x .= 0.
29+
slv, = @minimize ls(A*x - b) + norm(x, 1)
30+
~x .= 0.
31+
slv, = @minimize ls(A*x - b)
32+
33+
#TODO many many more tests
34+
x = Variable(5)
35+
A = randn(10, 5)
36+
b = randn(10)
37+
38+
@printf("\n Testing @minimize nonlinear \n")
39+
slv, = @minimize ls(sigmoid(A*x,10) - b)+norm(x,1) with PG()
40+
xpg = copy(~x)
41+
~x .= 0.
42+
slv, = @minimize ls(sigmoid(A*x,10) - b)+norm(x,1) with ZeroFPR()
43+
xz = copy(~x)
44+
~x .= 0.
45+
slv, = @minimize ls(sigmoid(A*x,10) - b)+norm(x,1) with PANOC()
46+
xp = copy(~x)
47+
~x .= 0.
48+
49+
@test norm(xz-xpg) <1e-4
50+
@test norm(xp-xpg) <1e-4

Diff for: test/test_problem.jl

+16-58
Original file line numberDiff line numberDiff line change
@@ -188,13 +188,22 @@ f = StructuredOptimization.extract_proximable(xAll,cf)
188188
#@test norm(f(~x) - 10*norm(fft(~x)-b,1)) < 1e-12
189189

190190
# single variable, multiple terms with GetIndex
191-
x = Variable(randn(5))
192-
b = randn(2)
193-
cf = 10*norm(x[1:2]-b,1)+norm(x[3:5],2)
194-
xAll = StructuredOptimization.extract_variables(cf)
195-
@test StructuredOptimization.is_proximable(cf) == true
196-
f = StructuredOptimization.extract_proximable(xAll,cf)
197-
@test norm(f(~x) - sum([10*norm((~x)[1:2]-b,1);norm((~x)[3:5],2)])) < 1e-12
191+
x = Variable(randn(5))
192+
b = randn(2)
193+
cf = 10*norm(x[1:2]-b,1)+norm(x[3:5],2)
194+
xAll = StructuredOptimization.extract_variables(cf)
195+
@test StructuredOptimization.is_proximable(cf) == true
196+
f = StructuredOptimization.extract_proximable(xAll,cf)
197+
@test norm(f(~x) - sum([10*norm((~x)[1:2]-b,1);norm((~x)[3:5],2)])) < 1e-12
198+
199+
# single variable, multiple terms with GetIndex composed with dct
200+
x = Variable(randn(5))
201+
b = randn(2)
202+
cf = 10*norm(x[1:2]-b,1)+norm(dct(x[3:5]),2)
203+
xAll = StructuredOptimization.extract_variables(cf)
204+
@test StructuredOptimization.is_proximable(cf) == true
205+
f = StructuredOptimization.extract_proximable(xAll,cf)
206+
@test norm(f(~x) - sum([10*norm((~x)[1:2]-b,1);norm(dct((~x)[3:5]),2)])) < 1e-12
198207

199208
# multiple variables, multiple terms
200209
x1 = Variable(randn(5))
@@ -246,54 +255,3 @@ xAll = (x1,x2)
246255
f = StructuredOptimization.extract_proximable(xAll,cf)
247256
@test norm(f.fs[1](~x1)-norm((~x1)[1:2]+b1[1:2],2)-norm((~x1)[3:5]+b1[3:5],1) ) < 1e-12
248257
@test norm(f.fs[2](~x2)-10*norm(~x2-b2,1) ) < 1e-12
249-
250-
@printf("\n Testing solver build \n")
251-
252-
x = Variable(10)
253-
A = randn(5, 10)
254-
y = Variable(7)
255-
B = randn(5, 7)
256-
b = randn(5)
257-
258-
prob = problem(ls(A*x + b), norm(x, 2) <= 1.0)
259-
built_slv = build(prob, StructuredOptimization.PG())
260-
solve!(built_slv)
261-
262-
~x .= 0.
263-
prob = problem(ls(A*x - B*y + b) + norm(y, 1), norm(x, 2) <= 1.0)
264-
built_slv = build(prob, FPG())
265-
solve!(built_slv)
266-
267-
@printf("\n Testing @minimize \n")
268-
~x .= 0.
269-
~y .= 0.
270-
slv, = @minimize ls(A*x - B*y + b) st norm(x, 2) <= 1e4, norm(y, 1) <= 1.0 with PG()
271-
~x .= 0.
272-
slv, = @minimize ls(A*x - b) st norm(x, 1) <= 1.0 with PG()
273-
~x .= 0.
274-
slv, = @minimize ls(A*x - b) st norm(x, 1) <= 1.0
275-
~x .= 0.
276-
slv, = @minimize ls(A*x - b) + norm(x, 1) with PG()
277-
~x .= 0.
278-
slv, = @minimize ls(A*x - b) + norm(x, 1)
279-
~x .= 0.
280-
slv, = @minimize ls(A*x - b)
281-
282-
#TODO many many more tests
283-
x = Variable(5)
284-
A = randn(10, 5)
285-
b = randn(10)
286-
287-
@printf("\n Testing @minimize nonlinear \n")
288-
slv, = @minimize ls(sigmoid(A*x,10) - b)+norm(x,1) with PG()
289-
xpg = copy(~x)
290-
~x .= 0.
291-
slv, = @minimize ls(sigmoid(A*x,10) - b)+norm(x,1) with ZeroFPR()
292-
xz = copy(~x)
293-
~x .= 0.
294-
slv, = @minimize ls(sigmoid(A*x,10) - b)+norm(x,1) with PANOC()
295-
xp = copy(~x)
296-
~x .= 0.
297-
298-
@test norm(xz-xpg) <1e-4
299-
@test norm(xp-xpg) <1e-4

0 commit comments

Comments
 (0)