CAUTION: Blog moved to dohmatob.github.io !!!

Monday, December 28, 2015

$\frac{1}{2}\|.\|^2$ is the only self-dual function on a Hilbert space $X$!

Let $X$ be a Hibert space, $x \mapsto \|x\| := \sqrt{x^Tx}$ be the euclidean norm on $X$, and $f:X \rightarrow (-\infty,+\infty]$ an extended real-valued function. Define the convex conjugate of $f$, denoted $f^*$, by
$$f^*(y) := \sup_{x \in X}x^Ty - f(x), \; \forall y \in X.$$
Note that $f^*$ is always convex l.s.c (being the supremum of affine functions) without any assumptions whatsoever on $f$.

Question: When do we have $f^* = f$ ? Checkout the answer here.

Monday, December 21, 2015

Paper accepted at ICASSP 2016!

Our math paper entitled: "LOCAL Q-LINEAR CONVERGENCE AND FINITE-TIME ACTIVE SET IDENTIFICATION OF ADMM ON A CLASS OF PENALIZED REGRESSION PROBLEMS" has been accepted for the ICASSP 2016 signal-processing conference (the largest in the world). The conference will be held at the shicc (Shanghai Convention Center).

Author manuscript available upon demand.

Exact minimization of the linearly perturbed distance of a point to a closed convex set

Let $a, b \in \mathbb R^n$, and $C$ be a nonempty closed convex subset of $\mathbb{R}^n$ with $a \not \in C$. Consider the problem $$\text{minimize }\|x-a\| + b^Tx\text{ subject to } x \in C.$$ The case $b = 0$ corresponds to the well-known problem of projecting the point $a$ onto $C$. This problem can be easy or difficult, depending on the geometry of $C$ (for example, the latter problem is rather easy if $C$ is a diamond-shaped polyhedron, or an oval). However the perturbative case $b \ne 0$ is completely non-trivial and must be treated rather carefully. That notwithstanding, we show here that the case $\|b\| < 1$ is essentially equivalennt to the non-perturbative case $b = 0$.

Saturday, November 21, 2015

Graph-Net vrs TV-L1: structure in regression coefficients

See conference paper for more theory on these multi-variate decoding / recovery models in brain science. Implementation to appear in next release of nilearn.

Monday, October 5, 2015

On the unreasonable effectiveness of Friedrichs angles: rates of convergence of powers of some weird operators

Statement of the problem

Let $d$ and $c$ be positive integers and $q = dc$. Let $G$ be a $q$-by-$q$ positive semi-definite real matrix with eigenvalues all $\le 1$, and define the $q$-by-$2q$ matrix $A = [G\hspace{1em}\mathrm{I}_q - G]$, where $\text{I}_q$ is the $q$-by-$q$ identity matrix. Now, given a scalar $\kappa > 0$ and $d$ vectors $X_1, X_2, \ldots, X_d \in \mathbb{R}^c$, with $\|X_j\| \ne \kappa \; \forall j$, consider the $q$-by-$q$ block-diagonal matrix $D$ whose $j$th block, a $c$-by-$c$ matrix, is given by \begin{equation} D_j = \begin{cases}\text{I}_c - \frac{\kappa}{\|X_j\|}\text{proj}_{\langle X_j \rangle^\perp}, &\mbox{ if } \|X_j\| > \kappa,\\ 0, &\mbox{ otherwise,}\end{cases} \end{equation} where $\text{proj}_{\langle X_j \rangle^\perp} = \mathrm{I}_c - \mathrm{proj}_{\langle X_j\rangle} = \mathrm{I}_c - \frac{1}{\|X_j\|^2}X_jX_j^T$ is the projection onto the orthogonal complement of the $1$-dimensional subspace of $\mathbb{R}^c$ spanned by the vector $X_j$. Finally define the $2q$-by-$q$ matrix $B = [D\hspace{1em}\mathrm{I}_q-D]^T$. For example, if $c=1$, then each $D_j \in \{{0,1}\}$, a bit indicating whether $|X_j| > \kappa$, and $D$ is a diagonal matrix with $0$s and $1$s accordingly. Now define the $2q$-by-$2q$ matrix $F := BA$ (Or $F := AB$. To see this equivalence, note that $(BA)^{n+1} = B(AB)^nA$, for all $n \in \mathbb{N}$. Thus to study the "convergence" of $((BA)^n)_{n \in \mathbb{N}}$, it suffices to study the convergence of $((AB)^n)_{n \in \mathbb{N}}$. In fact, if $(AB)^n \rightarrow C$, then $(BA)^n \rightarrow BCA$.).

Ultimately, I'm interested in the rate of convergence (in the sense of Definition 2.1 of this paper) of the sequence of matrix powers $(F^n)_{n\in\mathbb{N}}$. I can easily bound the spectral norm $r(F) \le \|F\| \le \|A\| \le 1$, but this turns out to be not very useful in my situation (e.g $\|A\|$ can be $1$). If I can compute the eigenvalues of $F$ (e.g in terms of the eigenvalues of $G$), then there is a fair chance I can apply the results of this paper to get rates of convergence for the aforementioned sequence.

Question

Can one relate (via a formula, for example) the eigenvalues of $F$ in terms of the eigenvalues of $G$ ? How fast does iterating $AB$ or $BA$ converge ?

Partial solution for limiting case when $G$ is the projection onto a subspace

In my initial problem, $G$ is a function of a scalar parameter, and after careful inspection, it turns out that in the limit as this parameter goes to infinity, $G$ becomes the projection operator onto a subspace of $\mathbb{R}^q$, i.e $G = \mathrm{proj}_{\text{Im }K} = K(K^TK)^{-1}K^T$ for some full column-rank $q$-by-$p$ matrix $K$ (with $p \le q$). If in addition $c=1$, then $D$ too is a projection (precisely, it is a diagonal matrix with only $0$s and $1$s, as already explained further above) and one computes \begin{eqnarray} \begin{split} F &= AB = [G\hspace{1em}\mathrm{I}_q - G][D\hspace{1em}\mathrm{I}_q - D]^T = GD + (\mathrm{I}_q - G)(\mathrm{I}_q - D)\\ &= \mathrm{proj}_U\mathrm{proj}_V + \mathrm{proj}_{U^\perp}\mathrm{proj}_{V^\perp}, \end{split} \end{eqnarray} where $U := \text{Im }K$, and $V := \text{Im }D$ (note that ${U^\perp} = \text{ker }K^T$ and ${V^\perp} = \text{ker }D^T$). It then suffices to invoke Theorem 3.10 of this [paper][2], with $\mu = 1 \in [0, 2)$ to obtain that the sequence of matrix powers $(F^n)_{n \in \mathbb{N}}$ has the optimal rate of convergence $\gamma = \cos{\theta_F}(U, V) \in [0, 1)$, the cosine of the the *Friedrichs angle* $\theta_F(U, V) \in [0,\pi/2]$ between $U$ and $V$ defined by: \begin{equation} \cos{\theta_F}(U, V) := \sup\{\langle u, v \rangle | u \in U \cap (U \cap V)^\perp, \|u\| \le 1, v \in V \cap (U \cap V)^\perp, \|v\| \le 1 \}. \end{equation}

Final words

As a pathological example, if $K$ (resp. $D$) is invertible, then $F = \mathrm{proj}_U$ (resp. $F = \mathrm{proj}_V$), and so $\cos{\theta_F}(U, V) = 0$. Consequently, $(F^n)_{n \in \mathbb{N}}$ convergences exponentially fast (in fact, in $1$ iteration!) to $F$.

Tuesday, September 15, 2015

I conjecture that there are infinitely many linear relations $p_n + p_{n + 3} = 2 p_{n + 2}$ in the sequence of primes!

For the musement of the trade alone, and not weary about real-life, in the spirit of GH Hardy, who invites us to do math as a fine art, I'm interested in constructing "random walks" on the prime numbers with certain "ergodic" properties. I've reduced my troubles to the following simple --to understand, not to solve!-- conjecture. Viz,

Let $p_1 < p_2 < p_3 < \ldots < p_n < \ldots$ be the sequence of primes (with $p_1 := 2$ as usual).

Conjecture: There are infinitely many positive integers $n$ for which
\begin{eqnarray}
p_n + p_{n + 3} - 2 p_{n + 2} = 0.
\end{eqnarray}

For example, the above relation holds for $n = 2$ since $p_2+ p_5 - 2p_4 = 3 + 7 - 2 \times 5 = 0$. Follow technical thread here.

Tuesday, August 25, 2015

What is a convex conjugate ?

In case you've missed the convex-optimization revolution (which started with Mureau, Rockafellar, etc., but has pre-history dating back to at least Kuhn, Karush, Tucker, Nash, etc.), the hottest things around are subdifferentials, convex conjugates (aka Fenchel-Legendre transforms), prox operators, and infimal convolutions. Checkout this beautiful one-page manuscript on the subject, by H.H. Bauschke. There are many cool stuff associated with this duality result, here are a few:

All primal solutions are obtainable from any dual solution! Paramount is the Fenchel duality Theorem (see previous reference), a monster result for mapping between an arbitrary dual solution and the entire set of primal solutions. You can see it in action here.
Duality between strong convexity and strong smoothness. Another paper shows a non-trivial and very useful result, Theorem 6 (it's fairly possibly this result was known earlier, but this is another debate...): a closed convex function $R$ is $\beta$-strongly convex w.r.t to a norm $\|.\|$ iff its Fenchel-Legendre transform (aka convex conjugate) $R^*$ is $\frac{1}{\beta}$-strongly smooth w.r.t the dual norm $\|.\|_*$. This result can be invoked to establish the strong convexity of some otherwise rather complicated matrix functions.