Convexity and Jensen Inequality

2024年12月4日 15:37

1. Definition of convexity

Definition 1.1. We say that a function $${f:[a,b]\to \mathbb{R}}$$ is convex if for any $${\alpha\in (0,1)}$$ the following inequality holds:

$$
f(\alpha x_1+(1-\alpha)x_2) \leq \alpha f(x_1)+(1-\alpha)f(x_2)
$$

for any $${x_1,x_2 \in [a,b]}$$.

Remark 1.1. The point $${\alpha x_1+(1-\alpha)x_2}$$ is the internal section of the points $${x_1,x_2}$$. The convexity of a function is equivalent to saying that the segment connecting any two points on its graph is always above the graph itself.

Remark 1.2. We also define concavity of a function by saying that a function $${f}$$ is concave if $${-f}$$ is convex.

Example 1.1. The functions $${f(x)=x, |x|, x^2, e^x}$$ are examples of convex functions on $${\mathbb{R}}$$.

We can prove easily that $${f(x)=x}$$ is convex since for $${\alpha\in (0,1)}$$,

$$
f(\alpha x_1+(1-\alpha)x_2)=\alpha x_1+(1-\alpha)x_2
= \alpha f(x_1)+(1-\alpha)f(x_2).
$$

where $${\leq}$$ is just $${=}$$.

$${f(x)=|x|}$$ is convex since for $${\alpha\in (0,1)}$$,

$$
\begin{align*}
f(\alpha x_1+(1-\alpha)x_2)&=|\alpha x_1+(1-\alpha)x_2|\\
&\leq \alpha |x_1|+(1-\alpha)|x_2|\\
&= \alpha f(x_1)+(1-\alpha)f(x_2).
\end{align*}
$$

$${f(x)=x^2}$$ is convex since for $${\alpha\in (0,1)}$$,

$$
\begin{align*}
&\alpha f(x_1)+(1-\alpha)f(x_2)-f(\alpha x_1+(1-\alpha)x_2)\\
&=\alpha x_1^2+(1-\alpha)x_2^2-(\alpha x_1+(1-\alpha)x_2)^2\\
&=\alpha x_1^2+(1-\alpha)x_2^2-\alpha^2 x_1^2-(1-\alpha)^2x_2^2-2\alpha(1-\alpha)x_1x_2\\
&=\alpha(1-\alpha)(x_1-x_2)^2\\
&\geq 0
\end{align*}
$$

To prove that $${f(x)=e^x}$$ is convex is quite harder, but the following theorem helps us solve the problem.

2. Differentiability and convexity

Theorem 2.1. Let $${f:(a,b)\to \mathbb{R}}$$ be a twice differentiable function on $${(a,b)}$$. Then $${f}$$ is convex on $${(a,b)}$$ if and only if $${f''(x)\geq 0}$$ for all $${x\in (a,b)}$$.

With this theorem, we immediately see that $${f(x)=e^x}$$ is convex since $${f''(x)=e^x>0}$$ for all $${x\in \mathbb{R}}$$.

Proof.

1) First we will show that $${f}$$ being convex implies $${f''(x)\geq 0}$$.

To do so, we will first prove that

$$
f''(x)=\lim_{h\to 0}\frac{f(x+h)+f(x-h)-2f(x)}{h^2} \quad \dots (*)
$$

This holds by using Taylor expansion,

$$
\begin{align*}
f(x+h)&=f(x)+hf'(x)+\frac{h^2}{2}f''(x)+o(h^2), \quad \dots (1)\\
f(x-h)&=f(x)-hf'(x)+\frac{h^2}{2}f''(x)+o(h^2), \quad \dots (2)\\
\end{align*}
$$

where $${\lim_{h\to 0}\frac{o(h^2)}{h^2}=0}$$, and adding $${(1), (2)}$$ to get

$$
\begin{align*}
f(x+h)+f(x-h)&=2f(x)+h^2f''(x)+o(h^2)
\end{align*}
$$

Thus we get our claim $${(*)}$$.

Since by convexity,

$$
\begin{align*}
f(x)=f\Big(\frac{x+h}{2}+\frac{x-h}{2}\Big)\leq \frac{1}{2}f(x+h)+\frac{1}{2}f(x-h)
\end{align*},
$$

$$
\begin{align*}
f(x+h)+f(x-h)-2f(x)\end{align*}\geq 0.
$$

Hence

$$
f''(x)=\lim_{h\to 0}\frac{f(x+h)+f(x-h)-2f(x)}{h^2}\geq 0.
$$

2) Next we prove the opposite; $${f''(x)\geq 0}$$ implies convexity of $${f}$$ convex.

Let $${t=\alpha x_1+(1-\alpha)x_2}$$. Then by Taylor theorem, there exist $${c_1, c_2}$$ such that

$$
\begin{align*}
f(x_1)-f(t)&=(x_1-t)f'(t)+\frac{1}{2}(x_1-t)^2f''(c_1), \quad \dots (4)\\
f(x_2)-f(t)&=(x_2-t)f'(t)+\frac{1}{2}(x_2-t)^2f''(c_2), \quad \dots (5)\\
\end{align*}
$$

Computing $${\alpha\times (4)+(1-\alpha)\times (5)}$$, we get

$$
\begin{align*}
&\alpha f(x_1)-\alpha f(t)+(1-\alpha)f(x_2)-(1-\alpha) f(t)\\
&=\alpha(x_1-t)f'(t)+\frac{1}{2}\alpha(x_1-t)^2f''(c_1)+(1-\alpha)(x_2-t)f'(t)+\frac{1}{2}(1-\alpha)(x_2-t)^2f''(c_2)
\end{align*}
$$

The LHS is

$$
\begin{align*}
&\alpha f(x_1)+(1-\alpha)f(x_2)- f(t)\\
\end{align*}
$$

and the RHS is

$$
\begin{align*}
&\alpha(x_1-t)f'(t)+\frac{1}{2}\alpha(x_1-t)^2f''(c_1)+(1-\alpha)(x_2-t)f'(t)+\frac{1}{2}(1-\alpha)(x_2-t)^2f''(c_2)\\
&=\underbrace{[\alpha x_1+(1-\alpha) x_2-t]}_{=0}f'(t)+\frac{1}{2}\alpha(x_1-t)^2f''(c_1)+\frac{1}{2}(1-\alpha)(x_2-t)^2f''(c_2)\\
&=\frac{1}{2}\alpha(x_1-t)^2\underbrace{f''(c_1)}_{\geq 0}+\frac{1}{2}(1-\alpha)(x_2-t)^2\underbrace{f''(c_2)}_{\geq 0}\\
&\geq 0
\end{align*}
$$

Hence we obtain

$$
\begin{align*}
&\alpha f(x_1)+(1-\alpha)f(x_2)- f(t)\geq 0
\end{align*}
$$

or equivalently,

$$
f(\alpha x_1+(1-\alpha)x_2) \leq \alpha f(x_1)+(1-\alpha)f(x_2)
$$

as desired.

□

Remark 2.1. If $${f}$$ is a differentiable convex function, then

$$
\begin{align*}
f(y)\geq f(x)&+(y-x)f'(x)
\end{align*}
$$

for any $${x,y}$$.

This holds from Taylor theorem

$$
\begin{align*}
f(y)&=f(x)+(y-x)f'(x)+\frac{1}{2}(y-x)^2f''(^{\exist}c)
\end{align*}
$$

and $${f''(c)\geq 0}$$ by the convexity.

Alternately, one can show directly from the definition of convexity. For $${t\in (0,1)}$$,

$$
\begin{align*}
f\Big(ty+(1-t)x\Big)\leq tf(y)+(1-t)f(x)
\end{align*}
$$

so we

$$
\begin{align*}
\frac{f\Big(x+t(y-x)\Big)-f(x)}{t}\leq f(y)-f(x)
\end{align*}
$$

Take limit as $${t\to 0}$$. Then the LHS becomes

$$
\begin{align*}
\lim_{t\to 0}\frac{f\Big(x+t(y-x)\Big)-f(x)}{t}=(y-x)f'(x)
\end{align*}
$$

Hence we obtain

$$
\begin{align*}
(y-x)f'(x)\leq f(y)-f(x)
\end{align*}
$$

which is equivalent to what we want to prove.

3. Jensen inequality

Theorem 3.1. Let $${f:[a,b]\to \mathbb{R}}$$ be a convex function. Then for any $${\alpha_1,\alpha_2,\cdots,\alpha_n \in (0,1)}$$ such that $${\sum_{i=1}^n\alpha_i=1}$$ and any $${x_1,x_2,\cdots ,x_n\in [a,b]}$$, the following inequality holds.

$$
\begin{align*}
f\Big(\sum_{i=1}^n\alpha_i x_i\Big) \leq \sum_{i=1}^n\alpha_i f(x_i)\quad \dots (J)
\end{align*}
$$

Proof. We will prove by induction.
First $${(J)}$$ holds $${n=2}$$ by the definition of convexity; i.e., for $${\alpha_1+\alpha_2=1}$$,

$$
\begin{align*}
f(\alpha_1 x_1+\alpha_2 x_2) \leq \alpha_1 f(x_1)+\alpha_2 (x_2)
\end{align*}
$$

Suppose that $${(J)}$$ holds up to $${n=k}$$. We will prove that it also holds for $${n=k+1}$$.

Take $${\alpha_1,\alpha_2,\cdots,\alpha_{k+1} \in (0,1)}$$ such that $${\sum_{i=1}^{k+1}\alpha_i=1}$$.

Let $${A:=\sum_{i=1}^{k}\alpha_i}$$. Then

$$
\begin{align*}
f\Big(\sum_{i=1}^{k+1}\alpha_i x_i\Big)
&= f\Big(\sum_{i=1}^{k}\alpha_i x_i+\alpha_{k+1} x_{k+1}\Big)\\
&= f\Big(A\sum_{i=1}^{k}\frac{\alpha_i}{A} x_i+\alpha_{k+1} x_{k+1}\Big)\\
&\leq Af\Big(\sum_{i=1}^{k}\frac{\alpha_i}{A} x_i\Big)+\alpha_{k+1} f(x_{k+1})\quad \dots \text{use }(J) \text{ for } n=2\\
&\leq A\cdot \frac{\alpha_i}{A}f\Big(\sum_{i=1}^{k} x_i\Big)+\alpha_{k+1} f(x_{k+1})\quad \dots \text{use }(J) \text{ for } n=k \text{ by induction hypothesis}\\
&= \sum_{i=1}^{k+1}\alpha_i f(x_i)
\end{align*}
$$

Hence $${(J)}$$ is proved.

□

Special case. Take $${\alpha_1,\alpha_2,\cdots,\alpha_n =\frac{1}{n}}$$, which satisfy $${\sum_{i=1}^n\alpha_i=1}$$. Then for any convex function $${f:[a,b]\to \mathbb{R}}$$ , the following holds.

$$
\begin{align*}
f\Big(\frac{1}{n}\sum_{i=1}^nx_i\Big) \leq \frac{1}{n}\ \sum_{i=1}^n\ f(x_i)
\end{align*}
$$

for any $${x_1,x_2,\cdots ,x_n\in [a,b]}$$.

4. Applications

4.1. Revisiting AM-GM inequality

Theorem 4.1. For any $${a_1,a_2,\dots,a_n \geq 0}$$, the following inequality holds

$$
\frac{1}{n}\sum_{i=1}^n a_i\geq \sqrt[n]{\prod_{i=1}^n a_i}\quad \dots (\text{AM-GM})
$$

Proof. WLOG, assume that $${a_1,a_2,\dots,a_n > 0}$$, since $${(\text{AM-GM})}$$ immediately holds if any of $${a_i}$$'s is 0.

Consider the function $${f:(0,\infty)\to \mathbb{R}, f(x)=-\ln x}$$, which is convex since $${f'(x)=-\frac{1}{x},f''(x)=\frac{1}{x^2} >0}$$. Hence by the Jensen inequality,

$$
\begin{align*}
f\Big(\frac{1}{n}\sum_{i=1}^nx_i\Big)
&\leq \frac{1}{n}\ \sum_{i=1}^n\ f(x_i)\\
\Leftrightarrow
-\ln\Big(\frac{1}{n}\sum_{i=1}^nx_i\Big)
&\leq -\frac{1}{n}\ \sum_{i=1}^n\ln(x_i)=-\ln\sqrt[n]{\prod_{i=1}^n a_i}\\
\Leftrightarrow
\frac{1}{n}\sum_{i=1}^n a_i&\geq \sqrt[n]{\prod_{i=1}^n a_i}
\end{align*}
$$

□

4.2. Weighted AM-GM inequality

Theorem 4.2. For any $${a_1,a_2,\dots,a_n \geq 0}$$ and $${\lambda_1,\lambda_2,\cdots,\lambda_n \in (0,1)}$$ such that $${\sum_{i=1}^n\lambda_i=1}$$, the following inequality holds

$$
\sum_{i=1}^n \lambda_i a_i\geq \prod_{i=1}^n a_i^{\lambda_i}
$$

Proof. The proof is the same as in AM-GM inequality. Consider the function $${f:(0,\infty)\to \mathbb{R}, f(x)=-\ln x}$$, which is convex. Hence by the Jensen inequality,

$$
\begin{align*}
f\Big(\sum_{i=1}^n\lambda_ix_i\Big)
&\leq \sum_{i=1}^n\lambda_i f(x_i)\\
\Leftrightarrow
-\ln\Big(\sum_{i=1}^n\lambda_i x_i\Big)
&\leq -\sum_{i=1}^n\lambda_i \ln(x_i)=-\ln\prod_{i=1}^n a_i^{\lambda_i }\\
\Leftrightarrow
\sum_{i=1}^n \lambda_i a_i&\geq \prod_{i=1}^n a_i^{\lambda_i}
\end{align*}
$$

□

4.3. Young's inequality

Theorem 4.3. If $${a,b\ge 0}$$, and $${p,q>1}$$ such that $${\frac{1}{p}+\frac{1}{q}=1}$$, then

$$
\frac{a^p}{p}+\frac{b^q}{q}\geq ab
$$

Equality holds if and only if $${a^p=b^q}$$

Proof. Use the weighted AM-GM with $${\lambda_1=\frac{1}{p}, \lambda_2=\frac{1}{q}, a_1=a^p, a_2=b^q}$$ as follows.

$$
\begin{align*}
\lambda_1a_1+\lambda_2a_2 &\geq a_1^{\lambda_1}a_2^{\lambda_2} \\
\Leftrightarrow
\frac{a^p}{p}+\frac{b^q}{q}&\geq (a^p)^{\frac{1}{p}}(b^q)^{\frac{1}{q}}\\
\Leftrightarrow
\frac{a^p}{p}+\frac{b^q}{q}&\geq ab
\end{align*}
$$

□

4.4. Hölder's inequality

Theorem 4.4. For any $${a_1,a_2,\dots,a_n \geq 0,b_1,b_2,\dots,b_n \geq 0}$$ and $${p,q>1}$$ such that $${\frac{1}{p}+\frac{1}{q}=1}$$, the following inequality holds

$$
\sum_{i=1}^n a_ib_i\leq \Big(\sum_{i=1}^na_i^p\Big)^{1/p}\Big(\sum_{i=1}^nb_i^q\Big)^{1/q}
$$

Equality holds if and only if

$$
\frac{a_1^p}{b_1^q}=\frac{a_2^p}{b_2^q}=\cdots=\frac{a_n^p}{b_n^q}.
$$

Proof. Consider the function $${f:(0,\infty)\to \mathbb{R}, f(x)=x^p, (p>0)}$$, which is convex since $${f''(x)=p(p-1)x^{p-2} > 0}$$. Hence by the Jensen inequality, for $${x_1,x_2,\cdots, x_n>0, m_1,m_2,\cdots, m_n>0}$$ we have

$$
\begin{align*}
f\Big(\frac{\sum_{i=1}^nm_ix_i}{\sum_{i=1}^nm_i}\Big)
&\leq \frac{\sum_{i=1}^nm_if(x_i)}{\sum_{i=1}^nm_i}\\
\Leftrightarrow
\Big(\frac{\sum_{i=1}^nm_ix_i}{\sum_{i=1}^nm_i}\Big)^p
&\leq \frac{\sum_{i=1}^nm_ix_i^p}{\sum_{i=1}^nm_i}\\
\Leftrightarrow
\Big(\sum_{i=1}^nm_ix_i\Big)^p
&\leq \Big(\sum_{i=1}^nm_i\Big)^{p-1}\sum_{i=1}^nm_ix_i^p\\
\Leftrightarrow
\sum_{i=1}^nm_ix_i
&\leq \Big(\sum_{i=1}^nm_i\Big)^{\frac{p-1}{p}}\Big(\sum_{i=1}^nm_ix_i^p\Big)^{1/p}\quad \dots (*)\\
\end{align*}
$$

Taking $${m_i:=b_i^q, x_i=a_ib_i^{1-q}}$$, then $${m_ix_i=a_ib_i,m_ix_i^p=a_i^p}$$, and as $${\frac{p-1}{p}=1-\frac{1}{p}=\frac{1}{q}}$$, $${(*)}$$ becomes

$$
\sum_{i=1}^n a_ib_i\leq \Big(\sum_{i=1}^nb_i^q\Big)^{1/q}\Big(\sum_{i=1}^na_i^p\Big)^{1/p}
$$

as desired.

□

5. Applications to problem solving

Problem 5.1. Let $${A, B, C}$$ be the internal angles of a triangle; i.e., $${A, B, C>0}$$ and $${A+B+C=\pi}$$. Show that

$${\sin A+\sin B+\sin C \leq \frac{3\sqrt{3}}{2}}$$
$${\cos \frac{A}{2}+\cos \frac{B}{2}+\cos \frac{C}{2} \leq \frac{3\sqrt{3}}{2}}$$
$${\tan \frac{A}{2}+\tan \frac{B}{2}+\tan\frac{C}{2} \geq \sqrt{3}}$$

Solution.
1. Since $${0< A, B, C<\pi}$$, consider the function $${f(x)=-\sin x, x\in (0,\pi)}$$ which is convex, since $${f'(x)=-\cos x, f''(x)=\sin x>0}$$. Thus by the Jensen inequality, we have

$$
\begin{align*}
f\Big(\frac{A+B+C}{3}\Big)&\leq \frac{f(A)+f(B)+f(C)}{3}\\
\Leftrightarrow
-\sin\Big(\frac{A+B+C}{3}\Big)&\leq -\frac{\sin A+\sin B+\sin C}{3}\\
\Leftrightarrow
\sin A+\sin B+\sin C&\leq 3\sin\Big(\frac{A+B+C}{3}\Big)=3\sin\frac{\pi}{3}=\frac{3\sqrt{3}}{2}\\
\end{align*}
$$

2. Since $${0< A, B, C<\pi}$$, then $${0 < A/2, B/2, C/2<\pi/2}$$. We consider the function $${f(x)=-\cos x, x\in (0,\pi/2)}$$ which is convex, since $${f'(x)=\sin x, f''(x)=\cos x>0}$$. Thus by the Jensen inequality, we have

$$
\begin{align*}
f\Big(\frac{A/2+B/2+C/2}{3}\Big)&\leq \frac{f(A/2)+f(B/2)+f(C/2)}{3}\\
\Leftrightarrow
-\cos\Big(\frac{A+B+C}{6}\Big)&\leq -\frac{\cos(A/2)+\cos (B/2)+\cos (C/2)}{3}\\
\Leftrightarrow
\cos \frac{A}{2}+\cos \frac{B}{2}+\cos \frac{C}{2}&\leq 3\cos\Big(\frac{A+B+C}{6}\Big)=3\cos\frac{\pi}{6}=\frac{3\sqrt{3}}{2}\\
\end{align*}
$$

3. Similarly, consider the function $${f(x)=\tan x, x\in (0,\pi/2)}$$ which is convex, since $${f'(x)=1+\tan^2 x, f''(x)=2(1+\tan^2)\tan x>0}$$. Thus by the Jensen inequality, we have

$$
\begin{align*}
f\Big(\frac{A/2+B/2+C/2}{3}\Big)&\leq \frac{f(A/2)+f(B/2)+f(C/2)}{3}\\
\Leftrightarrow
\tan\Big(\frac{A+B+C}{6}\Big)&\leq \frac{\tan(A/2)+\tan (B/2)+\tan (C/2)}{3}\\
\Leftrightarrow
\tan \frac{A}{2}+\tan \frac{B}{2}+\tan \frac{C}{2}&\geq 3\cos\Big(\frac{A+B+C}{6}\Big)=3\tan\frac{\pi}{6}=\sqrt{3}\\
\end{align*}
$$

Remark. The reason the angles are halved in problem 2 & 3 is because the function $${f(x)=\cos x, \tan x}$$ change sign in $${(0,\pi)}$$, and we do not have convexity in the whole interval $${(0,\pi)}$$, but only in $${(0,\pi/2)}$$.

Problem 5.2. Show that the triangle of maximum area that can be inscribed in any given circle is an equilateral triangle.

Solution. Write $${A, B, C}$$ be the internal angles of a triangle $${ABC}$$. Then its area is given by

$$
S_{\Delta ABC}=\frac{1}{2}AB \times AC\sin A
$$

By the sine law, we know that

$$
\frac{BC}{\sin A}=\frac{AC}{\sin B}=\frac{AB}{\sin C}=2R
$$

where $${R}$$ is the radius of the circumscribed circle. So

$$
AB=2R\sin C, AC=2R\sin B.
$$

Thus

$$
S_{\Delta ABC}=2R^2\sin A\sin B\sin C.
$$

By AM-GM inequality, we have that

$$
\begin{align*}
\sin A\sin B\sin C\stackrel{(1)}{\leq} \Big( \frac{\sin A+\sin B+\sin C}{3}\Big)^3
\end{align*}
$$

and from Problem 5.1. we have proved that $${\sin A+\sin B+\sin C \stackrel{(2)}{\leq} \frac{3\sqrt{3}}{2}}$$, so together we have

$$
S_{\Delta ABC}\leq 2R^2\Big(\frac{3\sqrt{3}/2}{3}\Big)^3=\frac{3\sqrt{3}}{4}R^2.
$$

$${S_{\Delta ABC}}$$ reaches its maximum $${\frac{3\sqrt{3}}{4}R^2}$$ when $${(1),(2)}$$ become equality. This happens if and only if $${A=B=C=\pi/3}$$; i.e., the triangle $${ABC}$$ is equilateral.