Skip to content

Commit

Permalink
typos, exercises Ch 1,2.
Browse files Browse the repository at this point in the history
  • Loading branch information
santoshv committed Jan 24, 2023
1 parent 09f1e9c commit 62d559c
Show file tree
Hide file tree
Showing 5 changed files with 147 additions and 48 deletions.
95 changes: 82 additions & 13 deletions convexity.lyx
Original file line number Diff line number Diff line change
Expand Up @@ -941,7 +941,7 @@ K=\bigcap_{\theta\in\Rn}\left\{ x:\ \left\langle \theta,x\right\rangle \leq\max_

\end_inset

In other words, any closed convex set is a limit of a sequence of polyhedra.
In other words, any closed convex set is the limit of a sequence of polyhedra.
\end_layout

\begin_layout Proof
Expand Down Expand Up @@ -1027,6 +1027,24 @@ where the set
.
\end_layout

\begin_layout Exercise
Let
\begin_inset Formula $A,B\subset\R^{n}$
\end_inset

be nonempty disjoint closed convex sets.
Then there exists a vector
\begin_inset Formula $v\in\R^{n}$
\end_inset

such that
\begin_inset Formula $\sup_{x\in A}v^{\top}x<\inf_{x\in B}v^{\top}x$
\end_inset

.

\end_layout

\begin_layout Standard
Similar to convex sets, we have a separation theorem similar to Theorem

Expand All @@ -1037,15 +1055,15 @@ reference "thm:sep"
\end_inset

for convex functions.
This shows that one can use binary search to find minimum of convex functions.
(See Chapter
In Chapter
\begin_inset CommandInset ref
LatexCommand ref
reference "chap:Elimination"

\end_inset

.)
, we will see that this allows us to use binary search to minimize convex
functions.
\end_layout

\begin_layout Theorem
Expand Down Expand Up @@ -1275,19 +1293,19 @@ Minimum Cost Flow Problem (Computer Science)
\begin_layout Standard
The min cost flow problem has lots of applications such as route planning,
airline scheduling, image segmentation, recommendation systems, etc.
In this problem, we are given a graph
In this problem, there is a graph
\begin_inset Formula $G=(V,E)$
\end_inset

with
\begin_inset Formula $m\defeq|E|$
\begin_inset Formula $m=|E|$
\end_inset

edges and
\begin_inset Formula $n\defeq|V|$
and
\begin_inset Formula $n=|V|$
\end_inset

vertices.
.
Each edge
\begin_inset Formula $e\in E$
\end_inset
Expand All @@ -1314,7 +1332,56 @@ The min cost flow problem has lots of applications such as route planning,
\end_inset

.
Formally, the problem can be written as an optimization problem
To imagine this less abstractly, imagine we want to match every person
to the best flight for them.
Then we can have a source node
\begin_inset Formula $s$
\end_inset

connected to a node for each person with
\begin_inset Formula $u_{e}=1$
\end_inset

for all such
\begin_inset Formula $e$
\end_inset

.
Further, we take
\begin_inset Formula $t$
\end_inset

to be connected to a node for each flight, with
\begin_inset Formula $u_{e}$
\end_inset

being the number of people that can fit on that flight.
Then, all the remaining edges will be from people nodes to flight nodes.
For any such
\begin_inset Formula $e,u_{e}=1$
\end_inset

and
\begin_inset Formula $c_{e}$
\end_inset

is proportional to how good that flight is for that person (does it get
them where they need to go at the time they need to go?) with
\begin_inset Formula $0$
\end_inset

representing a perfect flight and
\begin_inset Formula $\infty$
\end_inset

representing a flight they would not take even if given the option for
free.
Then we can calculate the min-cost flow to find the best allocation of
people to flights.
\end_layout

\begin_layout Standard
Formally, the problem can be written as an optimization problem
\begin_inset Formula $\min_{f\in\R^{|E|}}\sum_{e\in E}c_{e}\cdot f_{e}$
\end_inset

Expand Down Expand Up @@ -1983,7 +2050,7 @@ Epigraph of
; quasiconvex function
\begin_inset CommandInset label
LatexCommand label
name "fig:rel-1-1"
name "fig:quasiconvex"

\end_inset

Expand Down Expand Up @@ -2040,7 +2107,7 @@ quasiconvex

\begin_inset CommandInset ref
LatexCommand ref
reference "fig:rel-1-1"
reference "fig:quasiconvex"
plural "false"
caps "false"
noprefix "false"
Expand Down Expand Up @@ -2700,7 +2767,7 @@ The standard definition of a convex function in terms of gradients requires
\end_layout

\begin_layout Section
Logconcave Functions
Logconcave functions
\begin_inset CommandInset label
LatexCommand label
name "sec:Logconcave-functions"
Expand Down Expand Up @@ -2747,6 +2814,8 @@ The indicator function of a convex set
\end_inset

is logconcave.
The Gaussian density function is logconcave.
The Gaussian density restricted to any convex set is logconcave.
\end_layout

\begin_layout Lemma
Expand Down
2 changes: 1 addition & 1 deletion equivalence.lyx
Original file line number Diff line number Diff line change
Expand Up @@ -741,7 +741,7 @@ The relationships among the four oracles for convex sets.

\begin_inset CommandInset label
LatexCommand label
name "fig:rel-1-1"
name "fig:oracles"

\end_inset

Expand Down
96 changes: 63 additions & 33 deletions gradient_descent.lyx
Original file line number Diff line number Diff line change
Expand Up @@ -1199,24 +1199,9 @@ Let
\end_layout

\begin_layout Standard
The proof idea involves showing the function value
\begin_inset Formula $f(x)$
\end_inset

decreases by at least
\begin_inset Formula $\frac{\epsilon^{2}}{2L}$
\end_inset

when
\begin_inset Formula $\|\nabla f(x)\|_{2}\geq\epsilon$
\end_inset

.
Since the function value can only decrease by at most
\begin_inset Formula $f(x^{(0)})-f(x^{*})$
\end_inset

, this bounds the number of iterations.
The next lemma shows that the function value must decrease along the GD
path for a sufficiently small step size, and the magnitude of the decrease
depends on the norm of the current gradient.
\end_layout

\begin_layout Lemma
Expand Down Expand Up @@ -1279,10 +1264,8 @@ for some
\end_layout

\begin_layout Standard
\begin_inset Separator plain
\end_inset


We can now prove the theorem.

\end_layout

\begin_layout Proof
Expand All @@ -1302,7 +1285,39 @@ reference "thm:gd_general"

\end_inset

Since each step of gradient descent decreases
We observe that either
\begin_inset Formula $\|\nabla f(x)\|_{2}\le\epsilon$
\end_inset

, or
\begin_inset Formula $\|\nabla f(x)\|_{2}\geq\epsilon$
\end_inset

and so by Lemma
\begin_inset CommandInset ref
LatexCommand ref
reference "lem:gradient_progress"
plural "false"
caps "false"
noprefix "false"

\end_inset

, the function value
\begin_inset Formula $f(x)$
\end_inset

decreases by at least
\begin_inset Formula $\frac{\epsilon^{2}}{2L}$
\end_inset

.
Since the function value can decrease by at most
\begin_inset Formula $f(x^{(0)})-f(x^{*})$
\end_inset

, this bounds the number of iterations — each step of gradient descent decreases

\begin_inset Formula $f$
\end_inset

Expand All @@ -1323,8 +1338,7 @@ Since each step of gradient descent decreases

\begin_layout Standard
Despite the simplicity of the algorithm and the proof, it is known that
this is the best one can do via any algorithm for this general setting

this is the best one can do via any algorithm in this general setting
\begin_inset CommandInset citation
LatexCommand cite
key "carmon2017lower"
Expand Down Expand Up @@ -2300,8 +2314,8 @@ Generalizing Gradient Descent*
\end_layout

\begin_layout Standard
Now, we study what properties gradient descent are bein used for the strongly
convex case.
Now, we study what properties of gradient descent are being used for the
strongly convex case.
There are many ways to generalize it.
One way is to view gradient descent is as approximating the function
\begin_inset Formula $f$
Expand Down Expand Up @@ -2355,7 +2369,7 @@ We say
\end_inset

and
\begin_inset Formula $h((1-\alpha)x+\lambda\widehat{x})\leq\lambda^{2}h(\widehat{x})$
\begin_inset Formula $h((1-\alpha)x+\alpha\widehat{x})\leq\alpha^{2}h(\widehat{x})$
\end_inset

for all
Expand Down Expand Up @@ -2555,7 +2569,7 @@ status open
\end_layout

\begin_layout Standard
Find a
Find an
\begin_inset Formula $\alpha$
\end_inset

Expand Down Expand Up @@ -2664,7 +2678,7 @@ Using the fact that
\begin_inset Formula $g^{(k)}+h^{(k)}$
\end_inset

is an upper bound of
is an upper bound on
\begin_inset Formula $f$
\end_inset

Expand All @@ -2684,7 +2698,7 @@ f(x^{(k+1)})\leq\min_{y}g^{(k)}(y)+h^{(k)}(y).

\end_inset

To bound the best possible progress, we consider
To bound the best possible progress, i.e., the RHS above, we consider
\begin_inset Formula $\widehat{x}=\arg\min_{y}g^{(k)}(y)+\alpha h^{(k)}(y)$
\end_inset

Expand Down Expand Up @@ -2723,11 +2737,11 @@ where we used
\end_layout

\begin_layout Proof
Combining both and using
Combining both and using the fact that
\begin_inset Formula $g^{(k)}+\alpha h^{(k)}$
\end_inset

is a lower bound of
is a lower bound on
\begin_inset Formula $f$
\end_inset

Expand All @@ -2754,6 +2768,22 @@ reference "thm:gd_general_apx"
Here we list some of them.
\end_layout

\begin_layout Exercise
Show that the second condition in the definition of
\begin_inset Formula $\alpha$
\end_inset

-approximation can be replaced by
\begin_inset Formula
\[
g(y)+\alpha h(y/\alpha)\leq f(y)\leq g(y)+h(y)
\]

\end_inset

while maintaining the guarantee for the convergence of generalized GD.
\end_layout

\begin_layout Subsubsection*
Projected Gradient Descent / Proximal Gradient Descent
\end_layout
Expand Down
Binary file modified main.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion preliminaries.lyx
Original file line number Diff line number Diff line change
Expand Up @@ -402,7 +402,7 @@ A real symmetric matrix

\end_layout

\begin_layout Definition*
\begin_layout Definition
For any matrix
\begin_inset Formula $A$
\end_inset
Expand Down

0 comments on commit 62d559c

Please sign in to comment.