Section 3.3 treats max-min theory for functions of several variables. It mostly considers functions of two variables. In this supplement we discuss how to handle functions of any number of variables. We first review some terminology.
Consider a function
and a point
,
where U is an open subset of
Rn.
We say that f has a relative maximum at
a if there is
a number
such that
for all
x such
that
.
It has a strict relative maximum
at
a if there is a number
such that
f(x)< f(a)
for all
x such that
.
Replacing
by
and
f(x)<(a) by
f(x)> f(a) in these
statements gives the definitions of relative minimum
and strict relative minimum. The word ``local'' is often used in
place of the term ``relative.'' The word ``extremum'' is used to mean
``maximum or minimum''.
We say that
a is a critical point of f if
.
If f has a local extremum at
a, then
a must be a critical point of f.
It is not true if
a is a critical point of f, then f must have a local
extremum at
a. A critical point of f at which f does not have a local
extremum is called a saddle point of f.
Recall that the second order Taylor polynomial of f at a is
,
so that the matrix
product
ht S h is just the dot product of
h and the vector
Sh. Note that if f is a ``nice'' function (the only kind we will be considering),
then the mixed second partials are equal, and so S is a symmetric matrix.
Taylor's formula says that
f(x)=P2(x,a)+R2(x,a), where
.
At a critical point
,
and so the middle term of
P2(x,a)
goes away. If, in addition,
,
then
a is called a non-degenerate
critical point. It can be shown in this case that f and P2 have the
same behavior at
a as far as relative extrema and saddle points are concerned.
If one of them has a relative maximum, relative minimum, or saddle point, then the other
also has, respectively, a relative maximum, relative minimum, or saddle point.
Moreover, if there is a relative extremum at
a, it will be a strict relative
extremum.
So we now have to analyze the extrema of
.
Note that since
h=x-a,
f(a) is a constant, and
is a positive
constant,
P2(x,a) will have a strict local max, strict local min, or saddle point
at
a exactly when the function
q(h)=htSh has a strict local max, strict local min,
or saddle point, respectively, at
.
The function
q(h)=htSh is called a quadratic form. In general, working out
the matrix multiplications we see that
,
then
Note that for a quadratic form q we always have
q()=0.
A quadratic form q is called positive definite if
q(h)>0 for all
.
Note that this implies that q has
a local minimum at
. We say that q is negative definite
if
q(h)<0 whenever
.
Note that this implies that q has
a local maximum at
. If there are vectors
v and
w
such that
q(v)>0 and
q(w)<0, then q is said to be indefinite.
Note that this implies that q has a saddle point at
.
As we will see, any quadratic form
q(h)=htSh with
falls into one of these three categories, and there is a procedure for telling which
one it falls into. From now on we will be assuming that
.
The key observation is that if we make a change of
variables by setting
h=Qk, where Q is an
matrix, then we get
a new quadratic form
p(k)=(Qk)tSQk=ktQtSQk=kt Vk. We show that if
,
then this new form has the same kind of behavior at
as the old form.
First note that since
,
we have that Q is invertible. This implies that
k=Q-1h,
and so
h= if and only if
k=. Note also that
q(h)=p(k). Now suppose
that q is positive definite and that
.
Then
,
and so
p(k)=q(h)>0.
Thus p is positive definite. A similar argument shows that if p is positive definite,
then so is q. We leave it as an exercise to show that q is negative definite if and only
if p is negative definite, and that q is indefinite if and only if p is indefinite.
The advantage to this observation is that if we choose Q wisely
the new form may be simpler and easier to classify. For example, let
.
Then
Note that
p(k) is obviously positive
whenever
and so is positive definite. Since
,
our original quadratic form will also be positive definite.
If the matrix of a quadratic form is diagonal, as with the new form we
obtained above, then it is easy to classify
the form. Let
be the diagonal entries. Then the form
can be expressed as
.
It will be positive definite
if and only if all the
are positive. It will be negative definite if
and only if all the
are negative. It will be indefinite if and only if
some of the
are positive and some of the
are negative.
So, if we can always find a matrix Q with
such that QtSQ
is a diagonal matrix D, then we can classify the quadratic form q and
thus we can classify the critical point
a. To describe how to find Q we
need to review row and column operations and elementary matrices. There are three
elementary row operations:
Replace the ith row by the sum of the ith row
and c times the jth row.
Replace the ith row by c times the ith row, where
.
Interchange the ith and jth rows.
If you do one of these elementary operations on the identity matrix I you
get a matrix E which is called an elementary matrix. The basic fact about elementary
matrices is that if you do this row operation on a matrix A and get the matrix B
as a result, then B=EA. Note that E multiplies A on the left. Note that
elementary matrices always have non-zero determinants.
There are, similarly, three column operations
,
for
,
and
.
If you do one of these column operations on I you obtain
a matrix F. The basic fact here is that if you do the column operation on a matrix M and
get the result N, then N=MF. Note that F multiplies M on the right.
It turns out that F is an elementary matrix, and in fact it is the transpose of the elementary
matrix associated with the corresponding row operation.
Look back at our example, where we started with
.
The row operation
transforms S into
.
It also transforms
into
.
You can check that ES=T.
If we now do the corresponding column operation
on T
we get the matrix
.
Doing this column operation on
gives the matrix
.
You can check that TF=D. Note that E and F are transposes of each other, so we
have FtSF=D.
In general we will need several row and column operations to get a diagonal matrix. Here is
the general pattern. Start in the upper left hand corner of the given symmetric matrix S.
Suppose that entry is non-zero. Call it our pivot. Do a sequence of row operations of the
form
to turn all the entries below the pivot into zeros. Then do the corresponding
column operations
to turn all the entries to the right of the pivot into zeros.
This creates the first row and column of our diagonal matrix D. Next move on to
the matrix obtained by deleting the first row and column and continue in this fashion
until you get D.
Suppose the upper left entry of our original matrix had been zero.
If some entry on the main diagonal, say si,i, is
non-zero, then do
followed by
.
This will
interchange our zero entry with the non-zero entry si,i. Then proceed as above.
If there are no non-zero entries on the main diagonal, then look for a non-zero entry in the first
column, say si,1. Do
followed by
.
This will replace our zero entry by 2si,1. Then proceed as above. If there are no
non-zero entries in the first column, then the first row and column are already those of
a diagonal matrix and we move on to the next iteration. (Note that if
,
then
this scenario cannot happen.) Now note that our row operations give us elementary matrices
and our column operations give us elementary matrices
,
where Ej and Fj are transposes of each other. Thus we have
Let's try this on the matrix
.
First we do
to get
.
Then
to get
.
Then
and
to get
.
Then
and
to get
.
Next we do
to get
.
Finally we do
to get
.
Thus our new form is
,
which is clearly indefinite.
If we want to find
Q=F1F2F3F4, then we can avoid doing matrix multiplications or
column operations as follows.
First do the sequence of row operations above on I. (Do not do
any of the column operations.) This will give the product
E4E3E2E1.
We start with
.
First we do
to get
.
Then
and
to get
.
Finally we do
to get
.
This is Qt. To get Q we just take the transpose to get
.
Note that if all you want to do is classify the quadratic form, then you do not have to find Q.
What would have happened if we had done a different sequence of row operations?
You can check that if we had done
,
,
,
and the corresponding
column operations, we would have gotten
and
.
Thus D and Q are not unique. However, it can be shown that in D we always have the same number of positive entries, the same number of negative entries, and the same number of zero entries.
There is another method for classifying quadratic forms which for small values of n may
sometimes be easier than diagonalizing the form (for large values of n it is usually
not easier). Given a symmetric matrix S, let Sk be the
submatrix obtained by deleting the last n-k rows and columns of S; thus S is the
submatrix in the upper left corner of S. Let
.
Note that d1 is just the entry in the upper left hand corner of S, while since
Sn=S we have that
.
We consider the signs of the dk.
First consider the case of a diagonal matrix D with diagonal entries
.
Then each Dk is a diagonal matrix with diagonal entries
.
Now the determinant
of a diagonal matrix is just the product of the entries on the diagonal. Thus in this case
,
,
,
,
.
If all of the
are positive, then clearly all of the di are positive. A little
thought shows that the converse is also true: if all the di are positive, then all the
must be positive. The equation
shows that
.
Then the equation
shows that since d2>0 and
,
we must have
,
etc. Thus our form is positive definite if and only if all the di are positive.
Now suppose that all of the
are negative. Then
,
,
,
etc. So the di alternate in sign, beginning with the negative
sign. Again, a little thought shows that if we have d1<0, d2>0, d3<0, etc. then
all of the
are negative. Thus our form is negative definite if and only if we
have this pattern for the dk.
We now consider the case of an arbitrary symmetric matrix S.
Suppose we diagonalize it, so we have D=QtSQ. Then
.
Since
we have
,
and
so
and
have the same sign.
Now suppose that the form given by S is positive definite. Then D is positive definite,
so all its diagonal entries
are positive, hence
,
hence
.
We now observe that the matrix Sk gives a quadratic form qk on
Rk; if we take
a vector
in
Rk, then the value of qk on this vector is
the same as that of q on the vector
in
Rn,
where we have padded out the original vector with n-k zeroes. It follows that qk is also
positive definite. By the argument given above we must have that the determinant dk of
Sk is positive. So we have shown that all the dk are positive.
Conversely, suppose that all the dk are positive. Think about how we do row and column
operations to obtain D. Since
,
we will be doing operations of the form
and
to produce zeroes below and to
the right of s1,1. Note that the first diagonal entry of D is
.
Row and column operations of this type do not change the determinants of any of
the Sk, so all of our dk will be the same as before. Now our new S2 has the form
,
so we have d2=d1 s. Since
we must have
.
This means that when we go to work on our next round of row and
column operations they will be of the form
and
.
The second diagonal entry of D is
.
Since d1 and d2 are positive
and d2=d1 s, we must have that
is positive. Continuing in this fashion we get
that all of the
are positive, and so the form is positive definite.
Thus we have shown that the form is positive definite if and only if all of the di are
positive. We now show that the form is negative definite if and only if we have d1<0,
d2>0, d3<0, etc. The basic observation here is that the form q is negative definite
if and only if the form -q is positive definite. If q has matrix S, then -q has
matrix -S. The kth submatrix of -S is -Sk, where Sk is the kth submatrix
of S. Let
.
Well,
.
(You pull a -1 out of each
of the k rows of -Sk.) Thus
ek=(-1)k dk; we have e1=-d1, e2=d2, e3=-d3,
etc. So, if q is negative definite, then -q is positive definite, so all the ek
are positive, so
d1=-e1<0, d2=e2>0,
d3=-e3<0, etc. This line of reasoning can
be reversed, so if the di alternate in sign with d1 negative, then q is negative
definite.
So, we now know exactly when q is positive definite or negative definite. So if the di follow any other pattern, then q cannot be positive definite or negative definite, and so must be indefinite.
Let's look back at the example
.
We have d1=0,
,
.
This does not fit the pattern +++ or -+-, so the form is
indefinite.
Now let's go back to the problem of classifying the critical points of a
function
.
Suppose
.
Then
,
where S is the matrix whose i-jth entry is
fxixj(a). Recall that the
critical point
a is non-degenerate if and only if
.
We have seen that
classifying a non-degenerate critical point is equivalent to classifying the quadratic
form
q(h)=htSh. Note that since
we can work either with q and
its matrix S or with the form
and its matrix
.
Which one
we choose depends on which way the arithmetic is easier.
In this section we illustrate both the diagonalization and determinant methods for classifying
the quadratic forms and thus classifying non-degenerate critical points.
fx=2x-4-2y, fy=4y-2x+10+6z, and fz=22z+22+6y. Setting each of these equal to zero and rearranging a bit gives the system of equations
This system has augmented matrix
Doing the row operations
and then
puts it
into the row echelon form
We now classify this critical point. fxx=2, fyy=4, fzz=22,
fxy=fyx=-2,
fxz=fzx=0, and
fyz=fzy=6. Note that these all happen to be constants. If they
had not been constants, we would have had to plug in the values x=2, y=0, and z=-1 to
obtain the values of these derivatives at
a before going on. We next have
Here is the diagonalization method: We do
to get
Here is the determinant method:
,
,
and
.
Since we have the pattern +++ we know that the form is positive definite, and hence
that f has a local minimum at (2,0,-1).
fx=2y+2z-2x, fy=2x-4y, and fz=2x-6z. Setting each of these equal to zero and rearranging a bit gives the system of equations
This system has augmented matrix
Doing the row operations
and
,
and
then
puts it into the row echelon form
We now classify this critical point. fxx=-2, fyy=-4, fzz=-6,
fxy=fyx=2,
fxz=fzx=2, and
fyz=fzy=0. Note that these all happen to be constants. If they
had not been constants, we would have had to plug in the values x=0, y=0, and z=0 to
obtain the values of these derivatives at
a before going on. We next have
Here is the diagonalization method: We do
and
to get
Here is the determinant method:
,
,
and
.
Since we have the pattern -+- we know that the form is negative definite, and hence
that f has a local maximum at (0,0,0).
fx=2x+2z, fy=2y+4z, and fz=6z+2x+4y. Setting each of these equal to zero and rearranging a bit gives the system of equations
This system has augmented matrix
Doing the row operations
and then
puts it
into the row echelon form
We now classify this critical point. fxx=2, fyy=2, fzz=6,
fxy=fyx=0,
fxz=fzx=2, and
fyz=fzy=4. Note that these all happen to be constants. If they
had not been constants, we would have had to plug in the values x=0, y=0, and z=0 to
obtain the values of these derivatives at
a before going on. We next have
Here is the diagonalization method: We do
to get
Here is the determinant method:
,
,
and
.
Since we do not have the pattern +++ or -+- we know that the form is indefinite, and hence
that f has a saddle point at (0,0,0).
All this may seem vaguely reminiscent of using eigenvalues and eigenvectors to
diagonalize an
matrix A. We briefly review the main points; see your
linear algebra book for more details. Recall that the eigenvalues are the solutions of the
characteristic equation
.
For each eigenvalue
you find a basis
for the eigenspace
.
Put all these basis vectors together.
You will have at most n of them.
If you have n vectors, then A is diagonalizable. You let P be the
matrix
whose columns are these n vectors. Then
P-1AP=D, a diagonal matrix whose diagonal
entries are the eigenvalues corresponding to the eigenvectors. If you have fewer than n
vectors, then A is not diagonalizable.
Notice that
P-1AP=D looks sort of like QtSQ=D. Note the differences: S is symmetric.
In general A need not be symmetric. In general the inverse of a matrix is not equal to its
transpose. A matrix R is called orthogonal if
R-1=R t. This is equivalent to
saying that the columns of R form an orthonormal set (the dot product of two different
vectors is 0 while the dot product of a vector with itself is 1); it is also equivalent to
saying that the rows of R form an orthonormal set. If there is an orthogonal matrix R
such that
R-1AR=D, a diagonal matrix, then we say that A is
orthogonally diagonalizable. Note that in this case
A=RDR-1=RDR t and Dt=D, so
At=(RDR t)t=(R t)tDtR t=RDR t=A. Thus if A is orthogonally diagonalizable, it must be
symmetric. The converse of this statement is also true; if A is symmetric, then it is
orthogonally diagonalizable. The theory guarantees that you will always have n vectors and
that vectors in different
's will have dot product equal to zero. You apply the
Gram-Schmidt process to each
to get a new basis which is an orthonormal set.
You then let R be the matrix whose columns are these n vectors. Then
RtAR=R-1AR=D.
In this case the diagonal entries
of D are the eigenvalues
of A.
Let's look at the example
.
The characteristic equation is
.
The eigenvalues are
and
.
The basis vectors
for the two eigenspaces are, respectively,
and
.
Applying Gram-Schmidt to each of these gives
and
.
So our matrix
,
while
.
Note that this is different from (and more complicated than) the matrices
and
that we obtained using row and column operations.
Starting from scratch, orthogonal diagonalization is probably the last method you would want to use to classify the form. But it does show that if you happen to know the eigenvalues of S, then you can classify the form. If they are all positive, the form is positive definite. If they are all negative, the form is negative definite. If some are positive and some are negative, then it is indefinite.