title: Advanced Mathematics
subtitle: DLMDSAM01
Author: Dr. Robert Graf
publisher: IU International University of Applied Sciences
year: 2023

Unit 5 Matrices and Vector Spaces

Quick overview of our learning goals:

Introduction

Many problems can be restructured with matrices and solved in a systematic way. Tensors formalize and extend concepts of scalars, vectors, and matrices. Recommends additional readings.

5.1 Basic Matrix Algebra

Vectors are scalars with direction. They can be decomposed into basic components and represented as such. Matrices are rectangular arrays of numbers. They are represented by their row and column numbers. An n×mn\times m matrix has nn rows (yy) and mm columns (xx).

Matrices are a convenient way to think of and perform operations on vectors. So, matrices are the concept, and vectors are the application.

Calculating with Matrices

Notation - Matrices:

Rules of Calculating with Matrices:

Example of matrix multiplication:

An×mBm×p=Cn×pAB=C    cij=kaikbkj\begin{gather*} A_{n\times m} B_{m \times p} = C_{n \times p}\\ AB=C \iff c_{ij} = \sum_k a_{ik}b_{kj} \end{gather*}

Matrix multiplication is distributive (A(B+C)=AB+AC)(A(B+C) = AB+AC) but it is not commutative (ABBA)(AB \neq BA).

Definition - Commutator: A commutator is the quantity [A,B]ABBA[A,B] \equiv AB-BA. It’s an important concept in fields like quantum mechanics.

EXAMPLE

We are going to do this matrix multiplication on two matrices and compare them front to back. In order for this, they must be square matrices.

A=[123223323], B=[456551616]AB=[123223323][456551616]=[(14+25+36)(15+25+31)(16+21+36)(24+25+36)(25+25+31)(26+21+36)(34+26+36)(35+25+31)(36+21+36)]\begin{gather*} A = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 2 & 3 \\ 3 & 2 & 3 \\ \end{bmatrix},\ B = \begin{bmatrix} 4 & 5 & 6 \\ 5 & 5 & 1 \\ 6 & 1 & 6 \\ \end{bmatrix}\\ AB = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 2 & 3 \\ 3 & 2 & 3 \\ \end{bmatrix} \begin{bmatrix} 4 & 5 & 6 \\ 5 & 5 & 1 \\ 6 & 1 & 6 \\ \end{bmatrix}\\= \begin{bmatrix} (1*4 + 2*5 + 3*6) & (1*5 + 2*5 + 3*1) & (1*6 + 2*1 + 3*6) \\ (2*4+2*5+3*6) & (2*5+2*5+3*1) & (2*6+2*1+3*6) \\ (3*4+2*6+3*6) & (3*5+2*5+3*1) & (3*6+2*1+3*6) \\ \end{bmatrix} \end{gather*}

I wrote the sum of each row in matrix AA, then multiplied when writing each column of BB.

I won’t do it for sake of time, left as an exercise to the reader, but find BABA and note they should be different.

5.2 Determinant, Trace, Transpose, Complex Conjugate, and Hermitian Conjugate

p. 109

Matrix Transpose

The transpose of a matrix basically swaps the row and column arrangement of each element. So if aijAa_{ij} \in A then ajiATa_{ji} \in A^T. It’s quite an important concept, so give it a go.

EXAMPLE

find the transpose of

A=[123456]A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ \end{bmatrix}

An×mA_{n\times m} will become Am×nA_{m\times n} as each aijajia_{ij} \to a_{ji}.

AT=[142536]A^T= \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}

Hopefully you got the same \Box.

We also have a fun property:

(AB)T=BTAT(AB)^T = B^TA^T

Consider that matrix multiplication requires neighbouring dimensions to be the same. This is why the reverse order is necessary.

Complex Conjugate

Definition - Complex Conjugate: For a matrix with complex number entries, (a±bi)(a \pm bi), the complex conjugate of a matrix AA^* can be found by taking the complex conjugate of each entry of AA.

(a)ij=(aij)(a^*)_{ij} = (a_{ij})^*

Additionally, the complex conjugate of (a±bi)(a \pm bi)^* is just (abi)(a \mp bi).

That’s great but what does it actually mean? You flip the sign of the complex part.

EXAMPLE

Find the complex conjugate of the following matrix

A=[12i3+i4]A = \begin{bmatrix} 1 & 2i \\ 3 + i & 4 \end{bmatrix}

Remember, just flip the complex signs…

A=[12i3i4]A^*= \begin{bmatrix} 1 & -2i \\ 3 - i & 4 \end{bmatrix}

Notice that in the form (a+bi)(a+bi), the aa remains unchanged, and the bibi is negated \Box

Hermitian Conjugate

Definition - Hermitian Conjugate: The Hermitian conjugate of a matrix AA is the transpose of the complex conjugate, and denoted AA^\dagger. Funny enough, it is just the transpose of a matrix if there is no complex parts.

EXAMPLE

find the Hermitian conjugate of

A=[112i3i45+i6]A = \begin{bmatrix} 1 & 1-2i & 3i \\ 4 & 5+i & 6 \\ \end{bmatrix}

It is both a transpose and complex conjugate

A=[141+2i5i3i6]A^{\dagger}= \begin{bmatrix} 1 & 4 \\ 1+2i & 5-i \\ -3i & 6 \end{bmatrix}

Hopefully you got the same \Box

At this point, we are wondering why is any of this important? We usually represent vectors as column matrices

a=(a1a2an), and b=(b1b2bn)\vec{a} = \begin{pmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{pmatrix},\ \text{and} \ \vec{b} = \begin{pmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{pmatrix}

Because neighbouring dimensions are not the same, you can only add them or find a dot product. However, if you take the Hermitian conjugate of a\vec{a}, resulting in a row matrix, you can perform matrix multiplication with b\vec{b}.

(a1a2an)(b1b2bn)=i=1Naibi=ab\begin{pmatrix} a_1^* & a_2^* & \cdots & a_n^* \end{pmatrix} \begin{pmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{pmatrix} = \sum_{i=1}^N a_i^*b_i = \vec{a}^{\dagger}\vec{b}

This is also the inner product a  b\langle \vec{a}\ |\ \vec{b} \rangle. Again, if we are dealing with real numbers, the Hermitian conjugate becomes a transpose and the inner product becomes the dot product.

Trace of a Matrix

Definition - Trace of a Matrix: The trace of a matrix is a property of square matrices. It is the sum of the matrices’ diagonal elements

Tr(A)=a11+a22++ann=i=1naiiTr(A) = a_{11} + a_{22} + \cdots + a_{nn} = \sum_{i=1}^n a_{ii}

The value goes from a matrix to a scalar, so it has extra properties:

Tr(A±B)=Tr(A)±Tr(B)Tr(A \pm B) = Tr(A) \pm Tr(B)

and

Tr(AB)=Tr(BA)Tr(AB) = Tr(BA)

Example

Might seem trivial, but find the trace of the following

A=[123456789]A = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6\\ 7 & 8 & 9\\ \end{bmatrix}

Ok, so simply put

Tr(A)=1+5+9=15Tr(A) = 1+5+9=15

Determinant of a Matrix

Supported Functions · KaTeX

Definition - Permutation: A simple definition of permutation is one of several ways that a number of items can be arranged and different arrangements of the same items is a different permutation. That is, order matters.

Permutations is probably better left defined and explained in notes for sets and counting.

Definition - Determinant of a Matrix: The determinant is a single number (scalar) representation of a square matrix. Its notation looks like we are taking the absolute value of a matrix, but it is the determinant

det(A)=A=a11a1nan1ann\text{det}(A) = |A| = \begin{vmatrix} a_{11} & \cdots & a_{1n}\\ \vdots & \ddots & \vdots\\ a_{n1} & \cdots & a_{nn} \end{vmatrix}

A weird formula for the determinant is

det(A)=P[αβω]ϵαβωa1αa2βanω\text{det}(A) = \sum_{P[\alpha \beta \dots \omega]}\epsilon_{\alpha \beta \dots \omega}a_{1 \alpha}a_{2 \beta}\dots a_{n \omega}

The book “Math for Machine Learning”, section 4.1 really dives into determinants.

Anyway, what is all of this? Staring with ϵ\epsilon, that is called the anti-symmetric tensor, and is basically either ±1\pm 1.

ϵαβω={+1for even permutations of 1,,n1for odd permutations of 1,,n0if 2 indices are the same\epsilon_{\alpha \beta \dots \omega} = \begin{cases} +1 & \text{for even permutations of } 1,\dots,n\\ -1 & \text{for odd permutations of } 1,\dots,n\\ 0 & \text{if 2 indices are the same} \end{cases}

Start small with an example of a 2×22 \times 2 matrix

Example

Finding the determinant of an arbitrary 2×22 \times 2 matrix

A=a11a12a21a22=ϵ12a11a22+ϵ21a12a21=(+1)a11a22+(1)a12a21=a11a22a12a21\begin{align*} |A| &= \begin{vmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{vmatrix} \\ &= \epsilon_{12}a_{11}a_{22} + \epsilon_{21}a_{12}a_{21}\\ &= (+1)a_{11}a_{22} + (-1)a_{12}a_{21}\\ &= a_{11}a_{22} - a_{12}a_{21}\\ \end{align*}

For a 2×22 \times 2 matrix, the determinant becomes the product of the diagonal elements less the product of the off-diagonal elements.

Now, we move to a 3×33\times 3 matrix.

EXAMPLE

find the determinant of an arbitrary 3×33 \times 3 matrix

A=a11a12a13a21a22a23a31a32a33=a11a22a33a11a23a32  +a12a23a31a12a21a33  +a13a21a32a13a22a31\begin{align*} |A| &= \begin{vmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \\ \end{vmatrix} \\ &= a_{11}a_{22}a_{33} - a_{11}a_{23}a_{32}\\ &\ \ + a_{12}a_{23}a_{31} - a_{12}a_{21}a_{33}\\ &\ \ + a_{13}a_{21}a_{32} - a_{13}a_{22}a_{31} \end{align*}

We already cut out the ϵ\epsilon reference because it’s not really the easiest way to think about it. So, consider the Laplace Expansion, aka cofactor Expansion

Cij=(1)i+jMijC_{ij} = (-1)^{i+j}M_{ij}

Check out the following for even more information!

Taboga, Marco (2021). “Determinant of a matrix”, Lectures on matrix algebra. https://www.statlect.com/matrix-algebra/determinant-of-a-matrix. Taboga, Marco (2021). “The Laplace expansion, minors, cofactors and adjoints”, Lectures on matrix algebra. https://www.statlect.com/matrix-algebra/Laplace-expansion-minors-cofactors-adjoints.

We let MijM_{ij} be the minor determinant of matrix of size n1×n1n-1 \times n-1, which you get by removing all the elements in the ithi^{th} row and jthj^{th} column. And yes, that means the row ×\times column, which is still a little confusing to me because matrices are backwards to cartesian coordinates, and then this is frontwards again…

Per statlect.com

Proposition: Let AA be a K×KK \times K matrix K2\forall K \ge 2. We let CijC_{ij} be the cofactor, and aija_{ij} be the entry at that postion. For any row ii, the following row expansion holds

det(A)=j=1KaijCij\text{det}(A) = \sum_{j=1}^K a_{ij}C_{ij}

Or, if you feel frisky, it holds for any column jj as well, cycling through the ii’s instead

det(A)=i=1KaijCij\text{det}(A) = \sum_{i=1}^K a_{ij}C_{ij}

How can it be so? The statlect.com resource has a proof as well.

Let’s give it a go, let i=1i=1

C11=(1)1+1(a22a33a23a32)=a22a33a23a32C12=(1)1+2(a21a33a23a31)=a23a31a21a33C13=(1)1+3(a21a33a23a31)=a23a31a21a33\begin{align*} C_{11} &= (-1)^{1+1}(a_{22}*a_{33}-a_{23}*a_{32}) = a_{22}*a_{33}-a_{23}*a_{32}\\ C_{12} &= (-1)^{1+2}(a_{21}*a_{33}-a_{23}*a_{31}) = a_{23}*a_{31}-a_{21}*a_{33}\\ C_{13} &= (-1)^{1+3}(a_{21}*a_{33}-a_{23}*a_{31}) = a_{23}*a_{31}-a_{21}*a_{33}\\ \end{align*}

Now, we multiply by respective entries

det(A)=a11(a22a33a23a32)+a12(a23a31a21a33)+a13(a23a31a21a33)\begin{align*} \text{det}(A) &= &a_{11}(a_{22}*a_{33}-a_{23}*a_{32})\\ & &+a_{12}(a_{23}*a_{31}-a_{21}*a_{33})\\ & &+a_{13}(a_{23}*a_{31}-a_{21}*a_{33})\\ \end{align*}

Which when multiplied through is same as before. Leave it to the course textbook to only explain half of the Laplace Expansion. \Box

Useful properties of determinants

AT=AA=(A)T=A=AAB=AB=BAλA=λnA\begin{align} |A^T| &= |A|\\ |A^{\dagger}| &= |(A^*)^T| = |A^*| = |A|^*\\ |AB| &= |A||B| = |BA|\\ |\lambda A| &= \lambda^n |A| \end{align}

Inverse of a Matrix

p. 114

Should probably be defined sooner?

Definition - Square Matrix: A square matrix is an n×mn \times m matrix where n=mn=m. You can also say it’s a K×KK \times K matrix at that point. Number of rows is equal to the number of columns.

Definition - Symmetric Matrix: A symmetric matrix is a square matrix where for each element, aij=ajia_{ij} = a_{ji}. Hopefully, it is apparent why it must be square.

Definition - Diagonal Matrix: A diagonal matrix is a square matrix where only the diagonal elements are nonzero. That is, anything off of the diagonal is 0. Again, the matrix should be square else the diagonal becomes arbitrary.

Definition - Identity Matrix: The identity matrix is a diagonal matrix, which is also a square matrix, whose diagonal elements aii=1a_{ii} = 1. All other elements aij=0  ija_{ij} = 0 \ \forall \ i \ne j.

Taboga, Marco (2021). “Inverse of a matrix”, Lectures on matrix algebra. https://www.statlect.com/matrix-algebra/inverse-matrix.

Definition - Inverse Matrix: If AA and BB are both matrices, then BB is the inverse of AA, or B=A1    AB=IB=A^{-1} \iff AB = I, where II is the identity matrix. If such an II exists, then we say that AA is invertible.

Some fun facts:

To calculate an inverse matrix, each entry is calculated as

(A1)ij=CjiA(A^{-1})_{ij} = \frac{C_{ji}}{|A|}

I don’t like to use the big AA notation to represent elements for A1A^{-1}; however, I also don’t want it to appear we are taking reciprocals of element values. Additionally, the denominator is the determinant and the numerator are cofactors, but with indices SWAPPED!

Also note, if det(A)=0\text{det}(A)=0, the matrix is called singular and cannot be inverted.

EXAMPLE

Suppose AA is an invertible 2×22 \times 2 matrix. Can you write out the formula to calculate the inverse?

If

A=[a11a12a21a22]A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ \end{bmatrix}

Then

A1=CTdet(A)=1a11a22a12a21[a22a21a21a11]A^{-1} = \frac{C^T}{\text{det}(A)} = \frac{1}{a_{11}a_{22}-a_{12}a_{21}} \begin{bmatrix} a_{22} & -a_{21} \\ -a_{21} & a_{11} \end{bmatrix}

The coefficient is a little confusing, but if you think it through with the definition it makes sense. \Box

EXAMPLE

Find the inverse of

A=[123045106]A = \begin{bmatrix} 1 & 2 & 3 \\ 0 & 4 & 5 \\ 1 & 0 & 6 \\ \end{bmatrix}

First… the determinant. Since there are some zeros, let’s pick those columns.

Cij=(1)i+jMijC_{ij} = (-1)^{i+j}M_{ij}

And…

det(A)=j=1KaijCij i=3=a31C31+a32C32+a33C33=(1)((+1)(2534))+(0)((1)(1631))+(6)((+1)(1420))=(2)+0+24=22\begin{align} \text{det}(A) &= \left. \sum_{j=1}^K a_{ij}C_{ij}\ \right|_{i=3}\\ &= a_{31}C_{31} + a_{32}C_{32} + a_{33}C_{33}\\ &=(1)((+1)(2*5-3*4))+(0)((-1)(1*6-3*1))+(6)((+1)(1*4-2*0)) \\ &=(-2)+0+24 \\ &= 22 \end{align}

Now, we go crazy solving for Cofactors, and we will transpose the cofactors after building the matrix

a11=C11=4506a12=C21=0516a13=C31=0410\begin{align} a_{11} = C_{11} &= \begin{vmatrix} 4 & 5 \\ 0&6\\ \end{vmatrix} \\ a_{12} = C_{21} &= \begin{vmatrix} 0&5 \\ 1&6\\ \end{vmatrix} \\ a_{13} = C_{31} &= \begin{vmatrix} 0&4 \\ 1&0\\ \end{vmatrix} \\ \cdots \end{align}

So, you can build up the matrix and where the ijij ignore the iith row and jthjth column, and just put everything in the same place in the matrix, and then transpose it. I labelled the CjiC_{ji} to show the mapping but can understand it may look confusing, like which columns and rows should we ignore when calculating. Follow the aija_{ij} for that.

Then, scalar-matrix multiplication.

Eigenvalues and Eigenvectors of a Matrix

p. 115

Because column vectors are n×1n \times 1 matrix, we can perform certain matrix operations on them. One idea is to apply a matrix to a vector to only change its magnitude.

Ax=λxA\vec{x} = \lambda \vec{x}

Eigenvalue problem

Taboga, Marco (2021). “Eigenvalues and eigenvectors”, Lectures on matrix algebra. https://www.statlect.com/matrix-algebra/eigenvalues-and-eigenvectors.

The determinant “tells us by how much the linear transformation associated with the matrix AA scales up or down the area of shapes.” That is, if αA=det(A)α\alpha_A=\text{det}(A)\cdot \alpha, if α\alpha is the area of the 2×22 \times 2 matrix described by a matrix TT, via points.

Now, Eigenvalues and eigenvectors provide additional information, telling us “by how much the linear transformation scales up or down the sides of certain parallelograms.” If one pair of parallel sides is scaled by λ1\lambda_1 and the other pair by λ2\lambda_2, the area of parallelogram is scaled by a factor of λ1λ2\lambda_1 \cdot \lambda_2. As a consequence from the above discussion, we find that det(A)=A=λ1λ2\text{det}(A) = |A| = \lambda_1 \cdot \lambda_2.

The determinant of a matrix is equal to the product of its eigenvalues.

Definition - Eigenvalue and Eigenvector: Let AA be a K×KK \times K matrix. If there exists a K×1K \times 1 vector x0x \ne 0 and a scalar λ\lambda such that

Ax=λxA x = \lambda x

then λ\lambda is called the eigenvalue of AA, and xx is called the eigenvector corresponding to λ\lambda.

You may also write that

(AλI)x=0(A - \lambda I)\vec{x} = \vec{0}

I believe, because of this, the equation can be viewed as a homogeneous system of linear equations of the form Bx=0B\vec{x}=\vec{0}. If B=0|B| = 0, this characteristic equation has a nontrivial (nonzero) solution and we can determine the eigenvalues of AA.

EXAMPLE

Can we find eigenvalues and associated eigenvectors of

A=[10332]A = \begin{bmatrix} 10 & -3 \\ -3 & 2 \end{bmatrix}

The equations would be

Ax=λx[10332]x=λx\begin{align*} A\vec{x} &= \lambda \vec{x}\\ \begin{bmatrix} 10 & -3 \\ -3 & 2 \end{bmatrix} \vec{x}&=\lambda \vec{x}\\ \end{align*}

Then we multiply λ\lambda by an identity matrix, and we will then move everything to one side

[10332]x=λx=λIx[10332]x[λ00λ]x=0\begin{align*} \begin{bmatrix} 10 & -3 \\ -3 & 2 \end{bmatrix} \vec{x}&=\lambda \vec{x}=\lambda I\vec{x}\\ \begin{bmatrix} 10 & -3 \\ -3 & 2 \end{bmatrix} \vec{x} - \begin{bmatrix} \lambda & 0\\ 0 & \lambda \end{bmatrix} \vec{x}&=0\\ \end{align*}

Going to just move the equation around, switch side and purely for visual effect. Then factor out the x\vec{x} and smash together the matrices

0=[10332]x[λ00λ]x=([10332][λ00λ])x=[10λ30302λ]x\begin{align*} 0 &= \begin{bmatrix} 10 & -3 \\ -3 & 2 \end{bmatrix} \vec{x} - \begin{bmatrix} \lambda & 0\\ 0 & \lambda \end{bmatrix} \vec{x}\\ &= \left( \begin{bmatrix} 10 & -3 \\ -3 & 2 \end{bmatrix} - \begin{bmatrix} \lambda & 0\\ 0 & \lambda \end{bmatrix} \right)\vec{x}\\ &= \begin{bmatrix} 10-\lambda & -3-0 \\ -3-0 & 2-\lambda \end{bmatrix}\vec{x}\\ \end{align*}

Now, we solve for the determinant because we can set it equal to 0 (find the roots) to solve for non trivial answer(s).

det(AλI)=10λ332λ=(10λ)(2λ)(3)(3)=2010λ2λ+λ29=λ212λ+11\begin{align*} \text{det}(A - \lambda I) &= \begin{vmatrix} 10 - \lambda & -3\\ -3 & 2-\lambda \end{vmatrix}\\ &= (10-\lambda)(2-\lambda) - (-3)(-3)\\ &=20 -10 \lambda -2\lambda +\lambda^2 -9 \\ &= \lambda^2 -12 \lambda +11 \end{align*}

Well, isn’t that interesting, we have a lambda polynomial. Set equal to 0 and solve for λ\lambda. You could go quadratic equation if you want

x=b±b24ac2a=12±(12)24(1)(11)2(1)=12±144442=12±1002=12±102x=6±5x=(1,11)\begin{align*} x&=\frac{-b \pm \sqrt{b^2-4ac}}{2a}\\ &=\frac{12 \pm \sqrt{(-12)^2-4(1)(11)}}{2(1)}\\ &=\frac{12 \pm \sqrt{144-44}}{2}\\ &=\frac{12 \pm \sqrt{100}}{2}\\ &=\frac{12 \pm 10}{2}\\ x &=6 \pm 5\\ x &= (1, 11) \end{align*}

Yes, we now have our eigenvalues! What about those pesky eigenvectors?

Basically, we sub in our eigenvalues and solve for the vectors.

for λ=10=(A(1)I)x=[1013321][x1x2]=[9331][x1x2]\begin{align*} \text{for } \lambda &=1\\ \vec{0} &= (A-(1)I)\vec{x}\\ &= \begin{bmatrix} 10-1 & -3\\ -3 & 2-1 \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix}\\ &= \begin{bmatrix} 9 & -3\\ -3 & 1 \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix} \end{align*}

Quickly write out the equations

0=9x13x20=3x1+1x2\begin{array}{c} 0 = 9x_1-3x_2\\ 0=-3x_1+1x_2 \end{array}

We then have a small system of linear equations. However, they are scalar factors of 3-3 of each other, so they don’t add information the other does not. Basically, that means x1=tx_1=t and x2=3tx_2=3t, where tt is a “free variable”. In essence, we take the latter equation and write as 3x1=x23x_1=x_2. Then, substitute the xx for tt.

for λ=110=(A(1)I)x=[101133211][x1x2]=[1339][x1x2]\begin{align*} \text{for } \lambda &=11\\ \vec{0} &= (A-(1)I)\vec{x}\\ &= \begin{bmatrix} 10-11 & -3\\ -3 & 2-11 \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix}\\ &= \begin{bmatrix} -1 & -3\\ -3 & -9 \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix} \end{align*}

Quickly write out the equations

0=1x13x20=3x19x2\begin{array}{c} 0 = -1x_1-3x_2\\ 0=-3x_1-9x_2 \end{array}

They again, are just scalar multiple of each other.

We are able to describe the infinite amount of solutions as those satisfying x1=tx_1=t and x2=t/3x_2=-t/3. I think, to remove fractions, we take x1=3tx_1=3t and x2=tx_2=-t.

Then we write then in terms of unit eigenvectors, where the magnitude is a2+b2=c2a^2+b^2=c^2

x1=110(13)andx2=110(31)\begin{array}{ccc} \vec{x}_1=\frac{1}{\sqrt{10}} \begin{pmatrix} 1 \\ 3 \end{pmatrix} & \text{and} & \vec{x}_2=\frac{1}{\sqrt{10}} \begin{pmatrix} 3 \\ -1 \end{pmatrix} \end{array}

\Box

5.3 - Diagonalization

p. 117

Change of Basis

Definition - Linearly Independent: Vectors are linearly independent if they cannot be expressed as a linear combination of each other.

I suppose that would indicate the vectors have different directions.

A basis {ei:1,2,,N}\left\{ \vec{e}_i: 1,2,\cdots,N \right\} is a minimal spanning set of linearly independent vectors. Example being Cartesian coordinate system, which form a basis of R3\mathbb{R}^3. We would have

i=e1x-axisj=e2y-axisk=e3z-axis\begin{array}{l|r} \vec{i} = \vec{e}_1 & x\text{-axis}\\ \vec{j} = \vec{e}_2 & y\text{-axis}\\ \vec{k} = \vec{e}_3 & z\text{-axis}\\ \end{array}

Now, consider nn-dimensional vector space with basis e1,,ene_1,\cdots,e_n. Every vector x\vec{x} in the vector space can be expressed as a linear combination of the basis vectors:

x=x1e1+x2e2++xnen\vec{x} = x_1\vec{e}_1 + x_2\vec{e}_2 + \cdots + x_n\vec{e}_n

You can also write it as

x=[x1,x2,,xn]T\vec{x}=\begin{bmatrix} x_1, x_2, \cdots,x_n \end{bmatrix}^T

Ok, why all of this? Well, suppose we want to write spherical coordinates instead of Cartesian. We create new base vectors ej\vec{e}_j' as

ej=i=1NSijei\vec{e}_j' = \sum_{i=1}^N S_{ij} \vec{e}_i

where we say that SijS_{ij} is a matrix that transforms e\vec{e} into e\vec{e}', Cartesian to Spherical. We are merely changing representation and not any properties. We can express our logic mathematically as follows:

x=i=1Nxiei=i=1Nxiei=j=1Nxji=1NSijei\vec{x} = \sum_{i=1}^N x_i \vec{e}_i = \sum_{i=1}^N x_i' \vec{e}_i' = \sum_{j=1}^N x_j' \sum_{i=1}^N S_{ij} \vec{e}_i

So, the vector is broken into components, translated into other coordinates, and then expresses as that translation using the matrix. We have the following

xi=j=1NSijxjx_i = \sum_{j-=1}^N S_{ij}x_j'

Or in vector notation as

x=Sx    x=S1x\vec{x} = S \vec{x}' \iff \vec{x}' = S^{-1}\vec{x}

I’m not yet sure what this will accomplish, but we can express both representations as

y=Ax, y=Ax\vec{y} = A \vec{x},\ \vec{y}' = A'\vec{x}'

Then, we can incorporate the transformation matrix, as expressed above:

Sy=ASxy=S1ASx        A=S1AS\begin{align*} S \vec{y}' &= A S \vec{x}'\\ \vec{y}' &= S^{-1} A S \vec{x}' \implies \\ \implies A' &= S^{-1} A S \end{align*}

I can see how this comes to fruition. An example of why it is useful would be nice. Additionally, it’s important to note that the order or matrix multiplication is important, and where the matrices appear in the equation could be relevant.

Taboga, Marco (2021). “Change of basis”, Lectures on matrix algebra. https://www.statlect.com/matrix-algebra/change-of-basis.

It is more like we use a change of basis matrix SBCS_{B \to C} to change each element aAa \in A. The proposition is

Proposition: Let SS be a vector space. Let B={b1,,bK}B=\{b_1, \cdots, b_K\} and C={c1,,cK}C=\{c_1,\cdots, c_K\} be two bases for SS. They should represent the same basis really, just in different ways. Then, there exists a K×KK \times K matrix, we denote as SBCS_{B \to C} and called change-of-basis matrix, such that, for any sSs \in S

[s]C=SBC[s]B[s]_C=S_{B \to C}[s]_B

where [s]B[s]_B and [s]C[s]_C denote coordinate vectors of ss with respect to BB and CC repectively.

Matrix Diagonalization

p. 119

Consider matrix SS such that each column of the matrix corresponds to an eigenvector of some matrix AA

S=[x1x2xN]S = \begin{bmatrix} \uparrow & \uparrow & & \uparrow\\ \vec{x}^1 & \vec{x}^2 & \cdots & \vec{x}^N \\ \downarrow & \downarrow & & \downarrow \end{bmatrix}

The subscript does not denote an exponent but the eigenvector. The fulfill the equation for eigenvectors

Axj=λjxjA\vec{x}^j = \lambda_j \vec{x}^j

We can express AA in a new basis AA' consisting of the eigenvector from the change of basis section

(S1AS)ij=kl(S1)ikAklSlj=kl(S1)ikAkl(xj)l=k(S1)ikλj(xj)l=kλj(S1)ikSkj\begin{align*} (S^{-1}AS)_{ij} &= \sum_k \sum_l(S^{-1})_{ik}A_{kl}S_{lj}\\ &= \sum_k \sum_l(S^{-1})_{ik}A_{kl}(x^j)_l\\ &= \sum_k (S^{-1})_{ik}\lambda_{j}(x^j)_l\\ &= \sum_k \lambda_j (S^{-1})_{ik}S_{kj}\\ \end{align*}

I think we should look step by step

  1. The base definition
  2. The eigenvectors in matrix SS
  3. We apply the eigenvalues of matrix AA
  4. And recombine to the eigenvectors in matrix SS

multiplying the SS matrix by its inverse with produce the identity matrix, thus resulting in a diagonal matrix with eigenvalues along the diagonal.

A=[λ1000λ200λN]A'= \begin{bmatrix} \lambda_1 & 0 & \cdots & 0\\ 0 & \lambda_2 & \ddots & \vdots\\ \vdots & \ddots & \ddots & \vdots\\ 0 & 0 & \cdots & \lambda_N \end{bmatrix}

Let’s talk about what we used during the derivations:

Taboga, Marco (2021). “Matrix diagonalization”, Lectures on matrix algebra. https://www.statlect.com/matrix-algebra/matrix-diagonalization.

The above link states the most important application of diagonalization is the computation of matrix powers.

EXAMPLE

Diagonalize the following matrix (for no reason?).

A=[200121101]A = \begin{bmatrix} 2 & 0 & 0\\ 1 & 2 & 1\\ -1 & 0 & 1\\ \end{bmatrix}

First, find the eigenvalues of AA. Remember, we can use det(AλI)=0\text{det}(A-\lambda I)=0

AλI=2λ0012λ1101λhold i=1=(2λ)((1)1+1(2λ)(1λ)(0))+0()+0()0=(2λ)2(1λ)\begin{align*} |A-\lambda I| &= \begin{vmatrix} 2-\lambda & 0 & 0\\ 1 & 2-\lambda & 1\\ -1 & 0 & 1-\lambda\\ \end{vmatrix}\\ &\text{hold } i=1\\ &=(2-\lambda)\left((-1)^{1+1}(2-\lambda)(1-\lambda)-(0)\right)+0(\cdots)+0(\cdots)\\ 0 &=(2-\lambda)^2(1-\lambda) \end{align*}

By holding the top row to find the determinant, we take advantage of the zeros making the multiplication easier.

Now, we solve for (AλI)x=0(A-\lambda I)\vec{x}=0. I think from the equation, we can see the solution set as λ=[ 1,2 ]\lambda = [\ 1, 2\ ]. We start by substituting λ=1\lambda=1

(A1I)x=[210012111011]x=[100111100]x(A-1I)\vec{x} = \begin{bmatrix} 2-1&0&0\\ 1&2-1&1\\ -1&0&1-1\\ \end{bmatrix} \vec{x} = \begin{bmatrix} 1&0&0\\ 1&1&1\\ -1&0&0\\ \end{bmatrix} \vec{x}

This means we can solve the following set of equations:

x1=0x1+x2+x3=0x1=0\begin{align*} x_1 &= 0\\ x_1 + x_2 + x_3 &= 0\\ -x_1 &= 0 \end{align*}

We have that x1=0x_1=0, and x2=x3x_2=-x_3 and therefore the eigenvector is given by. Or like x2=tx_2=-t and x3=tx_3=t.

v1=a1[011]\vec{v}_1=a_1 \begin{bmatrix} 0 \\ -1 \\ 1 \end{bmatrix}

And a1a_1 is just an arbitrary constant. Remember, we usually choose a1a_1 so that the eigenvector becomes a unit vector. But, because we want to change basis, we can choose a1=1a_1=1 to make the resulting matrix of eigenvectors as simple as possible.

Now, let λ=2\lambda=2

(A1I)x=[220012211012]x=[000101101]x(A-1I)\vec{x} = \begin{bmatrix} 2-2&0&0\\ 1&2-2&1\\ -1&0&1-2\\ \end{bmatrix} \vec{x} = \begin{bmatrix} 0&0&0\\ 1&0&1\\ -1&0&-1\\ \end{bmatrix} \vec{x}

Let’s look at this like it’s a real problem to solve. We need 2 more vectors of x\vec{x} that will zero this out.

0(1)+0(0)+0(1)=01(1)+0(0)+1(1)=01(1)+0(0)+1(1)=0\begin{align*} 0(-1) + 0(0) + 0(1) &= 0\\ 1(-1) + 0(0) + 1(1) &=0\\ -1(-1) + 0(0) + -1(1) &=0\\ \end{align*}

It’s a bit like fill in the blank…

0=[000101101][101]\vec{0} = \begin{bmatrix} 0&0&0\\ 1&0&1\\ -1&0&-1\\ \end{bmatrix} \begin{bmatrix} -1 \\ 0 \\ 1 \end{bmatrix}\\

And, we also have the following…

0(0)+0(1)+0(0)=01(0)+0(1)+1(0)=01(0)+0(1)+1(0)=0\begin{align*} 0(0) + 0(1) + 0(0) &= 0\\ 1(0) + 0(1) + 1(0) &=0\\ -1(0) + 0(1) + -1(0) &=0\\ \end{align*}

It’s a bit like fill in the blank…

0=[000101101][010]\vec{0} = \begin{bmatrix} 0&0&0\\ 1&0&1\\ -1&0&-1\\ \end{bmatrix} \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}\\

That’s a little cheap, but gets the job done. Our other two eigenvectors are therefore

v2=a2[010]andv3=a3[101]\begin{array}{lcr} \vec{v}_2=a_2 \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix} & \text{and} & \vec{v}_3=a_3 \begin{bmatrix} -1 \\ 0 \\ 1 \end{bmatrix} \end{array}

Ok, wow. Now, going back to slotting them into a matrix side-by-side

S=[001110101]S = \begin{bmatrix} 0 & 0 & -1 \\ -1 & 1 & 0\\ 1 & 0 & 1\\ \end{bmatrix}

That is the matrix of eigenvectors, and below is the corresponding diagonal matrix of eigenvalues!

D=[100020002]D = \begin{bmatrix} 1 & 0 & 0\\ 0 & 2 & 0\\ 0 & 0 & 2\\ \end{bmatrix}

If we really want to go the extra mile, we can verify the answer using the expression D=S1ASD=S^{-1}AS.

\Box

As we are dealing with determinants, note that not all matrices are diagonalizable. The link to Statlect.com is very good because there is more to this that the course book leaves out.

5.4 - Tensors

p. 122

We are going to look at scalars, vectors, and matrices in a more unified way. Typically, numbers are useless without more information, the context of which they represent, whether it be slices of cake, or kilograms of mass. Scalars always produce scalars when we operating with scalars.

We revisit v=ai+bj+ck\vec{v} = a\vec{i} + b\vec{j} + c\vec{k} to talk about vectors and unit vectors multiplied by scalars for magnitude.

We have also operated on vectors with matrices to rotate vectors, or express a change of basis or coordinate system.

Not everything can be represented with scalars and vectors. For example, the inertia matrix requires nine elements of a 3×33 \times 3 matrix to describe behaviour. Similar to also describing an external magnetic field, it requires a 3×33 \times 3 matrix as well.

Informally, we have been dealing with Tensors the entire time, but let’s see what that means:

We can understand the rank of a tensor intuitively as the number of indices we need to express the tensor. Our vector has one index, hence can be represented like ai  aaa_i \ \forall \ a \in \vec{a}.

A Tensor of rank 2 behaves like a square matrix, but tensors also generalize the concepts we have covered so far.

if a\vec{a} and b\vec{b} are vectors and TT is a tensor of rank 2, we can express the linear equations b=Ta\vec{b} = T \vec{a} as the following system of linear equations:

b1=t11a1+t12a2+b13a3b2=t21a1+t22a2+b23a3b3=t31a1+t32a2+b33a3\begin{align*} \vec{b}_1 &= t_{11}a_1 + t_{12}a_2+b_{13}a_3\\ \vec{b}_2 &= t_{21}a_1 + t_{22}a_2+b_{23}a_3\\ \vec{b}_3 &= t_{31}a_1 + t_{32}a_2+b_{33}a_3\\ \end{align*}

Looks a bit like matrix multiplication. Apparently, we now look at the following summation convection where we don’t explicitly write the summation sign because why?

bi=j=13tijaj=tijajb_i = \sum_{j=1}^3 t_{ij}a_j=t_{ij}a_j

When a concept appears easy, we make it more difficult by changing notation, that way only we can understand it. It’s job security.

Basically, it might look like regular multiplication, but it is implied we sum over the elements of jj. It becomes a useful shorthand when there’s many indices used.

Definition - Symmetric Tensors: A Tensor is symmetric if the tensors invariant under a permutation of its indices. Basically, elements do not flip from positive to negative, or vice versa, if the indices are swapped (i.e. tijkz=tjikzt_{ijk\cdots z}=t_{jik\cdots z} for ii and jj swapping).

If the signs would flip, the Tensor is anti-symmetric.

Calculating with Tensors

p. 124

Taboga, Marco (2021). “Kronecker product”, Lectures on matrix algebra. https://www.statlect.com/matrix-algebra/Kronecker-product.

The Dyad product is apparently also called the Kronecker product. The link also links to many properties of Kronecker product which are currently beyond the scope of learning.

EXAMPLE

While I have a fresh understanding, suppose we have AA and BB such that

A=[1357]andB=[2468]\begin{array}{ccc} A = \begin{bmatrix} 1 & 3\\ 5 & 7 \end{bmatrix} & \text{and} & B = \begin{bmatrix} 2 & 4\\ 6 & 8 \end{bmatrix} \end{array}

Find the Kronecker product!

AB=[1B3B5B7B]=[1[2468]3[2468]5[2468]7[2468]]=[[2468][6121824][10203040][14284256]]=[246126818241020142830404256]\begin{align*} A \otimes B &= \begin{bmatrix} 1B & 3B\\ 5B & 7B \end{bmatrix} \\ &= \begin{bmatrix} 1\begin{bmatrix} 2 & 4\\ 6 & 8 \end{bmatrix} & 3\begin{bmatrix} 2 & 4\\ 6 & 8 \end{bmatrix}\\ 5\begin{bmatrix} 2 & 4\\ 6 & 8 \end{bmatrix} & 7\begin{bmatrix} 2 & 4\\ 6 & 8 \end{bmatrix} \end{bmatrix} \\ &= \begin{bmatrix} \begin{bmatrix} 2 & 4\\ 6 & 8 \end{bmatrix} & \begin{bmatrix} 6 & 12\\ 18 & 24 \end{bmatrix}\\ \begin{bmatrix} 10 & 20\\ 30 & 40 \end{bmatrix} & \begin{bmatrix} 14 & 28\\ 42 & 56 \end{bmatrix} \end{bmatrix} \\ &= \begin{bmatrix} 2 & 4 & 6 & 12\\ 6 & 8 & 18 & 24\\ 10 & 20 & 14 & 28\\ 30 & 40 & 42 & 56\\ \end{bmatrix} \\ \end{align*}

Taboga, Marco (2021). “Trace of a matrix”, Lectures on matrix algebra. https://www.statlect.com/matrix-algebra/trace-of-a-matrix.

I think we have discussed the trace before but…

Definition - Trace: The trace of a square matrix is the sum of its diagonal entries. Let AA be a K×KK \times K matrix.

tr(A)=k=1KAkk\text{tr}(A) = \sum_{k=1}^{K} A_{kk}

I don’t think an example is necessary.

It has properties like the trace of the sum of 2 matrices is equal to the sum of the traces.

CoVariant and ContraVariant Tensors

p. 125

Regardless of using any coordinate system

We must obtain the same physical relationships.

Definition - Symmetry: If a system is symmetric under some operation, we mean that it looks the same whether we apply the operation or not.

Some things are just harder to write in certain coordinate systems. We may want to write equations in better suited systems, and therefore require a change to the coordinate system, or base. As we do this, some quantities will change and some will remain invariant.

An example of an invariant quantity could be the mass mm of a particle. Irrespective of the coordinate system used to describe its path, the mass remains the same.

Definition - Contravariant: A property of a vector such that it is independent of the change of base or coordinate system. For this to be the case, the matrix that transforms the vector component has to be the inverse of the matrix expressing the change in base or coordinate system. Typically denoted with a super-script (e.g. rir^i).

The book give an example of changing units of distance from meters to millimetres. The base is divided by 1000 but the vector components are multiplied by 1000.

Definition - Covariant: A covector transforms the same way as the change of base or coordinate system does. Its components co-vary and hence, the property is called covariant. Typically denoted with a sub-script (e.g. rir_i).

The example being strength of an electric field in V/mV/m, and shifting length to mmmm. The coordinate system and covector need to be multiplied by 1000 to stay invariant.

Tensors can have both covariant and contravariant properties.

Contravariant Tensor — from Wolfram MathWorld

According to Wolfram MathWorld, these are topics of differential geometry.


Knowledge Check

QUESTION 1

If D3×4=AB+CTD_{3 \times 4} = AB+C^T, how would you describe the dimensions of AA, BB, and CC?

Since adding a matrix must have same dimension, then C4×3C_{4 \times 3}, since transpose. And because inner neighbours must be the same, and the product has dimensions of outer neighbours, A3×KA_{3 \times K} and BK×4B_{K \times 4} where KK can be basically any real number.

QUESTION 2

Given that A, B, and CA,\ B, \text{ and } C are square matrices of the same size, can you rewrite Tr(AB)Tr(AC)Tr(AB) - Tr(AC) in a different format?

A=[a1a2a3a4] B=[b1b2b3b4] C=[c1c2c3c4]A = \begin{bmatrix} a_1 & a_2\\ a_3 & a_4 \\ \end{bmatrix} \ B = \begin{bmatrix} b_1 & b_2\\ b_3 & b_4 \\ \end{bmatrix} \ C = \begin{bmatrix} c_1 & c_2\\ c_3 & c_4 \\ \end{bmatrix}

Because the course book doesn’t describe properties very well at all, I’d recommend

Taboga, Marco (2021). “Trace of a matrix”, Lectures on matrix algebra. https://www.statlect.com/matrix-algebra/trace-of-a-matrix.

Taboga, Marco (2021). “Matrix multiplication”, Lectures on matrix algebra. https://www.statlect.com/matrix-algebra/matrix-multiplication.

The matrix product is slightly distributive, which is really cool but you must remember that ORDER MATTERS!

Tr(AB)Tr(AC)=Tr(ABAC)=Tr(A(BC))=Tr((A(BC))T)\begin{align*} Tr(AB)-Tr(AC) &= Tr(AB - AC)\\ &= Tr(A(B-C))\\ &= Tr(\left( A(B-C) \right)^T) \end{align*}

Because the sum is down the diagonal, due to symmetry, the transpose makes no difference.

QUESTION 3

Is the following invertible?

A=[3624]A = \begin{bmatrix} 3 & 6 \\ 2 & 4\\ \end{bmatrix}

If the determinant is 0, then the matrix is not invertible, or singular.

A=3624=(34)(62)=1212=0|A| = \begin{vmatrix} 3 & 6 \\ 2 & 4\\ \end{vmatrix} = (3*4)-(6*2) = 12 -12 = 0

Unfortunately, it is singular.

Question 4

Find eigenvalues of

A=[1405]A= \begin{bmatrix} -1 & 4\\ 0 & 5 \end{bmatrix}

so

(AλI)x=[1λ405λ]=0det(AλI)=1λ405λ=(1λ)(5λ)0\begin{align*} (A - \lambda I)\vec{x} &= \begin{bmatrix} -1 - \lambda & 4 \\ 0 & 5 - \lambda\\ \end{bmatrix} = \vec{0}\\ \text{det}(A-\lambda I) &= \begin{vmatrix} -1 - \lambda & 4 \\ 0 & 5 - \lambda\\ \end{vmatrix}\\ &= (-1-\lambda)(5-\lambda) -0 \end{align*}

So, we can see that λ=[1,5]\lambda = [-1, 5].

Question 5

Suppose we contract two tensors of rank 1. What is the resultant’s rank?

Contract 2 tensors of rank 1 = 0