title: Advanced Mathematicssubtitle: DLMDSAM01Author: Dr. Robert Grafpublisher: IU International University of Applied Sciencesyear: 2023
Unit 5 Matrices and Vector Spaces
Quick overview of our learning goals:
What matrices and special matrices are
How to perform calculations with matrices
How to compute the following
determinant
trace
transpose
complex conjugate
Hermitian conjugate
How to determine EigenValues and EigenVectors of matrices
How to diagonalize matrices
How to change bases of matrices
What are Tensors
How to perform basic calculations with tensors
Introduction
Many problems can be restructured with matrices and solved in a systematic way. Tensors formalize and extend concepts of scalars, vectors, and matrices. Recommends additional readings.
5.1 Basic Matrix Algebra
Vectors are scalars with direction. They can be decomposed into basic components and represented as such. Matrices are rectangular arrays of numbers. They are represented by their row and column numbers. An n×m matrix has n rows (y) and m columns (x).
Matrices are a convenient way to think of and perform operations on vectors. So, matrices are the concept, and vectors are the application.
Calculating with Matrices
Notation - Matrices:
capital letters, such as A, denote an entire matrix
small letters with subscripts, such as aij, denote entries of a matrix
greek letters, like λ, typically denote a scalar
Rules of Calculating with Matrices:
A=B⟺∀i,j:(aij=bij) -> That is, two matrices are equal if, and only if, the matrices have the same dimension and identical entries.
A+B=C⟺∀i,j:(cij=aij+bij) -> Matrix addition and subtraction are performed entry-wise. Operations are both commutative(A+B=B+A) and associative(A+(B+C)=(A+B)+C).
B=αA⟺∀i,j:(bij=αaij) -> Multiply a matrix by a scalar multiplies each entry by said scalar.
Matrix multiplication is only defined when the number of columns on the left matrix is equal to the number of rows in the right. The product will have a number of rows same as the left matrix, and a number of columns save as the right
The neighbouring dimensions of the matrices must be equal.
The product assumes dimensions that are not neighbouring.
Example of matrix multiplication:
An×mBm×p=Cn×pAB=C⟺cij=k∑aikbkj
Matrix multiplication is distributive (A(B+C)=AB+AC) but it is not commutative (AB=BA).
Definition - Commutator: A commutator is the quantity [A,B]≡AB−BA. It’s an important concept in fields like quantum mechanics.
EXAMPLE
We are going to do this matrix multiplication on two matrices and compare them front to back. In order for this, they must be square matrices.
I wrote the sum of each row in matrix A, then multiplied when writing each column of B.
I won’t do it for sake of time, left as an exercise to the reader, but find BA and note they should be different.
5.2 Determinant, Trace, Transpose, Complex Conjugate, and Hermitian Conjugate
p. 109
Matrix Transpose
The transpose of a matrix basically swaps the row and column arrangement of each element. So if aij∈A then aji∈AT. It’s quite an important concept, so give it a go.
EXAMPLE
find the transpose of
A=[142536]
An×m will become Am×n as each aij→aji.
AT=123456
Hopefully you got the same □.
We also have a fun property:
The transpose of the product of two matrices is the product of the transposed matrices, but multiplied in reverse.
(AB)T=BTAT
Consider that matrix multiplication requires neighbouring dimensions to be the same. This is why the reverse order is necessary.
Complex Conjugate
Definition - Complex Conjugate: For a matrix with complex number entries, (a±bi), the complex conjugate of a matrix A∗ can be found by taking the complex conjugate of each entry of A.
(a∗)ij=(aij)∗
Additionally, the complex conjugate of (a±bi)∗ is just (a∓bi).
That’s great but what does it actually mean? You flip the sign of the complex part.
EXAMPLE
Find the complex conjugate of the following matrix
A=[13+i2i4]
Remember, just flip the complex signs…
A∗=[13−i−2i4]
Notice that in the form (a+bi), the a remains unchanged, and the bi is negated □
Hermitian Conjugate
Definition - Hermitian Conjugate: The Hermitian conjugate of a matrix A is the transpose of the complex conjugate, and denoted A†. Funny enough, it is just the transpose of a matrix if there is no complex parts.
EXAMPLE
find the Hermitian conjugate of
A=[141−2i5+i3i6]
It is both a transpose and complex conjugate
An×m will become Am×n as each aij→aji.
(a±bi)∗ is just (a∓bi).
A†=11+2i−3i45−i6
Hopefully you got the same □
At this point, we are wondering why is any of this important? We usually represent vectors as column matrices
a=a1a2⋮an,andb=b1b2⋮bn
Because neighbouring dimensions are not the same, you can only add them or find a dot product. However, if you take the Hermitian conjugate of a, resulting in a row matrix, you can perform matrix multiplication with b.
This is also the inner product ⟨a∣b⟩. Again, if we are dealing with real numbers, the Hermitian conjugate becomes a transpose and the inner product becomes the dot product.
Trace of a Matrix
Definition - Trace of a Matrix: The trace of a matrix is a property of square matrices. It is the sum of the matrices’ diagonal elements
Tr(A)=a11+a22+⋯+ann=i=1∑naii
The value goes from a matrix to a scalar, so it has extra properties:
Tr(A±B)=Tr(A)±Tr(B)
and
Tr(AB)=Tr(BA)
Example
Might seem trivial, but find the trace of the following
Definition - Permutation: A simple definition of permutation is one of several ways that a number of items can be arranged and different arrangements of the same items is a different permutation. That is, order matters.
Permutations is probably better left defined and explained in notes for sets and counting.
Definition - Determinant of a Matrix: The determinant is a single number (scalar) representation of a square matrix. Its notation looks like we are taking the absolute value of a matrix, but it is the determinant
det(A)=∣A∣=a11⋮an1⋯⋱⋯a1n⋮ann
A weird formula for the determinant is
det(A)=P[αβ…ω]∑ϵαβ…ωa1αa2β…anω
The book “Math for Machine Learning”, section 4.1 really dives into determinants.
Anyway, what is all of this? Staring with ϵ, that is called the anti-symmetric tensor, and is basically either ±1.
ϵαβ…ω=⎩⎨⎧+1−10for even permutations of 1,…,nfor odd permutations of 1,…,nif 2 indices are the same
We already cut out the ϵ reference because it’s not really the easiest way to think about it. So, consider the Laplace Expansion, aka cofactor Expansion
Cij=(−1)i+jMij
Check out the following for even more information!
We let Mij be the minor determinant of matrix of size n−1×n−1, which you get by removing all the elements in the ith row and jth column. And yes, that means the row × column, which is still a little confusing to me because matrices are backwards to cartesian coordinates, and then this is frontwards again…
Per statlect.com
Proposition: Let A be a K×K matrix ∀K≥2. We let Cij be the cofactor, and aij be the entry at that postion. For any row i, the following row expansion holds
det(A)=j=1∑KaijCij
Or, if you feel frisky, it holds for any column j as well, cycling through the i’s instead
det(A)=i=1∑KaijCij
How can it be so? The statlect.com resource has a proof as well.
If 2 rows or columns of a matrix are interchanged, the determinant changes its sign but not its value.
if 2 rows or columns of a matrix are identical, the determinant is zero.
Inverse of a Matrix
p. 114
Should probably be defined sooner?
Definition - Square Matrix: A square matrix is an n×m matrix where n=m. You can also say it’s a K×K matrix at that point. Number of rows is equal to the number of columns.
Definition - Symmetric Matrix: A symmetric matrix is a square matrix where for each element, aij=aji. Hopefully, it is apparent why it must be square.
Definition - Diagonal Matrix: A diagonal matrix is a square matrix where only the diagonal elements are nonzero. That is, anything off of the diagonal is 0. Again, the matrix should be square else the diagonal becomes arbitrary.
Definition - Identity Matrix: The identity matrix is a diagonal matrix, which is also a square matrix, whose diagonal elements aii=1. All other elements aij=0∀i=j.
Definition - Inverse Matrix: If A and B are both matrices, then B is the inverse of A, or B=A−1⟺AB=I, where I is the identity matrix. If such an I exists, then we say that A is invertible.
Some fun facts:
AB=I=BA is only satisfied when A is a square matrix
It is also possible that BA=I but AB=I.
To calculate an inverse matrix, each entry is calculated as
(A−1)ij=∣A∣Cji
I don’t like to use the big A notation to represent elements for A−1; however, I also don’t want it to appear we are taking reciprocals of element values. Additionally, the denominator is the determinant and the numerator are cofactors, but with indices SWAPPED!
Also note, if det(A)=0, the matrix is called singular and cannot be inverted.
EXAMPLE
Suppose A is an invertible 2×2 matrix. Can you write out the formula to calculate the inverse?
So, you can build up the matrix and where the ij ignore the ith row and jth column, and just put everything in the same place in the matrix, and then transpose it. I labelled the Cji to show the mapping but can understand it may look confusing, like which columns and rows should we ignore when calculating. Follow the aij for that.
Then, scalar-matrix multiplication.
Eigenvalues and Eigenvectors of a Matrix
p. 115
Because column vectors are n×1 matrix, we can perform certain matrix operations on them. One idea is to apply a matrix to a vector to only change its magnitude.
The determinant “tells us by how much the linear transformation associated with the matrix A scales up or down the area of shapes.” That is, if αA=det(A)⋅α, if α is the area of the 2×2 matrix described by a matrix T, via points.
Now, Eigenvalues and eigenvectors provide additional information, telling us “by how much the linear transformation scales up or down the sides of certain parallelograms.” If one pair of parallel sides is scaled by λ1 and the other pair by λ2, the area of parallelogram is scaled by a factor of λ1⋅λ2. As a consequence from the above discussion, we find that det(A)=∣A∣=λ1⋅λ2.
The determinant of a matrix is equal to the product of its eigenvalues.
Definition - Eigenvalue and Eigenvector: Let A be a K×K matrix. If there exists a K×1 vector x=0 and a scalar λ such that
Ax=λx
then λ is called the eigenvalue of A, and x is called the eigenvector corresponding to λ.
You may also write that
(A−λI)x=0
I believe, because of this, the equation can be viewed as a homogeneous system of linear equations of the form Bx=0. If ∣B∣=0, this characteristic equation has a nontrivial (nonzero) solution and we can determine the eigenvalues of A.
EXAMPLE
Can we find eigenvalues and associated eigenvectors of
A=[10−3−32]
The equations would be
Ax[10−3−32]x=λx=λx
Then we multiply λ by an identity matrix, and we will then move everything to one side
[10−3−32]x[10−3−32]x−[λ00λ]x=λx=λIx=0
Going to just move the equation around, switch side and purely for visual effect. Then factor out the x and smash together the matrices
Yes, we now have our eigenvalues! What about those pesky eigenvectors?
Basically, we sub in our eigenvalues and solve for the vectors.
for λ0=1=(A−(1)I)x=[10−1−3−32−1][x1x2]=[9−3−31][x1x2]
Quickly write out the equations
0=9x1−3x20=−3x1+1x2
We then have a small system of linear equations. However, they are scalar factors of −3 of each other, so they don’t add information the other does not. Basically, that means x1=t and x2=3t, where t is a “free variable”. In essence, we take the latter equation and write as 3x1=x2. Then, substitute the x for t.
for λ0=11=(A−(1)I)x=[10−11−3−32−11][x1x2]=[−1−3−3−9][x1x2]
Quickly write out the equations
0=−1x1−3x20=−3x1−9x2
They again, are just scalar multiple of each other.
We are able to describe the infinite amount of solutions as those satisfying x1=t and x2=−t/3. I think, to remove fractions, we take x1=3t and x2=−t.
Then we write then in terms of unit eigenvectors, where the magnitude is a2+b2=c2
x1=101(13)andx2=101(3−1)
□
5.3 - Diagonalization
p. 117
Change of Basis
Definition - Linearly Independent: Vectors are linearly independent if they cannot be expressed as a linear combination of each other.
I suppose that would indicate the vectors have different directions.
A basis {ei:1,2,⋯,N} is a minimal spanning set of linearly independent vectors. Example being Cartesian coordinate system, which form a basis of R3. We would have
i=e1j=e2k=e3x-axisy-axisz-axis
Now, consider n-dimensional vector space with basis e1,⋯,en. Every vector x in the vector space can be expressed as a linear combination of the basis vectors:
x=x1e1+x2e2+⋯+xnen
You can also write it as
x=[x1,x2,⋯,xn]T
Ok, why all of this? Well, suppose we want to write spherical coordinates instead of Cartesian. We create new base vectors ej′ as
ej′=i=1∑NSijei
where we say that Sij is a matrix that transforms e into e′, Cartesian to Spherical. We are merely changing representation and not any properties. We can express our logic mathematically as follows:
So, the vector is broken into components, translated into other coordinates, and then expresses as that translation using the matrix. We have the following
xi=j−=1∑NSijxj′
Or in vector notation as
x=Sx′⟺x′=S−1x
I’m not yet sure what this will accomplish, but we can express both representations as
y=Ax,y′=A′x′
Then, we can incorporate the transformation matrix, as expressed above:
Sy′y′⟹A′=ASx′=S−1ASx′⟹=S−1AS
I can see how this comes to fruition. An example of why it is useful would be nice. Additionally, it’s important to note that the order or matrix multiplication is important, and where the matrices appear in the equation could be relevant.
It is more like we use a change of basis matrixSB→C to change each element a∈A. The proposition is
Proposition: Let S be a vector space. Let B={b1,⋯,bK} and C={c1,⋯,cK} be two bases for S. They should represent the same basis really, just in different ways. Then, there exists a K×K matrix, we denote as SB→C and called change-of-basis matrix, such that, for any s∈S
[s]C=SB→C[s]B
where [s]B and [s]C denote coordinate vectors of s with respect to B and C repectively.
Matrix Diagonalization
p. 119
Consider matrix S such that each column of the matrix corresponds to an eigenvector of some matrix A
S=↑x1↓↑x2↓⋯↑xN↓
The subscript does not denote an exponent but the eigenvector. The fulfill the equation for eigenvectors
Axj=λjxj
We can express A in a new basis A′ consisting of the eigenvector from the change of basis section
By holding the top row to find the determinant, we take advantage of the zeros making the multiplication easier.
Now, we solve for (A−λI)x=0. I think from the equation, we can see the solution set as λ=[1,2]. We start by substituting λ=1
(A−1I)x=2−11−102−10011−1x=11−1010010x
This means we can solve the following set of equations:
x1x1+x2+x3−x1=0=0=0
We have that x1=0, and x2=−x3 and therefore the eigenvector is given by. Or like x2=−t and x3=t.
v1=a10−11
And a1 is just an arbitrary constant. Remember, we usually choose a1 so that the eigenvector becomes a unit vector. But, because we want to change basis, we can choose a1=1 to make the resulting matrix of eigenvectors as simple as possible.
Now, let λ=2
(A−1I)x=2−21−102−20011−2x=01−100001−1x
Let’s look at this like it’s a real problem to solve. We need 2 more vectors of x that will zero this out.
That’s a little cheap, but gets the job done. Our other two eigenvectors are therefore
v2=a2010andv3=a3−101
Ok, wow. Now, going back to slotting them into a matrix side-by-side
S=0−11010−101
That is the matrix of eigenvectors, and below is the corresponding diagonal matrix of eigenvalues!
it is literally, just what it sounds like…
D=100020002
If we really want to go the extra mile, we can verify the answer using the expression D=S−1AS.
□
As we are dealing with determinants, note that not all matrices are diagonalizable. The link to Statlect.com is very good because there is more to this that the course book leaves out.
5.4 - Tensors
p. 122
We are going to look at scalars, vectors, and matrices in a more unified way. Typically, numbers are useless without more information, the context of which they represent, whether it be slices of cake, or kilograms of mass. Scalars always produce scalars when we operating with scalars.
We revisit v=ai+bj+ck to talk about vectors and unit vectors multiplied by scalars for magnitude.
Sum of 2 vectors is another vector
Inner product of vectors produces a scalar
Cross product of vectors produces orthogonal vector.
Multiplication of a vector by a scalar is a vector.
We have also operated on vectors with matrices to rotate vectors, or express a change of basis or coordinate system.
Not everything can be represented with scalars and vectors. For example, the inertia matrix requires nine elements of a 3×3 matrix to describe behaviour. Similar to also describing an external magnetic field, it requires a 3×3 matrix as well.
Informally, we have been dealing with Tensors the entire time, but let’s see what that means:
Scalar = Tensor of rank 0 (30=1 components)
Vector = Tensor of rank 1 (31=3 components)
Dyad = Tensor of rank 2 (32=9 components)
Triad = Tensor of rank 3 (33=27 components)
We can understand the rank of a tensor intuitively as the number of indices we need to express the tensor. Our vector has one index, hence can be represented like ai∀a∈a.
A Tensor of rank 2 behaves like a square matrix, but tensors also generalize the concepts we have covered so far.
if a and b are vectors and T is a tensor of rank 2, we can express the linear equations b=Ta as the following system of linear equations:
Looks a bit like matrix multiplication. Apparently, we now look at the following summation convection where we don’t explicitly write the summation sign because why?
bi=j=1∑3tijaj=tijaj
When a concept appears easy, we make it more difficult by changing notation, that way only we can understand it. It’s job security.
Basically, it might look like regular multiplication, but it is implied we sum over the elements of j. It becomes a useful shorthand when there’s many indices used.
Definition - Symmetric Tensors: A Tensor is symmetric if the tensors invariant under a permutation of its indices. Basically, elements do not flip from positive to negative, or vice versa, if the indices are swapped (i.e. tijk⋯z=tjik⋯z for i and j swapping).
If the signs would flip, the Tensor is anti-symmetric.
Calculating with Tensors
p. 124
Addition and subtration
Only defined for tensors of the same rank.
addition is commutative, aij+bij=bij+aij
Dyad product
Multiply each component term-by-term
Always leads to a tensor of higher rank
Generally not commutative, aik⊗blm=blm⊗aik
For 2 Tensors of say, rank 1, they are multiplied to produce a square matrix.
Example below.
Contraction
Used when we sum over the indices of a tensor, if the index occurs twice.
Consider dot product…
a⋅b=∑aibi=aibi
The sum is rank 0. In essence, we contracted the index i of the vectors
Trace
A case of contraction and is calculated in same way we have seen in case of square matrix.
The Dyad product is apparently also called the Kronecker product. The link also links to many properties of Kronecker product which are currently beyond the scope of learning.
EXAMPLE
While I have a fresh understanding, suppose we have A and B such that
Definition - Trace: The trace of a square matrix is the sum of its diagonal entries. Let A be a K×K matrix.
tr(A)=k=1∑KAkk
I don’t think an example is necessary.
It has properties like the trace of the sum of 2 matrices is equal to the sum of the traces.
CoVariant and ContraVariant Tensors
p. 125
Regardless of using any coordinate system
Cartesian: x,y,z
Spherical: r,θ,ϕ
Cylindrical: r,θ,z
We must obtain the same physical relationships.
Definition - Symmetry: If a system is symmetric under some operation, we mean that it looks the same whether we apply the operation or not.
Some things are just harder to write in certain coordinate systems. We may want to write equations in better suited systems, and therefore require a change to the coordinate system, or base. As we do this, some quantities will change and some will remain invariant.
An example of an invariant quantity could be the mass m of a particle. Irrespective of the coordinate system used to describe its path, the mass remains the same.
Definition - Contravariant: A property of a vector such that it is independent of the change of base or coordinate system. For this to be the case, the matrix that transforms the vector component has to be the inverse of the matrix expressing the change in base or coordinate system. Typically denoted with a super-script (e.g. ri).
The book give an example of changing units of distance from meters to millimetres. The base is divided by 1000 but the vector components are multiplied by 1000.
Definition - Covariant: A covector transforms the same way as the change of base or coordinate system does. Its components co-vary and hence, the property is called covariant. Typically denoted with a sub-script (e.g. ri).
The example being strength of an electric field in V/m, and shifting length to mm. The coordinate system and covector need to be multiplied by 1000 to stay invariant.
Tensors can have both covariant and contravariant properties.
According to Wolfram MathWorld, these are topics of differential geometry.
Knowledge Check
QUESTION 1
If D3×4=AB+CT, how would you describe the dimensions of A, B, and C?
Since adding a matrix must have same dimension, then C4×3, since transpose. And because inner neighbours must be the same, and the product has dimensions of outer neighbours, A3×K and BK×4 where K can be basically any real number.
QUESTION 2
Given that A,B, and C are square matrices of the same size, can you rewrite Tr(AB)−Tr(AC) in a different format?