COMP1021 MCS: Linear algebra
- In the practical we found that matrix multiplication is
$O(n^k)$ - From wikipedia we find
$2.37188 \leq k \leq 3$
- Search engines using linear algebra
- Sparse matrices
- Santa using linear algebra (Markov chains)
- Diagonal matrices and eigenvectors
- Invented by Larry Page and Sergey Brin (who founded google)
- (One of) the main non-content features of google until 2016
- Finds "importance" of a page by the number of pages that link to it
- But this can be gamed with link farms
- So want to consider the "importance" of pages of incoming links
- Find all the links between web pages
- Construct the directed graph where nodes are pages and edges are links
- Make an initial assignment of PageRank values to each web page (node)
- Transfer the PageRank of each page to the pages it links to (follow edges)
- Weight the Rank according to the number of output links
- Repeat
Where
-
$r$ is the rank function -
$p$ and$q$ are pages -
$I(p)$ is the set of pages that link to$p$ -
$l(q)$ is the number of links leaving$q$
- Give all the pages a number
$i=1 \ldots n$ - Find the adjacency matrix
$A$ - Make
$M$ the transposed modified adjacency matrix: scale so each column sums to 1 - Write the vector
$\mathbf{r}$ as$(r_1,\ldots,r_n)$
- More complex version includes damping
- Currently more than a billion web pages
- So solve a
$10^9\times10^9$ matrix - Good news: successive approximation gets close
- Bad news: just to store the matrix would take
$8\times10^{18}$ bytes - Good news: a lot of them are zeros (sparse)
- Bad news: multiplication in numpy is about
$O(n^{2.85})$ - Good news: there are quicker ways of multiplying sparse matrices
- Can think of the PageRank surfer as a Markov process
- Jumps randomly from state to state
- Probability of jumping depends only on current state
- PageRank is the steady state probability of being on a page
- Which reindeer should lead the sleigh?
- What is the weather in Lapland?
- A decent way to forecast the weather is "tomorrow will be the same as today"
- Could simplify to cloudy/snowy/clear
- Look at the history of weather in the place
- Work out transition probabilities between states
- Write transitions as a matrix
$T$ - Write weather today as a (sparse) vector
$\mathbf{w_0}$ $\mathbf{w}_1 = T \mathbf{w}_0$ - Then weather in
$n$ days time is probably$\mathbf{w}_n = T^n \mathbf{w}_0$
- Long range forecast is
$\mathbf{w}_\infty$ - $T \mathbf{w}\infty = \mathbf{w}\infty$
-
$\mathbf{w}_\infty$ is an eigenvector with eigenvalue 1 - In general if
$A\mathbf{x} = \lambda \mathbf{x}$ then$\mathbf{x}$ is an eigenvector of$A$ with eigenvalue$\lambda$
If we can find a basis of eigenvectors then
- Matrix representation for that basis is diagonal
- Change of basis matrix
$P$ is invertible - Write
$A = PDP^{-1}$ where$D$ is diagonal
- Diagonal matrices are easy to store (sparse)
- Diagonal matrices are easy to multiply ($O(n)$)
$A^n = (PDP^{-1})^n = PD^nP^{-1}$ - Much quicker to find powers of matrices
- An example of matrix factorisation
- More in second term
- Can be used to generate musical chord sequences
- Or complete songs
- But I can tell when the whole thing is generated by AI
- Adding some human input can make for a better song
- Have a Very Markov Christmas
- And may your matrices all be diagonal