I am trying to reproduce the projection of the original data on the 1st principal component taken from here http://zoonek2.free.fr/UNIX/48_R/09.html after scrolling down.
Here is the supposed code
x <- rnorm(100) y <- x + rnorm(100) plot(y~x) r <- lm(y~x) abline(r, col='red') r <- lm(x ~ y) a <- r$coefficients # Intercept b <- r$coefficients # slope abline(-a/b , 1/b, col="blue") r <- princomp(cbind(x,y)) b <- r$loadings[2,1] / r$loadings[1,1] a <- r$center - b * r$center abline(a,b) title(main='Comparing three "regressions"') plot(y~x, xlim=c(-4,4), ylim=c(-4,4) ) abline(a,b) # Change-of-base matrix u <- r$loadings # Projection onto the first axis p <- matrix( c(1,0,0,0), nrow=2 ) X <- rbind(x,y) X <- r$center + solve(u, p %*% u %*% (X - r$center)) segments( x, y, X[1,], X[2,] ) title(main="PCA: distances measured perpendicularly to the line") but that code produces this: Can you provide any of the mathematical details and/or references for this line of calculation here that is supposed to provide the projection of the original data onto the 1st principal component in ORIGINAL space. Here it appears incorrect but can you provide details or a reference for the correct calculation? Thank you. X <- r$center + solve(u, p %*% u %*% (X - r$center)) ================= ================= =================