I will define the variables first. Let’s say t=d(x)+δt = d(x) + \delta where x is a N dimensional vector, d(x) is a deterministic function and δ\delta is noise that has Gaussian probability distribution with 00 mean and variance ς2\varsigma^2 . Now, I have linear function approximation with a parameter vector ww : y=∑Nn=0wnxn=wT⋅xy = \sum_{n=0}^{N} w_n x_n = w^T \cdot x. Now, I want to find the optimal ww such that the SSE = ∑Nn=1(tn−yn)2\sum_{n=1}^{N}(t^n – y^n)^2 is minimized. Here tnt^n is the actual value of in the input. yny^n is the output we have predicted. Hence, we are trying to get yny^n equal as closely possible to tnt^n. So from an intuitive stand point, I understand that finding an optimal set of weights would cause y=ty = t, which would minimize the SSE. However, I am not sure where to start when deriving this mathematically. My initial approach would be to take the derivative of yy with respect to ww and set it equal to 0. But if I did that, then that would just give me x=0x = 0.

=================

What is tnt^n? Also, the sentence “Now, I want to (..)” is incomplete.

– LinAlg

2 days ago

Thanks for pointing that. I have editted the question.

– Christian

2 days ago

=================

=================