The Surveyors House: Lesson Note On Solving the least squares problem

Tuesday, May 2, 2017

Lesson Note On Solving the least squares problem

Solving the least squares problemEdit

The minimum of the sum of squares is found by setting the gradient to zero.

Since the model contains m parameters, there are m gradient equations:

{\frac {\partial S}{\partial \beta _{j}}}=2\sum _{i}r_{i}{\frac {\partial r_{i}}{\partial \beta _{j}}}=0,\ j=1,\ldots ,m,

and since

r_{i}=y_{i}-f(x_{i},{\boldsymbol {\beta }})

, the gradient equations become

-2\sum _{i}r_{i}{\frac {\partial f(x_{i},{\boldsymbol {\beta }})}{\partial \beta _{j}}}=0,\ j=1,\ldots ,m.

The gradient equations apply to all least squares problems.

Each particular problem requires particular expressions for the model and

its partial derivatives.

Linear least squaresEdit

A regression model is a linear one when the model comprises a linear combination

of the parameters, i.e.,

f(x,\beta )=\sum _{j=1}^{m}\beta _{j}\phi _{j}(x),

where the function

\phi _{j}

is a function of

x

Letting

X_{ij}={\frac {\partial f(x_{i},{\boldsymbol {\beta }})}{\partial \beta _{j}}}=\phi _{j}(x_{i}),

we can then see that in that case the least square estimate (or estimator,

in the context of a random sample),

{\boldsymbol {\beta }}

is given by

{\boldsymbol {\hat {\beta }}}=(X^{T}X)^{-1}X^{T}{\boldsymbol {y}}.

For a derivation of this estimate see Linear least squares (mathematics).

Non-linear least squaresEdit

There is, in some cases, a closed-form solution to a non-linear least squares problem –

but in general there is not. In the case of no closed-form solution, numerical algorithms

are used to find the value of the parameters

\beta

that minimizes the objective.

Most algorithms involve choosing initial values for the parameters. Then,

the parameters are refined iteratively, that is, the values are obtained by successive

approximation:

{\beta _{j}}^{k+1}={\beta _{j}}^{k}+\Delta \beta _{j},

where a superscript k is an iteration number, and the vector of increments

\Delta \beta _{j}

is called the shift vector. In some commonly used algorithms, at each iteration the

model may be linearized by approximation to a first-order Taylor series expansion

about

{\boldsymbol {\beta }}^{k}

{\begin{aligned}f(x_{i},{\boldsymbol {\beta }})&=f^{k}(x_{i},{\boldsymbol {\beta }})+\sum _{j}{\frac {\partial f(x_{i},{\boldsymbol {\beta }})}{\partial \beta _{j}}}\left(\beta _{j}-{\beta _{j}}^{k}\right)\\&=f^{k}(x_{i},{\boldsymbol {\beta }})+\sum _{j}J_{ij}\,\Delta \beta _{j}.\end{aligned}}

The Jacobian J is a function of constants, the independent variable and the parameters,

so it changes from one iteration to the next. The residuals are given by

r_{i}=y_{i}-f^{k}(x_{i},{\boldsymbol {\beta }})-\sum _{k=1}^{m}J_{ik}\,\Delta \beta _{k}=\Delta y_{i}-\sum _{j=1}^{m}J_{ij}\,\Delta \beta _{j}.

To minimize the sum of squares of

r_{i}

, the gradient equation is set to zero and solved for

\Delta \beta _{j}

-2\sum _{i=1}^{n}J_{ij}\left(\Delta y_{i}-\sum _{k=1}^{m}J_{ik}\,\Delta \beta _{k}\right)=0,

which, on rearrangement, become m simultaneous linear equations,

the normal equations:

\sum _{i=1}^{n}\sum _{k=1}^{m}J_{ij}J_{ik}\,\Delta \beta _{k}=\sum _{i=1}^{n}J_{ij}\,\Delta y_{i}\qquad (j=1,\ldots ,m).

The normal equations are written in matrix notation as

\mathbf {(J^{T}J)\,\Delta {\boldsymbol {\beta }}=J^{T}\,\Delta y} .\,

These are the defining equations of the Gauss–Newton algorithm.

Tuesday, May 2, 2017

Lesson Note On Solving the least squares problem

Solving the least squares problemEdit

Linear least squaresEdit

Non-linear least squaresEdit

No comments:

Post a Comment

Contact Us Today

Featured Post

Introduction to ArcGIS 9.3.1Prepared by DavidR. Maidment

Search This Blog