This note is for the Stanford University online course “Machine Learning” taught by Andrew Ng on Coursera.org, 2016 March session.

## Environment Setup

Octave and MATLAB are preferred in machine learning.

## Multivariate Linear Regression

### Multiple Features

Multiple Features如何用线性方程序来表示？

\$h_theta(x) = theta_0 +theta_1x_1+theta_2x_2+…+theta_nx_n = theta^TX\$

### Cost Function for Multiple Variables

\$theta\$ is an \$n+1\$ -dimention vector

### Gradient Descent for Multiple Variables

Repeat{

(simutaneously update \$theta_j\$ for \$j = 0,…,n\$

notice that \$x_0 = 1\$)

}

### Feature Scaling

• Idea: get every feature into approximately a \$-1leq {x_i}leq 1\$ range

• Mean normalization: replace \$x_i\$ with \$x_i - mu_i\$ to make features have approximately zero mean (Do not apply to \$x_0 = 1\$)

### Learning Rate \$alpha\$

• If \$alpha\$ is too small: slow convergence.
• If \$alpha\$ is too large: \$J(theta)\$ may not decrease on every iteration; may not converge. (slow converge also possible)
• To choose \$alpha\$, try: …,0.001, 0.01, 0.1, 1,…

### Features and Polynomial Regression

• Polynomial regression example: \$theta_0+theta_1x+theta_2x^2+theta_3x^3\$; let \$x_1 = x, x_2 = x^2, x_3 = x^3\$

• Other possiblities: \$theta_0+theta_1x+theta_2sqrt{x}\$, let \$x_1 = x, x_2 = sqrt{x}\$

## Computing Parameters Analytically

Suppose there are \$m\$ examples; \$n\$ features

\$theta = (X^TX)^{-1}X^Ty\$
\$(X^TX)^{-1}\$ is inverse of matrix \$X^TX\$

if \$m < n\$,
\$X^TX\$ may be non-inversible.

Therefore use `pinv` in Octave:
`pinv(X'*X)*X'*y`