[Machine Learning] Regularization

2023.07.13
Machine Learning Regularization

Regularization lets you keep all of your features,
but they just prevent the features from having an overly large effect,
which is what sometimes can cause overfitting.

Cost Function with Regularization

$$ J'(\vec{\theta}) = J (\vec{\theta}) + \frac{\lambda}{2m}\sum^n_{k = 1}\theta^2_k $$

\begin{align}
& m \text{: number of training data} \\
& n \text{: number of features}
\end{align}

Then, for finding the minimum of J’,
it can effectively nearly prevent “the features which have overly large effect”
and “the over-high order polynomial model”.

λ balances both goals: fitting data and keeping parameters small.

Gradient Descent

\begin{align}
& \theta_{k+1} = \theta_k – \alpha \frac{\partial}{\partial \theta}J’ = \theta_k – \alpha(\frac{\partial}{\partial \theta}J + \frac{\lambda}{m}\theta_k) \\
\Rightarrow & \theta_{k+1} = (1 – \frac{\alpha\lambda}{m})\theta_k – \alpha\frac{\partial}{\partial \theta}J
\end{align}

\begin{align}
& -\alpha\frac{\partial}{\partial \theta}J \text{ is usual update} \\
& \text{The new part }- \frac{\alpha\lambda}{m}\theta_k \\
& \text{ decreases the value of } \theta \text{ each iteration by a little bit.}
\end{align}

Last Updated on 2023/08/16 by A1go