Andrew Ng’s Machine Learning Simplified — Part 5| Multivariate Linear Regression

Congratulations on completing Week 1🎉! Let’s hop right in to Week 2 with Multivariate Linear Regression.

Link to Part 4 -

[Andrew Ng’s Machine Learning Simplified — Part 4 | Linear Algebra
In part 4 of this series we will discuss linear algebra.medium.com](https://medium.com/@aakriti.sharma18/andrew-ngs-machine-learning-simplified-part-4-linear-algebra-bbde2852d62e "medium.com/@aakriti.sharma18/andrew-ngs-mac..")

Linear regression with multiple variables is also known as “multivariate linear regression”.

Now for every input x we have multiple features xa, xb, xc….. to describe .

For each of the feature we have to estimate a value that when multiplied to the input value contributes to the closest guess to the output. This process is carried out by Gradient Descent.

Gradient Descent for Multivariate Regression

Part 3 walks us through the procedure of Gradient Descent for one variable. For many of them, we just repeat the process multiple times to calculate the value for each.

This complex mathematical equation just means that we first plot the cost ( the difference between actual and predicted value ) for all possible parameter values and find where it is minimum ( which is the whole point of this? finding the value closest to the actual value i.e. least loss )

The steps it takes is the learning rate. But now that we have multiple features, the range ( difference between highest and lowest value ) may vary. We can speed up gradient descent by having each of our input values in roughly the same range. This is because θ will descend quickly on small ranges and slowly on large ranges, and so will be inefficient while reaching down to the optimum value when the variables are very uneven.

To solve this problem we scale all the variables to the same range, Two techniques to help with this process are Feature Scaling and Mean Normalization.

Feature scaling involves dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1.

Mean normalization involves subtracting the average value for an input variable from the values for that input variable.

To implement both of these techniques, adjust your input values as shown in this formula:

Where μi is the average of all the values for feature (i) and si is the range of values (max — min)

For example, if xi represents housing prices with a range of 100 to 2000 (range = 2000–100 = 1900) and a mean value of 1000, then new value of xi is -

Choosing the right Learning Rate

Learning rate is the step gradient descent takes to reach the optimal value of the parameter. If it is too small, the process will be very slow. If it is too large then it might just skip over the minimum value and result in increase of cost value after an iteration. An ideal learning rate leads to significant but decreasing change in the value of the cost function after EVERY iteration till it reaches the optimal value.

That’s it for today! In the next part we’d be doing some hands-on Linear Regression Practice . Stay tuned✨

Tweet version of the series :