Overfitting using Higher-order Linear Regression

Hindavi Churi
Nov 12, 2021
4 min read

In this project, we try to learn the concept of overfitting over various degrees of linear regression. The method used for training and testing is Gradient Descent. We will overview the overfitting concept and its solutions.

Data

For this project, we generated 20 data points ( X, Y ). We define X value uniformly. Then create a normal distribution using 0 and 1 as mean and std respectively. Finally, we generate the Y value using the function defined as,

Y = sin ( 2 * pi * X ) + N * 0.1

The data is then split into train and test data sets.

Overfitting

Training data can always give an error value. Error value defines the difference between predicted and actual values. When we try to run the same function multiple times or say increase the epoch for training, we can realize that the error for training data eventually becomes close to zero.

At first, this may seem a good indication that the model is well trained. But in reality, this is not a good sign. The reason is, as we try to increase training in turn decreasing the error, we try to fit the data through the points.

Now even now this may seem good, but consider testing data against this model. The model already tends to train data and as new data emerges, it will not give a proper prediction for that test data.

A solution to overfitting can be as follows:

Increase data
Regularization

Underfitting

There are many times when we do not train data enough which in turn causes improper training of data. In other words, it can cause the underfitting of data. Underfitting causes the improper prediction of data.

Hence, when we try testing data over this trained model the testing data may not be predicted properly as it should be because it does not have enough information or knowledge whether where to fit the new data.

A solution to underfitting can be as follows:

Increase training of data

Hence be it overfitting or underfitting, a proper balance of training the data must be obtained. The training should not be very less or very much but somewhere between both.

Gradient Descent

Gradient Descent is basically used to find the minimum point or cost from a given function.

The process starts with selecting a random point. the function then moves in the direction where the gradient causes less loss. Doing the same process repeatedly, it reaches a point which is called minima from which either of its sides has more cost value. While doing so, there can be local minima and global minima which can be a disadvantage.

This application is used in linear regression. It finds the next point from current using the loss function which tells it to go in which direction. It does this by updating the weights to the coordinates, which in turn becomes the next point in the predicted direction.

In this project, we used the gradient descent function to predict the y values for given x values based on the weights we trained. Many degrees are trained and tested, each giving different results.

Effect of Degrees

For the same set of train data with an increase in degree, we can see that the fitting of the curve against the test data varies.

See the below figures for the results,

degree=0

degree=1

degree=3

degree=9

From this, we can say that, as the degree increases the curve starts to fit the data more accurately. An increase in accuracy causes a decrease in the error which in turn means overfitting of the data.

Train vs Test error

As we train and test data for each degree, there are variations in error for each degree.

See below for the results,

,where blue dot = test_error and red dot = train_error

We plotted the train and test error for each degree and can see that as the degree increases the train error decreases and test error increases, hence indicating overfitting results.

Generating more data

One of the solutions to overfitting is training more data. As we increase the size of the data for training, the data fitting tends to not overfit or underfit the data. It distinguishes but does not exactly fit the data. Hence, while testing the data the more accurate and better predictions are made.

See below figure for results,

Regularization

Another solution to overfitting is regularization. Regularization is one of the popular methods used to overcome the underfitting problem.

In the traditional method with an increase in training, the error decreases but the weights increase tremendously. These weights are the ones that hence cause the overfitting of data.

In regularization method, we define a term penalty for weights. The penalty is applied to weights that are great in value. Hence, the loss function is optimized with the optimal weight value. The penalty of the weight keeps the gradient function intact.

The formula for the regularization is shown below,

Contribution

My contribution in this project is the dynamic part for weights where I have used a weight list that keeps track of the weights instead of defining each weight separately and hence avoiding the space complexity and space waste.

I also tried increasing the epochs(10000) for a better understanding of the data and to see different results for each degree. I can conclude that for more epochs the curve tends to overfit the data.

Experiments

I have experimented with degree 15.

The results for the same can be seen below,

We can see that the weight has increased intensively. Hence, we can conclude that as the degree increases, the graph shows the bulk of weight can be seen hanging between points.

Challenges

The challenges faced were waste initially but become null as and when I started understanding each factor that contributes towards the training of data.

Understanding the loading of data - crucial to understand the generation of the data using uniform and normal function and how different values in it matter in the fitting of data. When many trails became easy.
Fitting of data - The factors that contribute to the overfitting of the data. Why and how do different degrees vary? How does that function? With changes in the factors like epoch and many matplot function, the understanding became easier and visually understandable.

Implementation