Overfitting on purpose

Would it make sense to overfit a model on purpose?

Say I have a use case where I know the data will not vary much respect to the training data.

I'm thinking here about traffic prediction, where the traffic status follows a fixed set of patterns

These patterns won't change much unless there is a sudden increase of car users or major changes in the road infrastructure. In this case I would like the model to be as biased as possible towards the patterns it learned in current data, assuming that in the future the pattern and the data will be very similar.

This seems very fishy, if you have a guarantee that the data will not change you could overfit the model and gain better performance, but this sounds like a bad idea.

– user2974951
Sep 13 '18 at 9:15

It is. But on the other hand I feel that it won't be a bad idea if I know the data won't change much. I'm writing my thoughts here so someone can prove I'm mistaken.

– Brandon
Sep 13 '18 at 9:39

If the out-of-sample predictive performance of your model (i.e. on future data) is not worse than the performance on your current data, then I would say that technically you are not really overfitting. You are overfitting when you are fitting the noise in your current data, and that should always lead to worse predictions on new data. You should be able to set your model to the correct level of complexity by using cross-validation.

– matteo
Sep 13 '18 at 11:19

In a study, a subject was asked to overfit some data, and then they won the lottery. The study concluded that overfitting data is always a good thing.

– Nat
Sep 13 '18 at 13:51

3 Answers
3

In General it does not make sense to overfit your data on purpose. The problem is that it is difficult to make sure that the patterns also appear in the part which is not included in your data. You have to affirm that there are pattern in the data. One possibility of doing so is the concept of stationarity.

What you describe reminds me of stationarity and ergodicity. From a contextual side/ business side you assume that your time series follows certain patterns. These patterns are called stationarity or ergodicity.

Definition stationarity:

A stationary process is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Therefore parameters such as mean and variance also do not change over time.

Definition ergodicity:

An ergodic process is a process relating to or denoting systems or processes with the property that, given sufficient time, they include or impinge on all points in a given space and can be represented statistically by a reasonably large selection of points.

Now you want to make sure that it really follows these certain patterns. You can do this, e.g. with Unit root test (like Dickey-Fuller) or Stationarity test (like KPSS).

Definition Unit root test:

$H_0:$ There is a unit root.

$H_1:$ There is no unit root. This implies in most cases stationarity.

Definition Stationarity test:

$H_0:$ There is stationarity.

$H_1:$ There is no stationarity.

Further reading:

What is the difference between a stationary test and a unit root test?

It the time-series really follows these patterns forecasting and predicting will be "easier from a statistical point of view", for example you can apply econometric models for forecasting like ARIMA or TBATS. My answer relates to univariate and also multivariate time series if you have cross-sectional data stationarity and unit roots are not common concepts.

No, it does not make sense to overfit your data.

The term overfitting actually refers to a comparison between models: If model_a performance better on the given training data but worse out-of-sample than model_b, model_a is overfitting. Or in other words: "there exists a better alternative".

If the traffic status "will not vary at all with respect to the training data", then you will achieve the best possible results by simply memorizing the training data (again, that's not "overfitting").

But "data will not vary much with respect to the training data" simply equates to having a reasonable representation of the underlying pattern. This is where machine learning works best (stationary environment as Ferdi explained).

Okey. So maybe it's better to say that we increase the bias on purpose. I said this because I was reading about bias vs variance tradeoff and it made sense to me to have a higher bias for the traffic use case.

– Brandon
Sep 13 '18 at 11:04

algorithm+tuning will give you the best possible results (optimize the trade-off).

– lnathan
Sep 13 '18 at 11:25

+1 but I don't think "there exists a better alternative" follows from overfitting.

– kbrose
Sep 21 '18 at 15:32

I would say, that there is a sense to overfit your data, but only for research purposes. (Don't use overfitted model in production!)

In cases when data can be complex and task non-trivial, trying to overfit a model can be an important step!

If you can overfit a model - it means that the data is possible to be described by the model.

If you cannot even overfit - it can give you a clue for investigation:

Thanks for contributing an answer to Cross Validated!

But avoid …

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Required, but never shown

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt