Post #2031 by mungofitch on the Mechanical Investing board

Investment Strategies / Mechanical Investing

Unthreaded | Threaded | Whole Thread (40)

Post New | Post Reply | Report Post | Recommend It!

No. of Recommendations: 13

ML evaluates a model differently. They first define an algorithm to take any data and predict the future
(like a neural network or many tree search). Then the break the historical data into shorter subsets. For
each subset you train the predictor model on typically 60 to 70% of the subset and test the results.
The model is judged on the results of all the train-test sequences where no train-test sequence has any
future data. A model that does well did well looking only at the data it had at that point in time.

This is how all data mining is done, when it's done correctly. Same with "classic" MI screens.
(terminology clarification: data mining is a good thing, overmining is a problem)

The problem is this: the process you have described, including seeing which models worked on the withheld out-of-sample validation subset and killing those that didn't, is itself another layer of data mining, another step in a single larger and more complex model building process. That "greater" process has no out-of-sample validation. Once you have culled your set of models by looking at the effectiveness within the validation data set you held back, that validation data set is contaminated and is now in sample, not out of sample.

This might be seen to be "OK" if you only did it once (depending on your strictness), but nobody does this just once. There just isn't enough financial history. In effect all history gets used, and it's all in sample. The only out of sample is the stuff that actually happened after you stopped modelling, and (to be strict) after you stopped culling your set of models.

Combined with machine learning with a lot of parameters, the ability of your final model to have memorized the data set AND the validation set is nigh unbounded. That's not to say it's impossible to find a new and useful insight this way, but it's a very thorny patch in which to be hunting.

Jim

Post New | Post Reply | Report Post | Recommend It!

Print the post

Unthreaded | Threaded | Whole Thread (40)

Prev | Next

Announcements

Mechanical Investing FAQ

Contact Shrewd'm
Contact the developer of these message boards.

A community forum supporting civilized, and highly helpful, self-educating investors that come together for Shrewdness and merry spirits. These message boards closely follow the look and feel of the old message boards at boards.fool.com prior to the boards redirected to discussion.fool.com. Shrewd'm is not affiliated with the comprehensive and excellent investment website, The Motley Fool (TMF), in any way. The Shrewd commmunity here has owed, and continues to owe, gratitude to The Motley Fool for nurturing a culture of jovial irreverence towards Wall St, and that tradition will continue here.