Subject: Re: ML for MI
To unquarked's It would access a large library of backtests, and be capable of specifying more, as well as situationally updating an ever-elaborating library of relevant history.

TL;DR version: discouragingly (sorry), expectations should be very low due to the scope. Plebes like us could learn the ML tools to varying degrees, test out a few handfuls of promising correlation ideas by trial & error on the limited raw data we have access to. It's all "when this scenario of a, b, c, d and/or e" happens, the forward returns on index / stock have been good / bad". If ML tools can identify for us, faster, some promising *causes* of positive forward returns, that's where it will add value.

*****
Unfortunately, the data needed to do this is only available to us publicans in GTR1 (or to those who pay for Pinnacle), and - although GTR1 is by far the cleanest and most expansive set of stock and index price and fundamental data publicly available for all history - it's limited to the data fields from its sources and the formula functions that one phenomenal developer (Robbie) has coded it to handle.

Now, sites like Barchart, Stockcharts, et al have vast arrays of technical indicator formulas to use to generate current stock picks from screening criteria (below low ATR, Chaikin Flow is a specific value, RSI in a specific range, etc etc etc ad crazium) - but they're not easy or cheap to backtest if they even can. And, they would not / do not allow scraping-type access. I'm not sure what their APIs do.

So, the institutions / hedge funds / database vendors (like FactSet, especially) have built all this data for use by themselves and for very expensive sale to others; and are paying thousands of highly trained data developers to make & backtest ideas to get trading edges (or just cheating and frontrunning trades).

To really build some predictive models, both kinds of data are needed, and a nearly infinite set of variable combinations and formula definitions are possible to test.

Wrapping up - discouragingly (sorry), the only thing plebes like us might be able to get done is learning the ML tools somewhat and testing out a handful of promising correlation ideas by trial & error on the limited raw data we have access to.

More than that is expecting to boil the ocean.

P.S. Long ago now, Bill2m's legendary weekly trade results tracker of every screen was the only "What's working lately" forward tracking database out there - and unfortunately nobody could take over that process when he stopped (which was partly because it was showing The Problem.)

How did Zee come up with his (reasonably reliable) indicators? With (1) paid access to historical data sources, (2) a narrow concept area to focus on: handling extremes in data to identify high-probability opportunities, fairly contrarian and (3) A LOT - YEARS - of time and effort in testing.

XGBoost, pandas and numpy etc may have accelerated those discoveries faster. (Who knows, maybe he used them.)