Post #2977 by FlyingCircus on the Mechanical Investing board

Investment Strategies / Mechanical Investing

Unthreaded | Threaded | Whole Thread (8)

Post Reply | Report Post | Recommend It!

No. of Recommendations: 30

I'm startled by how good the new AI tools are at generating the code I need to do backtests. I asked claude.ai to write a Python program to test a market timing strategy that I had first come across in 2015. The strategy, called "The Buy and Sell Portfolio" was one of 3 timing strategies described in the following blog post:

How To Create Portfolios That Adapt To Market Changes
https://thetaoofwealth.wordpress.com/2015/02/09/ho...

The Buy and Sell strategy has the following rules.
1. Buy the S&P 500 whenever it closes at a 6 month high.
2. Sell the S&P 500 whenever it closes at a 1 year low, and put the proceeds into 5 year treasuries.

The author (Dick Stoken) claims the following results for this strategy.

In sample backtest of Buy and Sell market timing strategy
Backtest period: 1972 - 2010 (39 years)
CAGR: 13.8%
Max drawdown: +6.1%
Max drawdown for buy and hold S&P 500: -37.6%

I had bookmarked this blog post 9 years ago, because I thought the above max drawdown figure looked really good (almost too good to be true).

So I decided to do a post-discovery (out of sample) backtest for the period 2010 - 2024. To do this, I needed to write a program that could fetch daily prices for SPY and implement the buy and sell rules. I couldn't find an ETF that held only 5-year T-bills during this period, so I settled on using IEI (which holds 3 to 7-year T-bills) as a proxy.

To write the backtest program, I went to https://claude.ai and entered the following instructions in plain English:

"I want to back test the following investment strategy. Buy the SPY ETF on 12/31/2009. Sell SPY whenever the price hits a 52-week low and purchase the IEI ETF with the cash from the sale. Sell IEI and repurchase SPY whenever the SPY price hits a new 26-week high. Please provide the Python code to test this strategy from 12/31/2009 to the present using an API to a data source that provides daily closing prices for the SPY ETF."

In a few seconds, Claude generated the Python program I needed. It even offered to calculate the Sharpe ratio and max drawdown stats for me. After running the program, I asked Claude to also calculate the CAGR and compare it to the CAGR of buying and holding SPY.

I noticed that the program used the yfinance Python package to pull prices from Yahoo Finance. I had to install that package on my MacBook using pip install. Here are the results I got after I ran the program:

Post-discovery (out of sample) backtest of Buy and Sell market timing strategy
Backtest Results (2009-12-31 to 2025-01-01):

Trading Strategy Performance:
Final Value: $39,012.64
Total Return: 290.13%
CAGR: 9.50%
Sharpe Ratio: 0.58
Maximum Drawdown: -19.78%
Max Drawdown Period: 2020-02-19 to 2020-03-18

Buy and Hold SPY Performance:
Final Value: $69,283.30
Total Return: 592.83%
CAGR: 13.77%
Sharpe Ratio: 0.73
Maximum Drawdown: -33.72%
Max Drawdown Period: 2020-02-19 to 2020-03-23

Strategy Trade Summary:
Number of Trades: 11

Trade History:
2009-12-31: Switch to SPY
2011-10-03: Switch to IEI
2012-01-25: Switch to SPY
2016-02-11: Switch to IEI
2016-04-18: Switch to SPY
2018-12-19: Switch to IEI
2019-04-05: Switch to SPY
2020-03-12: Switch to IEI
2020-08-10: Switch to SPY
2022-05-09: Switch to IEI
2023-04-28: Switch to SPY

We see disappointing post-discovery results compared to the in-sample test period. This is a common result with most market timing strategies that were discovered around 2010 right after the Great Recession ended.

The CAGR is only 9.5% compared to SPY's 13.8%, and investors in the market timing strategy were still exposed to a -20% drawdown in 2020 - nowhere near as good as the in-sample results.

I hope this post inspires others here to try out claude.ai (and other chatbots) to generate their own backtesting programs.

Print the post

Post Reply | Report Post | Recommend It!

No. of Recommendations: 17

I assume you know Python and can double check the generated Python code.

I use R and have been using perplexity.ai a lot recently to generate code from plain English prompts.

FWIW, my take-aways:
1) It's often very good at generating logically correct, executable, code from plain English prompts for small well defined tasks.
I know this because I always double check it, it's usually very good.
But 'usually' isn't perfect, it needs double checking.

3) It quickly becomes marginal at generating logically correct, executable, code from plain English prompts for somewhat larger tasks. Even if the code executes, it can contain subtle logical errors (the worst kind!).
It always needs double-checking.

4) It is very good at reading my old code and cleaning it up.
Again, I double check it, but it does an excellent job at cleaning up existing code.

SUMMARY:
For small well defined tasks it usually produces logically and syntactically correct code. It definitely has increased my productivity: it's much easier to check small code chunks than produce them ab initio myself. It's nice to have something come in and clean up my code. Checking the resulting code is essential, it's pretty good but "pretty good" isn't good enough.

CONCERN:
AI'S are getting integrated into IDE's now, and if humans aren't carefully checking the generated code then
(1) Critical code can be generated that produces wrong results (perhaps due to subtle logical error, or due to a side case that was never checked). It may look "sort of right", but "sort of right" isn't good enough.
(2) Generated code is getting deposited into GitHub, this can potentially pollute the code base for training future AIs.

ANALYZING COMPANIES:
I've been doing a lot of the following recently with perplexity.ai

"Examine the financial health of NVDIA over the past few years, it's present health, and it's prospective health as well as estimated returns for the next few years, using analysis of balance sheet, cash flow, and income statement, including any other information that you may have."

I don't take what perplexity.ai outputs as true, but the output provides a nice entry to get into the analysis yourself. You can also ask followup questions e.g.
"What do you see as potential major problems?"
Always double check important conclusions.

CONCLUSION:
I agree that it's startling how good they are, yet sobering that they aren't good enough.

A lot has been said about the dangers of AIs becoming smarter than us, somehow taking over or whatever.
Certainly they will have a place on the battlefield, where 'move fast and break things' is unfortunately the whole point.
But I think that there's now a very present danger with "AM" i.e. "Automated Mediocrity".
They're just not all that good yet, but they're everywhere.

Print the post

Post Reply | Report Post | Recommend It!

No. of Recommendations: 12

What a phenomenal specification for how to the next phase of mechanical investing!

In a related vein, I used Google's Gemini - in Sheets - to "analyze my data" - which was my curated logging sheet of the last six years of weekly market timing signal tracking and SPY index values. This was after using the excellent and venerable software tool Orange on the same dataset. I had spent probably 8-12 hours curating/refining the source data and adding custom calculated fields such as forward 1 month return.

I learned the basics of Orange in a few hours and was able to generate (and learn about) stats on the predictive value of week to week signal changes for forward returns in a few more hours.

The Gemini plugin returned similar statistical analysis and results with better - excellent - layman's explanations in 15 seconds.

They both told me there's effectively zero predictability in forward returns from a sum of weekly changes in the signal totals. (There may be, at extremes). But this tool can very quickly help me analyze which of the indicators I track may be more associated/correlated with positive future return periods - especially at switch times. (Like the BCs, breadth measures, etc.).

Good hunting,
FC

Print the post

Post Reply | Report Post | Recommend It!

No. of Recommendations: 14

I assume you know Python and can double check the generated Python code.

Yes, I have a data science background, and have used Python extensively to analyze datasets with millions of rows. To double check the code that claude.ai generated, I first looked at a price chart of SPY on the dates that the program said to switch to SPY or IEI, and found that they did line up with 6-month highs and 52-week lows.

I also changed the start date to Jan 2007 to include the 2008-09 bear market, and the program correctly calculated SPY's max drawdown as 55%. The strategy got out of SPY on 2008-Jan-17 and got back in on 2009-May-09 with a max drawdown of only 15%. So it worked well to avoid the worst of the Great Recession. But this type of market timing hasn't worked satisfactorily since 2010, as my post-discovery backtest shows.

Here's the Python program that claude.ai generated to backtest the strategy. Claude also told me which Python packages to install. You will need to run pip install yfinance pandas numpy to install all the packages.

------------------

import yfinance as yf
import pandas as pd
import numpy as np
from datetime import datetime

def get_historical_data(ticker, start_date, end_date):
    """Fetch historical data for a given ticker"""
    stock = yf.Ticker(ticker)
    df = stock.history(start=start_date, end=end_date)
    return df['Close']

def calculate_signals(spy_prices):
    """Calculate trading signals based on 52-week lows and 26-week highs"""
    # Calculate rolling windows
    rolling_52_week_low = spy_prices.rolling(window=252).min()  # 252 trading days ≈ 52 weeks
    rolling_26_week_high = spy_prices.rolling(window=126).max() # 126 trading days ≈ 26 weeks

    # Initialize signals DataFrame
    signals = pd.DataFrame(index=spy_prices.index)
    signals['spy_price'] = spy_prices
    signals['52_week_low'] = rolling_52_week_low
    signals['26_week_high'] = rolling_26_week_high

    # Generate buy/sell signals
    signals['position'] = 'SPY'  # Start with SPY position

    # Initialize with first position as SPY
    current_position = 'SPY'
    positions = []

    for i in range(len(signals)):
        if current_position == 'SPY' and signals['spy_price'].iloc[i] == signals['52_week_low'].iloc[i]:
            current_position = 'IEI'
        elif current_position == 'IEI' and signals['spy_price'].iloc[i] == signals['26_week_high'].iloc[i]:
            current_position = 'SPY'
        positions.append(current_position)

    signals['position'] = positions
    return signals

def calculate_returns(signals, iei_prices):
    """Calculate strategy returns"""
    # Initialize portfolio value
    portfolio = pd.DataFrame(index=signals.index)
    portfolio['position'] = signals['position']

    # Calculate daily returns
    spy_returns = signals['spy_price'].pct_change()
    iei_returns = iei_prices.pct_change()

    # Calculate strategy returns based on position
    portfolio['daily_return'] = 0.0

    for i in range(1, len(portfolio)):
        if portfolio['position'].iloc[i] == 'SPY':
            portfolio['daily_return'].iloc[i] = spy_returns.iloc[i]
        else:
            portfolio['daily_return'].iloc[i] = iei_returns.iloc[i]

    # Calculate cumulative returns
    portfolio['cumulative_return'] = (1 + portfolio['daily_return']).cumprod()

    return portfolio

def calculate_buy_and_hold_returns(spy_prices):
    """Calculate returns for buy-and-hold SPY strategy"""
    buy_hold = pd.DataFrame(index=spy_prices.index)
    buy_hold['daily_return'] = spy_prices.pct_change()
    buy_hold['cumulative_return'] = (1 + buy_hold['daily_return']).cumprod()
    return buy_hold

def calculate_risk_metrics(portfolio):
    """Calculate risk metrics including Sharpe ratio and maximum drawdown"""
    # Calculate Sharpe Ratio
    risk_free_rate = 0.02  # Assuming 2% annual risk-free rate
    daily_rf_rate = (1 + risk_free_rate) ** (1/252) - 1
    excess_returns = portfolio['daily_return'] - daily_rf_rate
    sharpe_ratio = np.sqrt(252) * (excess_returns.mean() / excess_returns.std())

    # Calculate Maximum Drawdown
    cum_returns = portfolio['cumulative_return']
    rolling_max = cum_returns.expanding().max()
    drawdowns = cum_returns / rolling_max - 1
    max_drawdown = drawdowns.min()

    # Calculate when max drawdown occurred
    max_drawdown_idx = drawdowns.idxmin()
    peak_idx = rolling_max.loc[:max_drawdown_idx].idxmax()

    return {
        'sharpe_ratio': sharpe_ratio,
        'max_drawdown': max_drawdown,
        'max_drawdown_start': peak_idx,
        'max_drawdown_end': max_drawdown_idx
    }

def calculate_cagr(portfolio, start_date, end_date):
    """Calculate Compound Annual Growth Rate (CAGR)"""
    total_return = portfolio['cumulative_return'].iloc[-1]

    # Calculate number of years
    start_date = pd.to_datetime(start_date)
    end_date = pd.to_datetime(end_date)
    years = (end_date - start_date).days / 365.25

    # Calculate CAGR
    cagr = (total_return ** (1/years)) - 1

    return cagr

def main():
    # Set date range
    start_date = '2009-12-31'
    end_date = datetime.now().strftime('%Y-%m-%d')

    # Fetch historical data
    spy_prices = get_historical_data('SPY', start_date, end_date)
    iei_prices = get_historical_data('IEI', start_date, end_date)

    # Calculate signals and returns for strategy
    signals = calculate_signals(spy_prices)
    portfolio = calculate_returns(signals, iei_prices)

    # Calculate buy-and-hold returns
    buy_hold = calculate_buy_and_hold_returns(spy_prices)

    # Calculate metrics for both strategies
    strategy_metrics = calculate_risk_metrics(portfolio)
    buy_hold_metrics = calculate_risk_metrics(buy_hold)

    strategy_cagr = calculate_cagr(portfolio, start_date, end_date)
    buy_hold_cagr = calculate_cagr(buy_hold, start_date, end_date)

    # Print results
    initial_investment = 10000  # Example initial investment

    print(f"\nBacktest Results ({start_date} to {end_date}):")
    print("\nTrading Strategy Performance:")
    strategy_final = initial_investment * portfolio['cumulative_return'].iloc[-1]
    strategy_return = (strategy_final / initial_investment - 1) * 100
    print(f"Final Value: ${strategy_final:,.2f}")
    print(f"Total Return: {strategy_return:.2f}%")
    print(f"CAGR: {strategy_cagr*100:.2f}%")
    print(f"Sharpe Ratio: {strategy_metrics['sharpe_ratio']:.2f}")
    print(f"Maximum Drawdown: {strategy_metrics['max_drawdown']*100:.2f}%")
    print(f"Max Drawdown Period: {strategy_metrics['max_drawdown_start'].date()} to {strategy_metrics['max_drawdown_end'].date()}")

    print("\nBuy and Hold SPY Performance:")
    buy_hold_final = initial_investment * buy_hold['cumulative_return'].iloc[-1]
    buy_hold_return = (buy_hold_final / initial_investment - 1) * 100
    print(f"Final Value: ${buy_hold_final:,.2f}")
    print(f"Total Return: {buy_hold_return:.2f}%")
    print(f"CAGR: {buy_hold_cagr*100:.2f}%")
    print(f"Sharpe Ratio: {buy_hold_metrics['sharpe_ratio']:.2f}")
    print(f"Maximum Drawdown: {buy_hold_metrics['max_drawdown']*100:.2f}%")
    print(f"Max Drawdown Period: {buy_hold_metrics['max_drawdown_start'].date()} to {buy_hold_metrics['max_drawdown_end'].date()}")

    # Print trade summary
    position_changes = portfolio[portfolio['position'] != portfolio['position'].shift(1)]
    print(f"\nStrategy Trade Summary:")
    print(f"Number of Trades: {len(position_changes)}")
    print("\nTrade History:")
    for date, row in position_changes.iterrows():
        print(f"{date.date()}: Switch to {row['position']}")

if __name__ == "__main__":
    main()

Print the post

Post Reply | Report Post | Recommend It!

No. of Recommendations: 2

Great, I assumed you checked it. I check on data where I know the answer too. Also read over AI produced code to see if it looks robust etc as I'm sure you do. Lots of checks!

But a lot of folks might not be used to that protocol, and blindly use the code output of these AI's.
In addition to possibly being led astray, it can become a self-reinforcing loop where bad results get published on a blog or reddit or wherever, and bad or just badly styled code gets deposited in GitHub. Then future AI's train on this.

I use AI generated code a lot and it has definitely helped my productivity. But its limitations have also become clear.

Not sure what can be done about it.

Print the post

Post Reply | Report Post | Recommend It!

No. of Recommendations: 14

mechinv wrote: The Buy and Sell strategy has the following rules.
1. Buy the S&P 500 whenever it closes at a 6 month high.
2. Sell the S&P 500 whenever it closes at a 1 year low, and put the proceeds into 5 year treasuries.

This is not exactly what he wrote in the link you provided.

9. The Buy and Sell Portfolio:

Buy when DJIA/S&P 500 closes at a 6 month high. Sell DJIA/S&P 500 closes at a 1 year low. Put the proceeds into 5 year treasuries.
CAGR: 1926-2010= 12.77% ( worst drawdown=-20.30%)( worst drawdown for buy and hold=-64.21%)
CAGR: 1972-2010=13.84%( worst drawdown=6.09%)( worst drawdown for buy and hold=-37.61%)

This immediately did not pass the sniff test. Worst drawdown is 6.09%! Very doubtful. So I wondered what it would take me to plug in the data and run the test so I used my stopwatch - total time 5 minutes and 55 seconds. I'm not convinced that AI can beat out a simple setup in Excel. Not yet, anyway!

So the results? First off, you'll note that he states DJIA/S&P 500 closes. I'm not sure what he means so I tested the S&P 500 by itself, the Dow by itself, and choosing the earliest signal of the two. It may be that he means averaging the two indexes to produce the signal. Not sure.

First, we test 1972 to 2010.

S&P 500 Signals

* CAGR 10.80%
* Max DD -32.47%

Dow30 Signals

* CAGR 8.57%
* Max DD -42.63%

Combined Signals

* CAGR 9.19%
* Max DD -39.60%

Second, we test 1972 to present.

S&P 500 Signals

* CAGR 10.74%
* Max DD -32.47%

Dow30 Signals

* CAGR 8.96%
* Max DD -42.63%

Combined Signals

* CAGR 9.77%
* Max DD -39.60%

It is very simple to realize the Max DD he claims is impossible. Why? The 1987 drop! The sell is 10/19/1987 in all three cases. The buy back into the market is 6/14/1988. The Drawdown period is 8/25/1987 to 8/22/1988.

I wish he had posted his actual signals and then it would be easier to see where he failed in his setup.

Print the post

Post Reply | Report Post | Recommend It!

No. of Recommendations: 4

It is very simple to realize the Max DD he claims is impossible.

Yes, I suspected that Stoken's Max DD figure as reported in the blog post looked too good to be true. Thank you for confirming.

As far as using Excel vs Python, they both have their strengths. What I was trying to show is that, with AI, you don't need a development background any more to create powerful custom programs for backtesting that report exactly the results you want.

Print the post

Post Reply | Report Post | Recommend It!

No. of Recommendations: 6

mechniv wrote: Yes, I suspected that Stoken's Max DD figure as reported in the blog post looked too good to be true.

I found the CAGR for every year from 1928 to 2024. Interesting. The vast majority of the time it was worse during the bull signal than during the bear signal. It makes sense. The strategy is literally waiting for a very long time before issuing a signal from a bottom, and waiting a very long time after a peak before selling.

Peak/Trough	Signal	Days	Gain/Loss
5/20/2015	8/20/2015	92	-4.64%
2/11/2016	6/2/2016	112	16.31%
9/21/2018	11/23/2018	63	-10.53%
12/26/2018	4/12/2019	107	24.03%

It is truly remarkable to me that anyone could possibly think this system would be market beating with such a low MaxDD. Prior to the signal in August 2015 the S&P 500 had already dropped -4.64%, and after the bottom prior to the signal to re-enter, it had already risen 16.31%. Clearly, you were better off not doing anything but just holding. The price on 8/20/15 was $2,035.73, and the price on 6/2/2016 was $2,105.26. This is a difference of 3.42%. The cost of missing the drop was not too much in this case, but I can assure you there are many other times when the cost is much, much more dear!

Print the post

Post New

Unthreaded | Threaded | Whole Thread (8)

Prev | Next

Announcements

Mechanical Investing FAQ

Contact Shrewd'm
Contact the developer of these message boards.