Backtesting

You think your strategy works. You have been profitable for three months. But three months is nothing — it could be luck, a favorable market regime, or survivorship bias. Backtesting lets you test your strategy against years of historical data to see if the edge is real.

Why Backtesting Matters

Consider this: a credit spread strategy that wins 70% of the time sounds great. But over a 10-year backtest, you discover that in 2018, 2020, and 2022, the strategy had drawdowns of 25%, 40%, and 20% respectively. Could you survive a 40% drawdown? Would you keep trading after that?

Backtesting reveals:

The true win rate (not the three-month sample)
Maximum drawdown (the worst it gets)
How the strategy performs in different market regimes (bull, bear, high vol, low vol)
Whether your management rules actually improve results
The optimal parameters (DTE, delta, width, take-profit level)

Manual Backtesting

If you do not have backtesting software, you can do it manually with historical option chains and a spreadsheet.

Step 1: Pick a strategy. Example: sell SPY $5-wide bull put spreads at 16 delta, 45 DTE, close at 50% profit or 2x credit loss.

Step 2: Go to a historical options database (CBOE DataShop, OptionNet Explorer, or free resources like Option Alpha's backtest tool) and find the option chain for SPY on the first trading day of each month going back 3 to 5 years.

Step 3: For each month, identify the 16-delta put and sell the spread. Record the credit.

Step 4: Track the spread's value daily (or weekly) until it hits 50% profit or 2x loss. Record the outcome.

Step 5: After 36 to 60 months of data, calculate win rate, average win, average loss, max drawdown, and total return.

This is tedious but incredibly valuable. Doing 60 trades by hand takes about 4 to 6 hours and gives you deep intuition for how the strategy behaves.

Automated Backtesting Tools

For more rigorous analysis, use dedicated platforms:

OptionNet Explorer: The gold standard for options backtesting. $28/month. Full historical option chains, Greeks, and strategy builders. You can test any strategy with custom management rules.

Option Alpha: Offers free backtesting for common strategies on SPY and other liquid underlyings. Preset strategies with adjustable parameters.

OptionStrat: Visual backtesting with P&L charts. Good for understanding how specific trades would have played out.

Python + OptionMetrics/CBOE data: If you code, you can build custom backtests with exact historical option pricing. This is the most flexible but requires programming skills and data subscriptions ($500 to $2,000+/year for institutional-quality data).

What to Backtest

Parameters to Optimize

DTE at entry: Test 21, 30, 45, and 60 DTE. Most studies show 45 DTE outperforms for credit strategies because of the theta decay curve.

Delta of short strike: Test 10, 16, 20, 25, and 30 delta. Lower delta = higher win rate but smaller credits. Higher delta = more credit but more losers.

Spread width: $5, $10, $20. Wider spreads collect more credit but risk more per trade.

Take-profit level: 25%, 50%, 75%, or hold to expiration. The 50% take-profit has been shown to improve risk-adjusted returns across most studies.

Stop-loss level: 1x, 1.5x, 2x, or 3x the credit. Tighter stops reduce max loss but increase the number of losers (some trades recover after temporarily being at 2x).

Metrics to Track

Win rate: Percentage of trades that are profitable
Average P&L per trade: Total P&L / number of trades
Max drawdown: The largest peak-to-trough decline in your equity curve
Sharpe ratio: Risk-adjusted return. Above 1.0 is good, above 1.5 is excellent
Profit factor: Gross wins / gross losses. Above 1.5 is solid
Max consecutive losers: How many losses in a row? Can you psychologically handle 6 losers in a row?

Common Backtesting Findings

Research from multiple sources (TastyTrade, CBOE, academic papers) consistently shows:

Selling premium on SPY/SPX with 45 DTE and closing at 50% profit produces the best risk-adjusted returns. This is not debated. The data overwhelmingly supports this approach over shorter DTE and hold-to-expiration alternatives.

Wider spreads (10 to 20 wide) outperform narrow spreads ($5 wide) on an absolute return basis. But narrow spreads have lower max loss per trade, making them easier to size appropriately.

16 to 25 delta short strikes hit the sweet spot. Below 16 delta, the credits are too small to overcome commissions and the occasional big loser. Above 25 delta, the win rate drops enough to hurt.

Selling strangles outperforms iron condors before commissions and risk adjustments. But iron condors are easier to manage and have defined risk. Choose based on your risk tolerance and account size.

Backtesting Pitfalls

Survivorship bias. If you only backtest on stocks that exist today, you miss the ones that went bankrupt. AAPL in 2024 is easy to backtest. The stocks that collapsed in 2008 and delisted are not in your dataset.

Look-ahead bias. You know AAPL recovered from its 2020 lows. Your backtest "stays in the trade" because you subconsciously know the outcome. Use systematic rules, not discretion, in backtests.

Over-optimization. You test 200 parameter combinations and pick the one that performed best. Congratulations — you have curve-fit the past. That exact combination might not work in the future. Use a simple set of parameters and verify they work across different time periods.

Ignoring slippage and commissions. A strategy that makes $0.50 per trade before costs loses money after $0.30 in commissions and slippage. Always include realistic transaction costs in your backtest.

Small sample size. Testing over 12 months (12 trades for a monthly strategy) is not enough. You need at least 30 to 50 trades for statistical significance. Ideally, backtest over 5 to 10 years to capture different market regimes.

The Backtesting Workflow

Define your strategy with exact, mechanical rules (no discretion)
Test over at least 5 years of data
Record all metrics
Test sensitivity: change one parameter at a time and see how results change
If results are robust across multiple parameter settings, the strategy has a real edge
If results only work with one specific parameter combination, it is likely curve-fit
Paper trade for 3 months to validate
Go live with small size

Backtesting is not glamorous, but it is the difference between trading a strategy because you "feel" it works and trading a strategy because you have evidence it works. Next: the psychological challenges that no backtest can prepare you for.