54f689159bd540e6e4f7e76daf6e1aa4d3c9f3a7
Python/How to Build a Backtesting Engine in Python Using Pandas.md
| ... | ... | @@ -0,0 +1,344 @@ |
| 1 | +https://medium.com/@Jachowskii/how-to-build-a-backtesting-engine-in-python-using-pandas-bc8e532a9e95 |
|
| 2 | + |
|
| 3 | +# How to Build a Backtesting Engine in Python Using Pandas | by Jachowski | Medium |
|
| 4 | + |
|
| 5 | + |
|
| 6 | +[Jachowski](https://medium.com/@Jachowskii) |
|
| 7 | + |
|
| 8 | +A simple way to build an easily scalable backtesting engine to test your Trading Systems in Python using only Pandas and Numpy |
|
| 9 | + |
|
| 10 | +[**Backtesting**](https://www.investopedia.com/terms/b/backtesting.asp) is a crucial step in designing your **Trading Systems**, I would say that it is _the_ crucial step given that it assesses the viability of your strategies. |
|
| 11 | + |
|
| 12 | +Just imagine: Earth, 2050. The first flying car ever is released on the markek but it’s never been tested. Would you buy it? I think (hope) no. |
|
| 13 | + |
|
| 14 | +This simple analogy intends to highlight the importance of backtesting: before investing through a whatever **algorithmic model**, test it, again and again, even if your favourite financial guru on YouTube says that a certain strategy will provide a 100% return in less than a year. |
|
| 15 | + |
|
| 16 | +Believe in what you see, not in what they tell you to see. |
|
| 17 | +---------------------------------------------------------- |
|
| 18 | + |
|
| 19 | +In this sense, it’s not the best idea to use a pre-built engine for backtesting using libraries such as **Backtrader** for many reasons: you can’t neither properly see what is going on in there nor modify it as much as you want. |
|
| 20 | + |
|
| 21 | +Remember, the second principle of the [Zen of Python](https://peps.python.org/pep-0020/) states that **“Explicit is better than implicit”**. If you can build explicit functions by your own instead of using black-box pre-built ones, go for it. |
|
| 22 | + |
|
| 23 | + |
|
| 24 | + |
|
| 25 | +Oh, and the third principle says that “**Simple is better than complex**”. Let’s see how easily you can backtest your strategies with Pandas. |
|
| 26 | + |
|
| 27 | +The Idea |
|
| 28 | +-------- |
|
| 29 | + |
|
| 30 | +This is what we’re going to do: |
|
| 31 | + |
|
| 32 | +1. Import the libraries |
|
| 33 | +2. Import stock data |
|
| 34 | +3. Define a trading strategy |
|
| 35 | +4. Define a market position function |
|
| 36 | +5. Define a backtesting function |
|
| 37 | + |
|
| 38 | +Let’s get into code stuff! |
|
| 39 | + |
|
| 40 | +1\. Import the Libraries |
|
| 41 | +------------------------ |
|
| 42 | + |
|
| 43 | +Let’s import the three libraries we need. Said and done: |
|
| 44 | + |
|
| 45 | +``` |
|
| 46 | +import numpy as np |
|
| 47 | +import pandas as pd |
|
| 48 | +import yfinance as yf |
|
| 49 | +``` |
|
| 50 | + |
|
| 51 | + |
|
| 52 | +2\. Import Stock Data |
|
| 53 | +--------------------- |
|
| 54 | + |
|
| 55 | +Let’s download 20 years of Amazon (ticker AMZN) stock data. |
|
| 56 | + |
|
| 57 | +``` |
|
| 58 | +amzn = yf.download('AMZN', '2000-01-01', '2020-01-01') |
|
| 59 | +``` |
|
| 60 | + |
|
| 61 | + |
|
| 62 | +3\. Define a Trading Strategy |
|
| 63 | +----------------------------- |
|
| 64 | + |
|
| 65 | +In this case, we’re going to test one of the most popular strategies: the [**Double Moving Averages Crossover**](https://www.investopedia.com/articles/active-trading/052014/how-use-moving-average-buy-stocks.asp). |
|
| 66 | + |
|
| 67 | +First of all, we have to define two Simple Moving Averages. That’s how: |
|
| 68 | + |
|
| 69 | +``` |
|
| 70 | +def SMA(array, period): |
|
| 71 | + return array.rolling(period).mean() |
|
| 72 | +``` |
|
| 73 | + |
|
| 74 | + |
|
| 75 | +That is, this function has three arguments: |
|
| 76 | + |
|
| 77 | +* _dataset_ is the dataframe that contains the stock data we previously imported (AMZN stock data), |
|
| 78 | +* _array_ is the series we will apply the function on (Close Prices) and |
|
| 79 | +* _period_ is the lenght of our moving averages (e.g. 14 and 200 days). |
|
| 80 | + |
|
| 81 | +The function returns a [sliding window](https://www.geeksforgeeks.org/python-pandas-dataframe-rolling/#:~:text=rolling\(\)%20function%20provides%20the,desired%20mathematical%20operation%20on%20it.) (`.rolling()`) of a desired lenght (`(period)`) of an array (`array`) on which it is computed the arithmetic mean (`.mean()`). |
|
| 82 | + |
|
| 83 | +Let’s define the two moving averages we will use. The first is the shorter-term (14 days), while the second is the longer-term (200 days): |
|
| 84 | + |
|
| 85 | +``` |
|
| 86 | +sma14 = SMA(amzn['Close'], 14) |
|
| 87 | +sma200 = SMA(amzn['Close'], 200) |
|
| 88 | +``` |
|
| 89 | + |
|
| 90 | + |
|
| 91 | +This is what we get: |
|
| 92 | + |
|
| 93 | + |
|
| 94 | + |
|
| 95 | + |
|
| 96 | + |
|
| 97 | + |
|
| 98 | + |
|
| 99 | + |
|
| 100 | + |
|
| 101 | +Now, we need to define the **entry rules** and **exit rules** of our strategy, which are the _crossover_ and the _crossunder_, respectively. |
|
| 102 | + |
|
| 103 | +In other words, we get an: |
|
| 104 | + |
|
| 105 | +* **entry** (buy) **signal** when the shorter-term moving average (14 days) crosses _above_ the lower-term moving average (200 days) |
|
| 106 | +* **exit** (sell) **signal** when the shorter-term moving average (14 days) crosses _below_ the longer-term (200 days). |
|
| 107 | + |
|
| 108 | +``` |
|
| 109 | +def crossover(array1, array2): |
|
| 110 | + return array1 > array2 |
|
| 111 | +def crossunder(array1, array2): |
|
| 112 | + return array1 < array2 |
|
| 113 | +``` |
|
| 114 | + |
|
| 115 | + |
|
| 116 | +And after that we assign _crossover_ to the enter rules and _crossunder_ to the exit rules: |
|
| 117 | + |
|
| 118 | +``` |
|
| 119 | +enter_rules = crossover(sma14, sma200)exit_rules = crossunder(sma14, sma200) |
|
| 120 | +``` |
|
| 121 | + |
|
| 122 | + |
|
| 123 | +Basically, we obtain two boolean series (True or False): |
|
| 124 | + |
|
| 125 | +* `enter_rules` is True whenever sma14 > sma200 while |
|
| 126 | +* `exit_rules` is True whenever sma 14 < sma200. |
|
| 127 | + |
|
| 128 | +Hence, looking at the images above of the series sma14 and sma200, we expect to find False on the `enter_rules` on the 13th of October, 2000, since 33.5714 < 51.9385, i.e. sma14 < sma200. |
|
| 129 | + |
|
| 130 | +Let’s check for it: |
|
| 131 | + |
|
| 132 | +``` |
|
| 133 | +check = enter_rules[enter_rules.index == '2000-10-13'] |
|
| 134 | +print(check) |
|
| 135 | +``` |
|
| 136 | + |
|
| 137 | + |
|
| 138 | +This is the starting point. |
|
| 139 | + |
|
| 140 | +**Now we fly**. But not with that never tested flying car. |
|
| 141 | + |
|
| 142 | +4\. Define a Market Position Function |
|
| 143 | +------------------------------------- |
|
| 144 | + |
|
| 145 | +Here, we’re going to create a function that defines the ongoing trades: to achieve this, we will create a **switch** that: |
|
| 146 | + |
|
| 147 | +* **turns on** if `enter_rules` is True _and_ `exit_rules` is False and |
|
| 148 | +* **turns off** if `exit_rules` is True. |
|
| 149 | + |
|
| 150 | +Here it is the function: |
|
| 151 | + |
|
| 152 | +``` |
|
| 153 | +def marketposition_generator(dataset, enter_rules, exit_rules): |
|
| 154 | + dataset['enter_rules'] = enter_rules |
|
| 155 | + dataset['exit_rules'] = exit_rules |
|
| 156 | + status = 0 |
|
| 157 | + mp = [] |
|
| 158 | + for (i, j) in zip(enter_rules, exit_rules): |
|
| 159 | + if status == 0: |
|
| 160 | + if i == 1 and j != -1: |
|
| 161 | + status = 1 |
|
| 162 | + else: |
|
| 163 | + if j == -1: |
|
| 164 | + status = 0 |
|
| 165 | + mp.append(status) |
|
| 166 | + dataset['mp'] = mp |
|
| 167 | + dataset['mp'] = dataset['mp'].shift(1) |
|
| 168 | + dataset.iloc[0,2] = 0 |
|
| 169 | + return dataset['mp'] |
|
| 170 | +``` |
|
| 171 | + |
|
| 172 | + |
|
| 173 | +It takes three arguments: |
|
| 174 | + |
|
| 175 | +* _dataset_ is the dataframe that contains the stock data we previously imported (AMZN stock data), |
|
| 176 | +* _enter\_rules_ is the boolean series containing the entry signals and |
|
| 177 | +* _exit\_rules_ is the boolean series containing the exit signals. |
|
| 178 | + |
|
| 179 | +On the first two rows we copy on our dataset the exit and the entry rules. `status` is the **switch** and `mp` is an empty list that will be populated with the resulting values of `status`. |
|
| 180 | + |
|
| 181 | +At this point, we create a for loop with `[zip](https://realpython.com/python-zip-function/#:~:text=Python's%20zip\(\)%20function%20is,%2C%20sets%2C%20and%20so%20on.)` that works like a… ye, a zipper, enabling us to do a parallel iteration on both `enter_rules` and `exit_rules` simultaneously: it will return a single iterator object with all values finally stored into `mp` that will be: |
|
| 182 | + |
|
| 183 | +* `mp`\= 1 (**on**) whenever `enter_rules` is True _and_ `exit_rules` is False and |
|
| 184 | +* `mp`\= 0 (**off**) whenever `exit_rules` is True. |
|
| 185 | + |
|
| 186 | +**Note:** in Python, True corresponds to 1 but here, in the `if j == -1` statement related to the `exit_rules`, True is -1. Later on it will be clear the reason of that. |
|
| 187 | + |
|
| 188 | +In the last three lines, we add `mp` to our dataset, we forward shift its values by one period so that the trade starts the day after we received the signal and in the last line we substitute the nan value, subsequent to the shift operation, with 0. The function returns the `mp` series. |
|
| 189 | + |
|
| 190 | +5\. Define a Backtesting Function |
|
| 191 | +--------------------------------- |
|
| 192 | + |
|
| 193 | +Last step. We’re close to the end, hang on! |
|
| 194 | + |
|
| 195 | +First of all, we have to define some **parameters** such as: |
|
| 196 | + |
|
| 197 | +* **COSTS**: fixed costs per trade (i.e. transactions’ fee) |
|
| 198 | +* **INSTRUMENT**: type of instrument (1 for stocks, 2 for futures, etc.) |
|
| 199 | +* **OPERATION\_MONEY**: initial investment |
|
| 200 | +* **DIRECTION**: long _or_ short |
|
| 201 | +* **ORDER\_TYPE**: type of order (market, limit, stop, etc.) |
|
| 202 | +* **ENTER\_LEVEL**: entry price |
|
| 203 | + |
|
| 204 | +``` |
|
| 205 | +COSTS = 0.50 |
|
| 206 | +INSTRUMENT = 1 |
|
| 207 | +OPERATION_MONEY = 10000 |
|
| 208 | +DIRECTION = "long" |
|
| 209 | +ORDER_TYPE = "market" |
|
| 210 | +ENTER_LEVEL = amzn['Open'] |
|
| 211 | +``` |
|
| 212 | + |
|
| 213 | + |
|
| 214 | +We’re assuming that: |
|
| 215 | + |
|
| 216 | +* **COSTS**: every operation will cost us _50 cents_, 25 to buy and 25 to sell |
|
| 217 | +* **INSTRUMENT**: the system will be tested on a _stock_ (AMZN) |
|
| 218 | +* **OPERATION\_MONEY**: the initial capital is _10k dollars_ |
|
| 219 | +* **DIRECTION**: the strategy will be tested for _long trades_ |
|
| 220 | +* **ORDER\_TYPE**: the strategy will process _market orders_ |
|
| 221 | +* **ENTER\_LEVEL**: the entry price corresponds to the _open price_ |
|
| 222 | + |
|
| 223 | +And here it is the best part: |
|
| 224 | + |
|
| 225 | +```python |
|
| 226 | +def apply_trading_system(dataset, direction, order_type, enter_level, enter_rules, exit_rules): |
|
| 227 | + |
|
| 228 | + dataset['enter_rules'] = enter_rules.apply(lambda x: 1 if x == True else 0) |
|
| 229 | + dataset['exit_rules'] = exit_rules.apply(lambda x: -1 if x == True else 0) |
|
| 230 | + dataset['mp'] = marketposition_generator(dataset['enter_rules'], dataset['exit_rules']) |
|
| 231 | + |
|
| 232 | + if order_type == "market": |
|
| 233 | + dataset['entry_price'] = np.where((dataset.mp.shift(1) == 0) & |
|
| 234 | + (dataset.mp == 1), dataset.Open.shift(1), np.nan) |
|
| 235 | + if INSTRUMENT == 1: |
|
| 236 | + dataset['number_of_stocks'] = np.where((dataset.mp.shift(1) == 0) & |
|
| 237 | + (dataset.mp == 1), OPERATION_MONEY / dataset.Open, np.nan) |
|
| 238 | + |
|
| 239 | + dataset['entry_price'] = dataset['entry_price'].fillna(method='ffill') |
|
| 240 | + |
|
| 241 | + if INSTRUMENT == 1: |
|
| 242 | + dataset['number_of_stocks'] = dataset['number_of_stocks']\ |
|
| 243 | + .apply(lambda x: round(x, 0)).fillna(method='ffill') |
|
| 244 | + |
|
| 245 | + dataset['events_in'] = np.where((dataset.mp == 1) & (dataset.mp.shift(1) == 0), 'entry', '') |
|
| 246 | + |
|
| 247 | + if direction == 'long': |
|
| 248 | + if INSTRUMENT == 1: |
|
| 249 | + dataset['open_operations'] = (dataset.Close - dataset.entry_price) * dataset.number_of_stocks |
|
| 250 | + dataset['open_operations'] = np.where((dataset.mp == 1) & (dataset.mp.shift(-1) == 0), |
|
| 251 | + (dataset.Open.shift(-1) - dataset.entry_price) * dataset.number_of_stocks - 2 * COSTS, |
|
| 252 | + dataset.open_operations) |
|
| 253 | + else: |
|
| 254 | + if INSTRUMENT == 1: |
|
| 255 | + dataset['open_operations'] = (dataset.entry_price - dataset.Close) * dataset.number_of_stocks |
|
| 256 | + dataset['open_operations'] = np.where((dataset.mp == 1) & (dataset.mp.shift(-1) == 0), |
|
| 257 | + (dataset.entry_price - dataset.Open.shift(-1)) * dataset.number_of_stocks - 2 * COSTS, |
|
| 258 | + dataset.open_operations) |
|
| 259 | + |
|
| 260 | + dataset['open_operations'] = np.where(dataset.mp == 1, dataset.open_operations, 0) |
|
| 261 | + dataset['events_out'] = np.where((dataset.mp == 1) & (dataset.exit_rules == -1), 'exit', '') |
|
| 262 | + dataset['operations'] = np.where((dataset.exit_rules == -1) & |
|
| 263 | + (dataset.mp == 1), dataset.open_operations, np.nan) |
|
| 264 | + dataset['closed_equity'] = dataset.operations.fillna(0).cumsum() |
|
| 265 | + dataset['open_equity'] = dataset.closed_equity + dataset.open_operations - dataset.operations.fillna(0) |
|
| 266 | + |
|
| 267 | + dataset.to_csv('trading_system_export.csv') |
|
| 268 | + |
|
| 269 | + return dataset |
|
| 270 | +``` |
|
| 271 | + |
|
| 272 | +Let’s analyze the function line by line. |
|
| 273 | + |
|
| 274 | +From line 3 to line 5 we add the two boolean series and the market position function to the dataset. |
|
| 275 | + |
|
| 276 | +**Note:** In the previous note, I told you that everything would have been clear: in the lambda function of `exit_rules`, all values equal True are assigned to -1 while False values are assigned to 0. Thanks to that, `marketposition_generator` runs wonderfully. |
|
| 277 | + |
|
| 278 | +From line 7 to line 12 we define [market orders](https://www.investopedia.com/terms/m/marketorder.asp) for stocks: |
|
| 279 | + |
|
| 280 | +* In lines 7–9 we define the `entry_price`: if the previous value of `mp` was zero and the present value is one, i.e. we received a signal, we open a trade at the open price of the next day; |
|
| 281 | +* In lines 10–12 we define `number_of_stocks`, that is the amount of shares we buy, as the ratio between the initial capital (10k) and the `entry_price`; |
|
| 282 | + |
|
| 283 | +In line 14 we forward propagate the value of the `entry_price` ; |
|
| 284 | + |
|
| 285 | +In lines 16–17 we round `number_of_stocks` at the integer value and forward propagate its value as well; |
|
| 286 | + |
|
| 287 | +In line 20 we associate the label `'entry'` to `'events_in'` every time `mp` moves from 0 to 1; |
|
| 288 | + |
|
| 289 | +From line 22 to line 27 we define the long trades: |
|
| 290 | + |
|
| 291 | +* In line 24 we compute `open_operations`, i.e. the profit; |
|
| 292 | +* In line 25 we adjust the previous computation of `open_operations` whenever we exit the trade: whenever we receive an exit signal, the trade is closed the day after at the open price. Here, [round turn costs](https://www.investopedia.com/terms/r/rttc.asp) are included; |
|
| 293 | + |
|
| 294 | +From line 28 to line 33 we replicate for short trades what was said for long trades: to test _short trades_ you just have to set `DIRECTION = ‘short'`; |
|
| 295 | + |
|
| 296 | +In line 35 we assign `open_operations` equal 0 whenever there is no trade in progress; |
|
| 297 | + |
|
| 298 | +In line 36 we associate the label `'exit'` to `'events_out'` every time `mp` moves from 1 to 0, i.e. we receive an exit signal; |
|
| 299 | + |
|
| 300 | +In lines 37–38 we associate the value of `open_operations` to `operations` only when we’re exiting a trade, otherwise `nan`: by doing so, it will be very easy to aggregate data; |
|
| 301 | + |
|
| 302 | +In line 39 we define the `equity_line` for close operations and in line 40 it is defined the `equity_line` for open operations; |
|
| 303 | + |
|
| 304 | +In line 42 we save the resulting dataset in a csv file. |
|
| 305 | + |
|
| 306 | +Let’s call the function and inspect the results. |
|
| 307 | + |
|
| 308 | +``` |
|
| 309 | +COSTS = 0.50 |
|
| 310 | +INSTRUMENT = 1 |
|
| 311 | +OPERATION_MONEY = 10000 |
|
| 312 | +DIRECTION = "long" |
|
| 313 | +ORDER_TYPE = "market" |
|
| 314 | +ENTER_LEVEL = amzn['Open']trading_system = apply_trading_system(amzn, DIRECTION, ORDER_TYPE, ENTER_LEVEL, enter_rules, exit_rules) |
|
| 315 | +``` |
|
| 316 | + |
|
| 317 | + |
|
| 318 | +These are two _long trades_ registered by the Trading System: |
|
| 319 | + |
|
| 320 | + |
|
| 321 | + |
|
| 322 | +To check if the Trading Strategy— the Double Moving Averages Crossover — produced profitable _long trades_ in the time period considered for that stock, you can just digit: |
|
| 323 | + |
|
| 324 | +``` |
|
| 325 | +net_profit = trading_system['closed_equity'][-1] - OPERATION_MONEY |
|
| 326 | +print(round(net_profit, 2)) |
|
| 327 | +``` |
|
| 328 | + |
|
| 329 | + |
|
| 330 | + |
|
| 331 | +A return of almost 500% in 20 years. Not suprising considered that Amazon stock increased by 2400% in those 20 years and we used a _trend-following_ strategy. |
|
| 332 | + |
|
| 333 | +That’all for this article. Hope you’ll find it helpful. |
|
| 334 | + |
|
| 335 | +Let me know if you could be interested in seeing extensions of this backtesting engine, for example how to implement _limit orders_. |
|
| 336 | + |
|
| 337 | +In case you need clarification or you have advices, feel free to contact me on Telegram: |
|
| 338 | + |
|
| 339 | +Cheers 🍻 |
|
| 340 | + |
|
| 341 | +Reference: |
|
| 342 | +---------- |
|
| 343 | + |
|
| 344 | +Trombetta, Giovanni, _Strategie di Trading con Python,_ Hoepli, 2020 |
|
| ... | ... | \ No newline at end of file |