Python/How to Build a Backtesting Engine in Python Using Pandas.md
... ...
@@ -0,0 +1,344 @@
1
+https://medium.com/@Jachowskii/how-to-build-a-backtesting-engine-in-python-using-pandas-bc8e532a9e95
2
+
3
+# How to Build a Backtesting Engine in Python Using Pandas | by Jachowski | Medium
4
+
5
+![Jachowski](https://miro.medium.com/v2/resize:fill:88:88/1*ACwKgFpfwKbQJHyR3am92w@2x.jpeg)
6
+[Jachowski](https://medium.com/@Jachowskii)
7
+
8
+A simple way to build an easily scalable backtesting engine to test your Trading Systems in Python using only Pandas and Numpy
9
+
10
+[**Backtesting**](https://www.investopedia.com/terms/b/backtesting.asp) is a crucial step in designing your **Trading Systems**, I would say that it is _the_ crucial step given that it assesses the viability of your strategies.
11
+
12
+Just imagine: Earth, 2050. The first flying car ever is released on the markek but it’s never been tested. Would you buy it? I think (hope) no.
13
+
14
+This simple analogy intends to highlight the importance of backtesting: before investing through a whatever **algorithmic model**, test it, again and again, even if your favourite financial guru on YouTube says that a certain strategy will provide a 100% return in less than a year.
15
+
16
+Believe in what you see, not in what they tell you to see.
17
+----------------------------------------------------------
18
+
19
+In this sense, it’s not the best idea to use a pre-built engine for backtesting using libraries such as **Backtrader** for many reasons: you can’t neither properly see what is going on in there nor modify it as much as you want.
20
+
21
+Remember, the second principle of the [Zen of Python](https://peps.python.org/pep-0020/) states that **“Explicit is better than implicit”**. If you can build explicit functions by your own instead of using black-box pre-built ones, go for it.
22
+
23
+![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*0yEmMn468-AP-0XEhbHpqQ.png)
24
+
25
+Oh, and the third principle says that “**Simple is better than complex**”. Let’s see how easily you can backtest your strategies with Pandas.
26
+
27
+The Idea
28
+--------
29
+
30
+This is what we’re going to do:
31
+
32
+1. Import the libraries
33
+2. Import stock data
34
+3. Define a trading strategy
35
+4. Define a market position function
36
+5. Define a backtesting function
37
+
38
+Let’s get into code stuff!
39
+
40
+1\. Import the Libraries
41
+------------------------
42
+
43
+Let’s import the three libraries we need. Said and done:
44
+
45
+```
46
+import numpy as np
47
+import pandas as pd
48
+import yfinance as yf
49
+```
50
+
51
+
52
+2\. Import Stock Data
53
+---------------------
54
+
55
+Let’s download 20 years of Amazon (ticker AMZN) stock data.
56
+
57
+```
58
+amzn = yf.download('AMZN', '2000-01-01', '2020-01-01')
59
+```
60
+
61
+
62
+3\. Define a Trading Strategy
63
+-----------------------------
64
+
65
+In this case, we’re going to test one of the most popular strategies: the [**Double Moving Averages Crossover**](https://www.investopedia.com/articles/active-trading/052014/how-use-moving-average-buy-stocks.asp).
66
+
67
+First of all, we have to define two Simple Moving Averages. That’s how:
68
+
69
+```
70
+def SMA(array, period):
71
+ return array.rolling(period).mean()
72
+```
73
+
74
+
75
+That is, this function has three arguments:
76
+
77
+* _dataset_ is the dataframe that contains the stock data we previously imported (AMZN stock data),
78
+* _array_ is the series we will apply the function on (Close Prices) and
79
+* _period_ is the lenght of our moving averages (e.g. 14 and 200 days).
80
+
81
+The function returns a [sliding window](https://www.geeksforgeeks.org/python-pandas-dataframe-rolling/#:~:text=rolling\(\)%20function%20provides%20the,desired%20mathematical%20operation%20on%20it.) (`.rolling()`) of a desired lenght (`(period)`) of an array (`array`) on which it is computed the arithmetic mean (`.mean()`).
82
+
83
+Let’s define the two moving averages we will use. The first is the shorter-term (14 days), while the second is the longer-term (200 days):
84
+
85
+```
86
+sma14 = SMA(amzn['Close'], 14)
87
+sma200 = SMA(amzn['Close'], 200)
88
+```
89
+
90
+
91
+This is what we get:
92
+
93
+![](https://miro.medium.com/v2/resize:fit:640/format:webp/1*K21fwxkz3RCRBzJCOzsbag.png)
94
+
95
+![](https://miro.medium.com/v2/resize:fit:640/format:webp/1*UW96PbQ8kgEQqtiFYLxGRQ.png)
96
+
97
+![](https://miro.medium.com/v2/resize:fit:460/format:webp/1*tub_l5o94crIjDAp5lVZFg.png)
98
+
99
+![](https://miro.medium.com/v2/resize:fit:466/format:webp/1*hE-emYMSU0LQSoNTLFC2eQ.png)
100
+
101
+Now, we need to define the **entry rules** and **exit rules** of our strategy, which are the _crossover_ and the _crossunder_, respectively.
102
+
103
+In other words, we get an:
104
+
105
+* **entry** (buy) **signal** when the shorter-term moving average (14 days) crosses _above_ the lower-term moving average (200 days)
106
+* **exit** (sell) **signal** when the shorter-term moving average (14 days) crosses _below_ the longer-term (200 days).
107
+
108
+```
109
+def crossover(array1, array2):
110
+ return array1 > array2
111
+def crossunder(array1, array2):
112
+ return array1 < array2
113
+```
114
+
115
+
116
+And after that we assign _crossover_ to the enter rules and _crossunder_ to the exit rules:
117
+
118
+```
119
+enter_rules = crossover(sma14, sma200)exit_rules = crossunder(sma14, sma200)
120
+```
121
+
122
+
123
+Basically, we obtain two boolean series (True or False):
124
+
125
+* `enter_rules` is True whenever sma14 > sma200 while
126
+* `exit_rules` is True whenever sma 14 < sma200.
127
+
128
+Hence, looking at the images above of the series sma14 and sma200, we expect to find False on the `enter_rules` on the 13th of October, 2000, since 33.5714 < 51.9385, i.e. sma14 < sma200.
129
+
130
+Let’s check for it:
131
+
132
+```
133
+check = enter_rules[enter_rules.index == '2000-10-13']
134
+print(check)
135
+```
136
+![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*YocrdhIZpF2A_T1BoGLRLw.png)
137
+
138
+This is the starting point.
139
+
140
+**Now we fly**. But not with that never tested flying car.
141
+
142
+4\. Define a Market Position Function
143
+-------------------------------------
144
+
145
+Here, we’re going to create a function that defines the ongoing trades: to achieve this, we will create a **switch** that:
146
+
147
+* **turns on** if `enter_rules` is True _and_ `exit_rules` is False and
148
+* **turns off** if `exit_rules` is True.
149
+
150
+Here it is the function:
151
+
152
+```
153
+def marketposition_generator(dataset, enter_rules, exit_rules):
154
+ dataset['enter_rules'] = enter_rules
155
+ dataset['exit_rules'] = exit_rules
156
+ status = 0
157
+ mp = []
158
+ for (i, j) in zip(enter_rules, exit_rules):
159
+ if status == 0:
160
+ if i == 1 and j != -1:
161
+ status = 1
162
+ else:
163
+ if j == -1:
164
+ status = 0
165
+ mp.append(status)
166
+ dataset['mp'] = mp
167
+ dataset['mp'] = dataset['mp'].shift(1)
168
+ dataset.iloc[0,2] = 0
169
+ return dataset['mp']
170
+```
171
+
172
+
173
+It takes three arguments:
174
+
175
+* _dataset_ is the dataframe that contains the stock data we previously imported (AMZN stock data),
176
+* _enter\_rules_ is the boolean series containing the entry signals and
177
+* _exit\_rules_ is the boolean series containing the exit signals.
178
+
179
+On the first two rows we copy on our dataset the exit and the entry rules. `status` is the **switch** and `mp` is an empty list that will be populated with the resulting values of `status`.
180
+
181
+At this point, we create a for loop with `[zip](https://realpython.com/python-zip-function/#:~:text=Python's%20zip\(\)%20function%20is,%2C%20sets%2C%20and%20so%20on.)` that works like a… ye, a zipper, enabling us to do a parallel iteration on both `enter_rules` and `exit_rules` simultaneously: it will return a single iterator object with all values finally stored into `mp` that will be:
182
+
183
+* `mp`\= 1 (**on**) whenever `enter_rules` is True _and_ `exit_rules` is False and
184
+* `mp`\= 0 (**off**) whenever `exit_rules` is True.
185
+
186
+**Note:** in Python, True corresponds to 1 but here, in the `if j == -1` statement related to the `exit_rules`, True is -1. Later on it will be clear the reason of that.
187
+
188
+In the last three lines, we add `mp` to our dataset, we forward shift its values by one period so that the trade starts the day after we received the signal and in the last line we substitute the nan value, subsequent to the shift operation, with 0. The function returns the `mp` series.
189
+
190
+5\. Define a Backtesting Function
191
+---------------------------------
192
+
193
+Last step. We’re close to the end, hang on!
194
+
195
+First of all, we have to define some **parameters** such as:
196
+
197
+* **COSTS**: fixed costs per trade (i.e. transactions’ fee)
198
+* **INSTRUMENT**: type of instrument (1 for stocks, 2 for futures, etc.)
199
+* **OPERATION\_MONEY**: initial investment
200
+* **DIRECTION**: long _or_ short
201
+* **ORDER\_TYPE**: type of order (market, limit, stop, etc.)
202
+* **ENTER\_LEVEL**: entry price
203
+
204
+```
205
+COSTS = 0.50
206
+INSTRUMENT = 1
207
+OPERATION_MONEY = 10000
208
+DIRECTION = "long"
209
+ORDER_TYPE = "market"
210
+ENTER_LEVEL = amzn['Open']
211
+```
212
+
213
+
214
+We’re assuming that:
215
+
216
+* **COSTS**: every operation will cost us _50 cents_, 25 to buy and 25 to sell
217
+* **INSTRUMENT**: the system will be tested on a _stock_ (AMZN)
218
+* **OPERATION\_MONEY**: the initial capital is _10k dollars_
219
+* **DIRECTION**: the strategy will be tested for _long trades_
220
+* **ORDER\_TYPE**: the strategy will process _market orders_
221
+* **ENTER\_LEVEL**: the entry price corresponds to the _open price_
222
+
223
+And here it is the best part:
224
+
225
+```python
226
+def apply_trading_system(dataset, direction, order_type, enter_level, enter_rules, exit_rules):
227
+
228
+ dataset['enter_rules'] = enter_rules.apply(lambda x: 1 if x == True else 0)
229
+ dataset['exit_rules'] = exit_rules.apply(lambda x: -1 if x == True else 0)
230
+ dataset['mp'] = marketposition_generator(dataset['enter_rules'], dataset['exit_rules'])
231
+
232
+ if order_type == "market":
233
+ dataset['entry_price'] = np.where((dataset.mp.shift(1) == 0) &
234
+ (dataset.mp == 1), dataset.Open.shift(1), np.nan)
235
+ if INSTRUMENT == 1:
236
+ dataset['number_of_stocks'] = np.where((dataset.mp.shift(1) == 0) &
237
+ (dataset.mp == 1), OPERATION_MONEY / dataset.Open, np.nan)
238
+
239
+ dataset['entry_price'] = dataset['entry_price'].fillna(method='ffill')
240
+
241
+ if INSTRUMENT == 1:
242
+ dataset['number_of_stocks'] = dataset['number_of_stocks']\
243
+ .apply(lambda x: round(x, 0)).fillna(method='ffill')
244
+
245
+ dataset['events_in'] = np.where((dataset.mp == 1) & (dataset.mp.shift(1) == 0), 'entry', '')
246
+
247
+ if direction == 'long':
248
+ if INSTRUMENT == 1:
249
+ dataset['open_operations'] = (dataset.Close - dataset.entry_price) * dataset.number_of_stocks
250
+ dataset['open_operations'] = np.where((dataset.mp == 1) & (dataset.mp.shift(-1) == 0),
251
+ (dataset.Open.shift(-1) - dataset.entry_price) * dataset.number_of_stocks - 2 * COSTS,
252
+ dataset.open_operations)
253
+ else:
254
+ if INSTRUMENT == 1:
255
+ dataset['open_operations'] = (dataset.entry_price - dataset.Close) * dataset.number_of_stocks
256
+ dataset['open_operations'] = np.where((dataset.mp == 1) & (dataset.mp.shift(-1) == 0),
257
+ (dataset.entry_price - dataset.Open.shift(-1)) * dataset.number_of_stocks - 2 * COSTS,
258
+ dataset.open_operations)
259
+
260
+ dataset['open_operations'] = np.where(dataset.mp == 1, dataset.open_operations, 0)
261
+ dataset['events_out'] = np.where((dataset.mp == 1) & (dataset.exit_rules == -1), 'exit', '')
262
+ dataset['operations'] = np.where((dataset.exit_rules == -1) &
263
+ (dataset.mp == 1), dataset.open_operations, np.nan)
264
+ dataset['closed_equity'] = dataset.operations.fillna(0).cumsum()
265
+ dataset['open_equity'] = dataset.closed_equity + dataset.open_operations - dataset.operations.fillna(0)
266
+
267
+ dataset.to_csv('trading_system_export.csv')
268
+
269
+ return dataset
270
+```
271
+
272
+Let’s analyze the function line by line.
273
+
274
+From line 3 to line 5 we add the two boolean series and the market position function to the dataset.
275
+
276
+**Note:** In the previous note, I told you that everything would have been clear: in the lambda function of `exit_rules`, all values equal True are assigned to -1 while False values are assigned to 0. Thanks to that, `marketposition_generator` runs wonderfully.
277
+
278
+From line 7 to line 12 we define [market orders](https://www.investopedia.com/terms/m/marketorder.asp) for stocks:
279
+
280
+* In lines 7–9 we define the `entry_price`: if the previous value of `mp` was zero and the present value is one, i.e. we received a signal, we open a trade at the open price of the next day;
281
+* In lines 10–12 we define `number_of_stocks`, that is the amount of shares we buy, as the ratio between the initial capital (10k) and the `entry_price`;
282
+
283
+In line 14 we forward propagate the value of the `entry_price` ;
284
+
285
+In lines 16–17 we round `number_of_stocks` at the integer value and forward propagate its value as well;
286
+
287
+In line 20 we associate the label `'entry'` to `'events_in'` every time `mp` moves from 0 to 1;
288
+
289
+From line 22 to line 27 we define the long trades:
290
+
291
+* In line 24 we compute `open_operations`, i.e. the profit;
292
+* In line 25 we adjust the previous computation of `open_operations` whenever we exit the trade: whenever we receive an exit signal, the trade is closed the day after at the open price. Here, [round turn costs](https://www.investopedia.com/terms/r/rttc.asp) are included;
293
+
294
+From line 28 to line 33 we replicate for short trades what was said for long trades: to test _short trades_ you just have to set `DIRECTION = ‘short'`;
295
+
296
+In line 35 we assign `open_operations` equal 0 whenever there is no trade in progress;
297
+
298
+In line 36 we associate the label `'exit'` to `'events_out'` every time `mp` moves from 1 to 0, i.e. we receive an exit signal;
299
+
300
+In lines 37–38 we associate the value of `open_operations` to `operations` only when we’re exiting a trade, otherwise `nan`: by doing so, it will be very easy to aggregate data;
301
+
302
+In line 39 we define the `equity_line` for close operations and in line 40 it is defined the `equity_line` for open operations;
303
+
304
+In line 42 we save the resulting dataset in a csv file.
305
+
306
+Let’s call the function and inspect the results.
307
+
308
+```
309
+COSTS = 0.50
310
+INSTRUMENT = 1
311
+OPERATION_MONEY = 10000
312
+DIRECTION = "long"
313
+ORDER_TYPE = "market"
314
+ENTER_LEVEL = amzn['Open']trading_system = apply_trading_system(amzn, DIRECTION, ORDER_TYPE, ENTER_LEVEL, enter_rules, exit_rules)
315
+```
316
+
317
+
318
+These are two _long trades_ registered by the Trading System:
319
+
320
+![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*_4Z0QH6EO3beWChy5zlgxA.png)
321
+
322
+To check if the Trading Strategy— the Double Moving Averages Crossover — produced profitable _long trades_ in the time period considered for that stock, you can just digit:
323
+
324
+```
325
+net_profit = trading_system['closed_equity'][-1] - OPERATION_MONEY
326
+print(round(net_profit, 2))
327
+```
328
+
329
+![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*4ua0TkU9kIxF2GGFVYLvLw.png)
330
+
331
+A return of almost 500% in 20 years. Not suprising considered that Amazon stock increased by 2400% in those 20 years and we used a _trend-following_ strategy.
332
+
333
+That’all for this article. Hope you’ll find it helpful.
334
+
335
+Let me know if you could be interested in seeing extensions of this backtesting engine, for example how to implement _limit orders_.
336
+
337
+In case you need clarification or you have advices, feel free to contact me on Telegram:
338
+
339
+Cheers 🍻
340
+
341
+Reference:
342
+----------
343
+
344
+Trombetta, Giovanni, _Strategie di Trading con Python,_ Hoepli, 2020
... ...
\ No newline at end of file