Foreword

For low-frequency quantitative trading, backtesting is an essential process, although backtesting profitable strategies may not necessarily make money, but backtesting losing strategies must lose money. So, backtesting can’t prove the truth, but it can check the falsification.

This article is prepared to be divided into two parts, one part of the backtesting process encountered a few pits, I hope that the later can be encountered do not jump in. Another talk about the backtesting engine written by those gods over the years, I hope it will help you.

Both the Tao and the art, depending on the needs of the gentleman, I hope you have gained something.

As I said above, because of the rain, I am willing to hold an umbrella for those who come after me. But I also want to say, in the process of moving forward, met all the gods, all are sold, so I am also willing to share again, I wish to be able to leave a hand.

Thank you for the articles and guidance from Mr. Criminal, Mr. Qian Qian, Mr. Jornason, Ms. Lentil, Mr. Puppy, and I am very grateful for the articles and information from the gods that have helped me.

The pitfalls of backtesting in general

This bit does not involve financial data and other factors, purely from the technical aspects of the discussion of this pit.

As long as more than the stock of friends, must be in the stock software above to see some magical technical indicators, and these claimed how many thousands of how many thousands of indicators have accidentally been published by a kind person, you look, I wipe, so accurate! With this indicator, you can get rich and marry a rich girl in no time. I’m so lucky!

When you really follow this magical indicator, you will realize, “What’s going on?” Suddenly, it’s not working. What happened to the money printing machine has turned into a money crushing machine?

Generally speaking, these fraudulent indicators use the following techniques:

The most elementary pit of the newbie: the signal occurs immediately turnover

Here specifically refers to the low-frequency use of k-line trading strategies, such as:

IF Crossover(MA5, MA20) THEN Buy(Close, …) # Buy(Close,…) when the current MA5 SMA crosses the MA20 SMA.

This one is actually quite intuitive, but in reality, your actual buy happens above the opening price of the next K-line!

If you backtest this way, there are at least two consequences:

  1. the current K-line did not finish, the highest price triggered the golden cross signal, according to the highest price on the buy, there may be a not insignificant slippage between the closing price and the highest price.
  2. the current K line maximum price can trigger the golden cross signal, but according to the closing price can not trigger the golden cross signal, which results in the real market can be traded, back to the test can not be traded in the non-corresponding situation.

Have you thought it through?

So, this needs to be changed to

“the current K-line out of the signal, the next K-line opening price / limit price transaction”, for example, the above example can be changed to “IF Crossover (MA5[1], MA20[1]) THEN Buy(Open, …)”) (Buy at the open price when the MA5 SMA calculated from the previous K-line crosses the MA20 SMA).

The Newbie Sub-Beginner’s Pit: Future Functions

In quantitative backtesting, the preparation of the strategy will often use some of the need to use the future data function, such as shift (), iloc [], these functions are essentially in the use of future data to determine the current trading behavior. As an example.

1
2
3
4
5
import pandas as pd
import numpy as np

data = pd.DataFrame(np.random.randn(10))
data['signal'] = np.where(data.shift(1) > 0, 1, 0)

Here the shift(1) takes the value of a future period to determine the signal for the current period, but this is not possible in real trading, because when the current moment, it is not known what will happen in the future.

This strategy of applying the future function to generate trading signals can achieve excellent results in backtesting, but once used in real trading, there will be a huge discrepancy between the actual results of the strategy and the backtesting results.

The solution is to try to use only to the current moment and historical data to generate signals, do not utilize any future function, to ensure that the back test and the real market consistent.

If you have to use, you can consider setting a lag period, so that the signal also lags the corresponding period, to simulate the actual situation. This can try to avoid the future function in the backtest and real disk deviation, so that the backtest results closer to the real trading results. This is a pit that many newcomers to quantitative trading need to be careful to avoid!

Beginner’s Pit: Lucky Man Bias

This is in the stock market is backtesting using all existing stocks backtesting, forgetting those delisted stocks, resulting in backtesting returns significantly higher, in the currency market is to use existing coins, those disappeared in the history of the bewildering number of coins have been ignored. Then, the light to see the good results, the bad results can not be seen, the natural return will be much higher, this is more obvious will appear in the fixed investment strategy, and the cycle rotation type of strategy inside

Newbie Pit: Overfitting

We often see certain theological theories, such as a certain variety of a certain day is bound to fall, down how many points can give you a whole clear. In order to increase their authority, often there are charts showing how accurate his prediction history is. This is not to send you a life jacket, simply to send you a submarine.

You might even be thinking. Damn, if I follow this prediction, to the low point of the backhand long will not get rich in a flash?

You do this, you will find that the magical prediction mage why in this case failed?

Because the blogger who made the prediction used machine-learning historical trading data, and the kind that doesn’t work very well.

Beginner’s Pit: Slippage and Trading Costs Not Taken Into Account

Your newbie 10,000x earnings strategy basically doesn’t take into account slippage and fees. So during backtesting, the more trades you make, the more money you make, and the reality is that you make more trades than you realize that you don’t make enough to cover the commission. The profits are taken away by slippage and trading fees.

Of course, it’s now common to use machine learning and such for factor mining, and by the way, you can talk about the pitfalls in this area as well

The pit of the masters: the data set is not ideal

If you use the data set, is a one-sided market, is the general direction of the rise does not turn back, or simply shock, there is no trend market, then you in the data set above the excellent performance of the strategy once the market conversion, then the return can be imagined.

Master pit: backtesting data time series is disrupted

This problem is very easy to understand, you think that if you disrupt the time sequence, the future events to mention the front, *** model has a detached from the actual “predictive ability”, similar to the future function *** error.

Master Pit: Wrong Model Selection

These eigenvalues may be distributed in high-dimensional space, which can be well delineated by a hyperplane cut through, but if you use a two-dimensional planar model to fit, such as linear regression, this is what will happen.

Backtesting code:

The rest of the code I’ve collected is shared, I hope it can be helpful to you

The newbie backtesting code from Mr. Criminal No Way:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
# from mpl_finance import candlestick_ochl
#import matplotlib.finance as mpf
from matplotlib.pylab import date2num
from Functions import transfer_to_period_data
from Signals import signal_moving_average, signal_bolling
from Evaluate import equity_curve_with_long_and_short
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
pd.set_option('expand_frame_repr', False) # 当列太多时不换行
pd.set_option('display.max_rows', 1000)


# =====寻找最优参数
# 导入数据
all_data = pd.read_csv('E:\work\demo1\program\SelfToYouWrite\回测数据处理\calculator_profit\data\\binance_ETH_USDT_15T.csv',
parse_dates=['candle_begin_time'])
# print(all_data.tail(3))

# 转换数据周期
rule_type = '15T'
all_data = transfer_to_period_data(all_data, rule_type)
# 选取时间段
all_data = all_data[all_data['candle_begin_time'] >= pd.to_datetime('2019-01-01')]
all_data.reset_index(inplace=True, drop=True)

# 构建参数候选组合
n_list = range(1, 400, 10)
# m_list = range(1, 5, 1)
m_list = np.arange(0.1, 5, 0.5)

# 遍历所有参数组合
rtn = pd.DataFrame()
for m in m_list:
for n in n_list:
para = [n, m]
# 计算交易信号
df = signal_bolling(all_data.copy(), para)
# 计算资金曲线
df = equity_curve_with_long_and_short(df, leverage_rate=3, c_rate=2.0 / 1000)
print(para, '策略最终收益:', df.iloc[-1]['equity_curve'])

# 存储数据
rtn.loc[str(para), '收益'] = df.iloc[-1]['equity_curve']

print(rtn.sort_values(by='收益', ascending=False))

Newbie oriented backtesting code from the gods of Chiken:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
import pandas as pd
import numpy as np
pd.set_option('expand_frame_repr', False)
pd.set_option('display.max_rows', 10000)
def s_dc(df):
signal = []
if df.iloc[-2]['close'] >= df.iloc[-2]['DC_upp']:
signal.append('kd')
if df.iloc[-2]['close'] <= df.iloc[-2]['DC_mid'] and df.iloc[-3]['close'] > df.iloc[-3]['DC_mid']:
signal.append('pd')
if df.iloc[-2]['close'] <= df.iloc[-2]['DC_low']:
signal.append('kk')
if df.iloc[-2]['close'] >= df.iloc[-2]['DC_mid'] and df.iloc[-3]['close'] < df.iloc[-3]['DC_mid']:
signal.append('pk')
return signal, 's_dc'

def s_boll(df):
signal = []
if df.iloc[-2]['close'] >= df.iloc[-2]['boll_upper']:
signal.append('kd')
if df.iloc[-2]['close'] <= df.iloc[-2]['boll_middle'] and df.iloc[-3]['close'] > df.iloc[-3]['boll_middle']:
signal.append('pd')
if df.iloc[-2]['close'] <= df.iloc[-2]['boll_lower']:
signal.append('kk')
if df.iloc[-2]['close'] >= df.iloc[-2]['boll_middle'] and df.iloc[-3]['close'] < df.iloc[-3]['boll_middle']:
signal.append('pk')
return signal, 's_boll'

def backtest(df, para, name, symbol, rule_type):
'''

'''
df_test = df.copy()
m = para[0]
n = para[1]
df_test.loc[:,'DC_upp'] = df_test.loc[:, 'close'].rolling(m, min_periods=1).max()
df_test.loc[:,'DC_low'] = df_test.loc[:, 'close'].rolling(m, min_periods=1).min()
df_test.loc[:,'DC_mid'] = (df_test.loc[:, 'DC_upp'] + df_test.loc[:, 'DC_low'])/2

df_test.loc[:, 'boll_middle'] = df_test.loc[:, 'close'].rolling(m, min_periods=1).mean()
df_test.loc[:, 'std'] = df_test.loc[:, 'close'].rolling(n, min_periods=1).std(ddof=0)
df_test.loc[:, 'boll_upper'] = df_test.loc[:, 'boll_middle'] + n * df_test.loc[:, 'std']
df_test.loc[:, 'boll_lower'] = df_test.loc[:, 'boll_middle'] - n * df_test.loc[:, 'std']

import matplotlib.pyplot as plt
plt.figure()
plt.plot(df_test.loc[:,'DC_upp'])
plt.plot(df_test.loc[:,'DC_low'])
plt.plot(df_test.loc[:,'DC_mid'])
plt.plot(df_test.loc[:,'boll_upper'])
plt.plot(df_test.loc[:,'boll_lower'])
plt.plot(df_test.loc[:,'boll_middle'])
plt.plot(df_test.loc[:,'close'])
plt.grid()
plt.show()


window = 20
nrows = df_test.shape[0]
position = 0
fee = 2 / 1000
coin = 0
cash = 10000
one_hand = 1
total = []
ntrade = 0
#
for start_index in range(nrows - window):

print(str(start_index * 100//(nrows - window))+'%'+ str(para))

df_real = df_test.iloc[start_index:start_index+window]

# if name =='boll':signal, s_name = s_boll(df_real)
if name == 'dc': signal, s_name = s_dc(df_real)
price = df_real.iloc[-1]['open']

if 'kd' in signal and position == 0:
coin += one_hand
cash -= one_hand * price * (1 + fee)
position = 1
ntrade += 1
print('kd' + str(price))

if 'pd' in signal and position == 1:
coin -= one_hand
cash += one_hand * price * (1 - fee)
position = 0
print('pd' + str(price) + '\n')

if 'kk' in signal and position == 0:
coin -= one_hand
cash += one_hand * price * (1 - fee)
position = -1
ntrade += 1
print('kk' + str(price))

if 'pk' in signal and position == -1:
coin += one_hand
cash -= one_hand * price * (1 + fee)
position = 0
print('pk' + str(price) + '\n')

total.append(cash + coin * price)

total = np.array(total)
plt.figure()
plt.subplot(211)
plt.plot(df_test.loc[:,'open'])
plt.grid()
plt.subplot(212)
plt.plot(total)
plt.grid()
plt.title('ntrade:'+str(ntrade)+' total:'+str(round(total[-1],1))+' std:'+str(round(total.std(),1)))
plt.show()
# plt.savefig('backtest_%s_%s_%s_%d.png'%(symbol, time ,s_name, m))
# plt.close()



if __name__ == "__main__":
symbol = 'btc'
# df = pd.read_hdf(f'{symbol}usdt.h5')
df = pd.read_csv('E:\work\exchange_data\\binance_EOS_USDT.csv')

df['candle_begin_time'] = pd.to_datetime(df['candle_begin_time'])
df = df[(df['candle_begin_time'] >= pd.to_datetime('2019-01-01 00:00:00'))
& (df['candle_begin_time'] <= pd.to_datetime('2019-12-31 23:30:00'))]

rule_type = '60T'
period_df = df.resample(rule = rule_type, on='candle_begin_time', label='left', closed='left').agg(
{'open': 'first',
'high': 'max',
'low': 'min',
'close': 'last',
'volume': 'sum',
})
period_df.dropna(subset=['open'], inplace=True)
period_df = period_df[period_df['volume'] > 0]
period_df.reset_index(inplace=True)
df = period_df[['candle_begin_time', 'open', 'high', 'low', 'close', 'volume']]
df.reset_index(inplace=True, drop=True)

for m in range(138, 300, 10):
for n in range(1, 5, 5):
para =[m,n]
backtest(
df,
para,
'dc',
symbol,
rule_type
)

Backtesting module for vnpy

1
https://gitee.com/Jornason/backtesting.git

backtesting code for masters:

With a sword in his heart and no sword in his hand, the masters certainly don’t need to backtest and go straight to the real deal!

backtest summary:

In fact, to do a good backtesting, you need to consider a lot of issues, such as commission, simulation of orders caused by the perturbation of the order book, the situation of other opponents to go to the order book, in the order book of the iceberg order, and the robot’s strategy, and other factors. If you want to do backtesting taking into account the above factors, you will realize that the results of backtesting, because of the participation of your orders, will become more and more distant from the real situation over time. This is far from what you can simulate with 0ms full orderbook data and 0ms full trade data.

Of course, knowing these pits, can be avoided as much as possible to avoid, don’t always back to test fierce as a tiger, real disk two hundred and fifty-five situation.

My point of view, backtest almost on the line. If the indicators under the back test is still relatively ideal, you can run into the real disk to try. Don’t expect the back test can get the same results as the real disk. This is impossible.

If the back test are losing, then there is no need to go on the real market.

After all, we carry a fishing rod is to fish, not to feed the fish.