Pros

Data quality is high, all data is packaged by the exchange, so you can trust the exchange for data quality. No need to worry about missing data

Disadvantages

You need to have a server to get the data, if you know some Linux commands, it would be great, if not, just copy the commands and change the parameters.

Steps

1 Enter the command line command.

1
2
3
awk 'BEGIN{for(i=2017; i<=2021; i++)for(j=1; j<=9; j++) print i"-0"j}' | xargs -I{} curl -s https://data.binance.vision/data/spot/monthly/klines/BTCUSDT/1h/BTCUSDT-1h-{}.zip -o BTCUSDT-1h-{}.zip {}

awk 'BEGIN{for(i=2017; i<=2021; i++)for(j=10; j<=12; j++) print i"-"j}' | xargs -I{} curl -s https://data.binance.vision/data/spot/monthly/klines/BTCUSDT/1h/BTCUSDT-1h-{}.zip -o BTCUSDT-1h-{}.zip {}

The first line means to get 2017-2021, from January to September data

The second line means get 2017-2021, from October to December You can modify it to get days, 15m and so on, other currencies are also possible, just modify the url

2023-08-15_223115.png

2 The download is a zip file, by the way, I’ve attached all the unzip commands

Run it in the downloaded directory to unzip all the files (see the bottom picture first, delete some empty packages, you can only get them up to August 17 at the moment)

1
2
3
find . -type f -size -1k | grep zip | xargs -I{} rm {}

ls *zip | xargs -I{} rm -rf {}

3. Data processing

1
2
3
4
df = pd.read_csv(path, header=None)
df.rename(columns={0: 'candle_begin_time', 1: 'open', 2: 'high',
3: 'low', 4: 'close', 5: 'volume'}, inplace=True) # 重命名
df = df[['candle_begin_time', 'open', 'high', 'low', 'close', 'volume']]

2023-08-15_223501.png

Need to pay attention to it, there may be time without, the file is only a few hundred bytes is empty file, delete it!