Explain how to crawl the data down and save it as a csv

Methods

Find the web page, open F12, check the network monitoring content, find the corresponding xhr file, determine the request link and method, simulate the request and then process the data

Step one

Get the non-small rankings of centralized exchanges and some information about them

Here I’ll know the request URL and request method based on the method just described (since this is viewable without logging in, I’d say no API or anything like that)

Request URL: https://dncapi.aigopocket.com/api/v2/exchange/web-exchange?token=&page=1&pagesize=100&sort_type=exrank&asc=1&isinnovation=1&type=all&area=&webp=1

Request method: GET

Code

Now that we know the request URL and request method, let’s write the code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def fetch_data():
"""
This function is responsible for fetching data from a specific URL for an exchange.

The logic is as follows:
1. the target URL is defined.
2. sends an HTTP GET request to that URL using the requests library.
3. Check the returned HTTP status code.
- If the status code is 200 (i.e., the request was successful), the returned JSON data is parsed and returned.
- If the status code is not 200 (i.e., the request failed), print the status code and return None.
"""

url = "https://dncapi.aigopocket.com/api/v2/exchange/web-exchange?token=&page=1&pagesize=100&sort_type=exrank&asc=1&isinnovation=1&type=all&area=&webp=1" # 目标URL
response = requests.get(url) # Send an HTTP GET request to the target URL

if response.status_code == 200: # Check if the HTTP status code is 200 (request successful)
data = response.json() # Parsing the returned JSON data
return data # Returns parsed data
else:
print("Request failed with status code:", response.status_code) # Printing failed HTTP status codes
return None # Returns None if the request fails

data = fetch_data()
print(data)

Returned content (I usually do not target to view the data, so that I use word to show the content of the returned DATA)

404069

We can see that under data contains the data we need (information about the exchange)

So let’s extract him and do some processing on him (it’s all relatively simple, put it all together)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
import pandas as pd


def fetch_data():
"""
This function is responsible for fetching the data of an exchange from a specific URL.


The logic is as follows:
1. defines the target URL.
2. sends an HTTP GET request to that URL using the requests library.
3. Check the returned HTTP status code.
- If the status code is 200 (i.e., the request was successful), the returned JSON data is parsed and returned.
- If the status code is not 200 (i.e., the request failed), print the status code and return None.
"""


url = "https://dncapi.aigopocket.com/api/v2/exchange/web-exchange?token=&page=1&pagesize=100&sort_type=exrank&asc=1&isinnovation=1& type=all&area=&webp=1" # destination URL
response = requests.get(url) # send HTTP GET request to target URL


if response.status_code == 200: # Check if the HTTP status code is 200 (the request was successful)
data = response.json() # parse the returned JSON data
return data # Return the parsed data
data = response.json() # Parses the returned JSON data.
print("Request failed, status code: ", response.status_code) # Print the HTTP status code of the failed request.
return None # Returns None if the request failed

def transform_to_df(data).
"""
This function is responsible for transforming the input dictionary into a pandas DataFrame.


The logic is as follows:
1. check if the key 'data' is present in the input dictionary. 2.
2. if there is, use the value corresponding to this key to create a pandas DataFrame and return this DataFrame.
3. If not, print an error message and return None.
"""


if 'data' in data: # Check if the dictionary contains the key 'data'
df = pd.DataFrame(data['data']) # Create a DataFrame using the value corresponding to the 'data' key.
return df # return the created DataFrame
else.
print("Key 'data' not found") # Print an error message if the key 'data' is not in the dictionary.
return None # Return None to indicate that the conversion failed.


def rename_columns(df): # If there is no 'data' key in the dictionary, print an error message.
columns_mapping = {


'logo': 'Logo',
'rank': 'ranked',
'pairnum': 'number of pairs of transactions',
'volumn': 'number of transactions
'volumn': 'volume', 'volumn_btc'
'volumn_cny': 'Volume_CNY',
'change_volumn': 'volume_change',
'assets_usd': 'assets_usd',
'risk_level': 'Risk level'
}
df.rename(columns=columns_mapping, inplace=True)




def sort_by_assets_usd(df).
df['assets_usd'] = pd.to_numeric(df['assets_usd'], errors='coerce') # convert assets_usd columns to numeric, set to NaN if unable to do so
df.sort_values(by='Asset_Dollar', ascending=False, inplace=True) # Sort by Asset_Dollar column in descending order.
df.reset_index(drop=True, inplace=True) # reset indexes




def save_to_csv(df).
print("Successfully got exchange name from non-small")
# Optionally save or not
# df.to_csv('exchange_data.csv', index=False, encoding="GBK")
# print("Data has been saved as exchange_data.csv")




# Fetch and process the new data source
data = fetch_data()
if data is not None.
df = transform_to_df(data)
if df is not None: df = transform_to_df(data)
rename_columns(df) # rename the columns of the DataFrame
sort_by_assets_usd(df) # Sort the DataFrame by asset values
save_to_csv(df) # Save the DataFrame as a CSV file.



The final rendering:

157902

Note

The code I’ve given is not saved as a csv, because I’m passing the df directly into the requested function later on, and there’s no point in saving it.

Step 2

Now that we know the name of the exchange we want to request, we are ready to start the request.

First, we’ll use the same method to find out the inflow and outflow data of the exchange.

web url : Arkham (arkhamintelligence.com)

Search for binance, click on it and open F12, or go in, click F12 and refresh the page.

632226

804938

Well, at this point we know the request URL and the request method, start building the code

Request URL: https://api.arkhamintelligence.com/flow/entity/binance
Request method: GET

code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
def fetch_and_save_binance_data(df).
"""
This function is used to fetch and save data from Binance exchange.

Logical process:
1. filter out exchanges with assets greater than $50 million.
2. Normalize the names of these exchanges.
3. For each filtered exchange, launch an API request to get the data. 4.
4. parse the data returned by the API and save it as a CSV file. 5.
5. Perform subsequent processing of the CSV file.

Parameters:
df: DataFrame containing data about the exchanges and their assets

Return Value:
None
"""
# Step 1: Filter out exchanges with assets greater than 50 million dollars
high_value_entities = df[df['assets_$'] > 5e7]

# Step 2: Convert the names of these high value entities to lowercase and into a list
high_value_names = [name.lower() for name in high_value_entities['name'].values.tolist()]

# Create a dictionary of replacement names for some specific exchanges
name_replacements = {'coinbase pro': 'coinbase', 'mexc global': 'mexc', 'gate.io': 'gate-io'}
high_value_names = [name_replacements.get(name, name) for name in high_value_names]

# Step 3: Make an API request to get the data
for name in high_value_names.
time.sleep(0.5) # Pause for 0.5 seconds to avoid issuing the request too quickly
print(f "Fetching: {name} trade flow data")
url = f "https://api.arkhamintelligence.com/flow/entity/{name}"
response = requests.get(url)

dfs = {} # Use to store data from different networks
count_no_time = 0 # Used to keep track of the number of networks without a time field

# Step 4: Parse the data returned by the API and save it as CSV
if response.status_code == 200: # Parsing the data returned by the API and saving it as CSV.
print(f "Request successful for {name}!")
data = response.json()
total_networks = len(data)


dfs[network] = pd.DataFrame(records)

# Check for the existence of a time field
if 'time' in dfs[network].columns.
# Time field handling
dfs[network]['time'] = pd.to_datetime(dfs[network]['time'])
dfs[network]['time'] = dfs[network]['time'] + DateOffset(hours=12)
dfs[network]['time'] = dfs[network]['time'].dt.tz_localize(None)

# Create the folder where the data will be stored
folder_path = f "E:/Blockchain data acquisition/data/{name}fund flow history data/"
if not os.path.exists(folder_path):: os.makedirs(fs.folder_path)
os.makedirs(folder_path)

# Save as CSV
csv_path = os.path.join(folder_path, f"{network}.csv")
dfs[network].to_csv(csv_path, index=False, encoding="GBK")

# Step 5: Perform subsequent processing on the CSV file
rename_csv_columns(csv_path)
calculate_and_update_net_inflow(folder_path, network)
else.
count_no_time += 1

if count_no_time == total_networks.
print(f "No {name} exchange data in AR")
print(f "No {name} exchange data in AR")
print(f "Request failed for {name}, status code: {response.status_code}")

# Define a function to rename a column in a CSV file
def rename_csv_columns(csv_path).
# Use Pandas to read the CSV file into a DataFrame object
df = pd.read_csv(csv_path)

# Rename the columns in the DataFrame using the rename method.
# Parameter inplace=True implies modification of the original DataFrame
df.rename(columns={
'time': 'time', # rename 'time' to 'time'
'inflow': 'inflow', # rename 'inflow' to 'inflow'
'outflow': 'outflow', # rename 'outflow' to 'outflow'
'cumulativeInflow': 'cumulativeInflow', # rename 'cumulativeInflow' to 'cumulativeInflow'
'cumulativeOutflow': 'cumulativeOutflow' # rename 'cumulativeOutflow' to 'cumulativeOutflow'
}, inplace=True)

# Save the updated DataFrame back to a CSV file using Pandas, with the file encoding set to GBK
df.to_csv(csv_path, index=False, encoding="GBK")

# Define a function to calculate and update the net inflow
def calculate_and_update_net_inflow(folder_path, network).
# Construct the full path to the CSV file
csv_path = os.path.join(folder_path, f"{network}.csv")

# Use Pandas to read the CSV file into a DataFrame object with the file encoding set to GBK
df = pd.read_csv(csv_path, encoding="GBK")

# Check if 'inflow' and 'outflow' columns exist in the DataFrame
if 'inflows' in df.columns and 'outflows' in df.columns: # Calculate net inflows and add them to df.columns.
# Calculate the net inflows and store the results in a new column 'flows'
df['flows'] = df['inflows'] - df['outflows']

# Use Pandas to save the updated DataFrame back to a CSV file with the file encoding set to GBK
df.to_csv(csv_path, index=False, encoding="GBK")

Here’s what the code does: traverses the request for URLs based on the name of the exchange we just read and saves them as a csv

rendering :

296777

Note

This data is updated every day at 12 noon (it should be, it wasn’t updated at 8 am anyway, I looked at it at 2 pm and found it was updated)

Summary

So far, we already know the method of obtaining and how to process into csv, then we can do what we want according to this.

I’ve added detailed comments to the code, so you can read the code carefully (it’s not that hard to write).

The way to crawl the data is through F12 to find the corresponding data request URL and the way, and according to the return data for data processing, and ultimately saved as csv

I uploaded the file, is able to run successfully, you just need to change the path to your own path can be used!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
import os
import json
import time
import requests
import datetime
import matplotlib
import pandas as pd
from pandas.tseries.offsets import DateOffset


def fetch_data():
"""
This function is responsible for fetching data from a specific URL for an exchange.

The logic is as follows:
1. the target URL is defined.
2. sends an HTTP GET request to that URL using the requests library.
3. Check the returned HTTP status code.
- If the status code is 200 (i.e., the request was successful), the returned JSON data is parsed and returned.
- If the status code is not 200 (i.e., the request failed), print the status code and return None.
"""

url = "https://dncapi.aigopocket.com/api/v2/exchange/web-exchange?token=&page=1&pagesize=100&sort_type=exrank&asc=1&isinnovation=1&type=all&area=&webp=1" # 目标URL
response = requests.get(url) # Send an HTTP GET request to the target URL

if response.status_code == 200: # Check if the HTTP status code is 200 (request successful)
data = response.json() # Parsing the returned JSON data
return data # Returns parsed data
else:
print("Request failed with status code:", response.status_code) # Printing failed HTTP status codes
return None # Returns None if the request fails


def transform_to_df(data):

if 'data' in data:
df = pd.DataFrame(data['data'])
return df
else:
print("未找到'data'键")
return None

def rename_columns(df):
columns_mapping = {
'id': 'identifier',
'name': 'name',
'logo': 'Logo',
'rank': 'ranked',
'pairnum': 'number of pairs of transactions', 'volumn': 'number of transactions
'volumn': 'volume', 'volumn_btc'

'volumn_cny': 'Volume_CNY',

'change_volumn': 'volume_change',


'assets_usd': 'assets_usd', 'assets_usd'.

'risk_level': 'Risk level'
}
df.rename(columns=columns_mapping, inplace=True)


def sort_by_assets_usd(df).
df['assets_usd'] = pd.to_numeric(df['assets_usd'], errors='coerce') # convert assets_usd columns to numeric, set to NaN if unable to do so
df.sort_values(by='Asset_Dollar', ascending=False, inplace=True) # Sort by Asset_Dollar column in descending order.
df.reset_index(drop=True, inplace=True) # reset indexes


def save_to_csv(df).
print("Successfully got exchange name from non-small")
# Optionally save or not
# df.to_csv('exchange_data.csv', index=False, encoding="GBK")
# print("Data has been saved as exchange_data.csv")


def fetch_and_save_binance_data(df).
"""
This function is used to fetch and save Binance exchange data.

Logical process:
1. filter out exchanges with assets greater than $50 million.
2. Normalize the names of these exchanges.
3. For each filtered exchange, launch an API request to get the data. 4.
4. parse the data returned by the API and save it as a CSV file. 5.
5. Perform subsequent processing of the CSV file.

Parameters:
df: DataFrame containing data about the exchanges and their assets

Return Value:
None
"""
# Step 1: Filter out exchanges with assets greater than 50 million dollars
high_value_entities = df[df['assets_$'] > 5e7]

# Step 2: Convert the names of these high value entities to lowercase and into a list
high_value_names = [name.lower() for name in high_value_entities['name'].values.tolist()]

# Create a dictionary of replacement names for some specific exchanges
name_replacements = {'coinbase pro': 'coinbase', 'mexc global': 'mexc', 'gate.io': 'gate-io'}
high_value_names = [name_replacements.get(name, name) for name in high_value_names]

# Step 3: Make an API request to get the data
for name in high_value_names.
time.sleep(0.5) # Pause for 0.5 seconds to avoid issuing the request too quickly
print(f "Fetching: {name} trade flow data")
url = f "https://api.arkhamintelligence.com/flow/entity/{name}"
response = requests.get(url)

dfs = {} # Use to store data from different networks
count_no_time = 0 # Used to keep track of the number of networks without a time field

# Step 4: Parse the data returned by the API and save it as CSV
if response.status_code == 200: # Parsing the data returned by the API and saving it as CSV.
print(f "Request successful for {name}!")
data = response.json()
total_networks = len(data)


dfs[network] = pd.DataFrame(records)

# Check for the existence of a time field
if 'time' in dfs[network].columns.
# Time field handling
dfs[network]['time'] = pd.to_datetime(dfs[network]['time'])
dfs[network]['time'] = dfs[network]['time'] + DateOffset(hours=12)
dfs[network]['time'] = dfs[network]['time'].dt.tz_localize(None)

# Create the folder where the data will be stored
folder_path = f "E:/Blockchain data acquisition/data/{name}fund flow history data/"
if not os.path.exists(folder_path):: os.makedirs(fs.folder_path)
os.makedirs(folder_path)

# Save as CSV
csv_path = os.path.join(folder_path, f"{network}.csv")
dfs[network].to_csv(csv_path, index=False, encoding="GBK")

# Step 5: Perform subsequent processing on the CSV file
rename_csv_columns(csv_path)
calculate_and_update_net_inflow(folder_path, network)
else.
count_no_time += 1

if count_no_time == total_networks.
print(f "No {name} exchange data in AR")
print(f "No {name} exchange data in AR")
print(f "Request failed for {name}, status code: {response.status_code}")


# Define a function to rename a column in a CSV file
def rename_csv_columns(csv_path).
# Use Pandas to read the CSV file into a DataFrame object
df = pd.read_csv(csv_path)

# Rename the columns in the DataFrame using the rename method.
# Parameter inplace=True implies modification of the original DataFrame
df.rename(columns={
'time': 'time', # rename 'time' to 'time'
'inflow': 'inflow', # rename 'inflow' to 'inflow'
'outflow': 'outflow', # rename 'outflow' to 'outflow'
'cumulativeInflow': 'cumulativeInflow', # rename 'cumulativeInflow' to 'cumulativeInflow'
'cumulativeOutflow': 'cumulativeOutflow' # rename 'cumulativeOutflow' to 'cumulativeOutflow'
}, inplace=True)

# Save the updated DataFrame back to a CSV file using Pandas, with the file encoding set to GBK
df.to_csv(csv_path, index=False, encoding="GBK")


# Define a function to calculate and update the net inflow
def calculate_and_update_net_inflow(folder_path, network).
# Construct the full path to the CSV file
csv_path = os.path.join(folder_path, f"{network}.csv")

# Use Pandas to read the CSV file into a DataFrame object with the file encoding set to GBK
df = pd.read_csv(csv_path, encoding="GBK")

# Check if 'inflow' and 'outflow' columns exist in the DataFrame
if 'inflows' in df.columns and 'outflows' in df.columns: # Calculate net inflows and add them to df.columns.
# Calculate the net inflows and store the results in a new column 'flows'
df['flows'] = df['inflows'] - df['outflows']

# Use Pandas to save the updated DataFrame back to a CSV file with the file encoding set to GBK
df.to_csv(csv_path, index=False, encoding="GBK")


# Fetch and process the new data source
data = fetch_data()
if data is not None: df = transform_to_df
df = transform_to_df(data)
if df is not None: df = transform_to_df(data)
rename_columns(df) # rename the columns of the DataFrame
sort_by_assets_usd(df) # Sort the DataFrame by asset values
save_to_csv(df) # Save the DataFrame as a CSV file
fetch_and_save_binance_data(df) # fetch and save Binance data