How to download multiple files simultaneously and trigger specific actions for each one done?
How to download multiple files simultaneously and trigger specific actions for each one done?
I need help with a feature I try to implement, unfortunately I'm not very comfortable with multithreading.
My script download 4 different files from internet, and calls a dedicated function for each one, then saving all.
The problem is that I'm doing it step by step, therefore I have to wait for each download to finish in order to proceed to the next one.
I see what I should do to solve this, but I don't succeed to code it.
Actual Behaviour:
url_list = [Url1, Url2, Url3, Url4]
files_list =
files_list.append(downloadFile(Url1))
handleFile(files_list[-1], type=0)
...
files_list.append(downloadFile(Url4))
handleFile(files_list[-1], type=3)
saveAll(files_list)
Needed Behaviour:
url_list = [Url1, Url2, Url3, Url4]
files_list =
for url in url_list:
callThread(files_list.append(downloadFile(url)), # function
handleFile(files_list[url.index], type=url.index) # trigger
#use a thread for downloading
#once file is downloaded, it triggers his associated function
#wait for all files to be treated
saveAll(files_list)
Thanks for your help !
1 Answer
1
Typical approach is to put the IO heavy part like fetching data over the internet and data processing into the same function:
import random
import threading
import time
from concurrent.futures import ThreadPoolExecutor
import requests
def fetch_and_process_file(url):
thread_name = threading.currentThread().name
print(thread_name, "fetch", url)
data = requests.get(url).text
# "process" result
time.sleep(random.random() / 4) # simulate work
print(thread_name, "process data from", url)
result = len(data) ** 2
return result
threads = 2
urls = ["https://google.com", "https://python.org", "https://pypi.org"]
executor = ThreadPoolExecutor(max_workers=threads)
with executor:
results = executor.map(fetch_and_process_file, urls)
print()
print("results:", list(results))
outputs:
ThreadPoolExecutor-0_0 fetch https://google.com
ThreadPoolExecutor-0_1 fetch https://python.org
ThreadPoolExecutor-0_0 process data from https://google.com
ThreadPoolExecutor-0_0 fetch https://pypi.org
ThreadPoolExecutor-0_0 process data from https://pypi.org
ThreadPoolExecutor-0_1 process data from https://python.org
I don't find any release of concurrent.features compatible with my python version(2.7), and I can't use 3.x for this function. Do you have another solution or may it have been backported?
– Chris Prolls
Sep 7 '18 at 13:14
Backport found > pypi.org/project/futures
– Chris Prolls
Sep 7 '18 at 13:23
I successfully implemented it, but I have to warn about asynchronous behaviors. For example, I initially used to put those files directly in a zipfile, but it's not thread-safe in python 2.7. Then I put them in a list, and compressed them one by one.
– Chris Prolls
Sep 10 '18 at 7:13
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Thanks, it seems to be what I need, I'll mark your post as answer once I acheive to implement it correctly.
– Chris Prolls
Sep 7 '18 at 12:17