How to download multiple files simultaneously and trigger specific actions for each one done?

How to download multiple files simultaneously and trigger specific actions for each one done?



I need help with a feature I try to implement, unfortunately I'm not very comfortable with multithreading.



My script download 4 different files from internet, and calls a dedicated function for each one, then saving all.
The problem is that I'm doing it step by step, therefore I have to wait for each download to finish in order to proceed to the next one.



I see what I should do to solve this, but I don't succeed to code it.



Actual Behaviour:


url_list = [Url1, Url2, Url3, Url4]
files_list =

files_list.append(downloadFile(Url1))
handleFile(files_list[-1], type=0)
...
files_list.append(downloadFile(Url4))
handleFile(files_list[-1], type=3)
saveAll(files_list)



Needed Behaviour:


url_list = [Url1, Url2, Url3, Url4]
files_list =

for url in url_list:
callThread(files_list.append(downloadFile(url)), # function
handleFile(files_list[url.index], type=url.index) # trigger
#use a thread for downloading
#once file is downloaded, it triggers his associated function
#wait for all files to be treated
saveAll(files_list)



Thanks for your help !




1 Answer
1



Typical approach is to put the IO heavy part like fetching data over the internet and data processing into the same function:


import random
import threading
import time
from concurrent.futures import ThreadPoolExecutor

import requests


def fetch_and_process_file(url):
thread_name = threading.currentThread().name

print(thread_name, "fetch", url)
data = requests.get(url).text

# "process" result
time.sleep(random.random() / 4) # simulate work
print(thread_name, "process data from", url)

result = len(data) ** 2
return result


threads = 2
urls = ["https://google.com", "https://python.org", "https://pypi.org"]

executor = ThreadPoolExecutor(max_workers=threads)
with executor:
results = executor.map(fetch_and_process_file, urls)

print()
print("results:", list(results))



outputs:


ThreadPoolExecutor-0_0 fetch https://google.com
ThreadPoolExecutor-0_1 fetch https://python.org
ThreadPoolExecutor-0_0 process data from https://google.com
ThreadPoolExecutor-0_0 fetch https://pypi.org
ThreadPoolExecutor-0_0 process data from https://pypi.org
ThreadPoolExecutor-0_1 process data from https://python.org






Thanks, it seems to be what I need, I'll mark your post as answer once I acheive to implement it correctly.

– Chris Prolls
Sep 7 '18 at 12:17






I don't find any release of concurrent.features compatible with my python version(2.7), and I can't use 3.x for this function. Do you have another solution or may it have been backported?

– Chris Prolls
Sep 7 '18 at 13:14







Backport found > pypi.org/project/futures

– Chris Prolls
Sep 7 '18 at 13:23






I successfully implemented it, but I have to warn about asynchronous behaviors. For example, I initially used to put those files directly in a zipfile, but it's not thread-safe in python 2.7. Then I put them in a list, and compressed them one by one.

– Chris Prolls
Sep 10 '18 at 7:13



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)