How to download multiple files simultaneously and trigger specific actions for each one done?

I need help with a feature I try to implement, unfortunately I'm not very comfortable with multithreading.

My script download 4 different files from internet, and calls a dedicated function for each one, then saving all.
The problem is that I'm doing it step by step, therefore I have to wait for each download to finish in order to proceed to the next one.

I see what I should do to solve this, but I don't succeed to code it.

Actual Behaviour:

url_list = [Url1, Url2, Url3, Url4] files_list = files_list.append(downloadFile(Url1)) handleFile(files_list[-1], type=0) ... files_list.append(downloadFile(Url4)) handleFile(files_list[-1], type=3) saveAll(files_list)

Needed Behaviour:

url_list = [Url1, Url2, Url3, Url4] files_list = for url in url_list: callThread(files_list.append(downloadFile(url)), # function handleFile(files_list[url.index], type=url.index) # trigger #use a thread for downloading #once file is downloaded, it triggers his associated function #wait for all files to be treated saveAll(files_list)

Thanks for your help !

1 Answer
1

Typical approach is to put the IO heavy part like fetching data over the internet and data processing into the same function:

import random import threading import time from concurrent.futures import ThreadPoolExecutor import requests def fetch_and_process_file(url): thread_name = threading.currentThread().name print(thread_name, "fetch", url) data = requests.get(url).text # "process" result time.sleep(random.random() / 4) # simulate work print(thread_name, "process data from", url) result = len(data) ** 2 return result threads = 2 urls = ["https://google.com", "https://python.org", "https://pypi.org"] executor = ThreadPoolExecutor(max_workers=threads) with executor: results = executor.map(fetch_and_process_file, urls) print() print("results:", list(results))

outputs:

ThreadPoolExecutor-0_0 fetch https://google.com ThreadPoolExecutor-0_1 fetch https://python.org ThreadPoolExecutor-0_0 process data from https://google.com ThreadPoolExecutor-0_0 fetch https://pypi.org ThreadPoolExecutor-0_0 process data from https://pypi.org ThreadPoolExecutor-0_1 process data from https://python.org

Thanks, it seems to be what I need, I'll mark your post as answer once I acheive to implement it correctly.

– Chris Prolls
Sep 7 '18 at 12:17

I don't find any release of concurrent.features compatible with my python version(2.7), and I can't use 3.x for this function. Do you have another solution or may it have been backported?

– Chris Prolls
Sep 7 '18 at 13:14

Backport found > pypi.org/project/futures

– Chris Prolls
Sep 7 '18 at 13:23

I successfully implemented it, but I have to warn about asynchronous behaviors. For example, I initially used to put those files directly in a zipfile, but it's not thread-safe in python 2.7. Then I put them in a list, and compressed them one by one.

– Chris Prolls
Sep 10 '18 at 7:13

Thanks for contributing an answer to Stack Overflow!

But avoid …

To learn more, see our tips on writing great answers.

Required, but never shown

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt