Subprocess argument list to long

Subprocess argument list to long



I have a third party executable called by using subprocess.check_output unfortunately my argument list is too long and repeatedly calling it is much slower than calling it once with many arguments.



Slow due to making the command call many times:


def call_third_party_slow(third_party_path, files):
for file in files:
output = subprocess.check_output([third_party_path, "-z", file])
if "sought" in decode(output):
return False
return True



Fast but fails when there are many files:


def call_third_party_fast(third_party_path, files):
command = [third_party_path, "-z"]
command.extend(files)
output = subprocess.check_output(command)
if "sought" in decode(output):
return False
return True



Is there any easy way I can work around the command length limit or easily group the files to avoid exceeding the os dependent length?





I have done this empirically (grouping the arguments and running more than once). The windows maxsize is 32767, but not sure that all applications support that.
– Jean-François Fabre
Aug 30 at 12:12






Does the third-party tool support a directory option?
– loganasherjones
Aug 30 at 12:17





on windows, some applications can support wildcards (*.txt) which aren't expanded by windows. That can solve this issue too
– Jean-François Fabre
Aug 30 at 12:20





related: stackoverflow.com/questions/29801975/…
– Jean-François Fabre
Aug 30 at 12:42




3 Answers
3



You could batch the files list like this:


def batch_args(args, arg_max):
current_arg_length = 0
current_list =
for arg in args:
if current_arg_length + len(arg) + 1 > arg_max:
yield current_list
current_list = [arg]
current_arg_length = len(arg)
else:
current_list.append(arg)
current_arg_length += len(arg) + 1
if current_list:
yield current_list



So the method body would look like this:


os_limit = 10
for args in batch_args(files, os_limit):
command = [third_party_path, "-z"]
command.extend(args)
output = subprocess.check_output(command)
if "sought" in decode(output):
return False
return True



Two things I'm not sure about:



Adjust arg_max to what is possible. Probably there is some way of finding this out per OS. Here is some info about the max args size of some OSs. That site also states there is a 32k limit for windows.



Maybe there is a better way to do it using the subprocess library, but I'm not sure.



Also I'm not doing any exception handling (args in list longer than max size, etc.)





that's the method I'm using. The problem is that it batches using a fixed number of arguments, but that doesn't guarantee the final argument string length for each batch
– Jean-François Fabre
Aug 30 at 12:22





But the method included in my answer should take care of that. If the maximum is 10 and you have 13 args than the second batch will be of size 3. Or do you mean something else?
– Frederik Petersen
Aug 30 at 12:24





yes, I mean measuring the sum of the length of all argument strings. The limit isn't on the number of args but the total size of the command line
– Jean-François Fabre
Aug 30 at 12:25





Ah I see, I misunderstood then. Let me check. That should be possible similarly.
– Frederik Petersen
Aug 30 at 12:26





You still haeeto add one to each argument because the C-like representation on the OS level needs a zero byte string terminator byte after each argument. You also have to take into account any encoding. len('💩') is 1 but the UTF-8 encoding is 4 bytes.
– tripleee
Aug 30 at 18:52



len('💩')



I solved this by using a temporary file on windows. For Linux the command could be executed as is.



Method to build the full command for the different plattforms:


import tempfile

temporary_file = 0
def make_full_command(base_command, files):
command = list(base_command)

if platform.system() == "Windows":
global temporary_file
temporary_file = tempfile.NamedTemporaryFile()
posix_files = map((lambda f: f.replace(os.sep, '/')),files)
temporary_file.write(str.encode(" ".join(posix_files)))
temporary_file.flush()
command.append("@" + temporary_file.name)
else:
command.extend(files)
return command



Usage of the file as a global variable ensures it is cleaned up after the execution.



This way I didn't have to find the max command length for different OSes



If you don't want to reinvent an optimal solution, use a tool which already implements exactly this: xargs.


xargs


def call_third_party_slow(third_party_path, files):
result = subprocess.run(['xargs', '-r', '-0', third_party_path, '-z'],
stdin=''.join(files) + '', stdout=subprocess.PIPE,
check=True, universal_newlines=True)
if "sought" in result.stdout:
return False
return True



You'll notice I also switched to subprocess.run(), which is available in Python 3.5+


subprocess.run()



If you do want to reimplement xargs you will need to find the value of the kernel constant ARG_MAX and build a command-line list whose size never exceeds this limit. Then you could check after each iteration if the output contains sought, and quit immediately if it does.


xargs


ARG_MAX


sought





is xargs cross plattform?
– Bomaz
Aug 30 at 18:55





If you mean does it exist on Windows, sorry, I have no idea.
– tripleee
Aug 30 at 18:58





The -0 option is a GNU extension so you might need to adjust for other POSIX platforms, or go with the solution in the other answer, which however also needs some tweaks, some of which are probably system-dependent as well.
– tripleee
Aug 30 at 19:00


-0



Required, but never shown



Required, but never shown






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)