Subprocess argument list to long
Subprocess argument list to long
I have a third party executable called by using subprocess.check_output unfortunately my argument list is too long and repeatedly calling it is much slower than calling it once with many arguments.
Slow due to making the command call many times:
def call_third_party_slow(third_party_path, files):
for file in files:
output = subprocess.check_output([third_party_path, "-z", file])
if "sought" in decode(output):
return False
return True
Fast but fails when there are many files:
def call_third_party_fast(third_party_path, files):
command = [third_party_path, "-z"]
command.extend(files)
output = subprocess.check_output(command)
if "sought" in decode(output):
return False
return True
Is there any easy way I can work around the command length limit or easily group the files to avoid exceeding the os dependent length?
Does the third-party tool support a directory option?
– loganasherjones
Aug 30 at 12:17
on windows, some applications can support wildcards (*.txt) which aren't expanded by windows. That can solve this issue too
– Jean-François Fabre
Aug 30 at 12:20
related: stackoverflow.com/questions/29801975/…
– Jean-François Fabre
Aug 30 at 12:42
3 Answers
3
You could batch the files list like this:
def batch_args(args, arg_max):
current_arg_length = 0
current_list =
for arg in args:
if current_arg_length + len(arg) + 1 > arg_max:
yield current_list
current_list = [arg]
current_arg_length = len(arg)
else:
current_list.append(arg)
current_arg_length += len(arg) + 1
if current_list:
yield current_list
So the method body would look like this:
os_limit = 10
for args in batch_args(files, os_limit):
command = [third_party_path, "-z"]
command.extend(args)
output = subprocess.check_output(command)
if "sought" in decode(output):
return False
return True
Two things I'm not sure about:
Adjust arg_max to what is possible. Probably there is some way of finding this out per OS. Here is some info about the max args size of some OSs. That site also states there is a 32k limit for windows.
Maybe there is a better way to do it using the subprocess library, but I'm not sure.
Also I'm not doing any exception handling (args in list longer than max size, etc.)
that's the method I'm using. The problem is that it batches using a fixed number of arguments, but that doesn't guarantee the final argument string length for each batch
– Jean-François Fabre
Aug 30 at 12:22
But the method included in my answer should take care of that. If the maximum is 10 and you have 13 args than the second batch will be of size 3. Or do you mean something else?
– Frederik Petersen
Aug 30 at 12:24
yes, I mean measuring the sum of the length of all argument strings. The limit isn't on the number of args but the total size of the command line
– Jean-François Fabre
Aug 30 at 12:25
Ah I see, I misunderstood then. Let me check. That should be possible similarly.
– Frederik Petersen
Aug 30 at 12:26
You still haeeto add one to each argument because the C-like representation on the OS level needs a zero byte string terminator byte after each argument. You also have to take into account any encoding.
len('💩')
is 1 but the UTF-8 encoding is 4 bytes.– tripleee
Aug 30 at 18:52
len('💩')
I solved this by using a temporary file on windows. For Linux the command could be executed as is.
Method to build the full command for the different plattforms:
import tempfile
temporary_file = 0
def make_full_command(base_command, files):
command = list(base_command)
if platform.system() == "Windows":
global temporary_file
temporary_file = tempfile.NamedTemporaryFile()
posix_files = map((lambda f: f.replace(os.sep, '/')),files)
temporary_file.write(str.encode(" ".join(posix_files)))
temporary_file.flush()
command.append("@" + temporary_file.name)
else:
command.extend(files)
return command
Usage of the file as a global variable ensures it is cleaned up after the execution.
This way I didn't have to find the max command length for different OSes
If you don't want to reinvent an optimal solution, use a tool which already implements exactly this: xargs
.
xargs
def call_third_party_slow(third_party_path, files):
result = subprocess.run(['xargs', '-r', '-0', third_party_path, '-z'],
stdin=''.join(files) + '', stdout=subprocess.PIPE,
check=True, universal_newlines=True)
if "sought" in result.stdout:
return False
return True
You'll notice I also switched to subprocess.run()
, which is available in Python 3.5+
subprocess.run()
If you do want to reimplement xargs
you will need to find the value of the kernel constant ARG_MAX
and build a command-line list whose size never exceeds this limit. Then you could check after each iteration if the output contains sought
, and quit immediately if it does.
xargs
ARG_MAX
sought
is xargs cross plattform?
– Bomaz
Aug 30 at 18:55
If you mean does it exist on Windows, sorry, I have no idea.
– tripleee
Aug 30 at 18:58
The
-0
option is a GNU extension so you might need to adjust for other POSIX platforms, or go with the solution in the other answer, which however also needs some tweaks, some of which are probably system-dependent as well.– tripleee
Aug 30 at 19:00
-0
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
I have done this empirically (grouping the arguments and running more than once). The windows maxsize is 32767, but not sure that all applications support that.
– Jean-François Fabre
Aug 30 at 12:12