Generating all permutations of 1 digit, 2 equal letters and 2 different letters efficiently

I created a function that generates all the combinations of 1 digit, 2 equal letters and 2 different letters.

I reduced the numbers and letters to make it more easy to calculate:

letters = "bcdfghjklmnpqrstvwxz" digits = "2456789"

There are 1,436,400 possibilities because $$5choose17choose14choose220choose1·19·18 = 1,436,400$$ where

$5choose17choose1$ — Choosing place for one number and choosing the number

$4choose220choose1$ — Choosing 2 places for the equal letter and choosing the letter

$19·18$ — Permutations of the rest of the letters

Example for CORRECT combinations:

1abac 1aabd c8ldc

Example for INCORRECT combinations:

1aaaa -> incorrect because there are more than 2 equal letters 1aaab -> Same as the previous 1abcd -> No 2 equal letters

I wrote the following code

from itertools import permutations, combinations import time # Generate 1 digit, 2 equal letters and 2 different letters def generate_1_digit_2_equal_letters_2_different_letters(): alpha = "bcdfghjklmnpqrstvwxz" digits = "2456789" places = '01234' s = ['-', '-', '-', '-', '-'] combos = # Choosing a place for 1 digit for dig_indx in range(0, 5): # Choosing a digit for digit in digits: s[dig_indx] = digit # Creating equal letter indxes new_places = places.replace(str(dig_indx), '') # We are using 'combinations' because 'bb' on indexes 1,2 is the same as 2,1 equal_letter_indxs = combinations(new_places, 2) equal_letter_indxs2 = for x in equal_letter_indxs: equal_letter_indxs2.append(''.join(x)) # Choosing the equal letter for equal_letter in alpha: # Choosing a two places for the equal letter for i in equal_letter_indxs2: if int(i[0]) != dig_indx and int(i[1]) != dig_indx: s[int(i[0])] = equal_letter s[int(i[1])] = equal_letter # Creating the rest of the letters places two_places = places.replace(str(dig_indx), '') two_places = two_places.replace(i[0], '') two_places = two_places.replace(i[1], '') all_places_combo = permutations(two_places, 2) # Choosing the places for the two other letters for two_places in all_places_combo: letter1_history = # Choosing the first letter the different from all the other for letter1 in alpha: if letter1 != equal_letter: letter1_history[letter1] = letter1 # Choosing the second letter that different from all the other for letter2 in alpha: if letter2 != equal_letter and letter2 != letter1: found = False for k in letter1_history.keys(): if letter2 == k: found = True if not found: s[int(two_places[0])] = letter1 s[int(two_places[1])] = letter2 #print(''.join(s)) combos.append(''.join(s)) return combos # Should be 1,436,400 start_time = time.time() print(len(generate_1_digit_2_equal_letters_2_different_letters())) print("--- %s seconds ---" % (time.time() - start_time))

This is not efficient because when calculating it:

for dig_indx in range(0, 5) => 5 times for digit in digits => 7 times for x in equal_letter_indxs => 2 times for equal_letter in alpha => 20 times for i in equal_letter_indxs2 => 2 times for two_places in all_places_combo => 2 times for letter1 in alpha => 20 times for letter2 in alpha => 20 times for k in letter1_history.keys() => 20 times in worst case 5*7*(2+20*2*2*20*20*20 = ~640,002 iterations

It works, but I though maybe you have suggestions how to make it more efficient. Feel free to make changes in my code or create a new better code, I will be glad to see different approachs (maybe recursion ? )

Why are 0, 1, 3 not included as numbers? And why are some letters missing?

– maxb
Sep 7 '18 at 11:03

0, 1, 3

@maxb to make the calulcations with less combinations. Instead of 36 I just reduced it to 27.

– E235
Sep 7 '18 at 11:17

@TobySpeight The order is matter: 1aabc is not the same as 1aacb. I will update the description.

– E235
Sep 7 '18 at 11:18

1aabc

1aacb

Welcome to Code Review. To be able to help you as much as possible it would be good to know why you need this. What is the purpose of generating all combinations? Do you really need all of them? If so, why? The answers to these questions might help us give better suggestions. ("Just because I can and want to" is a perfectly valid answer)

– Simon Forsberg♦
Sep 7 '18 at 15:03

2 Answers
2

The main thing that I see when I look at your code is that it's too complicated. You have one big function, and I won't even count the amount of nested loops you have.

The first thing I'd do is to simplify this, and split it into separate functions. The second thing I'd to is to improve it.

To improve it, I used this question on StackOverflow to generate unique permutations. The rest uses what you wrote above, but in a clearer presentation:

import itertools alpha = "bcdfghjklmnpqrstvwxz" digits = "2456789" def get_letters_greater(letter, forbidden): for a in alpha: if a > letter and a not in forbidden: yield a def get_letters(forbidden = set()): for a in alpha: if a not in forbidden: yield a def get_numbers(): for d in digits: yield d order = [ 0,2,3,6,8,9,12,13,14,15,16,17,24,26,27,30,32,33,36,37,38, 39,40,41,48,50,51,54,56,57,60,61,62,63,64,65,72,73,74,75, 76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95 ] def get_permutations(s): all_perms = list(itertools.permutations(s)) unique_perms = [all_perms[i] for i in order] return unique_perms def get_all_combinations(): strings = for digit in get_numbers(): for double_letter in get_letters(): forbidden = set(double_letter) for letter_1 in get_letters(forbidden): two_forbidden = set(double_letter + letter_1) for letter_2 in get_letters_greater(letter_1, two_forbidden): s = digit + letter_1 + letter_2 + double_letter*2 strings.append(s) flat_list = [item for s in strings for item in get_permutations(s)] return flat_list

all_combinations = get_all_combinations()

First, I generate all unique combinations. Then I generate all unique permutations for each combination. Since I know that the combinations are unique, I don't have to check for duplicates. However, I can't just generate all $5!$ permutations, since there's a duplicate letter. That's why I used the code to handle that.

This code generates a list with exactly 1436400 elements, meaning that no elements are generated twice. I don't have to check for duplicates, since my code doesn't generate them.

EDIT:

From the comments, it seems as if this approach is slightly slower than the original solution. I profiled it quickly, and it seems as if the unique permutation code is the culprit. Since there is only one pair of letters, the overhead from creating unique permutations is worse than just filtering them out. Thus, I've changed the code to just use itertools. It runs in less than a second, and produces the same result.

itertools

I also used the fact that itertools always gives the results in the same order, rendering the use of set in get_permutations() obsolete. This version is more complicated and not as readable, but it performs better. I get an average execution time of around 0.45s on my machine. which makes it about 6-7 times faster than the original solution.

set

get_permutations()

To get the order list, I ran itertools.permutations('abcdd') and checked how the duplicate elements paired up. Since the ordering is the same every time, I could just save the indices of the unique elements that are returned, and use that to access the permutations from itertools. That way, I'm guaranteed to get all unique permutations, as long as the input is on the same form (with the duplicate letters in the end of the 5 letter string). Note that this solution breaks for any other input, and in that case you should use return list(set(itertools.permutations(s))).

order

itertools.permutations('abcdd')

itertools

return list(set(itertools.permutations(s)))

Last edit (I promise):

I realized that I had some overhead from first creating all permutations, and then flattening the list. I combined the two steps into one list comprehension, and cut another 25% off the runtime.

If you want to get the result as strings rather than tuples of characters, then use unique_perms = [''.join(all_perms[i]) for i in order] instead of the current definition. It makes the runtime 0.1s slower.

unique_perms = [''.join(all_perms[i]) for i in order]

yes you right regarding the function. I am usually doing it but I was interesting to know how to improve the main algorithm so I didn't do it. It looks much cleaner. I checked the times, it runs on ~5 seconds, mine runs on ~3 seconds and because this is not so much difference, a clean code is a better instead off a mass one.

– E235
Sep 7 '18 at 11:53

I liked the use of set(), it sounds more logically to prevent generating the "forbidden" combinations.

– E235
Sep 7 '18 at 11:56

set()

If time is a constraint, you can mark your question with the performance tag. I didn't optimize for speed, but using my code you could probably make it quite a bit faster. The join could be a culprit, as could the perm_unique. If I was to make it faster, I'd use arrays of characters rather than strings, and manipulate everything that way, or I'd choose a different language (or both).

– maxb
Sep 7 '18 at 11:57

performance

join

perm_unique

@E235 I updated the answer, now it runs in less than a second.

– maxb
Sep 7 '18 at 12:12

Oops, I forgot to include a proper explanation for that, I updated my answer.

– maxb
Sep 7 '18 at 13:06

The calculation in the post of the number of iterations is not correct. Since the returned list has 1,436,400 elements, there must have been exactly that many iterations that reached the combos.append line. Here's one mistake I spotted:

combos.append

for i in equal_letter_indxs2 => 2 times

The list equal_letter_indxs2 has $4choose2=6$ elements, not 2.

equal_letter_indxs2

Python provides the module timeit for timing function calls and short pieces of code:

timeit

>>> from timeit import timeit >>> timeit(generate_1_digit_2_equal_letters_2_different_letters, number=1) 3.0850040439981967

Since the lists you are building are short, it makes sense to construct them by inserting elements. (For long lists this could lead to poor performance, but 5 elements is too short for this to matter.) This approach avoids the need to maintain a set of places that gets adjusted as you go through the choices, and so simplifies the logic:

from itertools import combinations, permutations LETTERS = 'bcdfghjklmnpqrstvwxz' DIGITS = '2456789' def aabc1(letters=LETTERS, digits=DIGITS): """Generate the distinct 5-character strings consisting of four letters (exactly one of which is repeated) and one digit. """ for a, b, c in permutations(letters, 3): # Three letters (a repeated). for i, j in combinations(range(4), 2): # Positions for letter a. for d in digits: # One digit. for k in range(5): # Positions for the digit. result = [b, c] result[i:i] = a, result[j:j] = a, result[k:k] = d, yield ''.join(result)

Since the loops are now all independent of each other, we can turn them into a single loop using itertools.product:

itertools.product

from itertools import combinations, permutations, product LETTERS = 'bcdfghjklmnpqrstvwxz' DIGITS = '2456789' def aabc1(letters=LETTERS, digits=DIGITS): """Generate the distinct 5-character strings consisting of four letters (exactly one of which is repeated) and one digit. """ for (a, b, c), (i, j), d, k in product( permutations(letters, 3), # Three letters (a repeated). combinations(range(4), 2), # Positions for the repeated letter. digits, # One digit. range(5)): # Positions for the digit. result = [b, c] result[i:i] = a, result[j:j] = a, result[k:k] = d, yield ''.join(result)

This is about 3 times as fast as the code in the post:

>>> timeit(lambda:list(aabc1()), number=1) 1.0679944080184214

It is also very elegant, nice !

– E235
Sep 7 '18 at 13:04

I like that you just yield the results and don't actually generate a list, I think that is a big speed improvement.

– Simon Forsberg♦
Sep 7 '18 at 15:10

yield

@SimonForsberg: It depends on what happens to a string. If it's output asynchronously then that could be parallelized with generating the next string, saving some time. But if it's processed within Python, then generation merely reduces memory usage, not runtime.

– Gareth Rees
Sep 7 '18 at 15:22

Thanks for contributing an answer to Code Review Stack Exchange!

But avoid …

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Required, but never shown

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt