Generating all permutations of 1 digit, 2 equal letters and 2 different letters efficiently

Generating all permutations of 1 digit, 2 equal letters and 2 different letters efficiently



I created a function that generates all the combinations of 1 digit, 2 equal letters and 2 different letters.



I reduced the numbers and letters to make it more easy to calculate:


letters = "bcdfghjklmnpqrstvwxz"
digits = "2456789"



There are 1,436,400 possibilities because $$5choose17choose14choose220choose1·19·18 = 1,436,400$$ where



$5choose17choose1$ — Choosing place for one number and choosing the number

$4choose220choose1$ — Choosing 2 places for the equal letter and choosing the letter

$19·18$ — Permutations of the rest of the letters



Example for CORRECT combinations:


1abac
1aabd
c8ldc



Example for INCORRECT combinations:


1aaaa -> incorrect because there are more than 2 equal letters
1aaab -> Same as the previous
1abcd -> No 2 equal letters



I wrote the following code


from itertools import permutations, combinations
import time

# Generate 1 digit, 2 equal letters and 2 different letters
def generate_1_digit_2_equal_letters_2_different_letters():
alpha = "bcdfghjklmnpqrstvwxz"
digits = "2456789"
places = '01234'
s = ['-', '-', '-', '-', '-']
combos =
# Choosing a place for 1 digit
for dig_indx in range(0, 5):
# Choosing a digit
for digit in digits:
s[dig_indx] = digit
# Creating equal letter indxes
new_places = places.replace(str(dig_indx), '')
# We are using 'combinations' because 'bb' on indexes 1,2 is the same as 2,1
equal_letter_indxs = combinations(new_places, 2)
equal_letter_indxs2 =
for x in equal_letter_indxs:
equal_letter_indxs2.append(''.join(x))
# Choosing the equal letter
for equal_letter in alpha:
# Choosing a two places for the equal letter
for i in equal_letter_indxs2:
if int(i[0]) != dig_indx and int(i[1]) != dig_indx:
s[int(i[0])] = equal_letter
s[int(i[1])] = equal_letter
# Creating the rest of the letters places
two_places = places.replace(str(dig_indx), '')
two_places = two_places.replace(i[0], '')
two_places = two_places.replace(i[1], '')
all_places_combo = permutations(two_places, 2)
# Choosing the places for the two other letters
for two_places in all_places_combo:
letter1_history =
# Choosing the first letter the different from all the other
for letter1 in alpha:
if letter1 != equal_letter:
letter1_history[letter1] = letter1
# Choosing the second letter that different from all the other
for letter2 in alpha:
if letter2 != equal_letter and letter2 != letter1:
found = False
for k in letter1_history.keys():
if letter2 == k:
found = True
if not found:
s[int(two_places[0])] = letter1
s[int(two_places[1])] = letter2
#print(''.join(s))
combos.append(''.join(s))
return combos

# Should be 1,436,400
start_time = time.time()
print(len(generate_1_digit_2_equal_letters_2_different_letters()))
print("--- %s seconds ---" % (time.time() - start_time))



This is not efficient because when calculating it:


for dig_indx in range(0, 5) => 5 times
for digit in digits => 7 times
for x in equal_letter_indxs => 2 times
for equal_letter in alpha => 20 times
for i in equal_letter_indxs2 => 2 times
for two_places in all_places_combo => 2 times
for letter1 in alpha => 20 times
for letter2 in alpha => 20 times
for k in letter1_history.keys() => 20 times in worst case


5*7*(2+20*2*2*20*20*20 = ~640,002 iterations



It works, but I though maybe you have suggestions how to make it more efficient. Feel free to make changes in my code or create a new better code, I will be glad to see different approachs (maybe recursion ? )





$begingroup$
Why are 0, 1, 3 not included as numbers? And why are some letters missing?
$endgroup$
– maxb
Sep 7 '18 at 11:03


0, 1, 3





$begingroup$
@maxb to make the calulcations with less combinations. Instead of 36 I just reduced it to 27.
$endgroup$
– E235
Sep 7 '18 at 11:17





$begingroup$
@TobySpeight The order is matter: 1aabc is not the same as 1aacb. I will update the description.
$endgroup$
– E235
Sep 7 '18 at 11:18


1aabc


1aacb





$begingroup$
Welcome to Code Review. To be able to help you as much as possible it would be good to know why you need this. What is the purpose of generating all combinations? Do you really need all of them? If so, why? The answers to these questions might help us give better suggestions. ("Just because I can and want to" is a perfectly valid answer)
$endgroup$
– Simon Forsberg
Sep 7 '18 at 15:03




2 Answers
2



The main thing that I see when I look at your code is that it's too complicated. You have one big function, and I won't even count the amount of nested loops you have.



The first thing I'd do is to simplify this, and split it into separate functions. The second thing I'd to is to improve it.



To improve it, I used this question on StackOverflow to generate unique permutations. The rest uses what you wrote above, but in a clearer presentation:


import itertools

alpha = "bcdfghjklmnpqrstvwxz"
digits = "2456789"
def get_letters_greater(letter, forbidden):
for a in alpha:
if a > letter and a not in forbidden:
yield a

def get_letters(forbidden = set()):
for a in alpha:
if a not in forbidden:
yield a

def get_numbers():
for d in digits:
yield d

order = [
0,2,3,6,8,9,12,13,14,15,16,17,24,26,27,30,32,33,36,37,38,
39,40,41,48,50,51,54,56,57,60,61,62,63,64,65,72,73,74,75,
76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95
]

def get_permutations(s):
all_perms = list(itertools.permutations(s))
unique_perms = [all_perms[i] for i in order]
return unique_perms

def get_all_combinations():
strings =
for digit in get_numbers():
for double_letter in get_letters():
forbidden = set(double_letter)
for letter_1 in get_letters(forbidden):
two_forbidden = set(double_letter + letter_1)
for letter_2 in get_letters_greater(letter_1, two_forbidden):
s = digit + letter_1 + letter_2 + double_letter*2
strings.append(s)

flat_list = [item for s in strings for item in get_permutations(s)]
return flat_list



all_combinations = get_all_combinations()



First, I generate all unique combinations. Then I generate all unique permutations for each combination. Since I know that the combinations are unique, I don't have to check for duplicates. However, I can't just generate all $5!$ permutations, since there's a duplicate letter. That's why I used the code to handle that.



This code generates a list with exactly 1436400 elements, meaning that no elements are generated twice. I don't have to check for duplicates, since my code doesn't generate them.



EDIT:



From the comments, it seems as if this approach is slightly slower than the original solution. I profiled it quickly, and it seems as if the unique permutation code is the culprit. Since there is only one pair of letters, the overhead from creating unique permutations is worse than just filtering them out. Thus, I've changed the code to just use itertools. It runs in less than a second, and produces the same result.


itertools



I also used the fact that itertools always gives the results in the same order, rendering the use of set in get_permutations() obsolete. This version is more complicated and not as readable, but it performs better. I get an average execution time of around 0.45s on my machine. which makes it about 6-7 times faster than the original solution.


set


get_permutations()



To get the order list, I ran itertools.permutations('abcdd') and checked how the duplicate elements paired up. Since the ordering is the same every time, I could just save the indices of the unique elements that are returned, and use that to access the permutations from itertools. That way, I'm guaranteed to get all unique permutations, as long as the input is on the same form (with the duplicate letters in the end of the 5 letter string). Note that this solution breaks for any other input, and in that case you should use return list(set(itertools.permutations(s))).


order


itertools.permutations('abcdd')


itertools


return list(set(itertools.permutations(s)))



Last edit (I promise):



I realized that I had some overhead from first creating all permutations, and then flattening the list. I combined the two steps into one list comprehension, and cut another 25% off the runtime.



If you want to get the result as strings rather than tuples of characters, then use unique_perms = [''.join(all_perms[i]) for i in order] instead of the current definition. It makes the runtime 0.1s slower.


unique_perms = [''.join(all_perms[i]) for i in order]





$begingroup$
yes you right regarding the function. I am usually doing it but I was interesting to know how to improve the main algorithm so I didn't do it. It looks much cleaner. I checked the times, it runs on ~5 seconds, mine runs on ~3 seconds and because this is not so much difference, a clean code is a better instead off a mass one.
$endgroup$
– E235
Sep 7 '18 at 11:53





$begingroup$
I liked the use of set(), it sounds more logically to prevent generating the "forbidden" combinations.
$endgroup$
– E235
Sep 7 '18 at 11:56


set()





$begingroup$
If time is a constraint, you can mark your question with the performance tag. I didn't optimize for speed, but using my code you could probably make it quite a bit faster. The join could be a culprit, as could the perm_unique. If I was to make it faster, I'd use arrays of characters rather than strings, and manipulate everything that way, or I'd choose a different language (or both).
$endgroup$
– maxb
Sep 7 '18 at 11:57


performance


join


perm_unique





$begingroup$
@E235 I updated the answer, now it runs in less than a second.
$endgroup$
– maxb
Sep 7 '18 at 12:12





$begingroup$
Oops, I forgot to include a proper explanation for that, I updated my answer.
$endgroup$
– maxb
Sep 7 '18 at 13:06



The calculation in the post of the number of iterations is not correct. Since the returned list has 1,436,400 elements, there must have been exactly that many iterations that reached the combos.append line. Here's one mistake I spotted:


combos.append


for i in equal_letter_indxs2 => 2 times



The list equal_letter_indxs2 has $4choose2=6$ elements, not 2.


equal_letter_indxs2



Python provides the module timeit for timing function calls and short pieces of code:


timeit


>>> from timeit import timeit
>>> timeit(generate_1_digit_2_equal_letters_2_different_letters, number=1)
3.0850040439981967



Since the lists you are building are short, it makes sense to construct them by inserting elements. (For long lists this could lead to poor performance, but 5 elements is too short for this to matter.) This approach avoids the need to maintain a set of places that gets adjusted as you go through the choices, and so simplifies the logic:


from itertools import combinations, permutations

LETTERS = 'bcdfghjklmnpqrstvwxz'
DIGITS = '2456789'

def aabc1(letters=LETTERS, digits=DIGITS):
"""Generate the distinct 5-character strings consisting of four
letters (exactly one of which is repeated) and one digit.

"""
for a, b, c in permutations(letters, 3): # Three letters (a repeated).
for i, j in combinations(range(4), 2): # Positions for letter a.
for d in digits: # One digit.
for k in range(5): # Positions for the digit.
result = [b, c]
result[i:i] = a,
result[j:j] = a,
result[k:k] = d,
yield ''.join(result)



Since the loops are now all independent of each other, we can turn them into a single loop using itertools.product:


itertools.product


from itertools import combinations, permutations, product

LETTERS = 'bcdfghjklmnpqrstvwxz'
DIGITS = '2456789'

def aabc1(letters=LETTERS, digits=DIGITS):
"""Generate the distinct 5-character strings consisting of four
letters (exactly one of which is repeated) and one digit.

"""
for (a, b, c), (i, j), d, k in product(
permutations(letters, 3), # Three letters (a repeated).
combinations(range(4), 2), # Positions for the repeated letter.
digits, # One digit.
range(5)): # Positions for the digit.
result = [b, c]
result[i:i] = a,
result[j:j] = a,
result[k:k] = d,
yield ''.join(result)



This is about 3 times as fast as the code in the post:


>>> timeit(lambda:list(aabc1()), number=1)
1.0679944080184214





$begingroup$
It is also very elegant, nice !
$endgroup$
– E235
Sep 7 '18 at 13:04





$begingroup$
I like that you just yield the results and don't actually generate a list, I think that is a big speed improvement.
$endgroup$
– Simon Forsberg
Sep 7 '18 at 15:10


yield





$begingroup$
@SimonForsberg: It depends on what happens to a string. If it's output asynchronously then that could be parallelized with generating the next string, saving some time. But if it's processed within Python, then generation merely reduces memory usage, not runtime.
$endgroup$
– Gareth Rees
Sep 7 '18 at 15:22




Thanks for contributing an answer to Code Review Stack Exchange!



But avoid



Use MathJax to format equations. MathJax reference.



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

How do I collapse sections of code in Visual Studio Code for Windows?

ャフサォクコ ケウ,コ,ワ メ,ロスョノ゙,クネ,フムカヤヲニ,エコ゚ツ ウイオン゙ケワサネォキモュキォウイノンコチ゚メヌナイゥフュ,カヒウネェ ネ,ホノケ,ムュキ ッボーミュハ,チ ツス ィ メウイマヤ,゙ウチ ヅ ロ,ォジヌェ ャヌット ェ,マャ,チナエヒネソキツテ トホヲヲミーァ