Filter list elements by symbols

Hi I want to filter elements with / without -, . into two lists. My code doesn't do the job, it seems run through everything twice and cannot replace both - and . at the same time. Where did I do wrong?

-

.

-

.

Code:

NO_SYMBOL = WITH_SYMBOL = SPACE = for i in DATA: for ch in ['-', '.']: if ch in i: WITH_SYMBOL.append(i) SPACE.append(i.replace(ch, ' ')) else: NO_SYMBOL.append(i)

Data

['volunteer-abroad', 'volunteer-abroad.com', 'volunteer-abroad.ie', 'volunteer-abroad.org', 'volunteerabroad']

My output:

SPACE ['volunteer abroad', 'volunteer abroad.com', 'volunteer abroad.ie', 'volunteer abroad.org', 'volunteer-abroad com', 'volunteer-abroad ie', 'volunteer-abroad org'] NO_SYMBOL ['volunteer-abroad', 'volunteerabroad', 'volunteerabroad']

I want to get output like these:

SPACE ['volunteer abroad', 'volunteer abroad com', 'volunteer abroad ie', 'volunteer abroad org'] NO_SYMBOL ['volunteerabroad']

Can you figure out why your list might be twice as long as you expected? Look at your loops. Well written question, by the way.
– timgeb
Aug 25 at 8:25

Well, when i is set to 'volunteer-abroad.com', you first take '-', create a new string with just - replaced with spaces, and add that to SPACE. You then take . and do the same. Perhaps you need to first take care of all replacements?
– Martijn Pieters♦
Aug 25 at 8:26

i

'volunteer-abroad.com'

'-'

-

SPACE

.

3 Answers
3

You are treating those two characters separately, by using a for loop:

for

for ch in ['-', '.']:

ch

'-'

i

-

SPACE

i

WITH_SYMBOL

NO_SYMBOL

ch

'.'

As a result, you always append to either SPACE and WITH_SYMBOL, or to NO_SYMBOL twice for each i.

SPACE

WITH_SYMBOL

NO_SYMBOL

i

You need to delay appending until you have processed all the characters in ch, and only when the loop is done decide where to append. You could use a flag variable for that:

ch

for i in data: altered = False cleaned = i for ch in ['-', '.']: if ch in clean: altered = True cleaned = cleaned.replace(ch, ' ') if altered: SPACE.append(cleaned) WITH_SYMBOL.append(i) else: NO_SYMBOL.append(i)

You can also just test for either character to be present, and use str.replace() for both. It's safe to do so, str.replace() doesn't care if the character you are replacing is not present at all:

str.replace()

if '-' in i or '.' in i: SPACE.append(i.replace('-', ' ').replace('.', ' ')) WITH_SYMBOL.append(i) else: NO_SYMBOL.append(i)

Rather than use two .replace() calls you can also use a translation table passed to str.translate(); this is faster, and much more flexible if you have a variable number of characters to replace. For the latter case, you can use the any() function to test for a sequence of characters:

.replace()

str.translate()

any()

symbols = ['-', '.'] # can be extended later translation_map = str.maketrans(dict.fromkeys(symbols, ' ')) # map any symbol to a space for entry in data: # entry is a nicer name here than i # the following loops over symbols until one is found that matches, then # returns True. If no matching symbol is found, False is given instead. if any(ch in entry for ch in symbols): SPACE.append(entry.translate(translation_map)) WITH_SYMBOL.append(entry) else: NO_SYMBOL.append(entry)

You dont need a loop to run twice on the same list instead you can modify the if condidition

NO_SYMBOL = WITH_SYMBOL = SPACE = for i in DATA: if '-' in i or '.' in i: WITH_SYMBOL.append(i) SPACE.append(i.replace('.', ' ').replace('-', ' ')) else: NO_SYMBOL.append(i)

I think the error in your code was found by other answers - but by using a simple RegEx you would get more performance and could easily change it when you have to add new symbols to match:

import re pattern = re.compile("[-.]") NO_SYMBOL = WITH_SYMBOL = SPACE = for item in data: if pattern.search(item): WITH_SYMBOL.append(item) SPACE.append(pattern.sub(" ", item)) else: NO_SYMBOL.append(item)

Online demo here

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt