Filter list elements by symbols

Filter list elements by symbols



Hi I want to filter elements with / without -, . into two lists. My code doesn't do the job, it seems run through everything twice and cannot replace both - and . at the same time. Where did I do wrong?


-


.


-


.



Code:


NO_SYMBOL =
WITH_SYMBOL =
SPACE =
for i in DATA:
for ch in ['-', '.']:
if ch in i:
WITH_SYMBOL.append(i)
SPACE.append(i.replace(ch, ' '))
else:
NO_SYMBOL.append(i)



Data


['volunteer-abroad',
'volunteer-abroad.com',
'volunteer-abroad.ie',
'volunteer-abroad.org',
'volunteerabroad']



My output:


SPACE
['volunteer abroad',
'volunteer abroad.com',
'volunteer abroad.ie',
'volunteer abroad.org',
'volunteer-abroad com',
'volunteer-abroad ie',
'volunteer-abroad org']

NO_SYMBOL
['volunteer-abroad', 'volunteerabroad', 'volunteerabroad']



I want to get output like these:


SPACE
['volunteer abroad',
'volunteer abroad com',
'volunteer abroad ie',
'volunteer abroad org']

NO_SYMBOL
['volunteerabroad']





Can you figure out why your list might be twice as long as you expected? Look at your loops. Well written question, by the way.
– timgeb
Aug 25 at 8:25






Well, when i is set to 'volunteer-abroad.com', you first take '-', create a new string with just - replaced with spaces, and add that to SPACE. You then take . and do the same. Perhaps you need to first take care of all replacements?
– Martijn Pieters
Aug 25 at 8:26


i


'volunteer-abroad.com'


'-'


-


SPACE


.




3 Answers
3



You are treating those two characters separately, by using a for loop:


for


for ch in ['-', '.']:


ch


'-'


i


-


SPACE


i


WITH_SYMBOL


NO_SYMBOL


ch


'.'



As a result, you always append to either SPACE and WITH_SYMBOL, or to NO_SYMBOL twice for each i.


SPACE


WITH_SYMBOL


NO_SYMBOL


i



You need to delay appending until you have processed all the characters in ch, and only when the loop is done decide where to append. You could use a flag variable for that:


ch


for i in data:
altered = False
cleaned = i
for ch in ['-', '.']:
if ch in clean:
altered = True
cleaned = cleaned.replace(ch, ' ')
if altered:
SPACE.append(cleaned)
WITH_SYMBOL.append(i)
else:
NO_SYMBOL.append(i)



You can also just test for either character to be present, and use str.replace() for both. It's safe to do so, str.replace() doesn't care if the character you are replacing is not present at all:


str.replace()


str.replace()


if '-' in i or '.' in i:
SPACE.append(i.replace('-', ' ').replace('.', ' '))
WITH_SYMBOL.append(i)
else:
NO_SYMBOL.append(i)



Rather than use two .replace() calls you can also use a translation table passed to str.translate(); this is faster, and much more flexible if you have a variable number of characters to replace. For the latter case, you can use the any() function to test for a sequence of characters:


.replace()


str.translate()


any()


symbols = ['-', '.'] # can be extended later
translation_map = str.maketrans(dict.fromkeys(symbols, ' ')) # map any symbol to a space
for entry in data: # entry is a nicer name here than i
# the following loops over symbols until one is found that matches, then
# returns True. If no matching symbol is found, False is given instead.
if any(ch in entry for ch in symbols):
SPACE.append(entry.translate(translation_map))
WITH_SYMBOL.append(entry)
else:
NO_SYMBOL.append(entry)



You dont need a loop to run twice on the same list instead you can modify the if condidition


NO_SYMBOL =
WITH_SYMBOL =
SPACE =
for i in DATA:
if '-' in i or '.' in i:
WITH_SYMBOL.append(i)
SPACE.append(i.replace('.', ' ').replace('-', ' '))
else:
NO_SYMBOL.append(i)



I think the error in your code was found by other answers - but by using a simple RegEx you would get more performance and could easily change it when you have to add new symbols to match:


import re
pattern = re.compile("[-.]")
NO_SYMBOL =
WITH_SYMBOL =
SPACE =

for item in data:
if pattern.search(item):
WITH_SYMBOL.append(item)
SPACE.append(pattern.sub(" ", item))
else:
NO_SYMBOL.append(item)



Online demo here






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)