Strip whitespaces after and before of a specific list of punctuation symbols

Strip whitespaces after and before of a specific list of punctuation symbols



Although I found some references in StackOverflow, I am unable to write the correct regular expression to achieve my goal. I want to remove whitespaces before and after of specific punctuation symbols from a string in python.



I have a function as follows.


def modify_answers(answers):
hyp =
for ans in answers:
# remove whitespace before - / ? . ! ;
newhyp = re.sub(r's([-/?.!,;](?:s|$))', r'1', ans)
# remove whitespace after - / $ _
newhyp = re.sub(r'', r'1', newhyp)
hyp.append(newhyp)
return hyp



Some examples of what I want to achieve:



"Tax pin number is 1 - 866 - 704 - 7388 ." ---> "Tax pin number is 1-866-704-7388."



"No , emu is not protected in Victoria ." ---> "No, emu is not protected in Victoria."



"Find is to lose as construct is to _ _ _ _ _ _ ." ---> "Find is to lose as construct is to ______."



"$ 1,0 is equal to $ 1,0 ." ---> "$1,0 is equal to $1,0."



Any help would be appreciated.




3 Answers
3



First, define a function that performs replacement:


import re

def replace(x):
y, z = x.groups()
if z in '-/?.!,;':
y = y.lstrip()
if z in '-/$_':
y = y.rstrip()
return y



The function takes a match pattern and performs replacement accordingly.



Now, define your pattern. You can pre-compile for efficiency.


p = re.compile(r'(s*([-/?.,!$_])s*)')



Call the compiled regex sub on each string with the callback defined earlier:


sub


cases = [
"Tax pin number is 1 - 866 - 704 - 7388 .",
"No , emu is not protected in Victoria .",
"Find is to lose as construct is to _ _ _ _ _ _ .",
"$ 1,0 is equal to $ 1,0 ."]

repl = [p.sub(replace, c) for c in cases]




print (repl)
['Tax pin number is 1-866-704-7388.', 'No, emu is not protected in Victoria.',
'Find is to lose as construct is to ______.', '$1,0 is equal to $1,0.']






';' should be a part of the re.compile() statement. right?

– Wasi Ahmad
Sep 7 '18 at 21:31


re.compile()






@WasiAhmad yes, thanks for that.

– coldspeed
Sep 8 '18 at 0:03



You can do it like this:


import re

sentences = ["Tax pin number is 1 - 866 - 704 - 7388 .",
"No , emu is not protected in Victoria .",
"Find is to lose as construct is to _ _ _ _ _ _ .",
"$ 1,0 is equal to $ 1,0 ."]


def modify_answers(answers):
hyp =
for ans in answers:
# remove whitespace before - / ? . ! ;
new_hyp = re.sub(r's([/?.!;_-])(s|$)', r'1', ans)
new_hyp = re.sub(r's(,)(s|$)', r'1 ', new_hyp)
new_hyp = re.sub(r'(^|s)($)(s|$)', r' 2', new_hyp)
hyp.append(new_hyp.strip())
return hyp

for sentence in modify_answers(sentences):
print(sentence)



Output


Tax pin number is 1-866-704-7388.
No, emu is not protected in Victoria.
Find is to lose as construct is to______.
$1,0 is equal to $1,0.



Notes


/?.!;_-


-



,


,


$


$



Replace the pattern r' (?=[-/?.!])|(?<=[-/$_]) ' with empty string using re.sub


r' (?=[-/?.!])|(?<=[-/$_]) '


re.sub


>>> lst = ["Tax pin number is 1 - 866 - 704 - 7388 .",
... "No , emu is not protected in Victoria .",
... "Find is to lose as construct is to _ _ _ _ _ _ .",
... "$ 1,0 is equal to $ 1,0 ."]
>>>
>>> def modify_answers(answers):
... ptrn = re.compile(r' (?=[-/?.!])|(?<=[-/$_]) ')
... return [ptrn.sub('', answer) for answer in answers]
...
>>>
>>> pprint(modify_answers(lst))
['Tax pin number is 1-866-704-7388.',
'No , emu is not protected in Victoria.',
'Find is to lose as construct is to ______.',
'$1,0 is equal to $1,0.']



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

ャフサォクコ ケウ,コ,ワ メ,ロスョノ゙,クネ,フムカヤヲニ,エコ゚ツ ウイオン゙ケワサネォキモュキォウイノンコチ゚メヌナイゥフュ,カヒウネェ ネ,ホノケ,ムュキ ッボーミュハ,チ ツス ィ メウイマヤ,゙ウチ ヅ ロ,ォジヌェ ャヌット ェ,マャ,チナエヒネソキツテ トホヲヲミーァ

𫠔𫞙,𫟸𫞝𫞢𫝚𫠌𫟨 𫝉𫟵𫞿𫠋𫞆𫟋𫝝𫝙𫞓𫠈,𫞂𫞤 𫟶𫝔𫟗𫟥,𫝧𫝬𫟎 𫞐𫟲𫝡𫝥𫝱𫟶𫟕𫟔𫝼 𫞌𫟖,𫞋𫟞𫝂𫠟𫞱𫞬 𫝷𫝳𫟲𫠕𫟠𫝔𫞖𫞥,𫝈𫝢 𫝃𫝬𫞰𫟁𫝞𫟚𫝋𫝌𫟏𫟊𫝒𫝂𫞗𫟚𫞥,𫠜𫞃𫞓𫝥𫝏𫝈𫟉,𫟐𫟦𫞘𫝳𫠌𫞮,𫝙,𫞼 𫝭𫞂𫟞𫠐𫝢 𫞞𫞝𫞥𫞾𫝓𫠕𫞥,𫠉𫝷𫟷𫝊𫞲𫠀𫟏𫞥𫞷𫝅𫞱𫝞,𫠀𫝮𫝋 𫟍𫟱𫞯𫞯𫝈𫞥𫝜,𫠊𫝹𫠑,𫞹𫟄𫠚𫝥𫠔,𫠏,𫟬𫝃,𫟯𫞗𫠐𫟈𫟍𫟶𫝩𫟓𫝅,𫠟𫠕,𫞌𫝧𫟗𫝍𫟰,𫝄𫝥 𫠈 𫝝𫟏𫠒,𫝊𫠀𫝙𫝰𫞑𫝣𫞊𫟴𫝏