Strip whitespaces after and before of a specific list of punctuation symbols

Strip whitespaces after and before of a specific list of punctuation symbols



Although I found some references in StackOverflow, I am unable to write the correct regular expression to achieve my goal. I want to remove whitespaces before and after of specific punctuation symbols from a string in python.



I have a function as follows.


def modify_answers(answers):
hyp =
for ans in answers:
# remove whitespace before - / ? . ! ;
newhyp = re.sub(r's([-/?.!,;](?:s|$))', r'1', ans)
# remove whitespace after - / $ _
newhyp = re.sub(r'', r'1', newhyp)
hyp.append(newhyp)
return hyp



Some examples of what I want to achieve:



"Tax pin number is 1 - 866 - 704 - 7388 ." ---> "Tax pin number is 1-866-704-7388."



"No , emu is not protected in Victoria ." ---> "No, emu is not protected in Victoria."



"Find is to lose as construct is to _ _ _ _ _ _ ." ---> "Find is to lose as construct is to ______."



"$ 1,0 is equal to $ 1,0 ." ---> "$1,0 is equal to $1,0."



Any help would be appreciated.




3 Answers
3



First, define a function that performs replacement:


import re

def replace(x):
y, z = x.groups()
if z in '-/?.!,;':
y = y.lstrip()
if z in '-/$_':
y = y.rstrip()
return y



The function takes a match pattern and performs replacement accordingly.



Now, define your pattern. You can pre-compile for efficiency.


p = re.compile(r'(s*([-/?.,!$_])s*)')



Call the compiled regex sub on each string with the callback defined earlier:


sub


cases = [
"Tax pin number is 1 - 866 - 704 - 7388 .",
"No , emu is not protected in Victoria .",
"Find is to lose as construct is to _ _ _ _ _ _ .",
"$ 1,0 is equal to $ 1,0 ."]

repl = [p.sub(replace, c) for c in cases]




print (repl)
['Tax pin number is 1-866-704-7388.', 'No, emu is not protected in Victoria.',
'Find is to lose as construct is to ______.', '$1,0 is equal to $1,0.']






';' should be a part of the re.compile() statement. right?

– Wasi Ahmad
Sep 7 '18 at 21:31


re.compile()






@WasiAhmad yes, thanks for that.

– coldspeed
Sep 8 '18 at 0:03



You can do it like this:


import re

sentences = ["Tax pin number is 1 - 866 - 704 - 7388 .",
"No , emu is not protected in Victoria .",
"Find is to lose as construct is to _ _ _ _ _ _ .",
"$ 1,0 is equal to $ 1,0 ."]


def modify_answers(answers):
hyp =
for ans in answers:
# remove whitespace before - / ? . ! ;
new_hyp = re.sub(r's([/?.!;_-])(s|$)', r'1', ans)
new_hyp = re.sub(r's(,)(s|$)', r'1 ', new_hyp)
new_hyp = re.sub(r'(^|s)($)(s|$)', r' 2', new_hyp)
hyp.append(new_hyp.strip())
return hyp

for sentence in modify_answers(sentences):
print(sentence)



Output


Tax pin number is 1-866-704-7388.
No, emu is not protected in Victoria.
Find is to lose as construct is to______.
$1,0 is equal to $1,0.



Notes


/?.!;_-


-



,


,


$


$



Replace the pattern r' (?=[-/?.!])|(?<=[-/$_]) ' with empty string using re.sub


r' (?=[-/?.!])|(?<=[-/$_]) '


re.sub


>>> lst = ["Tax pin number is 1 - 866 - 704 - 7388 .",
... "No , emu is not protected in Victoria .",
... "Find is to lose as construct is to _ _ _ _ _ _ .",
... "$ 1,0 is equal to $ 1,0 ."]
>>>
>>> def modify_answers(answers):
... ptrn = re.compile(r' (?=[-/?.!])|(?<=[-/$_]) ')
... return [ptrn.sub('', answer) for answer in answers]
...
>>>
>>> pprint(modify_answers(lst))
['Tax pin number is 1-866-704-7388.',
'No , emu is not protected in Victoria.',
'Find is to lose as construct is to ______.',
'$1,0 is equal to $1,0.']



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Crossroads (UK TV series)

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế