Strip whitespaces after and before of a specific list of punctuation symbols
Strip whitespaces after and before of a specific list of punctuation symbols
Although I found some references in StackOverflow, I am unable to write the correct regular expression to achieve my goal. I want to remove whitespaces before and after of specific punctuation symbols from a string in python.
I have a function as follows.
def modify_answers(answers):
hyp =
for ans in answers:
# remove whitespace before - / ? . ! ;
newhyp = re.sub(r's([-/?.!,;](?:s|$))', r'1', ans)
# remove whitespace after - / $ _
newhyp = re.sub(r'', r'1', newhyp)
hyp.append(newhyp)
return hyp
Some examples of what I want to achieve:
"Tax pin number is 1 - 866 - 704 - 7388 ." ---> "Tax pin number is 1-866-704-7388."
"No , emu is not protected in Victoria ." ---> "No, emu is not protected in Victoria."
"Find is to lose as construct is to _ _ _ _ _ _ ." ---> "Find is to lose as construct is to ______."
"$ 1,0 is equal to $ 1,0 ." ---> "$1,0 is equal to $1,0."
Any help would be appreciated.
3 Answers
3
First, define a function that performs replacement:
import re
def replace(x):
y, z = x.groups()
if z in '-/?.!,;':
y = y.lstrip()
if z in '-/$_':
y = y.rstrip()
return y
The function takes a match pattern and performs replacement accordingly.
Now, define your pattern. You can pre-compile for efficiency.
p = re.compile(r'(s*([-/?.,!$_])s*)')
Call the compiled regex sub
on each string with the callback defined earlier:
sub
cases = [
"Tax pin number is 1 - 866 - 704 - 7388 .",
"No , emu is not protected in Victoria .",
"Find is to lose as construct is to _ _ _ _ _ _ .",
"$ 1,0 is equal to $ 1,0 ."]
repl = [p.sub(replace, c) for c in cases]
print (repl)
['Tax pin number is 1-866-704-7388.', 'No, emu is not protected in Victoria.',
'Find is to lose as construct is to ______.', '$1,0 is equal to $1,0.']
re.compile()
@WasiAhmad yes, thanks for that.
– coldspeed
Sep 8 '18 at 0:03
You can do it like this:
import re
sentences = ["Tax pin number is 1 - 866 - 704 - 7388 .",
"No , emu is not protected in Victoria .",
"Find is to lose as construct is to _ _ _ _ _ _ .",
"$ 1,0 is equal to $ 1,0 ."]
def modify_answers(answers):
hyp =
for ans in answers:
# remove whitespace before - / ? . ! ;
new_hyp = re.sub(r's([/?.!;_-])(s|$)', r'1', ans)
new_hyp = re.sub(r's(,)(s|$)', r'1 ', new_hyp)
new_hyp = re.sub(r'(^|s)($)(s|$)', r' 2', new_hyp)
hyp.append(new_hyp.strip())
return hyp
for sentence in modify_answers(sentences):
print(sentence)
Output
Tax pin number is 1-866-704-7388.
No, emu is not protected in Victoria.
Find is to lose as construct is to______.
$1,0 is equal to $1,0.
Notes
/?.!;_-
-
,
,
$
$
Replace the pattern r' (?=[-/?.!])|(?<=[-/$_]) '
with empty string using re.sub
r' (?=[-/?.!])|(?<=[-/$_]) '
re.sub
>>> lst = ["Tax pin number is 1 - 866 - 704 - 7388 .",
... "No , emu is not protected in Victoria .",
... "Find is to lose as construct is to _ _ _ _ _ _ .",
... "$ 1,0 is equal to $ 1,0 ."]
>>>
>>> def modify_answers(answers):
... ptrn = re.compile(r' (?=[-/?.!])|(?<=[-/$_]) ')
... return [ptrn.sub('', answer) for answer in answers]
...
>>>
>>> pprint(modify_answers(lst))
['Tax pin number is 1-866-704-7388.',
'No , emu is not protected in Victoria.',
'Find is to lose as construct is to ______.',
'$1,0 is equal to $1,0.']
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
';' should be a part of the
re.compile()
statement. right?– Wasi Ahmad
Sep 7 '18 at 21:31