ignoring newline character in regex match
ignoring newline character in regex match
I am trying to replace all matching occurrences with title cases using the following script. When there is a newline character between filter words (in this case 'ABC' and 'DEF') that line doesn't get replaced as intended.
How can I ignore the newline character in this case?
Edit: I don't want to strip all newline characters entirely from the string, but only strip those between the filter words.
Edit2: I edited the text and script to better reflect on the issue I am experiencing. If I include flags=re.DOTALL
argument, it will give me:
flags=re.DOTALL
mmm = "Hello Hello Hello Hello Hello Hello
Hello Hello Hello Hello",
Bbb = "Bbb",
whereas the output I want is (notice that bbb
is not capitalized):
bbb
mmm = "Hello Hello Hello Hello Hello Hello
Hello Hello Hello Hello",
bbb = "bbb",
The following is the script I am using.
test_string = '''
mmm = "hello hello hello hello hello hello
hello hello hello hello",
bbb = "bbb",
'''
rex = r'(?<= mmm)(.*)(?=")'
def maketitle(match_obj):
return match_obj.group(0).title()
formatted = re.sub(rex, maketitle, test_string, flags=re.DOTALL)
print(formatted)
re.sub
typically will replace everything if it can. So we need to see your code, to understand what might be going wrong.– Tim Biegeleisen
Sep 6 '18 at 6:01
re.sub
Found out that the newline character is causing the problem, so updated the question accordingly
– Layray
Sep 6 '18 at 6:19
I thought if DOTALL is not set, then newlines are ignored by default?
– Martin Thoma
Sep 6 '18 at 6:25
2 Answers
2
The following code gives the result you expect:
test_string = '''
mmm = "hello hello hello hello hello hello
hello hello hello hello",
bbb = "bbb",
'''
rex = r'(?<= mmm)s*=s*"[^"]*'
def maketitle(match_obj):
return match_obj.group(0).title()
formatted = re.sub(rex, maketitle, test_string)
print(formatted)
I'm assuming that the value you want to "title-case" is always between double quotes, and that it can not contain a double-quote (escaped in some way). Handling escaping would be possible with a slightly more complex regex, though.
Use the re.DOTALL
flag:
re.DOTALL
formatted = re.sub(rex, maketitle, string, flags=re.DOTALL)
print(formatted)
According to the docs:
re.DOTALL
Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.
re.DOTALL
thanks for your comment, but including the arg will capitalize following words as well. Please refer to the updated example.
– Layray
Sep 6 '18 at 6:41
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Please create a Minimal, Complete, and Verifiable example working on several examples.
– Yunnosch
Sep 6 '18 at 6:01