ignoring newline character in regex match

ignoring newline character in regex match



I am trying to replace all matching occurrences with title cases using the following script. When there is a newline character between filter words (in this case 'ABC' and 'DEF') that line doesn't get replaced as intended.



How can I ignore the newline character in this case?



Edit: I don't want to strip all newline characters entirely from the string, but only strip those between the filter words.



Edit2: I edited the text and script to better reflect on the issue I am experiencing. If I include flags=re.DOTALL argument, it will give me:


flags=re.DOTALL


mmm = "Hello Hello Hello Hello Hello Hello
Hello Hello Hello Hello",
Bbb = "Bbb",



whereas the output I want is (notice that bbb is not capitalized):


bbb


mmm = "Hello Hello Hello Hello Hello Hello
Hello Hello Hello Hello",
bbb = "bbb",



The following is the script I am using.


test_string = '''
mmm = "hello hello hello hello hello hello
hello hello hello hello",
bbb = "bbb",
'''

rex = r'(?<= mmm)(.*)(?=")'

def maketitle(match_obj):
return match_obj.group(0).title()

formatted = re.sub(rex, maketitle, test_string, flags=re.DOTALL)

print(formatted)






Please create a Minimal, Complete, and Verifiable example working on several examples.

– Yunnosch
Sep 6 '18 at 6:01






re.sub typically will replace everything if it can. So we need to see your code, to understand what might be going wrong.

– Tim Biegeleisen
Sep 6 '18 at 6:01


re.sub






Found out that the newline character is causing the problem, so updated the question accordingly

– Layray
Sep 6 '18 at 6:19







I thought if DOTALL is not set, then newlines are ignored by default?

– Martin Thoma
Sep 6 '18 at 6:25




2 Answers
2



The following code gives the result you expect:


test_string = '''
mmm = "hello hello hello hello hello hello
hello hello hello hello",
bbb = "bbb",
'''

rex = r'(?<= mmm)s*=s*"[^"]*'

def maketitle(match_obj):
return match_obj.group(0).title()

formatted = re.sub(rex, maketitle, test_string)

print(formatted)



I'm assuming that the value you want to "title-case" is always between double quotes, and that it can not contain a double-quote (escaped in some way). Handling escaping would be possible with a slightly more complex regex, though.



Use the re.DOTALL flag:


re.DOTALL


formatted = re.sub(rex, maketitle, string, flags=re.DOTALL)
print(formatted)



According to the docs:



re.DOTALL

Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.


re.DOTALL






thanks for your comment, but including the arg will capitalize following words as well. Please refer to the updated example.

– Layray
Sep 6 '18 at 6:41



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế

⃀⃉⃄⃅⃍,⃂₼₡₰⃉₡₿₢⃉₣⃄₯⃊₮₼₹₱₦₷⃄₪₼₶₳₫⃍₽ ₫₪₦⃆₠₥⃁₸₴₷⃊₹⃅⃈₰⃁₫ ⃎⃍₩₣₷ ₻₮⃊⃀⃄⃉₯,⃏⃊,₦⃅₪,₼⃀₾₧₷₾ ₻ ₸₡ ₾,₭⃈₴⃋,€⃁,₩ ₺⃌⃍⃁₱⃋⃋₨⃊⃁⃃₼,⃎,₱⃍₲₶₡ ⃍⃅₶₨₭,⃉₭₾₡₻⃀ ₼₹⃅₹,₻₭ ⃌