RegEx - Searching for specific content in quotes
RegEx - Searching for specific content in quotes
I know RegEx is NOT the most ideal tool for searching within HTML. However, it's what I'm given to work with. Note: I'm not looking for something that will be robust across websites. For example, I'm just considering quotation marks, and I'm not worried about apostrophe characters.
Suppose I have the following text:
The quick brown "fox.jpg" jumps "google.com" over the "lazy.png" dog.
I'm wanting to search for specific Image links, matching "fox.jpg" and "lazy.png", ignoring "google.com". I could theoretically use a search pattern like
".*?"
that would find all quotes, from which I could simply parse each match to determine whether or not it's an image.
But something like
".*?(jpg|png)"
doesn't work because it returns "fox.jpg" (good) and "google.com" over the "lazy.png" (bad).
So: is there an extra "greedy" setting that I'm missing? Something to tell RegEx that the first quotation mark of the match should be the quotation mark closest to the last quotation mark?
.*
.*?
1 Answer
1
After the first "
, try repeating anything but a "
, via a negated character set, instead of .
, which will (undesirably) match a "
:
"
"
.
"
"[^"]*(jpg|png)"
https://regex101.com/r/PKZLp5/1
Doesn't matter whether the repetition is lazy or greedy now, though when the filename is longer than the file extension, greedy repetition will find a match slightly faster.
This is perfect! I was playing with [^"] but I was still using .*?, which I think broke it. Thank you so much!!
– Matthew
Aug 24 at 1:16
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
.*
is greedy,.*?
is not. Otherwise your only match would be "fox.jpg" jumps "google.com" over the "lazy.png". "google.com" over the "lazy.png" does match the least number of characters. A regex engine always returns the leftmost match, even if a "better" match could be found later: regular-expressions.info/engine.html– Lee Kowalkowski
Aug 24 at 1:16