Targetting URLs with parameters
Targetting URLs with parameters
I want to grab the URL with highest pg
value:
pg
$html ='
<a href="http://example.com/?pg=1"></a>
<a href="http://example.com/?pg=2"></a>
<a href="http://example.com/?pg=3"></a>
';
I use this regex to locate the appropriate links:
preg_match_all('/<a.*href="./?pg=(d+)".*>(?:.*)</a>/U', $html, $preg_matches);
Sometimes, the links include another parameter:
http://example.com/?pg=3&test=1
My question is, how do I adjust my regex so links with the added parameters are included as well?
@WiktorStribiżew No, this question is targeting URLs with multiple parameters.
– Henrik Petterson
Aug 27 at 14:12
.
matches a dot. You must match any chars other than "
with [^"]
.– Wiktor Stribiżew
Aug 27 at 14:12
.
"
[^"]
@WiktorStribiżew It does not include URLs with multiple parameters. Try adding
<a href="http://example.com/?pg=4&test=1">a</a>
to the $html
variable and you will see.– Henrik Petterson
Aug 27 at 14:13
<a href="http://example.com/?pg=4&test=1">a</a>
$html
@WiktorStribiżew Can you please post an answer to demonstrate this? Thanks.
– Henrik Petterson
Aug 27 at 14:15
2 Answers
2
Example:
$dom = new DOMDocument;
$dom->loadHTML($html);
$html ='
<a href="http://example.com/?pg=1"></a>
<a href="http://example.com/?pg=2"></a>
<a href="http://example.com/?pg=3"></a>
';
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $anchor)
$url = $anchor->getAttribute('href');
$query = parse_url($url, PHP_URL_QUERY);
parse_str($query, $output);
$pg = $output['pg'];
//do something
Here's a helpful tutorial for PHP. http://htmlparsing.com/php.html
Also see here, why you should not use Regex for parsing html https://stackoverflow.com/a/1732454/81785
Thank you for that example code! =)
– Henrik Petterson
Aug 27 at 14:32
$html ='
<a href="http://example.com/?pg=1"></a>
<a href="http://example.com/?pg=2"></a>
<a href="http://example.com/?pg=4&test=1"></a>
';
preg_match_all('/<a[^>]+href="(.*?)"[^>]*>(.*)?</a>/', $html, $out);
$result = null;
foreach ($out[1] as $link)
parse_str(parse_url($link, PHP_URL_QUERY), $atr);
$result[$link] = $atr['pg'];
print_r($result);
// "http://example.com/?pg=1" => "1"
// "http://example.com/?pg=2" => "2"
// "http://example.com/?pg=4&test=1" => "4"
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
You have already asked it here, isn't that the same question?
– Wiktor Stribiżew
Aug 27 at 14:11