Scrap the web page using jsoup
I need to scrap the postcode from below html code by using the jsoup. I only need postcode which is part of href
attribute of a
tag, called W2:
<a href="/properties-for-sale/w2/chpk3848653" class="property_photo_holder" style="backgroundimage:url(https://assets.foxtons.co.uk/w/480/1523289105/chpk3848653-23.jpg)"></a>
This is html code:
</div>
<div id="property_1062067" class="property_summary">
<h6><a href="/properties-for-sale/w2/chpk3848653">Lancaster Gate, <span class="property_address_location_name">Bayswater,</span> W2</a></h6>
Can anyone help ?
Thank you.
java html parsing web-scraping jsoup
|
show 3 more comments
I need to scrap the postcode from below html code by using the jsoup. I only need postcode which is part of href
attribute of a
tag, called W2:
<a href="/properties-for-sale/w2/chpk3848653" class="property_photo_holder" style="backgroundimage:url(https://assets.foxtons.co.uk/w/480/1523289105/chpk3848653-23.jpg)"></a>
This is html code:
</div>
<div id="property_1062067" class="property_summary">
<h6><a href="/properties-for-sale/w2/chpk3848653">Lancaster Gate, <span class="property_address_location_name">Bayswater,</span> W2</a></h6>
Can anyone help ?
Thank you.
java html parsing web-scraping jsoup
1
What do you mean by "I only need postcode which is W2" ? Also, may you post something you tried?
– Subhasish Bhattacharjee
Nov 11 '18 at 9:18
I just tried to show what data exactly I want to scrap. Please see the below
– Hakan
Nov 11 '18 at 9:48
>Bayswater,</span> W2</a></h6>
– Hakan
Nov 11 '18 at 9:48
This is my code which I tried to scrap
– Hakan
Nov 11 '18 at 9:51
Elements postcodes = doc.select("span.property_address_location_name"); for (Element postcode : postcodes) System.out.println(postcode.text());
– Hakan
Nov 11 '18 at 9:51
|
show 3 more comments
I need to scrap the postcode from below html code by using the jsoup. I only need postcode which is part of href
attribute of a
tag, called W2:
<a href="/properties-for-sale/w2/chpk3848653" class="property_photo_holder" style="backgroundimage:url(https://assets.foxtons.co.uk/w/480/1523289105/chpk3848653-23.jpg)"></a>
This is html code:
</div>
<div id="property_1062067" class="property_summary">
<h6><a href="/properties-for-sale/w2/chpk3848653">Lancaster Gate, <span class="property_address_location_name">Bayswater,</span> W2</a></h6>
Can anyone help ?
Thank you.
java html parsing web-scraping jsoup
I need to scrap the postcode from below html code by using the jsoup. I only need postcode which is part of href
attribute of a
tag, called W2:
<a href="/properties-for-sale/w2/chpk3848653" class="property_photo_holder" style="backgroundimage:url(https://assets.foxtons.co.uk/w/480/1523289105/chpk3848653-23.jpg)"></a>
This is html code:
</div>
<div id="property_1062067" class="property_summary">
<h6><a href="/properties-for-sale/w2/chpk3848653">Lancaster Gate, <span class="property_address_location_name">Bayswater,</span> W2</a></h6>
Can anyone help ?
Thank you.
java html parsing web-scraping jsoup
java html parsing web-scraping jsoup
edited Nov 11 '18 at 11:25
Dinko Pehar
1,4163424
1,4163424
asked Nov 11 '18 at 8:46
HakanHakan
113
113
1
What do you mean by "I only need postcode which is W2" ? Also, may you post something you tried?
– Subhasish Bhattacharjee
Nov 11 '18 at 9:18
I just tried to show what data exactly I want to scrap. Please see the below
– Hakan
Nov 11 '18 at 9:48
>Bayswater,</span> W2</a></h6>
– Hakan
Nov 11 '18 at 9:48
This is my code which I tried to scrap
– Hakan
Nov 11 '18 at 9:51
Elements postcodes = doc.select("span.property_address_location_name"); for (Element postcode : postcodes) System.out.println(postcode.text());
– Hakan
Nov 11 '18 at 9:51
|
show 3 more comments
1
What do you mean by "I only need postcode which is W2" ? Also, may you post something you tried?
– Subhasish Bhattacharjee
Nov 11 '18 at 9:18
I just tried to show what data exactly I want to scrap. Please see the below
– Hakan
Nov 11 '18 at 9:48
>Bayswater,</span> W2</a></h6>
– Hakan
Nov 11 '18 at 9:48
This is my code which I tried to scrap
– Hakan
Nov 11 '18 at 9:51
Elements postcodes = doc.select("span.property_address_location_name"); for (Element postcode : postcodes) System.out.println(postcode.text());
– Hakan
Nov 11 '18 at 9:51
1
1
What do you mean by "I only need postcode which is W2" ? Also, may you post something you tried?
– Subhasish Bhattacharjee
Nov 11 '18 at 9:18
What do you mean by "I only need postcode which is W2" ? Also, may you post something you tried?
– Subhasish Bhattacharjee
Nov 11 '18 at 9:18
I just tried to show what data exactly I want to scrap. Please see the below
– Hakan
Nov 11 '18 at 9:48
I just tried to show what data exactly I want to scrap. Please see the below
– Hakan
Nov 11 '18 at 9:48
>Bayswater,</span> W2</a></h6>
– Hakan
Nov 11 '18 at 9:48
>Bayswater,</span> W2</a></h6>
– Hakan
Nov 11 '18 at 9:48
This is my code which I tried to scrap
– Hakan
Nov 11 '18 at 9:51
This is my code which I tried to scrap
– Hakan
Nov 11 '18 at 9:51
Elements postcodes = doc.select("span.property_address_location_name"); for (Element postcode : postcodes) System.out.println(postcode.text());
– Hakan
Nov 11 '18 at 9:51
Elements postcodes = doc.select("span.property_address_location_name"); for (Element postcode : postcodes) System.out.println(postcode.text());
– Hakan
Nov 11 '18 at 9:51
|
show 3 more comments
1 Answer
1
active
oldest
votes
You can use JSOUP for that, you just need to retrieve the href attribute value as it follows:
Document document = Jsoup.connect(URL).userAgent("Mozilla/5.0").get();
Elements elements = document.select("a");
String href = elements.attr("href");
Now that you have the href attribute as a String, you need to apply a RegEx (Regular Expression) to get the field you want, in this case, the Postal Code contained in: "/properties-for-sale/w2/chpk3848653". To do that you will need to:
String regex = "[a-zA-Z0-9]11";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(href);
String postalCode = matcher.find().group(0);
That's all, if you need anything else feel free to ask! Hope this helped you!
Something is wrong with this code. Thanks for anyway
– Hakan
Nov 13 '18 at 19:07
@Hakan no problem! ask me if you need anything else, that was just a sample code as guide! +1 if you found it useful!
– alvarobartt
Nov 14 '18 at 8:35
This is the code how I scraped all other attributes...etc.
– Hakan
Nov 15 '18 at 10:52
//Get the location of property Elements locations = items.get(i).getElementsByTag("h6"); //Get the post code of property Elements postcodes = items.get(i).getElementsByTag("h6.a[href]"); //Get the longitude Elements longitude = items.get(i).select("div");
– Hakan
Nov 15 '18 at 10:52
foxtons.co.uk/… This is the link to web scraping.
– Hakan
Nov 15 '18 at 10:53
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53247125%2fscrap-the-web-page-using-jsoup%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use JSOUP for that, you just need to retrieve the href attribute value as it follows:
Document document = Jsoup.connect(URL).userAgent("Mozilla/5.0").get();
Elements elements = document.select("a");
String href = elements.attr("href");
Now that you have the href attribute as a String, you need to apply a RegEx (Regular Expression) to get the field you want, in this case, the Postal Code contained in: "/properties-for-sale/w2/chpk3848653". To do that you will need to:
String regex = "[a-zA-Z0-9]11";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(href);
String postalCode = matcher.find().group(0);
That's all, if you need anything else feel free to ask! Hope this helped you!
Something is wrong with this code. Thanks for anyway
– Hakan
Nov 13 '18 at 19:07
@Hakan no problem! ask me if you need anything else, that was just a sample code as guide! +1 if you found it useful!
– alvarobartt
Nov 14 '18 at 8:35
This is the code how I scraped all other attributes...etc.
– Hakan
Nov 15 '18 at 10:52
//Get the location of property Elements locations = items.get(i).getElementsByTag("h6"); //Get the post code of property Elements postcodes = items.get(i).getElementsByTag("h6.a[href]"); //Get the longitude Elements longitude = items.get(i).select("div");
– Hakan
Nov 15 '18 at 10:52
foxtons.co.uk/… This is the link to web scraping.
– Hakan
Nov 15 '18 at 10:53
add a comment |
You can use JSOUP for that, you just need to retrieve the href attribute value as it follows:
Document document = Jsoup.connect(URL).userAgent("Mozilla/5.0").get();
Elements elements = document.select("a");
String href = elements.attr("href");
Now that you have the href attribute as a String, you need to apply a RegEx (Regular Expression) to get the field you want, in this case, the Postal Code contained in: "/properties-for-sale/w2/chpk3848653". To do that you will need to:
String regex = "[a-zA-Z0-9]11";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(href);
String postalCode = matcher.find().group(0);
That's all, if you need anything else feel free to ask! Hope this helped you!
Something is wrong with this code. Thanks for anyway
– Hakan
Nov 13 '18 at 19:07
@Hakan no problem! ask me if you need anything else, that was just a sample code as guide! +1 if you found it useful!
– alvarobartt
Nov 14 '18 at 8:35
This is the code how I scraped all other attributes...etc.
– Hakan
Nov 15 '18 at 10:52
//Get the location of property Elements locations = items.get(i).getElementsByTag("h6"); //Get the post code of property Elements postcodes = items.get(i).getElementsByTag("h6.a[href]"); //Get the longitude Elements longitude = items.get(i).select("div");
– Hakan
Nov 15 '18 at 10:52
foxtons.co.uk/… This is the link to web scraping.
– Hakan
Nov 15 '18 at 10:53
add a comment |
You can use JSOUP for that, you just need to retrieve the href attribute value as it follows:
Document document = Jsoup.connect(URL).userAgent("Mozilla/5.0").get();
Elements elements = document.select("a");
String href = elements.attr("href");
Now that you have the href attribute as a String, you need to apply a RegEx (Regular Expression) to get the field you want, in this case, the Postal Code contained in: "/properties-for-sale/w2/chpk3848653". To do that you will need to:
String regex = "[a-zA-Z0-9]11";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(href);
String postalCode = matcher.find().group(0);
That's all, if you need anything else feel free to ask! Hope this helped you!
You can use JSOUP for that, you just need to retrieve the href attribute value as it follows:
Document document = Jsoup.connect(URL).userAgent("Mozilla/5.0").get();
Elements elements = document.select("a");
String href = elements.attr("href");
Now that you have the href attribute as a String, you need to apply a RegEx (Regular Expression) to get the field you want, in this case, the Postal Code contained in: "/properties-for-sale/w2/chpk3848653". To do that you will need to:
String regex = "[a-zA-Z0-9]11";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(href);
String postalCode = matcher.find().group(0);
That's all, if you need anything else feel free to ask! Hope this helped you!
answered Nov 13 '18 at 13:06
alvarobarttalvarobartt
12418
12418
Something is wrong with this code. Thanks for anyway
– Hakan
Nov 13 '18 at 19:07
@Hakan no problem! ask me if you need anything else, that was just a sample code as guide! +1 if you found it useful!
– alvarobartt
Nov 14 '18 at 8:35
This is the code how I scraped all other attributes...etc.
– Hakan
Nov 15 '18 at 10:52
//Get the location of property Elements locations = items.get(i).getElementsByTag("h6"); //Get the post code of property Elements postcodes = items.get(i).getElementsByTag("h6.a[href]"); //Get the longitude Elements longitude = items.get(i).select("div");
– Hakan
Nov 15 '18 at 10:52
foxtons.co.uk/… This is the link to web scraping.
– Hakan
Nov 15 '18 at 10:53
add a comment |
Something is wrong with this code. Thanks for anyway
– Hakan
Nov 13 '18 at 19:07
@Hakan no problem! ask me if you need anything else, that was just a sample code as guide! +1 if you found it useful!
– alvarobartt
Nov 14 '18 at 8:35
This is the code how I scraped all other attributes...etc.
– Hakan
Nov 15 '18 at 10:52
//Get the location of property Elements locations = items.get(i).getElementsByTag("h6"); //Get the post code of property Elements postcodes = items.get(i).getElementsByTag("h6.a[href]"); //Get the longitude Elements longitude = items.get(i).select("div");
– Hakan
Nov 15 '18 at 10:52
foxtons.co.uk/… This is the link to web scraping.
– Hakan
Nov 15 '18 at 10:53
Something is wrong with this code. Thanks for anyway
– Hakan
Nov 13 '18 at 19:07
Something is wrong with this code. Thanks for anyway
– Hakan
Nov 13 '18 at 19:07
@Hakan no problem! ask me if you need anything else, that was just a sample code as guide! +1 if you found it useful!
– alvarobartt
Nov 14 '18 at 8:35
@Hakan no problem! ask me if you need anything else, that was just a sample code as guide! +1 if you found it useful!
– alvarobartt
Nov 14 '18 at 8:35
This is the code how I scraped all other attributes...etc.
– Hakan
Nov 15 '18 at 10:52
This is the code how I scraped all other attributes...etc.
– Hakan
Nov 15 '18 at 10:52
//Get the location of property Elements locations = items.get(i).getElementsByTag("h6"); //Get the post code of property Elements postcodes = items.get(i).getElementsByTag("h6.a[href]"); //Get the longitude Elements longitude = items.get(i).select("div");
– Hakan
Nov 15 '18 at 10:52
//Get the location of property Elements locations = items.get(i).getElementsByTag("h6"); //Get the post code of property Elements postcodes = items.get(i).getElementsByTag("h6.a[href]"); //Get the longitude Elements longitude = items.get(i).select("div");
– Hakan
Nov 15 '18 at 10:52
foxtons.co.uk/… This is the link to web scraping.
– Hakan
Nov 15 '18 at 10:53
foxtons.co.uk/… This is the link to web scraping.
– Hakan
Nov 15 '18 at 10:53
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53247125%2fscrap-the-web-page-using-jsoup%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
What do you mean by "I only need postcode which is W2" ? Also, may you post something you tried?
– Subhasish Bhattacharjee
Nov 11 '18 at 9:18
I just tried to show what data exactly I want to scrap. Please see the below
– Hakan
Nov 11 '18 at 9:48
>Bayswater,</span> W2</a></h6>
– Hakan
Nov 11 '18 at 9:48
This is my code which I tried to scrap
– Hakan
Nov 11 '18 at 9:51
Elements postcodes = doc.select("span.property_address_location_name"); for (Element postcode : postcodes) System.out.println(postcode.text());
– Hakan
Nov 11 '18 at 9:51