is there a way to put a blank entry in place of a node value for the particular classes that don't contain that node with Scrapy and Python
is there a way to put a blank entry in place of a node value for the particular classes that don't contain that node with Scrapy and Python
I'm using python and scrapy to pull information from a database of companies online. Each company's information is completely contained in a parent node but not every company has a child node containing its website, some only have the company name. This means when i pull the data with xpath i'm getting 20 company names but only 18 web addresses (per page) which means when i zip up the lists and export i'm only getting the first 18 companies and the websites don't match. is there a way to insert a blank entry into the website list for the companies that don't have the website information node as one of the child nodes.
Thank you
<div class="company">
<p class="website">
www.company.co.uk</p>
...
</div>
<div class="company">
...
</div>
from the above, when i run
xpath('//div[@class="company"]/p/text()')
ideally i'd get ['www.company.co.uk','']
with a blank entry for the second company node since they don't have a p node for the website. when i run xpath i'm getting a longer list of company names than websites so the lists don't match when i zip them together
['www.company.co.uk','']
1 Answer
1
Pls, attach some code so people could get better understanding of you problem...
Overall, you should follow next pattern:
companies = response.xpath('//...some xpath here')
for company in companies:
item =
item['title'] = company.xpath('./...some title xpath here relative to company node').extract_first()
item['website'] = company.xpath('./...some website xpath').extract_first()
yield item
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Code snippet please?
– Michael Savchenko
Aug 23 at 15:05