Python - Beautifulsoup | ValueError: Unsupported or invalid CSS selector: “<div”

Python - Beautifulsoup | ValueError: Unsupported or invalid CSS selector: “<div”



I an trying to scrape an application off of the fareham.gov.uk web page and every time I try it returns an error instead of the reference number. Can somebody help me to fix this problem? I am new to web scraping and whenever I try to google a fix nothing worked.



Error:


Traceback (most recent call last):
File "C:UsersDBaldwinDesktopsel.py", line 39, in <module>
div = soup.select('<div Class="docGridRow"><div Class="detailsCells detailsFieldNames">Reference</div><div Class="detailsCells detailsValues">')
File "C:UsersDBaldwinAnaconda3libsite-packagesbs4element.py", line 1477, in select
'Unsupported or invalid CSS selector: "%s"' % token)
ValueError: Unsupported or invalid CSS selector: "<div"



Code:


import time
import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver

url = "http://www.fareham.gov.uk/casetrackerplanning/applicationsearch.aspx"

driver = webdriver.Chrome(executable_path=r"C:UsersDBaldwinDesktopchromedriver.exe")
driver.get(url)

driver.find_element_by_id("lnkAllowCookies").click()

def rerun():
driver.find_element_by_id("BodyPlaceHolder_uxLinkButtonShowAdvancedSearch").click()

time.sleep(3)

driver.find_element_by_id("uxStartDateDecisionTextBox").click()
driver.find_element_by_id("uxStartDateDecisionTextBox").clear()
driver.find_element_by_id("uxStartDateDecisionTextBox").send_keys("1/8/2018")

driver.find_element_by_id("uxStopDateDecisionTextBox").click()
driver.find_element_by_id("uxStopDateDecisionTextBox").clear()
driver.find_element_by_id("uxStopDateDecisionTextBox").send_keys("308/2018")

driver.find_element_by_id("BodyPlaceHolder_uxButtonSearch").click()

time.sleep(3)

rerun()

elements = driver.find_elements_by_class_name("searchResultsCell")

for e in elements:
e.click()
newUrl = driver.current_url
go = urllib.request.urlopen(newUrl)
soup = BeautifulSoup(go.read(), "html.parser")
div = soup.select('<div Class="docGridRow"><div Class="detailsCells detailsFieldNames">Reference</div><div Class="detailsCells detailsValues">')
test = div[0].get_text()
print(test)
driver.back()
rerun()
print("Worked???")






Have you looked at the bs4 documentation? select is used with css selectors synthax, not html.

– Chillie
Sep 14 '18 at 9:12



select






then how can I fix it

– Feitan Portor
Sep 14 '18 at 9:13






I can't really understand what you are trying to do here. What do you want to select?

– Daniel Roseman
Sep 14 '18 at 9:13






div = soup.select('div.docGridRow)?

– Andersson
Sep 14 '18 at 9:14


div = soup.select('div.docGridRow)






Daniel Roseman I am trying to scrape the reference number on one of the applications on that website, if you go to the link in the variable "url" it will ask for a date. Type in 1/8/2018 on start and 30/8/2018 on end, click the first application and you will see a reference id and other things, I am trying to make it scrape that

– Feitan Portor
Sep 14 '18 at 9:15





1 Answer
1



Try to use below code to get required values


elements = driver.find_elements_by_css_selector(".searchResultsCell a")
links = [link.get_attribute('href') for link in elements]

for link in links:
driver.get(link)
print(driver.find_element_by_css_selector('div.docGridRow').text)



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)