Python - Beautifulsoup | ValueError: Unsupported or invalid CSS selector: “<div”

I an trying to scrape an application off of the fareham.gov.uk web page and every time I try it returns an error instead of the reference number. Can somebody help me to fix this problem? I am new to web scraping and whenever I try to google a fix nothing worked.

Error:

Traceback (most recent call last): File "C:UsersDBaldwinDesktopsel.py", line 39, in <module> div = soup.select('<div Class="docGridRow"><div Class="detailsCells detailsFieldNames">Reference</div><div Class="detailsCells detailsValues">') File "C:UsersDBaldwinAnaconda3libsite-packagesbs4element.py", line 1477, in select 'Unsupported or invalid CSS selector: "%s"' % token) ValueError: Unsupported or invalid CSS selector: "<div"

Code:

import time import urllib.request from bs4 import BeautifulSoup from selenium import webdriver url = "http://www.fareham.gov.uk/casetrackerplanning/applicationsearch.aspx" driver = webdriver.Chrome(executable_path=r"C:UsersDBaldwinDesktopchromedriver.exe") driver.get(url) driver.find_element_by_id("lnkAllowCookies").click() def rerun(): driver.find_element_by_id("BodyPlaceHolder_uxLinkButtonShowAdvancedSearch").click() time.sleep(3) driver.find_element_by_id("uxStartDateDecisionTextBox").click() driver.find_element_by_id("uxStartDateDecisionTextBox").clear() driver.find_element_by_id("uxStartDateDecisionTextBox").send_keys("1/8/2018") driver.find_element_by_id("uxStopDateDecisionTextBox").click() driver.find_element_by_id("uxStopDateDecisionTextBox").clear() driver.find_element_by_id("uxStopDateDecisionTextBox").send_keys("308/2018") driver.find_element_by_id("BodyPlaceHolder_uxButtonSearch").click() time.sleep(3) rerun() elements = driver.find_elements_by_class_name("searchResultsCell") for e in elements: e.click() newUrl = driver.current_url go = urllib.request.urlopen(newUrl) soup = BeautifulSoup(go.read(), "html.parser") div = soup.select('<div Class="docGridRow"><div Class="detailsCells detailsFieldNames">Reference</div><div Class="detailsCells detailsValues">') test = div[0].get_text() print(test) driver.back() rerun() print("Worked???")

Have you looked at the bs4 documentation? select is used with css selectors synthax, not html.

– Chillie
Sep 14 '18 at 9:12

select

then how can I fix it

– Feitan Portor
Sep 14 '18 at 9:13

I can't really understand what you are trying to do here. What do you want to select?

– Daniel Roseman
Sep 14 '18 at 9:13

div = soup.select('div.docGridRow)?

– Andersson
Sep 14 '18 at 9:14

div = soup.select('div.docGridRow)

Daniel Roseman I am trying to scrape the reference number on one of the applications on that website, if you go to the link in the variable "url" it will ask for a date. Type in 1/8/2018 on start and 30/8/2018 on end, click the first application and you will see a reference id and other things, I am trying to make it scrape that

– Feitan Portor
Sep 14 '18 at 9:15

1 Answer
1

Try to use below code to get required values

elements = driver.find_elements_by_css_selector(".searchResultsCell a") links = [link.get_attribute('href') for link in elements] for link in links: driver.get(link) print(driver.find_element_by_css_selector('div.docGridRow').text)

Thanks for contributing an answer to Stack Overflow!

But avoid …

To learn more, see our tips on writing great answers.

Required, but never shown

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt