Python web scraping query

Python web scraping query



I have written my first ever python code to scrape a dividend history table from the web but soup.select statement doesn't seem to select anything and gives rise to an index error.



Any advice on how to resolve please?


from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd

driver = webdriver.Chrome(executable_path='F:PythonAppsChromeDriverChromeDriver.exe')
driver.get("https://www.dividendchannel.com/history/?symbol=ibm")
soup = BeautifulSoup(driver.page_source,"lxml")
driver.quit()
table = soup.select("table#Dividend History")[0]
print(table)
list_row =[[tab_d.text.strip().replace("n","") for tab_d in
item.select('th,td')] for item in table.select('tr')]

for data in list_row[:2]:
print(' '.join(data))



File "F:/System/Python/dividend.py", line 9, in
table = soup.select("table#Dividend History")[0]



IndexError: list index out of range





if you mean you need to parse the whole table with each date and Division column then you need to select each row separately and extract text out of it.
– Vishal Singh
Aug 22 at 17:22





that means your specified search is not found based on what you selecting. What you might want to do is change your select tag to: #divvytable > table
– hpca01
Aug 22 at 19:28






This kind of error will pop up when the selection doesn't have any data in it. It looks like "table#Dividend History" is not a valid CSS selector for that page. The table you want is nested under "div#divvytable". Try starting there.
– dustintheglass
Aug 22 at 20:30


"table#Dividend History"


"div#divvytable"




2 Answers
2



this is not a direct answer, but a recommendation. Depending on what you need it for, the website you have referenced has a limited usage based on IP, only can be accessed 6 times.
Take a look at the dividend api which is FREE(not advertising)->
IEX API



If you choose to use it, it might make your application that much more efficient. It is much easier playing with JSON data then converting to dataframe(PANDAS) or post to a front end via JavaScript.



here is a sample call for apply for last 5 years->



https://api.iextrading.com/1.0/stock/aapl/dividends/5y



You would use requests.get(url, params).json() and traverse it through a simple for loop.



It seems that layout of this page is based on tables, lots of tables. Your code is trying to find table with id of "Dividend", which does not exist.



Here is your code after some tweaks. It finds the rows with data, and then extracts data from the rows:


from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd

driver = webdriver.Chrome()
driver.get("https://www.dividendchannel.com/history/?symbol=ibm")

soup = BeautifulSoup(driver.page_source,"lxml")
driver.quit()

dividend_rows = soup.select("div#divvytable")[0].find_all("tr")

for row in dividend_rows:
columns = list(row.stripped_strings)
if len(columns) != 2:
continue
print("date: amount: ".format(columns[0], columns[1]))





Cheers. I really need to spend some time working out html structure! One further question - why was the google bot option necessary please?
– KitsuneMakai
Aug 22 at 21:21





@KitsuneMakai you are right, this part is not needed. I have removed it.
– user44
Aug 23 at 4:06





I also need to extra stock splits from the sister site but it doesnt seem to have any id tags in the relevent table
– KitsuneMakai
Aug 24 at 21:14






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

How do I collapse sections of code in Visual Studio Code for Windows?

ャフサォクコ ケウ,コ,ワ メ,ロスョノ゙,クネ,フムカヤヲニ,エコ゚ツ ウイオン゙ケワサネォキモュキォウイノンコチ゚メヌナイゥフュ,カヒウネェ ネ,ホノケ,ムュキ ッボーミュハ,チ ツス ィ メウイマヤ,゙ウチ ヅ ロ,ォジヌェ ャヌット ェ,マャ,チナエヒネソキツテ トホヲヲミーァ