How to scrape a specific table form website using python (beautifulsoup4 and requests or any other library)?

How to scrape a specific table form website using python (beautifulsoup4 and requests or any other library)?



https://en.wikipedia.org/wiki/Economy_of_the_European_Union



Above is the link to website and I want to scrape table: Fortune top 10 E.U. corporations by revenue (2016).


Fortune top 10 E.U. corporations by revenue (2016)



Please, share the code for the same:


import requests
from bs4 import BeautifulSoup

def web_crawler(url):

page = requests.get(url)
plain_text = page.text
soup = BeautifulSoup(plain_text,"html.parser")
tables = soup.findAll("tbody")[1]
print(tables)

soup = web_crawler("https://en.wikipedia.org/wiki/Economy_of_the_European_Union")






You need to read stackoverflow.com/help/how-to-ask before asking a question. Were here to help, not teach. Please add your code from what you've already attempted and the issue that is coming up from your code. I would be happy to help you at that point.

– FanMan
Sep 10 '18 at 14:22







@FanMan sorry for the trouble for not writing the code actually I am new to stackflow.... anyways I didn't catch what you wanna say by your answer... basically I am looking to fetch the table and its content....also the link i have provided that of wikipedia have several tables and I only want to fetch a particular with the title "Fortune top 10 E.U. corporations by revenue (2016)"....

– Kali
Sep 10 '18 at 17:13






@FanMan further more I am also interested to ask that in the for loop in your answer I found that you took the text variable and within the for loop you used text.findAll method and I dont know why but in my pycharm this doesn't work that is I can call findAll on soup(which is variable of BeautifulSoup) but not on text (which is further variable of soup)

– Kali
Sep 10 '18 at 17:17






I have added my answer. The answer you were referring to was not mine.

– FanMan
Sep 10 '18 at 18:39




2 Answers
2



following what @FanMan said , this is simple code to help you get started, keep in mind that you will need to clean it and also perform the rest of the work on your own.


import requests
from bs4 import BeautifulSoup
url='https://en.wikipedia.org/wiki/Economy_of_the_European_Union'
r=requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
temp_datastore=list()
for text in soup.findAll('p'):
w=text.findAll(text=True)
if(len(w)>0):
temp_datastore.append(w)



Some documentation



beautiful soup:https://www.crummy.com/software/BeautifulSoup/bs4/doc/



requests: http://docs.python-requests.org/en/master/user/intro/



urllib: https://docs.python.org/2/library/urllib.html



You're first issue is that your url is not properly defined. After that you need to find the table to extract and it's class. In this case the class was "wikitable" and it was a the first table. I have started your code for you so it gives you the extracted data from the table. Web-scraping is good to learn but if your are just starting to program, practice with some simpler stuff first.


import requests
from bs4 import BeautifulSoup

def webcrawler():

url = "https://en.wikipedia.org/wiki/Economy_of_the_European_Union"
page = requests.get(url)
soup = BeautifulSoup(page.text,"html.parser")
tables = soup.findAll("table", class_='wikitable')[0]
print(tables)

webcrawler()



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Crossroads (UK TV series)

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế