bs4 scraping python get contents until specific class name
bs4 scraping python get contents until specific class name
I want to scrape this site
https://www.eduvision.edu.pk/institutions-detail.php?city=51I&institute=5_allama-iqbal-open-university-islamabad
and i want only the bachelor data in this url which is under class name=academicsList and i don't want below MS(MASTERS) data.
I want my scraper to stop before ms data. my logic is that we can set temporary incrementor on class=academicsHead and it should stop when it gets second academicsHead
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
ua = UserAgent()
header = 'user-agent':ua.chrome
response = requests.get('https://www.eduvision.edu.pk/institutions-detail.php?city=51I&institute=5_allama-iqbal-open-university-islamabad',headers=header)
soup = BeautifulSoup(response.content, 'html.parser')
disciplines = soup.findAll("ul", "class": "academicsList")
#temp = soup.findAll("ul","class":"academicsHead")
#stop at second academicsHead
for d in disciplines:
print(d.findAll('li')[0].text)
1 Answer
1
We can check if the class is 'academicsHead' and if it is just check if the text is BACHELOR if not break the loop.
Something like this would work:
disciplines = soup.findAll('ul',attrs='class':re.compile(r'academics+(.)+'))
for i in disciplines:
if i['class'][0] == 'academicsHead':
if i.find('li').text.strip() != 'BACHELOR':
break
else:
print(i.find('li').text.strip())
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.