I would like to exclude bold paragraphs from the website

Multi tool use
up vote
1
down vote
favorite
I use the following code to scrape the website:
import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.ecb.europa.eu/press/pressconf/2018/html/ecb.is180913.en.html')
soup = BeautifulSoup(resp.content, 'html5lib')
article = soup.find('article')
paragraphs = article.find_all('p')
The output look likes:
[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p><strong>Has QE been used well by the various euro area countries?</strong></p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p><strong>My second question is on reinvestment. ...Have you today explicitly asked the committees to come up with proposals on reinvestments?</strong></p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]
I would like to exclude bold paragraph that contains
<p><strong>
and has more than 15 words. The desired output should be:
[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]
I tried to code but failed to obtain the desired output. I would really appreciate if you could help me.
python web-scraping beautifulsoup
New contributor
Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
up vote
1
down vote
favorite
I use the following code to scrape the website:
import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.ecb.europa.eu/press/pressconf/2018/html/ecb.is180913.en.html')
soup = BeautifulSoup(resp.content, 'html5lib')
article = soup.find('article')
paragraphs = article.find_all('p')
The output look likes:
[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p><strong>Has QE been used well by the various euro area countries?</strong></p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p><strong>My second question is on reinvestment. ...Have you today explicitly asked the committees to come up with proposals on reinvestments?</strong></p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]
I would like to exclude bold paragraph that contains
<p><strong>
and has more than 15 words. The desired output should be:
[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]
I tried to code but failed to obtain the desired output. I would really appreciate if you could help me.
python web-scraping beautifulsoup
New contributor
Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
– petezurich
2 days ago
1
My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
– Vinh Vo
2 days ago
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I use the following code to scrape the website:
import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.ecb.europa.eu/press/pressconf/2018/html/ecb.is180913.en.html')
soup = BeautifulSoup(resp.content, 'html5lib')
article = soup.find('article')
paragraphs = article.find_all('p')
The output look likes:
[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p><strong>Has QE been used well by the various euro area countries?</strong></p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p><strong>My second question is on reinvestment. ...Have you today explicitly asked the committees to come up with proposals on reinvestments?</strong></p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]
I would like to exclude bold paragraph that contains
<p><strong>
and has more than 15 words. The desired output should be:
[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]
I tried to code but failed to obtain the desired output. I would really appreciate if you could help me.
python web-scraping beautifulsoup
New contributor
Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I use the following code to scrape the website:
import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.ecb.europa.eu/press/pressconf/2018/html/ecb.is180913.en.html')
soup = BeautifulSoup(resp.content, 'html5lib')
article = soup.find('article')
paragraphs = article.find_all('p')
The output look likes:
[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p><strong>Has QE been used well by the various euro area countries?</strong></p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p><strong>My second question is on reinvestment. ...Have you today explicitly asked the committees to come up with proposals on reinvestments?</strong></p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]
I would like to exclude bold paragraph that contains
<p><strong>
and has more than 15 words. The desired output should be:
[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]
I tried to code but failed to obtain the desired output. I would really appreciate if you could help me.
python web-scraping beautifulsoup
python web-scraping beautifulsoup
New contributor
Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 2 days ago


petezurich
3,28881631
3,28881631
New contributor
Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 2 days ago


Vinh Vo
103
103
New contributor
Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
– petezurich
2 days ago
1
My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
– Vinh Vo
2 days ago
add a comment |
Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
– petezurich
2 days ago
1
My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
– Vinh Vo
2 days ago
Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
– petezurich
2 days ago
Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
– petezurich
2 days ago
1
1
My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
– Vinh Vo
2 days ago
My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
– Vinh Vo
2 days ago
add a comment |
2 Answers
2
active
oldest
votes
up vote
0
down vote
accepted
use str()
to convert bs4 object to string like <p><strong>......</strong></p>
....
paragraphs = article.find_all('p')
for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)
You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago
you're welcome.
– ewwink
2 days ago
add a comment |
up vote
0
down vote
Try the extract()
function:
article = soup.find('article')
paragraphs = article.find_all('p')
article.strong.extract()
paragraphs_without_bold = article.find_all('p')
See also this.
Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
use str()
to convert bs4 object to string like <p><strong>......</strong></p>
....
paragraphs = article.find_all('p')
for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)
You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago
you're welcome.
– ewwink
2 days ago
add a comment |
up vote
0
down vote
accepted
use str()
to convert bs4 object to string like <p><strong>......</strong></p>
....
paragraphs = article.find_all('p')
for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)
You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago
you're welcome.
– ewwink
2 days ago
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
use str()
to convert bs4 object to string like <p><strong>......</strong></p>
....
paragraphs = article.find_all('p')
for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)
use str()
to convert bs4 object to string like <p><strong>......</strong></p>
....
paragraphs = article.find_all('p')
for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)
answered 2 days ago
ewwink
5,35422231
5,35422231
You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago
you're welcome.
– ewwink
2 days ago
add a comment |
You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago
you're welcome.
– ewwink
2 days ago
You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago
You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago
you're welcome.
– ewwink
2 days ago
you're welcome.
– ewwink
2 days ago
add a comment |
up vote
0
down vote
Try the extract()
function:
article = soup.find('article')
paragraphs = article.find_all('p')
article.strong.extract()
paragraphs_without_bold = article.find_all('p')
See also this.
Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago
add a comment |
up vote
0
down vote
Try the extract()
function:
article = soup.find('article')
paragraphs = article.find_all('p')
article.strong.extract()
paragraphs_without_bold = article.find_all('p')
See also this.
Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago
add a comment |
up vote
0
down vote
up vote
0
down vote
Try the extract()
function:
article = soup.find('article')
paragraphs = article.find_all('p')
article.strong.extract()
paragraphs_without_bold = article.find_all('p')
See also this.
Try the extract()
function:
article = soup.find('article')
paragraphs = article.find_all('p')
article.strong.extract()
paragraphs_without_bold = article.find_all('p')
See also this.
answered 2 days ago


petezurich
3,28881631
3,28881631
Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago
add a comment |
Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago
Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago
Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago
add a comment |
Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.
Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.
Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.
Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53206114%2fi-would-like-to-exclude-bold-paragraphs-from-the-website%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
kZv,fIfa0ngtMqOvhKkdikh9 e7UFi6Dmcd0zU2 c7wBMp7o2 QD3pEMmQk KBb5nHz9o75WPn12Pk3,M7Apokti
Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
– petezurich
2 days ago
1
My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
– Vinh Vo
2 days ago