I would like to exclude bold paragraphs from the website
up vote
1
down vote
favorite
I use the following code to scrape the website:
import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.ecb.europa.eu/press/pressconf/2018/html/ecb.is180913.en.html')
soup = BeautifulSoup(resp.content, 'html5lib')
article = soup.find('article')
paragraphs = article.find_all('p')
The output look likes:
[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p><strong>Has QE been used well by the various euro area countries?</strong></p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p><strong>My second question is on reinvestment. ...Have you today explicitly asked the committees to come up with proposals on reinvestments?</strong></p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]
I would like to exclude bold paragraph that contains
<p><strong>
and has more than 15 words. The desired output should be:
[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]
I tried to code but failed to obtain the desired output. I would really appreciate if you could help me.
python web-scraping beautifulsoup
New contributor
add a comment |
up vote
1
down vote
favorite
I use the following code to scrape the website:
import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.ecb.europa.eu/press/pressconf/2018/html/ecb.is180913.en.html')
soup = BeautifulSoup(resp.content, 'html5lib')
article = soup.find('article')
paragraphs = article.find_all('p')
The output look likes:
[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p><strong>Has QE been used well by the various euro area countries?</strong></p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p><strong>My second question is on reinvestment. ...Have you today explicitly asked the committees to come up with proposals on reinvestments?</strong></p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]
I would like to exclude bold paragraph that contains
<p><strong>
and has more than 15 words. The desired output should be:
[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]
I tried to code but failed to obtain the desired output. I would really appreciate if you could help me.
python web-scraping beautifulsoup
New contributor
Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
– petezurich
2 days ago
1
My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
– Vinh Vo
2 days ago
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I use the following code to scrape the website:
import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.ecb.europa.eu/press/pressconf/2018/html/ecb.is180913.en.html')
soup = BeautifulSoup(resp.content, 'html5lib')
article = soup.find('article')
paragraphs = article.find_all('p')
The output look likes:
[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p><strong>Has QE been used well by the various euro area countries?</strong></p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p><strong>My second question is on reinvestment. ...Have you today explicitly asked the committees to come up with proposals on reinvestments?</strong></p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]
I would like to exclude bold paragraph that contains
<p><strong>
and has more than 15 words. The desired output should be:
[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]
I tried to code but failed to obtain the desired output. I would really appreciate if you could help me.
python web-scraping beautifulsoup
New contributor
I use the following code to scrape the website:
import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.ecb.europa.eu/press/pressconf/2018/html/ecb.is180913.en.html')
soup = BeautifulSoup(resp.content, 'html5lib')
article = soup.find('article')
paragraphs = article.find_all('p')
The output look likes:
[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p><strong>Has QE been used well by the various euro area countries?</strong></p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p><strong>My second question is on reinvestment. ...Have you today explicitly asked the committees to come up with proposals on reinvestments?</strong></p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]
I would like to exclude bold paragraph that contains
<p><strong>
and has more than 15 words. The desired output should be:
[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]
I tried to code but failed to obtain the desired output. I would really appreciate if you could help me.
python web-scraping beautifulsoup
python web-scraping beautifulsoup
New contributor
New contributor
edited 2 days ago
petezurich
3,28881631
3,28881631
New contributor
asked 2 days ago
Vinh Vo
103
103
New contributor
New contributor
Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
– petezurich
2 days ago
1
My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
– Vinh Vo
2 days ago
add a comment |
Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
– petezurich
2 days ago
1
My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
– Vinh Vo
2 days ago
Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
– petezurich
2 days ago
Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
– petezurich
2 days ago
1
1
My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
– Vinh Vo
2 days ago
My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
– Vinh Vo
2 days ago
add a comment |
2 Answers
2
active
oldest
votes
up vote
0
down vote
accepted
use str()
to convert bs4 object to string like <p><strong>......</strong></p>
....
paragraphs = article.find_all('p')
for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)
You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago
you're welcome.
– ewwink
2 days ago
add a comment |
up vote
0
down vote
Try the extract()
function:
article = soup.find('article')
paragraphs = article.find_all('p')
article.strong.extract()
paragraphs_without_bold = article.find_all('p')
See also this.
Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
use str()
to convert bs4 object to string like <p><strong>......</strong></p>
....
paragraphs = article.find_all('p')
for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)
You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago
you're welcome.
– ewwink
2 days ago
add a comment |
up vote
0
down vote
accepted
use str()
to convert bs4 object to string like <p><strong>......</strong></p>
....
paragraphs = article.find_all('p')
for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)
You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago
you're welcome.
– ewwink
2 days ago
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
use str()
to convert bs4 object to string like <p><strong>......</strong></p>
....
paragraphs = article.find_all('p')
for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)
use str()
to convert bs4 object to string like <p><strong>......</strong></p>
....
paragraphs = article.find_all('p')
for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)
answered 2 days ago
ewwink
5,35422231
5,35422231
You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago
you're welcome.
– ewwink
2 days ago
add a comment |
You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago
you're welcome.
– ewwink
2 days ago
You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago
You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago
you're welcome.
– ewwink
2 days ago
you're welcome.
– ewwink
2 days ago
add a comment |
up vote
0
down vote
Try the extract()
function:
article = soup.find('article')
paragraphs = article.find_all('p')
article.strong.extract()
paragraphs_without_bold = article.find_all('p')
See also this.
Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago
add a comment |
up vote
0
down vote
Try the extract()
function:
article = soup.find('article')
paragraphs = article.find_all('p')
article.strong.extract()
paragraphs_without_bold = article.find_all('p')
See also this.
Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago
add a comment |
up vote
0
down vote
up vote
0
down vote
Try the extract()
function:
article = soup.find('article')
paragraphs = article.find_all('p')
article.strong.extract()
paragraphs_without_bold = article.find_all('p')
See also this.
Try the extract()
function:
article = soup.find('article')
paragraphs = article.find_all('p')
article.strong.extract()
paragraphs_without_bold = article.find_all('p')
See also this.
answered 2 days ago
petezurich
3,28881631
3,28881631
Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago
add a comment |
Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago
Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago
Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago
add a comment |
Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.
Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.
Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.
Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53206114%2fi-would-like-to-exclude-bold-paragraphs-from-the-website%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
– petezurich
2 days ago
1
My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
– Vinh Vo
2 days ago