I would like to exclude bold paragraphs from the website

Multi tool use
Multi tool use








up vote
1
down vote

favorite












I use the following code to scrape the website:




import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.ecb.europa.eu/press/pressconf/2018/html/ecb.is180913.en.html')
soup = BeautifulSoup(resp.content, 'html5lib')
article = soup.find('article')
paragraphs = article.find_all('p')


The output look likes:



[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p><strong>Has QE been used well by the various euro area countries?</strong></p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p><strong>My second question is on reinvestment. ...Have you today explicitly asked the committees to come up with proposals on reinvestments?</strong></p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]


I would like to exclude bold paragraph that contains



 <p><strong>


and has more than 15 words. The desired output should be:



[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]


I tried to code but failed to obtain the desired output. I would really appreciate if you could help me.










share|improve this question









New contributor




Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
    – petezurich
    2 days ago






  • 1




    My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
    – Vinh Vo
    2 days ago














up vote
1
down vote

favorite












I use the following code to scrape the website:




import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.ecb.europa.eu/press/pressconf/2018/html/ecb.is180913.en.html')
soup = BeautifulSoup(resp.content, 'html5lib')
article = soup.find('article')
paragraphs = article.find_all('p')


The output look likes:



[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p><strong>Has QE been used well by the various euro area countries?</strong></p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p><strong>My second question is on reinvestment. ...Have you today explicitly asked the committees to come up with proposals on reinvestments?</strong></p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]


I would like to exclude bold paragraph that contains



 <p><strong>


and has more than 15 words. The desired output should be:



[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]


I tried to code but failed to obtain the desired output. I would really appreciate if you could help me.










share|improve this question









New contributor




Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
    – petezurich
    2 days ago






  • 1




    My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
    – Vinh Vo
    2 days ago












up vote
1
down vote

favorite









up vote
1
down vote

favorite











I use the following code to scrape the website:




import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.ecb.europa.eu/press/pressconf/2018/html/ecb.is180913.en.html')
soup = BeautifulSoup(resp.content, 'html5lib')
article = soup.find('article')
paragraphs = article.find_all('p')


The output look likes:



[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p><strong>Has QE been used well by the various euro area countries?</strong></p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p><strong>My second question is on reinvestment. ...Have you today explicitly asked the committees to come up with proposals on reinvestments?</strong></p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]


I would like to exclude bold paragraph that contains



 <p><strong>


and has more than 15 words. The desired output should be:



[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]


I tried to code but failed to obtain the desired output. I would really appreciate if you could help me.










share|improve this question









New contributor




Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I use the following code to scrape the website:




import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.ecb.europa.eu/press/pressconf/2018/html/ecb.is180913.en.html')
soup = BeautifulSoup(resp.content, 'html5lib')
article = soup.find('article')
paragraphs = article.find_all('p')


The output look likes:



[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p><strong>Has QE been used well by the various euro area countries?</strong></p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p><strong>My second question is on reinvestment. ...Have you today explicitly asked the committees to come up with proposals on reinvestments?</strong></p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]


I would like to exclude bold paragraph that contains



 <p><strong>


and has more than 15 words. The desired output should be:



[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]


I tried to code but failed to obtain the desired output. I would really appreciate if you could help me.







python web-scraping beautifulsoup






share|improve this question









New contributor




Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 2 days ago









petezurich

3,28881631




3,28881631






New contributor




Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









Vinh Vo

103




103




New contributor




Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
    – petezurich
    2 days ago






  • 1




    My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
    – Vinh Vo
    2 days ago
















  • Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
    – petezurich
    2 days ago






  • 1




    My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
    – Vinh Vo
    2 days ago















Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
– petezurich
2 days ago




Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
– petezurich
2 days ago




1




1




My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
– Vinh Vo
2 days ago




My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
– Vinh Vo
2 days ago












2 Answers
2






active

oldest

votes

















up vote
0
down vote



accepted










use str() to convert bs4 object to string like <p><strong>......</strong></p>



....
paragraphs = article.find_all('p')

for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)





share|improve this answer




















  • You save me, @ewwink. Thank you very much for your time.
    – Vinh Vo
    2 days ago










  • you're welcome.
    – ewwink
    2 days ago

















up vote
0
down vote













Try the extract() function:



article = soup.find('article')
paragraphs = article.find_all('p')

article.strong.extract()
paragraphs_without_bold = article.find_all('p')


See also this.






share|improve this answer




















  • Thanks for your suggested link, @petezurich
    – Vinh Vo
    2 days ago










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);






Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.









 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53206114%2fi-would-like-to-exclude-bold-paragraphs-from-the-website%23new-answer', 'question_page');

);

Post as a guest






























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote



accepted










use str() to convert bs4 object to string like <p><strong>......</strong></p>



....
paragraphs = article.find_all('p')

for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)





share|improve this answer




















  • You save me, @ewwink. Thank you very much for your time.
    – Vinh Vo
    2 days ago










  • you're welcome.
    – ewwink
    2 days ago














up vote
0
down vote



accepted










use str() to convert bs4 object to string like <p><strong>......</strong></p>



....
paragraphs = article.find_all('p')

for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)





share|improve this answer




















  • You save me, @ewwink. Thank you very much for your time.
    – Vinh Vo
    2 days ago










  • you're welcome.
    – ewwink
    2 days ago












up vote
0
down vote



accepted







up vote
0
down vote



accepted






use str() to convert bs4 object to string like <p><strong>......</strong></p>



....
paragraphs = article.find_all('p')

for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)





share|improve this answer












use str() to convert bs4 object to string like <p><strong>......</strong></p>



....
paragraphs = article.find_all('p')

for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)






share|improve this answer












share|improve this answer



share|improve this answer










answered 2 days ago









ewwink

5,35422231




5,35422231











  • You save me, @ewwink. Thank you very much for your time.
    – Vinh Vo
    2 days ago










  • you're welcome.
    – ewwink
    2 days ago
















  • You save me, @ewwink. Thank you very much for your time.
    – Vinh Vo
    2 days ago










  • you're welcome.
    – ewwink
    2 days ago















You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago




You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago












you're welcome.
– ewwink
2 days ago




you're welcome.
– ewwink
2 days ago












up vote
0
down vote













Try the extract() function:



article = soup.find('article')
paragraphs = article.find_all('p')

article.strong.extract()
paragraphs_without_bold = article.find_all('p')


See also this.






share|improve this answer




















  • Thanks for your suggested link, @petezurich
    – Vinh Vo
    2 days ago














up vote
0
down vote













Try the extract() function:



article = soup.find('article')
paragraphs = article.find_all('p')

article.strong.extract()
paragraphs_without_bold = article.find_all('p')


See also this.






share|improve this answer




















  • Thanks for your suggested link, @petezurich
    – Vinh Vo
    2 days ago












up vote
0
down vote










up vote
0
down vote









Try the extract() function:



article = soup.find('article')
paragraphs = article.find_all('p')

article.strong.extract()
paragraphs_without_bold = article.find_all('p')


See also this.






share|improve this answer












Try the extract() function:



article = soup.find('article')
paragraphs = article.find_all('p')

article.strong.extract()
paragraphs_without_bold = article.find_all('p')


See also this.







share|improve this answer












share|improve this answer



share|improve this answer










answered 2 days ago









petezurich

3,28881631




3,28881631











  • Thanks for your suggested link, @petezurich
    – Vinh Vo
    2 days ago
















  • Thanks for your suggested link, @petezurich
    – Vinh Vo
    2 days ago















Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago




Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago










Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.









 

draft saved


draft discarded


















Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.












Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.











Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.













 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53206114%2fi-would-like-to-exclude-bold-paragraphs-from-the-website%23new-answer', 'question_page');

);

Post as a guest














































































kZv,fIfa0ngtMqOvhKkdikh9 e7UFi6Dmcd0zU2 c7wBMp7o2 QD3pEMmQk KBb5nHz9o75WPn12Pk3,M7Apokti
gbwfiToim7cc6gL4 m,w K1Zjy1nRYaeD9zWGrcB7KE1Exw L3bwBK3RIKM2NUAdFdRquUXJbcb2kboBTCbyevC5bhv,v21 6L Pk,GJ,t9f

Popular posts from this blog

Old paper Canadian currency

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế