I would like to exclude bold paragraphs from the website









up vote
1
down vote

favorite












I use the following code to scrape the website:




import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.ecb.europa.eu/press/pressconf/2018/html/ecb.is180913.en.html')
soup = BeautifulSoup(resp.content, 'html5lib')
article = soup.find('article')
paragraphs = article.find_all('p')


The output look likes:



[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p><strong>Has QE been used well by the various euro area countries?</strong></p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p><strong>My second question is on reinvestment. ...Have you today explicitly asked the committees to come up with proposals on reinvestments?</strong></p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]


I would like to exclude bold paragraph that contains



 <p><strong>


and has more than 15 words. The desired output should be:



[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]


I tried to code but failed to obtain the desired output. I would really appreciate if you could help me.










share|improve this question









New contributor




Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
    – petezurich
    2 days ago






  • 1




    My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
    – Vinh Vo
    2 days ago














up vote
1
down vote

favorite












I use the following code to scrape the website:




import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.ecb.europa.eu/press/pressconf/2018/html/ecb.is180913.en.html')
soup = BeautifulSoup(resp.content, 'html5lib')
article = soup.find('article')
paragraphs = article.find_all('p')


The output look likes:



[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p><strong>Has QE been used well by the various euro area countries?</strong></p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p><strong>My second question is on reinvestment. ...Have you today explicitly asked the committees to come up with proposals on reinvestments?</strong></p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]


I would like to exclude bold paragraph that contains



 <p><strong>


and has more than 15 words. The desired output should be:



[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]


I tried to code but failed to obtain the desired output. I would really appreciate if you could help me.










share|improve this question









New contributor




Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
    – petezurich
    2 days ago






  • 1




    My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
    – Vinh Vo
    2 days ago












up vote
1
down vote

favorite









up vote
1
down vote

favorite











I use the following code to scrape the website:




import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.ecb.europa.eu/press/pressconf/2018/html/ecb.is180913.en.html')
soup = BeautifulSoup(resp.content, 'html5lib')
article = soup.find('article')
paragraphs = article.find_all('p')


The output look likes:



[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p><strong>Has QE been used well by the various euro area countries?</strong></p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p><strong>My second question is on reinvestment. ...Have you today explicitly asked the committees to come up with proposals on reinvestments?</strong></p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]


I would like to exclude bold paragraph that contains



 <p><strong>


and has more than 15 words. The desired output should be:



[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]


I tried to code but failed to obtain the desired output. I would really appreciate if you could help me.










share|improve this question









New contributor




Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I use the following code to scrape the website:




import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.ecb.europa.eu/press/pressconf/2018/html/ecb.is180913.en.html')
soup = BeautifulSoup(resp.content, 'html5lib')
article = soup.find('article')
paragraphs = article.find_all('p')


The output look likes:



[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p><strong>Has QE been used well by the various euro area countries?</strong></p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p><strong>My second question is on reinvestment. ...Have you today explicitly asked the committees to come up with proposals on reinvestments?</strong></p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]


I would like to exclude bold paragraph that contains



 <p><strong>


and has more than 15 words. The desired output should be:



[<p>Based on our regular economic and monetary analyses, we decided to keep the <strong>key ECB interest rates</strong> unchanged. .... to levels that are below, but close to, 2% over the medium term.</p>,
<p>By and large, yes, it's been used well in the sense that the intended effects of the QE – mind, ... It reduced dispersion in growth rates everywhere. An employment situation which is by and large improving almost everywhere, some countries more than others. </p>,
<p>If your question is meant to say; shouldn't governments have taken advantage of the situation of such low rates to decrease budget deficits, to restore? ... is a good situation for doing that.</p>,
<p>About inflation: I said inflation is going to hover around the present level for the rest of the year and then I gave numbers for next year and 2020. ...will reach our objective over the medium term. </p>,]


I tried to code but failed to obtain the desired output. I would really appreciate if you could help me.







python web-scraping beautifulsoup






share|improve this question









New contributor




Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 2 days ago









petezurich

3,28881631




3,28881631






New contributor




Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









Vinh Vo

103




103




New contributor




Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Vinh Vo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
    – petezurich
    2 days ago






  • 1




    My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
    – Vinh Vo
    2 days ago
















  • Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
    – petezurich
    2 days ago






  • 1




    My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
    – Vinh Vo
    2 days ago















Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
– petezurich
2 days ago




Possible duplicate of Exclude unwanted tag on Beautifulsoup Python
– petezurich
2 days ago




1




1




My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
– Vinh Vo
2 days ago




My question is probably a bit different or maybe my question is not too clear. The bold paragraph <p><strong> should have more than 15 words. For example, <p><strong> Thank you </strong></p> I do not exclude it.
– Vinh Vo
2 days ago












2 Answers
2






active

oldest

votes

















up vote
0
down vote



accepted










use str() to convert bs4 object to string like <p><strong>......</strong></p>



....
paragraphs = article.find_all('p')

for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)





share|improve this answer




















  • You save me, @ewwink. Thank you very much for your time.
    – Vinh Vo
    2 days ago










  • you're welcome.
    – ewwink
    2 days ago

















up vote
0
down vote













Try the extract() function:



article = soup.find('article')
paragraphs = article.find_all('p')

article.strong.extract()
paragraphs_without_bold = article.find_all('p')


See also this.






share|improve this answer




















  • Thanks for your suggested link, @petezurich
    – Vinh Vo
    2 days ago










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);






Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.









 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53206114%2fi-would-like-to-exclude-bold-paragraphs-from-the-website%23new-answer', 'question_page');

);

Post as a guest






























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote



accepted










use str() to convert bs4 object to string like <p><strong>......</strong></p>



....
paragraphs = article.find_all('p')

for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)





share|improve this answer




















  • You save me, @ewwink. Thank you very much for your time.
    – Vinh Vo
    2 days ago










  • you're welcome.
    – ewwink
    2 days ago














up vote
0
down vote



accepted










use str() to convert bs4 object to string like <p><strong>......</strong></p>



....
paragraphs = article.find_all('p')

for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)





share|improve this answer




















  • You save me, @ewwink. Thank you very much for your time.
    – Vinh Vo
    2 days ago










  • you're welcome.
    – ewwink
    2 days ago












up vote
0
down vote



accepted







up vote
0
down vote



accepted






use str() to convert bs4 object to string like <p><strong>......</strong></p>



....
paragraphs = article.find_all('p')

for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)





share|improve this answer












use str() to convert bs4 object to string like <p><strong>......</strong></p>



....
paragraphs = article.find_all('p')

for p in paragraphs:
if '<p><strong>' not in str(p):
print str(p)






share|improve this answer












share|improve this answer



share|improve this answer










answered 2 days ago









ewwink

5,35422231




5,35422231











  • You save me, @ewwink. Thank you very much for your time.
    – Vinh Vo
    2 days ago










  • you're welcome.
    – ewwink
    2 days ago
















  • You save me, @ewwink. Thank you very much for your time.
    – Vinh Vo
    2 days ago










  • you're welcome.
    – ewwink
    2 days ago















You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago




You save me, @ewwink. Thank you very much for your time.
– Vinh Vo
2 days ago












you're welcome.
– ewwink
2 days ago




you're welcome.
– ewwink
2 days ago












up vote
0
down vote













Try the extract() function:



article = soup.find('article')
paragraphs = article.find_all('p')

article.strong.extract()
paragraphs_without_bold = article.find_all('p')


See also this.






share|improve this answer




















  • Thanks for your suggested link, @petezurich
    – Vinh Vo
    2 days ago














up vote
0
down vote













Try the extract() function:



article = soup.find('article')
paragraphs = article.find_all('p')

article.strong.extract()
paragraphs_without_bold = article.find_all('p')


See also this.






share|improve this answer




















  • Thanks for your suggested link, @petezurich
    – Vinh Vo
    2 days ago












up vote
0
down vote










up vote
0
down vote









Try the extract() function:



article = soup.find('article')
paragraphs = article.find_all('p')

article.strong.extract()
paragraphs_without_bold = article.find_all('p')


See also this.






share|improve this answer












Try the extract() function:



article = soup.find('article')
paragraphs = article.find_all('p')

article.strong.extract()
paragraphs_without_bold = article.find_all('p')


See also this.







share|improve this answer












share|improve this answer



share|improve this answer










answered 2 days ago









petezurich

3,28881631




3,28881631











  • Thanks for your suggested link, @petezurich
    – Vinh Vo
    2 days ago
















  • Thanks for your suggested link, @petezurich
    – Vinh Vo
    2 days ago















Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago




Thanks for your suggested link, @petezurich
– Vinh Vo
2 days ago










Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.









 

draft saved


draft discarded


















Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.












Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.











Vinh Vo is a new contributor. Be nice, and check out our Code of Conduct.













 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53206114%2fi-would-like-to-exclude-bold-paragraphs-from-the-website%23new-answer', 'question_page');

);

Post as a guest














































































Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)