Remove all whitespaces in URL/Email [closed]
Remove all whitespaces in URL/Email [closed]
I'd like to remove all whitespaces in URLs / Email addresses. The addresses are in a "normal" string, like: "Today the weather is fine. Tomorrow, we'll see. More information: www.weather .com or info @weather.com"
"Today the weather is fine. Tomorrow, we'll see. More information: www.weather .com or info @weather.com"
I'm looking for a good regex (using the re
module of Python), but my versions can't handle all cases
re
re.sub(u'(www)([ .])([a-zA-Z-]+)([ .])([a-z]+)', '\1.\3.\5')
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
Couldn't you just use
replace
? "https://www . some-example.c o m".replace(' ', '')
– brittenb
Sep 10 '18 at 17:45
replace
"https://www . some-example.c o m".replace(' ', '')
1 Answer
1
Your expression for url just require a little fixing. The regex expression for email can also be inherited from url expression.
>>> #EXPRESSIONS:
>>> url = "(www)+([ .])+([a-zA-Z-]+)+([ .])+([a-z]+)"
>>> ema = "([a-zA-Z]+)+([ +@]+)+([a-zA-Z-]+.com)"
>>>
>>> #IMPORTINGS:
>>> import re
>>>
>>> #YOUR DATA:
>>> string = "Today the weather is fine. Tomorrow, we'll see. More information: www.weather .com or info @weather.com"
>>>
>>> #Scraping Data
>>> "".join(re.findall(url,string)[0])
'www.weather.com'
>>> "".join(re.findall(ema,string)[0]).replace(" ","")
'info@weather.com'
>>>
Thank you very much! Your answer helped a lot! One last question: What regex could I use if i want to add a space to "/" in a normal text, but not in a URL. Example: "Hello ladies/gentleman. You can find more information on www.skynews.com/today/act ." --> "Hello ladies / gentleman. You can find more information on www.skynews.com/today/act ."
– StMan
Sep 12 '18 at 15:06
@StMan You can accept my answer. If your URL don't contain spaces then this will work:
(www.)?([da-z.-]+).([a-z.]2,6)([/w]*)
– Black Thunder
Sep 12 '18 at 18:23
(www.)?([da-z.-]+).([a-z.]2,6)([/w]*)
what is the expected output?
– The Scientific Method
Sep 10 '18 at 17:40