String slugification in Python

I am in search of the best way to "slugify" string what "slug" is, and my current solution is based on this recipe

I have changed it a little bit to:

s = 'String to slugify'

slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)

Anyone see any problems with this code? It is working fine, but maybe I am missing something or you know a better way?

edited May 23 '17 at 10:31

Community♦

asked Apr 6 '11 at 23:08

Zygimantas

2,01642746

are you working with unicode alot? if so, the last re.sub might be better if you wrap unicode() around it, This is what django does. Also, the [^a-z0-9]+ can be shortened to use w . see django.template.defaultfilters, it's close to yours, but a bit more refined.

– Mike Ramirez
Apr 7 '11 at 0:23

Are unicode characters allowed in URL? Also, I have changed w to a-z0-9 because w includes _ character and uppercase letters. Letters are set to lowercase in advance, so there will be no uppercase letters to match.

– Zygimantas
Apr 7 '11 at 1:21

'_' is valid (but your choice, you did ask), unicode is as percent encoded chars.

– Mike Ramirez
Apr 7 '11 at 1:36

Thank you Mike. Well, I asked a wrong question. Is there any reason to encode it back to unicode string, if we already replaced all characters except "a-z", "0-9" and "-" ?

– Zygimantas
Apr 7 '11 at 1:47

For django, I believe it's important to them to have it all strings as unicode objects for compatibility. It's your choice if you want this.

– Mike Ramirez
Apr 7 '11 at 1:51

|
show 1 more comment

I am in search of the best way to "slugify" string what "slug" is, and my current solution is based on this recipe

I have changed it a little bit to:

s = 'String to slugify'

slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)

Anyone see any problems with this code? It is working fine, but maybe I am missing something or you know a better way?

edited May 23 '17 at 10:31

Community♦

asked Apr 6 '11 at 23:08

Zygimantas

2,01642746

are you working with unicode alot? if so, the last re.sub might be better if you wrap unicode() around it, This is what django does. Also, the [^a-z0-9]+ can be shortened to use w . see django.template.defaultfilters, it's close to yours, but a bit more refined.

– Mike Ramirez
Apr 7 '11 at 0:23

Are unicode characters allowed in URL? Also, I have changed w to a-z0-9 because w includes _ character and uppercase letters. Letters are set to lowercase in advance, so there will be no uppercase letters to match.

– Zygimantas
Apr 7 '11 at 1:21

'_' is valid (but your choice, you did ask), unicode is as percent encoded chars.

– Mike Ramirez
Apr 7 '11 at 1:36

Thank you Mike. Well, I asked a wrong question. Is there any reason to encode it back to unicode string, if we already replaced all characters except "a-z", "0-9" and "-" ?

– Zygimantas
Apr 7 '11 at 1:47

For django, I believe it's important to them to have it all strings as unicode objects for compatibility. It's your choice if you want this.

– Mike Ramirez
Apr 7 '11 at 1:51

|
show 1 more comment

I am in search of the best way to "slugify" string what "slug" is, and my current solution is based on this recipe

I have changed it a little bit to:

s = 'String to slugify'

slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)

Anyone see any problems with this code? It is working fine, but maybe I am missing something or you know a better way?

edited May 23 '17 at 10:31

Community♦

asked Apr 6 '11 at 23:08

Zygimantas

2,01642746

I am in search of the best way to "slugify" string what "slug" is, and my current solution is based on this recipe

I have changed it a little bit to:

s = 'String to slugify'

slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)

Anyone see any problems with this code? It is working fine, but maybe I am missing something or you know a better way?

python slug

edited May 23 '17 at 10:31

Community♦

asked Apr 6 '11 at 23:08

Zygimantas

2,01642746

edited May 23 '17 at 10:31

Community♦

asked Apr 6 '11 at 23:08

Zygimantas

2,01642746

edited May 23 '17 at 10:31

Community♦

edited May 23 '17 at 10:31

Community♦

edited May 23 '17 at 10:31

Community♦

asked Apr 6 '11 at 23:08

Zygimantas

2,01642746

asked Apr 6 '11 at 23:08

Zygimantas

2,01642746

asked Apr 6 '11 at 23:08

Zygimantas

2,01642746

are you working with unicode alot? if so, the last re.sub might be better if you wrap unicode() around it, This is what django does. Also, the [^a-z0-9]+ can be shortened to use w . see django.template.defaultfilters, it's close to yours, but a bit more refined.

– Mike Ramirez
Apr 7 '11 at 0:23

Are unicode characters allowed in URL? Also, I have changed w to a-z0-9 because w includes _ character and uppercase letters. Letters are set to lowercase in advance, so there will be no uppercase letters to match.

– Zygimantas
Apr 7 '11 at 1:21

'_' is valid (but your choice, you did ask), unicode is as percent encoded chars.

– Mike Ramirez
Apr 7 '11 at 1:36

Thank you Mike. Well, I asked a wrong question. Is there any reason to encode it back to unicode string, if we already replaced all characters except "a-z", "0-9" and "-" ?

– Zygimantas
Apr 7 '11 at 1:47

For django, I believe it's important to them to have it all strings as unicode objects for compatibility. It's your choice if you want this.

– Mike Ramirez
Apr 7 '11 at 1:51

|
show 1 more comment

are you working with unicode alot? if so, the last re.sub might be better if you wrap unicode() around it, This is what django does. Also, the [^a-z0-9]+ can be shortened to use w . see django.template.defaultfilters, it's close to yours, but a bit more refined.

– Mike Ramirez
Apr 7 '11 at 0:23

Are unicode characters allowed in URL? Also, I have changed w to a-z0-9 because w includes _ character and uppercase letters. Letters are set to lowercase in advance, so there will be no uppercase letters to match.

– Zygimantas
Apr 7 '11 at 1:21

'_' is valid (but your choice, you did ask), unicode is as percent encoded chars.

– Mike Ramirez
Apr 7 '11 at 1:36

Thank you Mike. Well, I asked a wrong question. Is there any reason to encode it back to unicode string, if we already replaced all characters except "a-z", "0-9" and "-" ?

– Zygimantas
Apr 7 '11 at 1:47

For django, I believe it's important to them to have it all strings as unicode objects for compatibility. It's your choice if you want this.

– Mike Ramirez
Apr 7 '11 at 1:51

are you working with unicode alot? if so, the last re.sub might be better if you wrap unicode() around it, This is what django does. Also, the [^a-z0-9]+ can be shortened to use w . see django.template.defaultfilters, it's close to yours, but a bit more refined.

– Mike Ramirez
Apr 7 '11 at 0:23

Are unicode characters allowed in URL? Also, I have changed w to a-z0-9 because w includes _ character and uppercase letters. Letters are set to lowercase in advance, so there will be no uppercase letters to match.

– Zygimantas
Apr 7 '11 at 1:21

'_' is valid (but your choice, you did ask), unicode is as percent encoded chars.

– Mike Ramirez
Apr 7 '11 at 1:36

Thank you Mike. Well, I asked a wrong question. Is there any reason to encode it back to unicode string, if we already replaced all characters except "a-z", "0-9" and "-" ?

– Zygimantas
Apr 7 '11 at 1:47

For django, I believe it's important to them to have it all strings as unicode objects for compatibility. It's your choice if you want this.

– Mike Ramirez
Apr 7 '11 at 1:51

|
show 1 more comment

10 Answers
10

active

oldest

votes

112

There is a python package named python-slugify, which does a pretty good job of slugifying:

pip install python-slugify

Works like this:

from slugify import slugify

txt = "This is a test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = "This -- is a ## test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = 'C'est déjà l'été.'
r = slugify(txt)
self.assertEquals(r, "cest-deja-lete")

txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
r = slugify(txt)
self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")

txt = 'Компьютер'
r = slugify(txt)
self.assertEquals(r, "kompiuter")

txt = 'jaja---lol-méméméoo--a'
r = slugify(txt)
self.assertEquals(r, "jaja-lol-mememeoo-a")

See More examples

This package does a bit more than what you posted (take a look at the source, it's just one file). The project is still active (got updated 2 days before I originally answered, over four years later (last checked 2017-04-26), it still gets updated).

careful: There is a second package around, named slugify. If you have both of them, you might get a problem, as they have the same name for import. The one just named slugify didn't do all I quick-checked: "Ich heiße" became "ich-heie" (should be "ich-heisse"), so be sure to pick the right one, when using pip or easy_install.

edited May 21 '18 at 16:13

Nick T

14.1k55799

answered Feb 15 '13 at 2:12

kratenko

5,24342652

5

python-slugify is licensed under MIT, but it uses Unidecode which is licensed under GPL, so it might not fit for some projects.

– Rotareti
Aug 6 '17 at 21:40

add a comment |

Install unidecode form from here for unicode support

pip install unidecode

# -*- coding: utf-8 -*-
import re
import unidecode

def slugify(text):
 text = unidecode.unidecode(text).lower()
 return re.sub(r'[W_]+', '-', text)

text = u"My custom хелло ворлд"
print slugify(text)

>>> my-custom-khello-vorld

edited Nov 12 '18 at 10:02

Arne

2,50322239

answered Dec 3 '11 at 9:29

user1078810

303146

1

hi, its a bit strange but it give for my res like that "my-custom-ndud-d-d3-4-d2d3-4nd-d-"

– derevo
Jul 30 '12 at 7:04

1

@derevo that happend when you don't send unicode strings. Replace slugify("My custom хелло ворлд") with slugify(u"My custom хелло ворлд"), and it should work.

– kratenko
Dec 16 '12 at 12:10

9

I would suggest against using variable names like str. This hides the builtin str type.

– crodjer
Apr 19 '14 at 7:22

2

unidecode is GPL, which may not be suitable for some.

– Jorge Leitão
Apr 25 '15 at 6:59

What about the reslugifying or deslugifying.

– Ryan Chou
Jan 24 at 3:46

add a comment |

There is python package named awesome-slugify:

pip install awesome-slugify

Works like this:

from slugify import slugify

slugify('one kožušček') # one-kozuscek

awesome-slugify github page

answered Mar 2 '14 at 21:01

voronin

40644

2

Nice package! But be careful, it's licensed under GPL.

– Rotareti
Aug 6 '17 at 21:27

add a comment |

The problem is with the ascii normalization line:

slug = unicodedata.normalize('NFKD', s)

It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:

Mørdag -> mrdag
Æther -> ther

A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:

import unidecode
slug = unidecode.unidecode(s)

You get better results for the above strings and for many Greek and Russian characters too:

Mørdag -> mordag
Æther -> aether

answered Sep 7 '11 at 13:16

Björn Lindqvist

10.3k95487

add a comment |

It works well in Django, so I don't see why it wouldn't be a good general purpose slugify function.

Are you having any problems with it?

answered Apr 6 '11 at 23:22

Nick Presta

23.9k54672

It's possible, that for some cases, it's a healthy dose of paranoia :-)

– nemesisfixx
Nov 9 '13 at 12:42

The code has moved to here.

– raylu
Jul 21 '16 at 4:43

2

For the lazies: from django.utils.text import slugify

– Spartacus
Dec 6 '17 at 23:29

add a comment |

def slugify(value):
 """
 Converts to lowercase, removes non-word characters (alphanumerics and
 underscores) and converts spaces to hyphens. Also strips leading and
 trailing whitespace.
 """
 value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
 value = re.sub('[^ws-]', '', value).strip().lower()
 return mark_safe(re.sub('[-s]+', '-', value))
slugify = allow_lazy(slugify, six.text_type)

This is the slugify function present in django.utils.text
This should suffice your requirement.

answered Dec 3 '14 at 5:35

Animesh Sharma

2,0551019

add a comment |

Unidecode is good; however, be careful: unidecode is GPL. If this license doesn't fit then use this one

edited Nov 14 '14 at 9:21

BomberMan

93511031

answered Apr 22 '13 at 14:29

Mikhail Korobov

16.1k35451

add a comment |

A couple of options on GitHub:

https://github.com/dimka665/awesome-slugify

https://github.com/un33k/python-slugify

https://github.com/mozilla/unicode-slugify

Each supports slightly different parameters for its API, so you'll need to look through to figure out what you prefer.

In particular, pay attention to the different options they provide for dealing with non-ASCII characters. Pydanny wrote a very helpful blog post illustrating some of the unicode handling differences in these slugify'ing libraries: http://www.pydanny.com/awesome-slugify-human-readable-url-slugs-from-any-string.html This blog post is slightly outdated because Mozilla's unicode-slugify is no longer Django-specific.

Also note that currently awesome-slugify is GPLv3, though there's an open issue where the author says they'd prefer to release as MIT/BSD, just not sure of the legality: https://github.com/dimka665/awesome-slugify/issues/24

edited Mar 16 '16 at 13:35

answered Mar 15 '16 at 10:20

Jeff Widman

7,46934365

add a comment |

You might consider changing the last line to

slug=re.sub(r'--+',r'-',slug)

since the pattern [-]+ is no different than -+, and you don't really care about matching just one hyphen, only two or more.

But, of course, this is quite minor.

answered Apr 6 '11 at 23:36

unutbu

553k10111921242

add a comment |

Another option is boltons.strutils.slugify. Boltons has quite a few other useful functions as well, and is distributed under a BSD license.

edited Nov 17 '18 at 1:34

answered Nov 17 '18 at 1:27

ostrokach

5,48023547

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f5574042%2fstring-slugification-in-python%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

10 Answers
10

active

oldest

votes

10 Answers
10

active

oldest

votes

112

There is a python package named python-slugify, which does a pretty good job of slugifying:

pip install python-slugify

Works like this:

from slugify import slugify

txt = "This is a test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = "This -- is a ## test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = 'C'est déjà l'été.'
r = slugify(txt)
self.assertEquals(r, "cest-deja-lete")

txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
r = slugify(txt)
self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")

txt = 'Компьютер'
r = slugify(txt)
self.assertEquals(r, "kompiuter")

txt = 'jaja---lol-méméméoo--a'
r = slugify(txt)
self.assertEquals(r, "jaja-lol-mememeoo-a")

See More examples

edited May 21 '18 at 16:13

Nick T

14.1k55799

answered Feb 15 '13 at 2:12

kratenko

5,24342652

5

python-slugify is licensed under MIT, but it uses Unidecode which is licensed under GPL, so it might not fit for some projects.

– Rotareti
Aug 6 '17 at 21:40

add a comment |

112

There is a python package named python-slugify, which does a pretty good job of slugifying:

pip install python-slugify

Works like this:

from slugify import slugify

txt = "This is a test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = "This -- is a ## test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = 'C'est déjà l'été.'
r = slugify(txt)
self.assertEquals(r, "cest-deja-lete")

txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
r = slugify(txt)
self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")

txt = 'Компьютер'
r = slugify(txt)
self.assertEquals(r, "kompiuter")

txt = 'jaja---lol-méméméoo--a'
r = slugify(txt)
self.assertEquals(r, "jaja-lol-mememeoo-a")

See More examples

edited May 21 '18 at 16:13

Nick T

14.1k55799

answered Feb 15 '13 at 2:12

kratenko

5,24342652

5

python-slugify is licensed under MIT, but it uses Unidecode which is licensed under GPL, so it might not fit for some projects.

– Rotareti
Aug 6 '17 at 21:40

add a comment |

112

There is a python package named python-slugify, which does a pretty good job of slugifying:

pip install python-slugify

Works like this:

from slugify import slugify

txt = "This is a test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = "This -- is a ## test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = 'C'est déjà l'été.'
r = slugify(txt)
self.assertEquals(r, "cest-deja-lete")

txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
r = slugify(txt)
self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")

txt = 'Компьютер'
r = slugify(txt)
self.assertEquals(r, "kompiuter")

txt = 'jaja---lol-méméméoo--a'
r = slugify(txt)
self.assertEquals(r, "jaja-lol-mememeoo-a")

See More examples

edited May 21 '18 at 16:13

Nick T

14.1k55799

answered Feb 15 '13 at 2:12

kratenko

5,24342652

There is a python package named python-slugify, which does a pretty good job of slugifying:

pip install python-slugify

Works like this:

from slugify import slugify

txt = "This is a test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = "This -- is a ## test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = 'C'est déjà l'été.'
r = slugify(txt)
self.assertEquals(r, "cest-deja-lete")

txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
r = slugify(txt)
self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")

txt = 'Компьютер'
r = slugify(txt)
self.assertEquals(r, "kompiuter")

txt = 'jaja---lol-méméméoo--a'
r = slugify(txt)
self.assertEquals(r, "jaja-lol-mememeoo-a")

See More examples

edited May 21 '18 at 16:13

Nick T

14.1k55799

answered Feb 15 '13 at 2:12

kratenko

5,24342652

edited May 21 '18 at 16:13

Nick T

14.1k55799

edited May 21 '18 at 16:13

Nick T

14.1k55799

edited May 21 '18 at 16:13

Nick T

14.1k55799

answered Feb 15 '13 at 2:12

kratenko

5,24342652

answered Feb 15 '13 at 2:12

kratenko

5,24342652

answered Feb 15 '13 at 2:12

kratenko

5,24342652

5

python-slugify is licensed under MIT, but it uses Unidecode which is licensed under GPL, so it might not fit for some projects.

– Rotareti
Aug 6 '17 at 21:40

add a comment |

5

python-slugify is licensed under MIT, but it uses Unidecode which is licensed under GPL, so it might not fit for some projects.

– Rotareti
Aug 6 '17 at 21:40

python-slugify is licensed under MIT, but it uses Unidecode which is licensed under GPL, so it might not fit for some projects.

– Rotareti
Aug 6 '17 at 21:40

add a comment |

Install unidecode form from here for unicode support

pip install unidecode

# -*- coding: utf-8 -*-
import re
import unidecode

def slugify(text):
 text = unidecode.unidecode(text).lower()
 return re.sub(r'[W_]+', '-', text)

text = u"My custom хелло ворлд"
print slugify(text)

>>> my-custom-khello-vorld

edited Nov 12 '18 at 10:02

Arne

2,50322239

answered Dec 3 '11 at 9:29

user1078810

303146

1

hi, its a bit strange but it give for my res like that "my-custom-ndud-d-d3-4-d2d3-4nd-d-"

– derevo
Jul 30 '12 at 7:04

1

@derevo that happend when you don't send unicode strings. Replace slugify("My custom хелло ворлд") with slugify(u"My custom хелло ворлд"), and it should work.

– kratenko
Dec 16 '12 at 12:10

9

I would suggest against using variable names like str. This hides the builtin str type.

– crodjer
Apr 19 '14 at 7:22

2

unidecode is GPL, which may not be suitable for some.

– Jorge Leitão
Apr 25 '15 at 6:59

What about the reslugifying or deslugifying.

– Ryan Chou
Jan 24 at 3:46

add a comment |

Install unidecode form from here for unicode support

pip install unidecode

# -*- coding: utf-8 -*-
import re
import unidecode

def slugify(text):
 text = unidecode.unidecode(text).lower()
 return re.sub(r'[W_]+', '-', text)

text = u"My custom хелло ворлд"
print slugify(text)

>>> my-custom-khello-vorld

edited Nov 12 '18 at 10:02

Arne

2,50322239

answered Dec 3 '11 at 9:29

user1078810

303146

1

hi, its a bit strange but it give for my res like that "my-custom-ndud-d-d3-4-d2d3-4nd-d-"

– derevo
Jul 30 '12 at 7:04

1

@derevo that happend when you don't send unicode strings. Replace slugify("My custom хелло ворлд") with slugify(u"My custom хелло ворлд"), and it should work.

– kratenko
Dec 16 '12 at 12:10

9

I would suggest against using variable names like str. This hides the builtin str type.

– crodjer
Apr 19 '14 at 7:22

2

unidecode is GPL, which may not be suitable for some.

– Jorge Leitão
Apr 25 '15 at 6:59

What about the reslugifying or deslugifying.

– Ryan Chou
Jan 24 at 3:46

add a comment |

Install unidecode form from here for unicode support

pip install unidecode

# -*- coding: utf-8 -*-
import re
import unidecode

def slugify(text):
 text = unidecode.unidecode(text).lower()
 return re.sub(r'[W_]+', '-', text)

text = u"My custom хелло ворлд"
print slugify(text)

>>> my-custom-khello-vorld

edited Nov 12 '18 at 10:02

Arne

2,50322239

answered Dec 3 '11 at 9:29

user1078810

303146

Install unidecode form from here for unicode support

pip install unidecode

# -*- coding: utf-8 -*-
import re
import unidecode

def slugify(text):
 text = unidecode.unidecode(text).lower()
 return re.sub(r'[W_]+', '-', text)

text = u"My custom хелло ворлд"
print slugify(text)

>>> my-custom-khello-vorld

edited Nov 12 '18 at 10:02

Arne

2,50322239

answered Dec 3 '11 at 9:29

user1078810

303146

edited Nov 12 '18 at 10:02

Arne

2,50322239

edited Nov 12 '18 at 10:02

Arne

2,50322239

edited Nov 12 '18 at 10:02

Arne

2,50322239

answered Dec 3 '11 at 9:29

user1078810

303146

answered Dec 3 '11 at 9:29

user1078810

303146

answered Dec 3 '11 at 9:29

user1078810

303146

1

hi, its a bit strange but it give for my res like that "my-custom-ndud-d-d3-4-d2d3-4nd-d-"

– derevo
Jul 30 '12 at 7:04

1

@derevo that happend when you don't send unicode strings. Replace slugify("My custom хелло ворлд") with slugify(u"My custom хелло ворлд"), and it should work.

– kratenko
Dec 16 '12 at 12:10

9

I would suggest against using variable names like str. This hides the builtin str type.

– crodjer
Apr 19 '14 at 7:22

2

unidecode is GPL, which may not be suitable for some.

– Jorge Leitão
Apr 25 '15 at 6:59

What about the reslugifying or deslugifying.

– Ryan Chou
Jan 24 at 3:46

add a comment |

1

hi, its a bit strange but it give for my res like that "my-custom-ndud-d-d3-4-d2d3-4nd-d-"

– derevo
Jul 30 '12 at 7:04

1

@derevo that happend when you don't send unicode strings. Replace slugify("My custom хелло ворлд") with slugify(u"My custom хелло ворлд"), and it should work.

– kratenko
Dec 16 '12 at 12:10

9

I would suggest against using variable names like str. This hides the builtin str type.

– crodjer
Apr 19 '14 at 7:22

2

unidecode is GPL, which may not be suitable for some.

– Jorge Leitão
Apr 25 '15 at 6:59

What about the reslugifying or deslugifying.

– Ryan Chou
Jan 24 at 3:46

hi, its a bit strange but it give for my res like that "my-custom-ndud-d-d3-4-d2d3-4nd-d-"

– derevo
Jul 30 '12 at 7:04

@derevo that happend when you don't send unicode strings. Replace slugify("My custom хелло ворлд") with slugify(u"My custom хелло ворлд"), and it should work.

– kratenko
Dec 16 '12 at 12:10

I would suggest against using variable names like str. This hides the builtin str type.

– crodjer
Apr 19 '14 at 7:22

unidecode is GPL, which may not be suitable for some.

– Jorge Leitão
Apr 25 '15 at 6:59

What about the reslugifying or deslugifying.

– Ryan Chou
Jan 24 at 3:46

add a comment |

There is python package named awesome-slugify:

pip install awesome-slugify

Works like this:

from slugify import slugify

slugify('one kožušček') # one-kozuscek

awesome-slugify github page

answered Mar 2 '14 at 21:01

voronin

40644

2

Nice package! But be careful, it's licensed under GPL.

– Rotareti
Aug 6 '17 at 21:27

add a comment |

There is python package named awesome-slugify:

pip install awesome-slugify

Works like this:

from slugify import slugify

slugify('one kožušček') # one-kozuscek

awesome-slugify github page

answered Mar 2 '14 at 21:01

voronin

40644

2

Nice package! But be careful, it's licensed under GPL.

– Rotareti
Aug 6 '17 at 21:27

add a comment |

There is python package named awesome-slugify:

pip install awesome-slugify

Works like this:

from slugify import slugify

slugify('one kožušček') # one-kozuscek

awesome-slugify github page

answered Mar 2 '14 at 21:01

voronin

40644

There is python package named awesome-slugify:

pip install awesome-slugify

Works like this:

from slugify import slugify

slugify('one kožušček') # one-kozuscek

awesome-slugify github page

answered Mar 2 '14 at 21:01

voronin

40644

answered Mar 2 '14 at 21:01

voronin

40644

answered Mar 2 '14 at 21:01

voronin

40644

answered Mar 2 '14 at 21:01

voronin

40644

2

Nice package! But be careful, it's licensed under GPL.

– Rotareti
Aug 6 '17 at 21:27

add a comment |

2

Nice package! But be careful, it's licensed under GPL.

– Rotareti
Aug 6 '17 at 21:27

Nice package! But be careful, it's licensed under GPL.

– Rotareti
Aug 6 '17 at 21:27

add a comment |

The problem is with the ascii normalization line:

slug = unicodedata.normalize('NFKD', s)

It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:

Mørdag -> mrdag
Æther -> ther

A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:

import unidecode
slug = unidecode.unidecode(s)

You get better results for the above strings and for many Greek and Russian characters too:

Mørdag -> mordag
Æther -> aether

answered Sep 7 '11 at 13:16

Björn Lindqvist

10.3k95487

add a comment |

The problem is with the ascii normalization line:

slug = unicodedata.normalize('NFKD', s)

It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:

Mørdag -> mrdag
Æther -> ther

A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:

import unidecode
slug = unidecode.unidecode(s)

You get better results for the above strings and for many Greek and Russian characters too:

Mørdag -> mordag
Æther -> aether

answered Sep 7 '11 at 13:16

Björn Lindqvist

10.3k95487

add a comment |

The problem is with the ascii normalization line:

slug = unicodedata.normalize('NFKD', s)

It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:

Mørdag -> mrdag
Æther -> ther

A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:

import unidecode
slug = unidecode.unidecode(s)

You get better results for the above strings and for many Greek and Russian characters too:

Mørdag -> mordag
Æther -> aether

answered Sep 7 '11 at 13:16

Björn Lindqvist

10.3k95487

The problem is with the ascii normalization line:

slug = unicodedata.normalize('NFKD', s)

It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:

Mørdag -> mrdag
Æther -> ther

A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:

import unidecode
slug = unidecode.unidecode(s)

You get better results for the above strings and for many Greek and Russian characters too:

Mørdag -> mordag
Æther -> aether

answered Sep 7 '11 at 13:16

Björn Lindqvist

10.3k95487

answered Sep 7 '11 at 13:16

Björn Lindqvist

10.3k95487

answered Sep 7 '11 at 13:16

Björn Lindqvist

10.3k95487

answered Sep 7 '11 at 13:16

Björn Lindqvist

10.3k95487

add a comment |

It works well in Django, so I don't see why it wouldn't be a good general purpose slugify function.

Are you having any problems with it?

answered Apr 6 '11 at 23:22

Nick Presta

23.9k54672

It's possible, that for some cases, it's a healthy dose of paranoia :-)

– nemesisfixx
Nov 9 '13 at 12:42

The code has moved to here.

– raylu
Jul 21 '16 at 4:43

2

For the lazies: from django.utils.text import slugify

– Spartacus
Dec 6 '17 at 23:29

add a comment |

It works well in Django, so I don't see why it wouldn't be a good general purpose slugify function.

Are you having any problems with it?

answered Apr 6 '11 at 23:22

Nick Presta

23.9k54672

It's possible, that for some cases, it's a healthy dose of paranoia :-)

– nemesisfixx
Nov 9 '13 at 12:42

The code has moved to here.

– raylu
Jul 21 '16 at 4:43

2

For the lazies: from django.utils.text import slugify

– Spartacus
Dec 6 '17 at 23:29

add a comment |

It works well in Django, so I don't see why it wouldn't be a good general purpose slugify function.

Are you having any problems with it?

answered Apr 6 '11 at 23:22

Nick Presta

23.9k54672

It works well in Django, so I don't see why it wouldn't be a good general purpose slugify function.

Are you having any problems with it?

answered Apr 6 '11 at 23:22

Nick Presta

23.9k54672

answered Apr 6 '11 at 23:22

Nick Presta

23.9k54672

answered Apr 6 '11 at 23:22

Nick Presta

23.9k54672

answered Apr 6 '11 at 23:22

Nick Presta

23.9k54672

It's possible, that for some cases, it's a healthy dose of paranoia :-)

– nemesisfixx
Nov 9 '13 at 12:42

The code has moved to here.

– raylu
Jul 21 '16 at 4:43

2

For the lazies: from django.utils.text import slugify

– Spartacus
Dec 6 '17 at 23:29

add a comment |

It's possible, that for some cases, it's a healthy dose of paranoia :-)

– nemesisfixx
Nov 9 '13 at 12:42

The code has moved to here.

– raylu
Jul 21 '16 at 4:43

2

For the lazies: from django.utils.text import slugify

– Spartacus
Dec 6 '17 at 23:29

It's possible, that for some cases, it's a healthy dose of paranoia :-)

– nemesisfixx
Nov 9 '13 at 12:42

The code has moved to here.

– raylu
Jul 21 '16 at 4:43

For the lazies: from django.utils.text import slugify

– Spartacus
Dec 6 '17 at 23:29

add a comment |

def slugify(value):
 """
 Converts to lowercase, removes non-word characters (alphanumerics and
 underscores) and converts spaces to hyphens. Also strips leading and
 trailing whitespace.
 """
 value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
 value = re.sub('[^ws-]', '', value).strip().lower()
 return mark_safe(re.sub('[-s]+', '-', value))
slugify = allow_lazy(slugify, six.text_type)

This is the slugify function present in django.utils.text
This should suffice your requirement.

answered Dec 3 '14 at 5:35

Animesh Sharma

2,0551019

add a comment |

def slugify(value):
 """
 Converts to lowercase, removes non-word characters (alphanumerics and
 underscores) and converts spaces to hyphens. Also strips leading and
 trailing whitespace.
 """
 value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
 value = re.sub('[^ws-]', '', value).strip().lower()
 return mark_safe(re.sub('[-s]+', '-', value))
slugify = allow_lazy(slugify, six.text_type)

This is the slugify function present in django.utils.text
This should suffice your requirement.

answered Dec 3 '14 at 5:35

Animesh Sharma

2,0551019

add a comment |

def slugify(value):
 """
 Converts to lowercase, removes non-word characters (alphanumerics and
 underscores) and converts spaces to hyphens. Also strips leading and
 trailing whitespace.
 """
 value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
 value = re.sub('[^ws-]', '', value).strip().lower()
 return mark_safe(re.sub('[-s]+', '-', value))
slugify = allow_lazy(slugify, six.text_type)

This is the slugify function present in django.utils.text
This should suffice your requirement.

answered Dec 3 '14 at 5:35

Animesh Sharma

2,0551019

def slugify(value):
 """
 Converts to lowercase, removes non-word characters (alphanumerics and
 underscores) and converts spaces to hyphens. Also strips leading and
 trailing whitespace.
 """
 value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
 value = re.sub('[^ws-]', '', value).strip().lower()
 return mark_safe(re.sub('[-s]+', '-', value))
slugify = allow_lazy(slugify, six.text_type)

This is the slugify function present in django.utils.text
This should suffice your requirement.

answered Dec 3 '14 at 5:35

Animesh Sharma

2,0551019

answered Dec 3 '14 at 5:35

Animesh Sharma

2,0551019

answered Dec 3 '14 at 5:35

Animesh Sharma

2,0551019

answered Dec 3 '14 at 5:35

Animesh Sharma

2,0551019

add a comment |

Unidecode is good; however, be careful: unidecode is GPL. If this license doesn't fit then use this one

edited Nov 14 '14 at 9:21

BomberMan

93511031

answered Apr 22 '13 at 14:29

Mikhail Korobov

16.1k35451

add a comment |

Unidecode is good; however, be careful: unidecode is GPL. If this license doesn't fit then use this one

edited Nov 14 '14 at 9:21

BomberMan

93511031

answered Apr 22 '13 at 14:29

Mikhail Korobov

16.1k35451

add a comment |

Unidecode is good; however, be careful: unidecode is GPL. If this license doesn't fit then use this one

edited Nov 14 '14 at 9:21

BomberMan

93511031

answered Apr 22 '13 at 14:29

Mikhail Korobov

16.1k35451

Unidecode is good; however, be careful: unidecode is GPL. If this license doesn't fit then use this one

edited Nov 14 '14 at 9:21

BomberMan

93511031

answered Apr 22 '13 at 14:29

Mikhail Korobov

16.1k35451

edited Nov 14 '14 at 9:21

BomberMan

93511031

edited Nov 14 '14 at 9:21

BomberMan

93511031

edited Nov 14 '14 at 9:21

BomberMan

93511031

answered Apr 22 '13 at 14:29

Mikhail Korobov

16.1k35451

answered Apr 22 '13 at 14:29

Mikhail Korobov

16.1k35451

answered Apr 22 '13 at 14:29

Mikhail Korobov

16.1k35451

add a comment |

A couple of options on GitHub:

https://github.com/dimka665/awesome-slugify

https://github.com/un33k/python-slugify

https://github.com/mozilla/unicode-slugify

Each supports slightly different parameters for its API, so you'll need to look through to figure out what you prefer.

edited Mar 16 '16 at 13:35

answered Mar 15 '16 at 10:20

Jeff Widman

7,46934365

add a comment |

A couple of options on GitHub:

https://github.com/dimka665/awesome-slugify

https://github.com/un33k/python-slugify

https://github.com/mozilla/unicode-slugify

Each supports slightly different parameters for its API, so you'll need to look through to figure out what you prefer.

edited Mar 16 '16 at 13:35

answered Mar 15 '16 at 10:20

Jeff Widman

7,46934365

add a comment |

A couple of options on GitHub:

https://github.com/dimka665/awesome-slugify

https://github.com/un33k/python-slugify

https://github.com/mozilla/unicode-slugify

Each supports slightly different parameters for its API, so you'll need to look through to figure out what you prefer.

edited Mar 16 '16 at 13:35

answered Mar 15 '16 at 10:20

Jeff Widman

7,46934365

A couple of options on GitHub:

https://github.com/dimka665/awesome-slugify

https://github.com/un33k/python-slugify

https://github.com/mozilla/unicode-slugify

Each supports slightly different parameters for its API, so you'll need to look through to figure out what you prefer.

edited Mar 16 '16 at 13:35

answered Mar 15 '16 at 10:20

Jeff Widman

7,46934365

edited Mar 16 '16 at 13:35

answered Mar 15 '16 at 10:20

Jeff Widman

7,46934365

answered Mar 15 '16 at 10:20

Jeff Widman

7,46934365

answered Mar 15 '16 at 10:20

Jeff Widman

7,46934365

add a comment |

You might consider changing the last line to

slug=re.sub(r'--+',r'-',slug)

since the pattern [-]+ is no different than -+, and you don't really care about matching just one hyphen, only two or more.

But, of course, this is quite minor.

answered Apr 6 '11 at 23:36

unutbu

553k10111921242

add a comment |

You might consider changing the last line to

slug=re.sub(r'--+',r'-',slug)

since the pattern [-]+ is no different than -+, and you don't really care about matching just one hyphen, only two or more.

But, of course, this is quite minor.

answered Apr 6 '11 at 23:36

unutbu

553k10111921242

add a comment |

You might consider changing the last line to

slug=re.sub(r'--+',r'-',slug)

since the pattern [-]+ is no different than -+, and you don't really care about matching just one hyphen, only two or more.

But, of course, this is quite minor.

answered Apr 6 '11 at 23:36

unutbu

553k10111921242

You might consider changing the last line to

slug=re.sub(r'--+',r'-',slug)

since the pattern [-]+ is no different than -+, and you don't really care about matching just one hyphen, only two or more.

But, of course, this is quite minor.

answered Apr 6 '11 at 23:36

unutbu

553k10111921242

answered Apr 6 '11 at 23:36

unutbu

553k10111921242

answered Apr 6 '11 at 23:36

unutbu

553k10111921242

answered Apr 6 '11 at 23:36

unutbu

553k10111921242

add a comment |

Another option is boltons.strutils.slugify. Boltons has quite a few other useful functions as well, and is distributed under a BSD license.

edited Nov 17 '18 at 1:34

answered Nov 17 '18 at 1:27

ostrokach

5,48023547

add a comment |

Another option is boltons.strutils.slugify. Boltons has quite a few other useful functions as well, and is distributed under a BSD license.

edited Nov 17 '18 at 1:34

answered Nov 17 '18 at 1:27

ostrokach

5,48023547

add a comment |

Another option is boltons.strutils.slugify. Boltons has quite a few other useful functions as well, and is distributed under a BSD license.

edited Nov 17 '18 at 1:34

answered Nov 17 '18 at 1:27

ostrokach

5,48023547

Another option is boltons.strutils.slugify. Boltons has quite a few other useful functions as well, and is distributed under a BSD license.

edited Nov 17 '18 at 1:34

answered Nov 17 '18 at 1:27

ostrokach

5,48023547

edited Nov 17 '18 at 1:34

answered Nov 17 '18 at 1:27

ostrokach

5,48023547

answered Nov 17 '18 at 1:27

ostrokach

5,48023547

answered Nov 17 '18 at 1:27

ostrokach

5,48023547

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dfyjkt