String slugification in Python










68















I am in search of the best way to "slugify" string what "slug" is, and my current solution is based on this recipe



I have changed it a little bit to:



s = 'String to slugify'

slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)


Anyone see any problems with this code? It is working fine, but maybe I am missing something or you know a better way?










share|improve this question
























  • are you working with unicode alot? if so, the last re.sub might be better if you wrap unicode() around it, This is what django does. Also, the [^a-z0-9]+ can be shortened to use w . see django.template.defaultfilters, it's close to yours, but a bit more refined.

    – Mike Ramirez
    Apr 7 '11 at 0:23











  • Are unicode characters allowed in URL? Also, I have changed w to a-z0-9 because w includes _ character and uppercase letters. Letters are set to lowercase in advance, so there will be no uppercase letters to match.

    – Zygimantas
    Apr 7 '11 at 1:21











  • '_' is valid (but your choice, you did ask), unicode is as percent encoded chars.

    – Mike Ramirez
    Apr 7 '11 at 1:36











  • Thank you Mike. Well, I asked a wrong question. Is there any reason to encode it back to unicode string, if we already replaced all characters except "a-z", "0-9" and "-" ?

    – Zygimantas
    Apr 7 '11 at 1:47











  • For django, I believe it's important to them to have it all strings as unicode objects for compatibility. It's your choice if you want this.

    – Mike Ramirez
    Apr 7 '11 at 1:51















68















I am in search of the best way to "slugify" string what "slug" is, and my current solution is based on this recipe



I have changed it a little bit to:



s = 'String to slugify'

slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)


Anyone see any problems with this code? It is working fine, but maybe I am missing something or you know a better way?










share|improve this question
























  • are you working with unicode alot? if so, the last re.sub might be better if you wrap unicode() around it, This is what django does. Also, the [^a-z0-9]+ can be shortened to use w . see django.template.defaultfilters, it's close to yours, but a bit more refined.

    – Mike Ramirez
    Apr 7 '11 at 0:23











  • Are unicode characters allowed in URL? Also, I have changed w to a-z0-9 because w includes _ character and uppercase letters. Letters are set to lowercase in advance, so there will be no uppercase letters to match.

    – Zygimantas
    Apr 7 '11 at 1:21











  • '_' is valid (but your choice, you did ask), unicode is as percent encoded chars.

    – Mike Ramirez
    Apr 7 '11 at 1:36











  • Thank you Mike. Well, I asked a wrong question. Is there any reason to encode it back to unicode string, if we already replaced all characters except "a-z", "0-9" and "-" ?

    – Zygimantas
    Apr 7 '11 at 1:47











  • For django, I believe it's important to them to have it all strings as unicode objects for compatibility. It's your choice if you want this.

    – Mike Ramirez
    Apr 7 '11 at 1:51













68












68








68


15






I am in search of the best way to "slugify" string what "slug" is, and my current solution is based on this recipe



I have changed it a little bit to:



s = 'String to slugify'

slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)


Anyone see any problems with this code? It is working fine, but maybe I am missing something or you know a better way?










share|improve this question
















I am in search of the best way to "slugify" string what "slug" is, and my current solution is based on this recipe



I have changed it a little bit to:



s = 'String to slugify'

slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)


Anyone see any problems with this code? It is working fine, but maybe I am missing something or you know a better way?







python slug






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 23 '17 at 10:31









Community

11




11










asked Apr 6 '11 at 23:08









ZygimantasZygimantas

2,01642746




2,01642746












  • are you working with unicode alot? if so, the last re.sub might be better if you wrap unicode() around it, This is what django does. Also, the [^a-z0-9]+ can be shortened to use w . see django.template.defaultfilters, it's close to yours, but a bit more refined.

    – Mike Ramirez
    Apr 7 '11 at 0:23











  • Are unicode characters allowed in URL? Also, I have changed w to a-z0-9 because w includes _ character and uppercase letters. Letters are set to lowercase in advance, so there will be no uppercase letters to match.

    – Zygimantas
    Apr 7 '11 at 1:21











  • '_' is valid (but your choice, you did ask), unicode is as percent encoded chars.

    – Mike Ramirez
    Apr 7 '11 at 1:36











  • Thank you Mike. Well, I asked a wrong question. Is there any reason to encode it back to unicode string, if we already replaced all characters except "a-z", "0-9" and "-" ?

    – Zygimantas
    Apr 7 '11 at 1:47











  • For django, I believe it's important to them to have it all strings as unicode objects for compatibility. It's your choice if you want this.

    – Mike Ramirez
    Apr 7 '11 at 1:51

















  • are you working with unicode alot? if so, the last re.sub might be better if you wrap unicode() around it, This is what django does. Also, the [^a-z0-9]+ can be shortened to use w . see django.template.defaultfilters, it's close to yours, but a bit more refined.

    – Mike Ramirez
    Apr 7 '11 at 0:23











  • Are unicode characters allowed in URL? Also, I have changed w to a-z0-9 because w includes _ character and uppercase letters. Letters are set to lowercase in advance, so there will be no uppercase letters to match.

    – Zygimantas
    Apr 7 '11 at 1:21











  • '_' is valid (but your choice, you did ask), unicode is as percent encoded chars.

    – Mike Ramirez
    Apr 7 '11 at 1:36











  • Thank you Mike. Well, I asked a wrong question. Is there any reason to encode it back to unicode string, if we already replaced all characters except "a-z", "0-9" and "-" ?

    – Zygimantas
    Apr 7 '11 at 1:47











  • For django, I believe it's important to them to have it all strings as unicode objects for compatibility. It's your choice if you want this.

    – Mike Ramirez
    Apr 7 '11 at 1:51
















are you working with unicode alot? if so, the last re.sub might be better if you wrap unicode() around it, This is what django does. Also, the [^a-z0-9]+ can be shortened to use w . see django.template.defaultfilters, it's close to yours, but a bit more refined.

– Mike Ramirez
Apr 7 '11 at 0:23





are you working with unicode alot? if so, the last re.sub might be better if you wrap unicode() around it, This is what django does. Also, the [^a-z0-9]+ can be shortened to use w . see django.template.defaultfilters, it's close to yours, but a bit more refined.

– Mike Ramirez
Apr 7 '11 at 0:23













Are unicode characters allowed in URL? Also, I have changed w to a-z0-9 because w includes _ character and uppercase letters. Letters are set to lowercase in advance, so there will be no uppercase letters to match.

– Zygimantas
Apr 7 '11 at 1:21





Are unicode characters allowed in URL? Also, I have changed w to a-z0-9 because w includes _ character and uppercase letters. Letters are set to lowercase in advance, so there will be no uppercase letters to match.

– Zygimantas
Apr 7 '11 at 1:21













'_' is valid (but your choice, you did ask), unicode is as percent encoded chars.

– Mike Ramirez
Apr 7 '11 at 1:36





'_' is valid (but your choice, you did ask), unicode is as percent encoded chars.

– Mike Ramirez
Apr 7 '11 at 1:36













Thank you Mike. Well, I asked a wrong question. Is there any reason to encode it back to unicode string, if we already replaced all characters except "a-z", "0-9" and "-" ?

– Zygimantas
Apr 7 '11 at 1:47





Thank you Mike. Well, I asked a wrong question. Is there any reason to encode it back to unicode string, if we already replaced all characters except "a-z", "0-9" and "-" ?

– Zygimantas
Apr 7 '11 at 1:47













For django, I believe it's important to them to have it all strings as unicode objects for compatibility. It's your choice if you want this.

– Mike Ramirez
Apr 7 '11 at 1:51





For django, I believe it's important to them to have it all strings as unicode objects for compatibility. It's your choice if you want this.

– Mike Ramirez
Apr 7 '11 at 1:51












10 Answers
10






active

oldest

votes


















112














There is a python package named python-slugify, which does a pretty good job of slugifying:



pip install python-slugify


Works like this:



from slugify import slugify

txt = "This is a test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = "This -- is a ## test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = 'C'est déjà l'été.'
r = slugify(txt)
self.assertEquals(r, "cest-deja-lete")

txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
r = slugify(txt)
self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")

txt = 'Компьютер'
r = slugify(txt)
self.assertEquals(r, "kompiuter")

txt = 'jaja---lol-méméméoo--a'
r = slugify(txt)
self.assertEquals(r, "jaja-lol-mememeoo-a")


See More examples



This package does a bit more than what you posted (take a look at the source, it's just one file). The project is still active (got updated 2 days before I originally answered, over four years later (last checked 2017-04-26), it still gets updated).



careful: There is a second package around, named slugify. If you have both of them, you might get a problem, as they have the same name for import. The one just named slugify didn't do all I quick-checked: "Ich heiße" became "ich-heie" (should be "ich-heisse"), so be sure to pick the right one, when using pip or easy_install.






share|improve this answer




















  • 5





    python-slugify is licensed under MIT, but it uses Unidecode which is licensed under GPL, so it might not fit for some projects.

    – Rotareti
    Aug 6 '17 at 21:40


















27














Install unidecode form from here for unicode support




pip install unidecode




# -*- coding: utf-8 -*-
import re
import unidecode

def slugify(text):
text = unidecode.unidecode(text).lower()
return re.sub(r'[W_]+', '-', text)

text = u"My custom хелло ворлд"
print slugify(text)



>>> my-custom-khello-vorld







share|improve this answer




















  • 1





    hi, its a bit strange but it give for my res like that "my-custom-ndud-d-d3-4-d2d3-4nd-d-"

    – derevo
    Jul 30 '12 at 7:04







  • 1





    @derevo that happend when you don't send unicode strings. Replace slugify("My custom хелло ворлд") with slugify(u"My custom хелло ворлд"), and it should work.

    – kratenko
    Dec 16 '12 at 12:10






  • 9





    I would suggest against using variable names like str. This hides the builtin str type.

    – crodjer
    Apr 19 '14 at 7:22






  • 2





    unidecode is GPL, which may not be suitable for some.

    – Jorge Leitão
    Apr 25 '15 at 6:59











  • What about the reslugifying or deslugifying.

    – Ryan Chou
    Jan 24 at 3:46


















8














There is python package named awesome-slugify:



pip install awesome-slugify


Works like this:



from slugify import slugify

slugify('one kožušček') # one-kozuscek


awesome-slugify github page






share|improve this answer


















  • 2





    Nice package! But be careful, it's licensed under GPL.

    – Rotareti
    Aug 6 '17 at 21:27


















6














The problem is with the ascii normalization line:



slug = unicodedata.normalize('NFKD', s)


It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:



Mørdag -> mrdag
Æther -> ther


A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:



import unidecode
slug = unidecode.unidecode(s)


You get better results for the above strings and for many Greek and Russian characters too:



Mørdag -> mordag
Æther -> aether





share|improve this answer






























    5














    It works well in Django, so I don't see why it wouldn't be a good general purpose slugify function.



    Are you having any problems with it?






    share|improve this answer























    • It's possible, that for some cases, it's a healthy dose of paranoia :-)

      – nemesisfixx
      Nov 9 '13 at 12:42











    • The code has moved to here.

      – raylu
      Jul 21 '16 at 4:43






    • 2





      For the lazies: from django.utils.text import slugify

      – Spartacus
      Dec 6 '17 at 23:29


















    5














    def slugify(value):
    """
    Converts to lowercase, removes non-word characters (alphanumerics and
    underscores) and converts spaces to hyphens. Also strips leading and
    trailing whitespace.
    """
    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
    value = re.sub('[^ws-]', '', value).strip().lower()
    return mark_safe(re.sub('[-s]+', '-', value))
    slugify = allow_lazy(slugify, six.text_type)


    This is the slugify function present in django.utils.text
    This should suffice your requirement.






    share|improve this answer






























      3














      Unidecode is good; however, be careful: unidecode is GPL. If this license doesn't fit then use this one






      share|improve this answer
































        2














        A couple of options on GitHub:



        1. https://github.com/dimka665/awesome-slugify


        2. https://github.com/un33k/python-slugify

        3. https://github.com/mozilla/unicode-slugify

        Each supports slightly different parameters for its API, so you'll need to look through to figure out what you prefer.



        In particular, pay attention to the different options they provide for dealing with non-ASCII characters. Pydanny wrote a very helpful blog post illustrating some of the unicode handling differences in these slugify'ing libraries: http://www.pydanny.com/awesome-slugify-human-readable-url-slugs-from-any-string.html This blog post is slightly outdated because Mozilla's unicode-slugify is no longer Django-specific.



        Also note that currently awesome-slugify is GPLv3, though there's an open issue where the author says they'd prefer to release as MIT/BSD, just not sure of the legality: https://github.com/dimka665/awesome-slugify/issues/24






        share|improve this answer
































          1














          You might consider changing the last line to



          slug=re.sub(r'--+',r'-',slug)


          since the pattern [-]+ is no different than -+, and you don't really care about matching just one hyphen, only two or more.



          But, of course, this is quite minor.






          share|improve this answer






























            0














            Another option is boltons.strutils.slugify. Boltons has quite a few other useful functions as well, and is distributed under a BSD license.






            share|improve this answer
























              Your Answer






              StackExchange.ifUsing("editor", function ()
              StackExchange.using("externalEditor", function ()
              StackExchange.using("snippets", function ()
              StackExchange.snippets.init();
              );
              );
              , "code-snippets");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "1"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f5574042%2fstring-slugification-in-python%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              10 Answers
              10






              active

              oldest

              votes








              10 Answers
              10






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              112














              There is a python package named python-slugify, which does a pretty good job of slugifying:



              pip install python-slugify


              Works like this:



              from slugify import slugify

              txt = "This is a test ---"
              r = slugify(txt)
              self.assertEquals(r, "this-is-a-test")

              txt = "This -- is a ## test ---"
              r = slugify(txt)
              self.assertEquals(r, "this-is-a-test")

              txt = 'C'est déjà l'été.'
              r = slugify(txt)
              self.assertEquals(r, "cest-deja-lete")

              txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
              r = slugify(txt)
              self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")

              txt = 'Компьютер'
              r = slugify(txt)
              self.assertEquals(r, "kompiuter")

              txt = 'jaja---lol-méméméoo--a'
              r = slugify(txt)
              self.assertEquals(r, "jaja-lol-mememeoo-a")


              See More examples



              This package does a bit more than what you posted (take a look at the source, it's just one file). The project is still active (got updated 2 days before I originally answered, over four years later (last checked 2017-04-26), it still gets updated).



              careful: There is a second package around, named slugify. If you have both of them, you might get a problem, as they have the same name for import. The one just named slugify didn't do all I quick-checked: "Ich heiße" became "ich-heie" (should be "ich-heisse"), so be sure to pick the right one, when using pip or easy_install.






              share|improve this answer




















              • 5





                python-slugify is licensed under MIT, but it uses Unidecode which is licensed under GPL, so it might not fit for some projects.

                – Rotareti
                Aug 6 '17 at 21:40















              112














              There is a python package named python-slugify, which does a pretty good job of slugifying:



              pip install python-slugify


              Works like this:



              from slugify import slugify

              txt = "This is a test ---"
              r = slugify(txt)
              self.assertEquals(r, "this-is-a-test")

              txt = "This -- is a ## test ---"
              r = slugify(txt)
              self.assertEquals(r, "this-is-a-test")

              txt = 'C'est déjà l'été.'
              r = slugify(txt)
              self.assertEquals(r, "cest-deja-lete")

              txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
              r = slugify(txt)
              self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")

              txt = 'Компьютер'
              r = slugify(txt)
              self.assertEquals(r, "kompiuter")

              txt = 'jaja---lol-méméméoo--a'
              r = slugify(txt)
              self.assertEquals(r, "jaja-lol-mememeoo-a")


              See More examples



              This package does a bit more than what you posted (take a look at the source, it's just one file). The project is still active (got updated 2 days before I originally answered, over four years later (last checked 2017-04-26), it still gets updated).



              careful: There is a second package around, named slugify. If you have both of them, you might get a problem, as they have the same name for import. The one just named slugify didn't do all I quick-checked: "Ich heiße" became "ich-heie" (should be "ich-heisse"), so be sure to pick the right one, when using pip or easy_install.






              share|improve this answer




















              • 5





                python-slugify is licensed under MIT, but it uses Unidecode which is licensed under GPL, so it might not fit for some projects.

                – Rotareti
                Aug 6 '17 at 21:40













              112












              112








              112







              There is a python package named python-slugify, which does a pretty good job of slugifying:



              pip install python-slugify


              Works like this:



              from slugify import slugify

              txt = "This is a test ---"
              r = slugify(txt)
              self.assertEquals(r, "this-is-a-test")

              txt = "This -- is a ## test ---"
              r = slugify(txt)
              self.assertEquals(r, "this-is-a-test")

              txt = 'C'est déjà l'été.'
              r = slugify(txt)
              self.assertEquals(r, "cest-deja-lete")

              txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
              r = slugify(txt)
              self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")

              txt = 'Компьютер'
              r = slugify(txt)
              self.assertEquals(r, "kompiuter")

              txt = 'jaja---lol-méméméoo--a'
              r = slugify(txt)
              self.assertEquals(r, "jaja-lol-mememeoo-a")


              See More examples



              This package does a bit more than what you posted (take a look at the source, it's just one file). The project is still active (got updated 2 days before I originally answered, over four years later (last checked 2017-04-26), it still gets updated).



              careful: There is a second package around, named slugify. If you have both of them, you might get a problem, as they have the same name for import. The one just named slugify didn't do all I quick-checked: "Ich heiße" became "ich-heie" (should be "ich-heisse"), so be sure to pick the right one, when using pip or easy_install.






              share|improve this answer















              There is a python package named python-slugify, which does a pretty good job of slugifying:



              pip install python-slugify


              Works like this:



              from slugify import slugify

              txt = "This is a test ---"
              r = slugify(txt)
              self.assertEquals(r, "this-is-a-test")

              txt = "This -- is a ## test ---"
              r = slugify(txt)
              self.assertEquals(r, "this-is-a-test")

              txt = 'C'est déjà l'été.'
              r = slugify(txt)
              self.assertEquals(r, "cest-deja-lete")

              txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
              r = slugify(txt)
              self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")

              txt = 'Компьютер'
              r = slugify(txt)
              self.assertEquals(r, "kompiuter")

              txt = 'jaja---lol-méméméoo--a'
              r = slugify(txt)
              self.assertEquals(r, "jaja-lol-mememeoo-a")


              See More examples



              This package does a bit more than what you posted (take a look at the source, it's just one file). The project is still active (got updated 2 days before I originally answered, over four years later (last checked 2017-04-26), it still gets updated).



              careful: There is a second package around, named slugify. If you have both of them, you might get a problem, as they have the same name for import. The one just named slugify didn't do all I quick-checked: "Ich heiße" became "ich-heie" (should be "ich-heisse"), so be sure to pick the right one, when using pip or easy_install.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited May 21 '18 at 16:13









              Nick T

              14.1k55799




              14.1k55799










              answered Feb 15 '13 at 2:12









              kratenkokratenko

              5,24342652




              5,24342652







              • 5





                python-slugify is licensed under MIT, but it uses Unidecode which is licensed under GPL, so it might not fit for some projects.

                – Rotareti
                Aug 6 '17 at 21:40












              • 5





                python-slugify is licensed under MIT, but it uses Unidecode which is licensed under GPL, so it might not fit for some projects.

                – Rotareti
                Aug 6 '17 at 21:40







              5




              5





              python-slugify is licensed under MIT, but it uses Unidecode which is licensed under GPL, so it might not fit for some projects.

              – Rotareti
              Aug 6 '17 at 21:40





              python-slugify is licensed under MIT, but it uses Unidecode which is licensed under GPL, so it might not fit for some projects.

              – Rotareti
              Aug 6 '17 at 21:40













              27














              Install unidecode form from here for unicode support




              pip install unidecode




              # -*- coding: utf-8 -*-
              import re
              import unidecode

              def slugify(text):
              text = unidecode.unidecode(text).lower()
              return re.sub(r'[W_]+', '-', text)

              text = u"My custom хелло ворлд"
              print slugify(text)



              >>> my-custom-khello-vorld







              share|improve this answer




















              • 1





                hi, its a bit strange but it give for my res like that "my-custom-ndud-d-d3-4-d2d3-4nd-d-"

                – derevo
                Jul 30 '12 at 7:04







              • 1





                @derevo that happend when you don't send unicode strings. Replace slugify("My custom хелло ворлд") with slugify(u"My custom хелло ворлд"), and it should work.

                – kratenko
                Dec 16 '12 at 12:10






              • 9





                I would suggest against using variable names like str. This hides the builtin str type.

                – crodjer
                Apr 19 '14 at 7:22






              • 2





                unidecode is GPL, which may not be suitable for some.

                – Jorge Leitão
                Apr 25 '15 at 6:59











              • What about the reslugifying or deslugifying.

                – Ryan Chou
                Jan 24 at 3:46















              27














              Install unidecode form from here for unicode support




              pip install unidecode




              # -*- coding: utf-8 -*-
              import re
              import unidecode

              def slugify(text):
              text = unidecode.unidecode(text).lower()
              return re.sub(r'[W_]+', '-', text)

              text = u"My custom хелло ворлд"
              print slugify(text)



              >>> my-custom-khello-vorld







              share|improve this answer




















              • 1





                hi, its a bit strange but it give for my res like that "my-custom-ndud-d-d3-4-d2d3-4nd-d-"

                – derevo
                Jul 30 '12 at 7:04







              • 1





                @derevo that happend when you don't send unicode strings. Replace slugify("My custom хелло ворлд") with slugify(u"My custom хелло ворлд"), and it should work.

                – kratenko
                Dec 16 '12 at 12:10






              • 9





                I would suggest against using variable names like str. This hides the builtin str type.

                – crodjer
                Apr 19 '14 at 7:22






              • 2





                unidecode is GPL, which may not be suitable for some.

                – Jorge Leitão
                Apr 25 '15 at 6:59











              • What about the reslugifying or deslugifying.

                – Ryan Chou
                Jan 24 at 3:46













              27












              27








              27







              Install unidecode form from here for unicode support




              pip install unidecode




              # -*- coding: utf-8 -*-
              import re
              import unidecode

              def slugify(text):
              text = unidecode.unidecode(text).lower()
              return re.sub(r'[W_]+', '-', text)

              text = u"My custom хелло ворлд"
              print slugify(text)



              >>> my-custom-khello-vorld







              share|improve this answer















              Install unidecode form from here for unicode support




              pip install unidecode




              # -*- coding: utf-8 -*-
              import re
              import unidecode

              def slugify(text):
              text = unidecode.unidecode(text).lower()
              return re.sub(r'[W_]+', '-', text)

              text = u"My custom хелло ворлд"
              print slugify(text)



              >>> my-custom-khello-vorld








              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 12 '18 at 10:02









              Arne

              2,50322239




              2,50322239










              answered Dec 3 '11 at 9:29









              user1078810user1078810

              303146




              303146







              • 1





                hi, its a bit strange but it give for my res like that "my-custom-ndud-d-d3-4-d2d3-4nd-d-"

                – derevo
                Jul 30 '12 at 7:04







              • 1





                @derevo that happend when you don't send unicode strings. Replace slugify("My custom хелло ворлд") with slugify(u"My custom хелло ворлд"), and it should work.

                – kratenko
                Dec 16 '12 at 12:10






              • 9





                I would suggest against using variable names like str. This hides the builtin str type.

                – crodjer
                Apr 19 '14 at 7:22






              • 2





                unidecode is GPL, which may not be suitable for some.

                – Jorge Leitão
                Apr 25 '15 at 6:59











              • What about the reslugifying or deslugifying.

                – Ryan Chou
                Jan 24 at 3:46












              • 1





                hi, its a bit strange but it give for my res like that "my-custom-ndud-d-d3-4-d2d3-4nd-d-"

                – derevo
                Jul 30 '12 at 7:04







              • 1





                @derevo that happend when you don't send unicode strings. Replace slugify("My custom хелло ворлд") with slugify(u"My custom хелло ворлд"), and it should work.

                – kratenko
                Dec 16 '12 at 12:10






              • 9





                I would suggest against using variable names like str. This hides the builtin str type.

                – crodjer
                Apr 19 '14 at 7:22






              • 2





                unidecode is GPL, which may not be suitable for some.

                – Jorge Leitão
                Apr 25 '15 at 6:59











              • What about the reslugifying or deslugifying.

                – Ryan Chou
                Jan 24 at 3:46







              1




              1





              hi, its a bit strange but it give for my res like that "my-custom-ndud-d-d3-4-d2d3-4nd-d-"

              – derevo
              Jul 30 '12 at 7:04






              hi, its a bit strange but it give for my res like that "my-custom-ndud-d-d3-4-d2d3-4nd-d-"

              – derevo
              Jul 30 '12 at 7:04





              1




              1





              @derevo that happend when you don't send unicode strings. Replace slugify("My custom хелло ворлд") with slugify(u"My custom хелло ворлд"), and it should work.

              – kratenko
              Dec 16 '12 at 12:10





              @derevo that happend when you don't send unicode strings. Replace slugify("My custom хелло ворлд") with slugify(u"My custom хелло ворлд"), and it should work.

              – kratenko
              Dec 16 '12 at 12:10




              9




              9





              I would suggest against using variable names like str. This hides the builtin str type.

              – crodjer
              Apr 19 '14 at 7:22





              I would suggest against using variable names like str. This hides the builtin str type.

              – crodjer
              Apr 19 '14 at 7:22




              2




              2





              unidecode is GPL, which may not be suitable for some.

              – Jorge Leitão
              Apr 25 '15 at 6:59





              unidecode is GPL, which may not be suitable for some.

              – Jorge Leitão
              Apr 25 '15 at 6:59













              What about the reslugifying or deslugifying.

              – Ryan Chou
              Jan 24 at 3:46





              What about the reslugifying or deslugifying.

              – Ryan Chou
              Jan 24 at 3:46











              8














              There is python package named awesome-slugify:



              pip install awesome-slugify


              Works like this:



              from slugify import slugify

              slugify('one kožušček') # one-kozuscek


              awesome-slugify github page






              share|improve this answer


















              • 2





                Nice package! But be careful, it's licensed under GPL.

                – Rotareti
                Aug 6 '17 at 21:27















              8














              There is python package named awesome-slugify:



              pip install awesome-slugify


              Works like this:



              from slugify import slugify

              slugify('one kožušček') # one-kozuscek


              awesome-slugify github page






              share|improve this answer


















              • 2





                Nice package! But be careful, it's licensed under GPL.

                – Rotareti
                Aug 6 '17 at 21:27













              8












              8








              8







              There is python package named awesome-slugify:



              pip install awesome-slugify


              Works like this:



              from slugify import slugify

              slugify('one kožušček') # one-kozuscek


              awesome-slugify github page






              share|improve this answer













              There is python package named awesome-slugify:



              pip install awesome-slugify


              Works like this:



              from slugify import slugify

              slugify('one kožušček') # one-kozuscek


              awesome-slugify github page







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Mar 2 '14 at 21:01









              voroninvoronin

              40644




              40644







              • 2





                Nice package! But be careful, it's licensed under GPL.

                – Rotareti
                Aug 6 '17 at 21:27












              • 2





                Nice package! But be careful, it's licensed under GPL.

                – Rotareti
                Aug 6 '17 at 21:27







              2




              2





              Nice package! But be careful, it's licensed under GPL.

              – Rotareti
              Aug 6 '17 at 21:27





              Nice package! But be careful, it's licensed under GPL.

              – Rotareti
              Aug 6 '17 at 21:27











              6














              The problem is with the ascii normalization line:



              slug = unicodedata.normalize('NFKD', s)


              It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:



              Mørdag -> mrdag
              Æther -> ther


              A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:



              import unidecode
              slug = unidecode.unidecode(s)


              You get better results for the above strings and for many Greek and Russian characters too:



              Mørdag -> mordag
              Æther -> aether





              share|improve this answer



























                6














                The problem is with the ascii normalization line:



                slug = unicodedata.normalize('NFKD', s)


                It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:



                Mørdag -> mrdag
                Æther -> ther


                A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:



                import unidecode
                slug = unidecode.unidecode(s)


                You get better results for the above strings and for many Greek and Russian characters too:



                Mørdag -> mordag
                Æther -> aether





                share|improve this answer

























                  6












                  6








                  6







                  The problem is with the ascii normalization line:



                  slug = unicodedata.normalize('NFKD', s)


                  It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:



                  Mørdag -> mrdag
                  Æther -> ther


                  A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:



                  import unidecode
                  slug = unidecode.unidecode(s)


                  You get better results for the above strings and for many Greek and Russian characters too:



                  Mørdag -> mordag
                  Æther -> aether





                  share|improve this answer













                  The problem is with the ascii normalization line:



                  slug = unicodedata.normalize('NFKD', s)


                  It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:



                  Mørdag -> mrdag
                  Æther -> ther


                  A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:



                  import unidecode
                  slug = unidecode.unidecode(s)


                  You get better results for the above strings and for many Greek and Russian characters too:



                  Mørdag -> mordag
                  Æther -> aether






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Sep 7 '11 at 13:16









                  Björn LindqvistBjörn Lindqvist

                  10.3k95487




                  10.3k95487





















                      5














                      It works well in Django, so I don't see why it wouldn't be a good general purpose slugify function.



                      Are you having any problems with it?






                      share|improve this answer























                      • It's possible, that for some cases, it's a healthy dose of paranoia :-)

                        – nemesisfixx
                        Nov 9 '13 at 12:42











                      • The code has moved to here.

                        – raylu
                        Jul 21 '16 at 4:43






                      • 2





                        For the lazies: from django.utils.text import slugify

                        – Spartacus
                        Dec 6 '17 at 23:29















                      5














                      It works well in Django, so I don't see why it wouldn't be a good general purpose slugify function.



                      Are you having any problems with it?






                      share|improve this answer























                      • It's possible, that for some cases, it's a healthy dose of paranoia :-)

                        – nemesisfixx
                        Nov 9 '13 at 12:42











                      • The code has moved to here.

                        – raylu
                        Jul 21 '16 at 4:43






                      • 2





                        For the lazies: from django.utils.text import slugify

                        – Spartacus
                        Dec 6 '17 at 23:29













                      5












                      5








                      5







                      It works well in Django, so I don't see why it wouldn't be a good general purpose slugify function.



                      Are you having any problems with it?






                      share|improve this answer













                      It works well in Django, so I don't see why it wouldn't be a good general purpose slugify function.



                      Are you having any problems with it?







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Apr 6 '11 at 23:22









                      Nick PrestaNick Presta

                      23.9k54672




                      23.9k54672












                      • It's possible, that for some cases, it's a healthy dose of paranoia :-)

                        – nemesisfixx
                        Nov 9 '13 at 12:42











                      • The code has moved to here.

                        – raylu
                        Jul 21 '16 at 4:43






                      • 2





                        For the lazies: from django.utils.text import slugify

                        – Spartacus
                        Dec 6 '17 at 23:29

















                      • It's possible, that for some cases, it's a healthy dose of paranoia :-)

                        – nemesisfixx
                        Nov 9 '13 at 12:42











                      • The code has moved to here.

                        – raylu
                        Jul 21 '16 at 4:43






                      • 2





                        For the lazies: from django.utils.text import slugify

                        – Spartacus
                        Dec 6 '17 at 23:29
















                      It's possible, that for some cases, it's a healthy dose of paranoia :-)

                      – nemesisfixx
                      Nov 9 '13 at 12:42





                      It's possible, that for some cases, it's a healthy dose of paranoia :-)

                      – nemesisfixx
                      Nov 9 '13 at 12:42













                      The code has moved to here.

                      – raylu
                      Jul 21 '16 at 4:43





                      The code has moved to here.

                      – raylu
                      Jul 21 '16 at 4:43




                      2




                      2





                      For the lazies: from django.utils.text import slugify

                      – Spartacus
                      Dec 6 '17 at 23:29





                      For the lazies: from django.utils.text import slugify

                      – Spartacus
                      Dec 6 '17 at 23:29











                      5














                      def slugify(value):
                      """
                      Converts to lowercase, removes non-word characters (alphanumerics and
                      underscores) and converts spaces to hyphens. Also strips leading and
                      trailing whitespace.
                      """
                      value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
                      value = re.sub('[^ws-]', '', value).strip().lower()
                      return mark_safe(re.sub('[-s]+', '-', value))
                      slugify = allow_lazy(slugify, six.text_type)


                      This is the slugify function present in django.utils.text
                      This should suffice your requirement.






                      share|improve this answer



























                        5














                        def slugify(value):
                        """
                        Converts to lowercase, removes non-word characters (alphanumerics and
                        underscores) and converts spaces to hyphens. Also strips leading and
                        trailing whitespace.
                        """
                        value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
                        value = re.sub('[^ws-]', '', value).strip().lower()
                        return mark_safe(re.sub('[-s]+', '-', value))
                        slugify = allow_lazy(slugify, six.text_type)


                        This is the slugify function present in django.utils.text
                        This should suffice your requirement.






                        share|improve this answer

























                          5












                          5








                          5







                          def slugify(value):
                          """
                          Converts to lowercase, removes non-word characters (alphanumerics and
                          underscores) and converts spaces to hyphens. Also strips leading and
                          trailing whitespace.
                          """
                          value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
                          value = re.sub('[^ws-]', '', value).strip().lower()
                          return mark_safe(re.sub('[-s]+', '-', value))
                          slugify = allow_lazy(slugify, six.text_type)


                          This is the slugify function present in django.utils.text
                          This should suffice your requirement.






                          share|improve this answer













                          def slugify(value):
                          """
                          Converts to lowercase, removes non-word characters (alphanumerics and
                          underscores) and converts spaces to hyphens. Also strips leading and
                          trailing whitespace.
                          """
                          value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
                          value = re.sub('[^ws-]', '', value).strip().lower()
                          return mark_safe(re.sub('[-s]+', '-', value))
                          slugify = allow_lazy(slugify, six.text_type)


                          This is the slugify function present in django.utils.text
                          This should suffice your requirement.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Dec 3 '14 at 5:35









                          Animesh SharmaAnimesh Sharma

                          2,0551019




                          2,0551019





















                              3














                              Unidecode is good; however, be careful: unidecode is GPL. If this license doesn't fit then use this one






                              share|improve this answer





























                                3














                                Unidecode is good; however, be careful: unidecode is GPL. If this license doesn't fit then use this one






                                share|improve this answer



























                                  3












                                  3








                                  3







                                  Unidecode is good; however, be careful: unidecode is GPL. If this license doesn't fit then use this one






                                  share|improve this answer















                                  Unidecode is good; however, be careful: unidecode is GPL. If this license doesn't fit then use this one







                                  share|improve this answer














                                  share|improve this answer



                                  share|improve this answer








                                  edited Nov 14 '14 at 9:21









                                  BomberMan

                                  93511031




                                  93511031










                                  answered Apr 22 '13 at 14:29









                                  Mikhail KorobovMikhail Korobov

                                  16.1k35451




                                  16.1k35451





















                                      2














                                      A couple of options on GitHub:



                                      1. https://github.com/dimka665/awesome-slugify


                                      2. https://github.com/un33k/python-slugify

                                      3. https://github.com/mozilla/unicode-slugify

                                      Each supports slightly different parameters for its API, so you'll need to look through to figure out what you prefer.



                                      In particular, pay attention to the different options they provide for dealing with non-ASCII characters. Pydanny wrote a very helpful blog post illustrating some of the unicode handling differences in these slugify'ing libraries: http://www.pydanny.com/awesome-slugify-human-readable-url-slugs-from-any-string.html This blog post is slightly outdated because Mozilla's unicode-slugify is no longer Django-specific.



                                      Also note that currently awesome-slugify is GPLv3, though there's an open issue where the author says they'd prefer to release as MIT/BSD, just not sure of the legality: https://github.com/dimka665/awesome-slugify/issues/24






                                      share|improve this answer





























                                        2














                                        A couple of options on GitHub:



                                        1. https://github.com/dimka665/awesome-slugify


                                        2. https://github.com/un33k/python-slugify

                                        3. https://github.com/mozilla/unicode-slugify

                                        Each supports slightly different parameters for its API, so you'll need to look through to figure out what you prefer.



                                        In particular, pay attention to the different options they provide for dealing with non-ASCII characters. Pydanny wrote a very helpful blog post illustrating some of the unicode handling differences in these slugify'ing libraries: http://www.pydanny.com/awesome-slugify-human-readable-url-slugs-from-any-string.html This blog post is slightly outdated because Mozilla's unicode-slugify is no longer Django-specific.



                                        Also note that currently awesome-slugify is GPLv3, though there's an open issue where the author says they'd prefer to release as MIT/BSD, just not sure of the legality: https://github.com/dimka665/awesome-slugify/issues/24






                                        share|improve this answer



























                                          2












                                          2








                                          2







                                          A couple of options on GitHub:



                                          1. https://github.com/dimka665/awesome-slugify


                                          2. https://github.com/un33k/python-slugify

                                          3. https://github.com/mozilla/unicode-slugify

                                          Each supports slightly different parameters for its API, so you'll need to look through to figure out what you prefer.



                                          In particular, pay attention to the different options they provide for dealing with non-ASCII characters. Pydanny wrote a very helpful blog post illustrating some of the unicode handling differences in these slugify'ing libraries: http://www.pydanny.com/awesome-slugify-human-readable-url-slugs-from-any-string.html This blog post is slightly outdated because Mozilla's unicode-slugify is no longer Django-specific.



                                          Also note that currently awesome-slugify is GPLv3, though there's an open issue where the author says they'd prefer to release as MIT/BSD, just not sure of the legality: https://github.com/dimka665/awesome-slugify/issues/24






                                          share|improve this answer















                                          A couple of options on GitHub:



                                          1. https://github.com/dimka665/awesome-slugify


                                          2. https://github.com/un33k/python-slugify

                                          3. https://github.com/mozilla/unicode-slugify

                                          Each supports slightly different parameters for its API, so you'll need to look through to figure out what you prefer.



                                          In particular, pay attention to the different options they provide for dealing with non-ASCII characters. Pydanny wrote a very helpful blog post illustrating some of the unicode handling differences in these slugify'ing libraries: http://www.pydanny.com/awesome-slugify-human-readable-url-slugs-from-any-string.html This blog post is slightly outdated because Mozilla's unicode-slugify is no longer Django-specific.



                                          Also note that currently awesome-slugify is GPLv3, though there's an open issue where the author says they'd prefer to release as MIT/BSD, just not sure of the legality: https://github.com/dimka665/awesome-slugify/issues/24







                                          share|improve this answer














                                          share|improve this answer



                                          share|improve this answer








                                          edited Mar 16 '16 at 13:35

























                                          answered Mar 15 '16 at 10:20









                                          Jeff WidmanJeff Widman

                                          7,46934365




                                          7,46934365





















                                              1














                                              You might consider changing the last line to



                                              slug=re.sub(r'--+',r'-',slug)


                                              since the pattern [-]+ is no different than -+, and you don't really care about matching just one hyphen, only two or more.



                                              But, of course, this is quite minor.






                                              share|improve this answer



























                                                1














                                                You might consider changing the last line to



                                                slug=re.sub(r'--+',r'-',slug)


                                                since the pattern [-]+ is no different than -+, and you don't really care about matching just one hyphen, only two or more.



                                                But, of course, this is quite minor.






                                                share|improve this answer

























                                                  1












                                                  1








                                                  1







                                                  You might consider changing the last line to



                                                  slug=re.sub(r'--+',r'-',slug)


                                                  since the pattern [-]+ is no different than -+, and you don't really care about matching just one hyphen, only two or more.



                                                  But, of course, this is quite minor.






                                                  share|improve this answer













                                                  You might consider changing the last line to



                                                  slug=re.sub(r'--+',r'-',slug)


                                                  since the pattern [-]+ is no different than -+, and you don't really care about matching just one hyphen, only two or more.



                                                  But, of course, this is quite minor.







                                                  share|improve this answer












                                                  share|improve this answer



                                                  share|improve this answer










                                                  answered Apr 6 '11 at 23:36









                                                  unutbuunutbu

                                                  553k10111921242




                                                  553k10111921242





















                                                      0














                                                      Another option is boltons.strutils.slugify. Boltons has quite a few other useful functions as well, and is distributed under a BSD license.






                                                      share|improve this answer





























                                                        0














                                                        Another option is boltons.strutils.slugify. Boltons has quite a few other useful functions as well, and is distributed under a BSD license.






                                                        share|improve this answer



























                                                          0












                                                          0








                                                          0







                                                          Another option is boltons.strutils.slugify. Boltons has quite a few other useful functions as well, and is distributed under a BSD license.






                                                          share|improve this answer















                                                          Another option is boltons.strutils.slugify. Boltons has quite a few other useful functions as well, and is distributed under a BSD license.







                                                          share|improve this answer














                                                          share|improve this answer



                                                          share|improve this answer








                                                          edited Nov 17 '18 at 1:34

























                                                          answered Nov 17 '18 at 1:27









                                                          ostrokachostrokach

                                                          5,48023547




                                                          5,48023547



























                                                              draft saved

                                                              draft discarded
















































                                                              Thanks for contributing an answer to Stack Overflow!


                                                              • Please be sure to answer the question. Provide details and share your research!

                                                              But avoid


                                                              • Asking for help, clarification, or responding to other answers.

                                                              • Making statements based on opinion; back them up with references or personal experience.

                                                              To learn more, see our tips on writing great answers.




                                                              draft saved


                                                              draft discarded














                                                              StackExchange.ready(
                                                              function ()
                                                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f5574042%2fstring-slugification-in-python%23new-answer', 'question_page');

                                                              );

                                                              Post as a guest















                                                              Required, but never shown





















































                                                              Required, but never shown














                                                              Required, but never shown












                                                              Required, but never shown







                                                              Required, but never shown

































                                                              Required, but never shown














                                                              Required, but never shown












                                                              Required, but never shown







                                                              Required, but never shown







                                                              Popular posts from this blog

                                                              𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

                                                              How do I collapse sections of code in Visual Studio Code for Windows?

                                                              ャフサォクコ ケウ,コ,ワ メ,ロスョノ゙,クネ,フムカヤヲニ,エコ゚ツ ウイオン゙ケワサネォキモュキォウイノンコチ゚メヌナイゥフュ,カヒウネェ ネ,ホノケ,ムュキ ッボーミュハ,チ ツス ィ メウイマヤ,゙ウチ ヅ ロ,ォジヌェ ャヌット ェ,マャ,チナエヒネソキツテ トホヲヲミーァ