When you want to have nice readable URLs it's usually a good idea to use the title or name of an object in the URL.
For example if you have a post entitled "Hello World!" the slug could be "hello-world". The term slug probably originally came from publishing but ever since Django used it on the core documentation many web developers know what it means.
Because ASCII is the common subset nearly everyone around the world can input with his keyboard it makes a lot of sense to limit the URL to that. The following function tries its best to generate an ASCII only and lowercase slug from a word:
import re
_punct_re = re.compile(r'[\t !"#$%&\'()*\-/<=>?@\[\\\]^_`{|},.]+')
def slugify(text, delim=u'-'):
"""Generates an ASCII-only slug."""
result = []
for word in _punct_re.split(text.lower()):
word = word.encode('translit/long')
if word:
result.append(word)
return unicode(delim.join(result))
As you might have spotted, the code above is using Jason Kirkland's translit codec: get translitcodec from PyPI.
Wha translitcodec does is converting umlauts like "ü" to "ue" instead of their improper "u" equivalent. If you don't need that you can also take advantage of the builtin unicodedata module:
import re
from unicodedata import normalize
_punct_re = re.compile(r'[\t !"#$%&\'()*\-/<=>?@\[\\\]^_`{|},.]+')
def slugify(text, delim=u'-'):
"""Generates an slightly worse ASCII-only slug."""
result = []
for word in _punct_re.split(text.lower()):
word = normalize('NFKD', word).encode('ascii', 'ignore')
if word:
result.append(word)
return unicode(delim.join(result))
Please be advised that neither of the above functions will work correctly on asian signs. In both cases the return value of that function will most likely be an empty string. Prepare to have a backup slug in that situation (like the post id or something). You also might want to fall back to a non-ASCII slug because modern browsers will display non-ASCII chars in the URL as long as the URL is UTF-8 encoded.
If you expect a lot of Asian characters or want to support them as well you can instead use the Unidecode package that handles them as well:
import re
from unidecode import unidecode
_punct_re = re.compile(r'[\t !"#$%&\'()*\-/<=>?@\[\\\]^_`{|},.]+')
def slugify(text, delim=u'-'):
"""Generates an ASCII-only slug."""
result = []
for word in _punct_re.split(text.lower()):
result.extend(unidecode(word).split())
return unicode(delim.join(result))
This snippet by Armin Ronacher can be used freely for anything you like. Consider it public domain.
Comments
how to use slugify to store the slug in database by A Ruzaqi on 2010-11-04 @ 10:33
Nice Tut.
could you pls show me some example of how to use it to slug a title and store the slugs in database.
I am just a beginner in python .
Thanks
Simple slugfy in Python by Álvaro Justen - Turicas on 2011-12-03 @ 23:32
What do you think about it: https://gist.github.com/1428479 ?
[]s
Comment by Christoph Schniedermeier on 2013-01-16 @ 14:01
Would there be any problem with using:
_punct_re = re.compile(r'[\t !"#$%&\'()*\-/<=>?@\[
\]^_`{|},.]+')
Comment by Christoph Schniedermeier on 2013-01-16 @ 14:02
Would there be any problem with using:
_punct_re = re.compile(r'\W+')