Skip to content

Commit 191d727

Browse files
authored
Merge pull request #797 from jazzband/unicode-slugify
Preserve unicode when slugifying by default
2 parents 5dfc48d + ddb5ce6 commit 191d727

File tree

6 files changed

+98
-4
lines changed

6 files changed

+98
-4
lines changed

CHANGELOG.rst

+20-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,26 @@ Changelog
44
(Unreleased)
55
~~~~~~~~~~~~
66

7-
* Drop Django 2.2 support.
7+
* **Backwards incompatible:** Tag slugification used to silently strip non-ASCII characters
8+
from the tag name to make the slug. This leads to a lot of confusion for anyone using
9+
languages with non-latin alphabets, as well as weird performance issues.
10+
11+
Tag slugification will now, by default, maintain unicode characters as-is during
12+
slugification. This will lead to less surprises, but might cause issues for you if you are
13+
expecting all of your tag slugs to fit within a regex like ``[a-zA-Z0-9]`` (for example in
14+
URL routing configurations).
15+
16+
Generally speaking, this should not require action on your part as a library user, as
17+
existing tag slugs are persisted in the database, and only new tags will receive the
18+
enhanced unicode-compatible slug.
19+
20+
If you wish to maintain the old stripping behavior, set the setting
21+
``TAGGIT_STRIP_UNICODE_WHEN_SLUGIFYING`` to ``True``.
22+
23+
As a reminder, custom tag models can easily customize slugification behavior by overriding
24+
the ``slugify`` method to your business needs.
25+
26+
`` Drop Django 2.2 support.
827

928
2.1.0 (2022-01-24)
1029
~~~~~~~~~~~~~~~~~~

docs/faq.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@ Frequently Asked Questions
1313

1414

1515
One way to handle this is with post-generation hooks::
16-
class ProductFactory(DjangoModelFactory):
16+
17+
class ProductFactory(DjangoModelFactory):
1718
# Rest of the stuff
1819

1920
@post_generation

docs/getting_started.rst

+20
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,23 @@ And then to any model you want tagging on do the following::
2727
If you want ``django-taggit`` to be **CASE-INSENSITIVE** when looking up existing tags, you'll have to set ``TAGGIT_CASE_INSENSITIVE`` (in ``settings.py`` or wherever you have your Django settings) to ``True`` (``False`` by default)::
2828

2929
TAGGIT_CASE_INSENSITIVE = True
30+
31+
32+
Settings
33+
--------
34+
35+
The following Django-level settings affect the behavior of the library
36+
37+
* ``TAGGIT_CASE_INSENSITIVE``
38+
39+
When set to ``True``, tag lookups will be case insensitive. This defaults to ``False``.
40+
41+
`` ``TAGGIT_STRIP_UNICODE_WHEN_SLUGIFYING``
42+
When this is set to ``True``, tag slugs will be limited to ASCII characters. In this case, if you also have ```unidecode`` installed,
43+
then tag sluggification will transform a tag like ``あい うえお`` to ``ai-ueo``.
44+
If you do not have ``unidecode`` installed, then you will usually be outright stripping unicode, meaning that something like ``helloあい`` will be slugified as ``hello``.
45+
46+
This value defaults to ``False``, meaning that unicode is preserved in slugification.
47+
48+
Because the behavior when ``True`` is set leads to situations where
49+
slugs can be entirely stripped to an empty string, we recommend not activating this.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Generated by Django 2.2.26 on 2022-04-24 20:25
2+
3+
from django.db import migrations, models
4+
5+
6+
class Migration(migrations.Migration):
7+
8+
dependencies = [
9+
("taggit", "0004_alter_taggeditem_content_type_alter_taggeditem_tag"),
10+
]
11+
12+
operations = [
13+
migrations.AlterField(
14+
model_name="tag",
15+
name="slug",
16+
field=models.SlugField(
17+
allow_unicode=True, max_length=100, unique=True, verbose_name="slug"
18+
),
19+
),
20+
]

taggit/models.py

+9-2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
from django.conf import settings
12
from django.contrib.contenttypes.fields import GenericForeignKey
23
from django.contrib.contenttypes.models import ContentType
34
from django.db import IntegrityError, models, router, transaction
@@ -19,7 +20,10 @@ class TagBase(models.Model):
1920
verbose_name=pgettext_lazy("A tag name", "name"), unique=True, max_length=100
2021
)
2122
slug = models.SlugField(
22-
verbose_name=pgettext_lazy("A tag slug", "slug"), unique=True, max_length=100
23+
verbose_name=pgettext_lazy("A tag slug", "slug"),
24+
unique=True,
25+
max_length=100,
26+
allow_unicode=True,
2327
)
2428

2529
def __str__(self):
@@ -71,7 +75,10 @@ def save(self, *args, **kwargs):
7175
return super().save(*args, **kwargs)
7276

7377
def slugify(self, tag, i=None):
74-
slug = slugify(unidecode(tag))
78+
if getattr(settings, "TAGGIT_STRIP_UNICODE_WHEN_SLUGIFYING", False):
79+
slug = slugify(unidecode(tag))
80+
else:
81+
slug = slugify(tag, allow_unicode=True)
7582
if i is not None:
7683
slug += "_%d" % i
7784
return slug

tests/test_models.py

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
from django.test import TestCase, override_settings
2+
3+
from tests.models import TestModel
4+
5+
6+
class TestSlugification(TestCase):
7+
def test_unicode_slugs(self):
8+
"""
9+
Confirm the preservation of unicode in slugification by default
10+
"""
11+
sample_obj = TestModel.objects.create()
12+
# a unicode tag will be slugified for space reasons but
13+
# unicode-ness will be kept by default
14+
sample_obj.tags.add("あい うえお")
15+
self.assertEqual([tag.slug for tag in sample_obj.tags.all()], ["あい-うえお"])
16+
17+
def test_old_slugs(self):
18+
"""
19+
Test that the setting that gives us the old slugification behavior
20+
is in place
21+
"""
22+
with override_settings(TAGGIT_STRIP_UNICODE_WHEN_SLUGIFYING=True):
23+
sample_obj = TestModel.objects.create()
24+
# a unicode tag will be slugified for space reasons but
25+
# unicode-ness will be kept by default
26+
sample_obj.tags.add("あい うえお")
27+
self.assertEqual([tag.slug for tag in sample_obj.tags.all()], [""])

0 commit comments

Comments
 (0)