Skip to content

PyGraz Website, MkDocs and regular expressions with named groups

Tuesday, June 3rd 2025
Realraum, Brockmanngasse 15, 8020 Graz

Größere Karte anzeigen

Behind the curtain of the new PyGraz website

Thomas Aglassinger & Dorian Santner
https://github.com/pygraz/ghp-website

Until recently, the PyGraz website at https://pygraz.org was still using Django 1.0. Attempts to upgrade to a more recent version deemed to be rather futile because Django 1 and 2 radically differ in many ways. Unlike upgrading from Django 2 to 3 to 4 to 5, which typically has only comparable minor hick-ups.

After some pondering, we realized a static site generator is actually enough to hold out meeting notes and announce the next meeting.

The new site is based on MkDocs, a static site generator implemented in Python. For the design, it uses the MkDocs material theme with some minor adjustments of the coloring scheme.

The website is hosted on GitHub pages. To find your way through the project, read the contribution guide. Changes to the main branch are automatically deployed using the GitHub CI workflow in deploy.yaml.

The interactive maps use OpenStreetMap. To create such inline maps, tap the "Share" button, enable "Include marker", and select "HTML". In the text box below, copy and paste the resulting <iframe> into any Markdown or HTML document.

Potentially helpful MkDocs plugins

Christoph Reiter

  • mkdocs-print-site-plugin: Adds a print page to your site that combines the entire site, allowing for easy export to PDF and standalone HTML. See demo.
  • mkdocs-redirects: Create page redirects (e.g., for moved/renamed pages).
  • lychee: Check external links. Example for integration into a project: linkcheck.sh (scripts to check), lychee.toml (configuration for lychee)

Regular expressions with named groups in Python and (not) PostgreSQL

Thomas Aglassinger

Python

To find your bearing with regular expressions or debug non-working ones, consider using sites like regular expressions 101.

Example regular expression to extract an issue number like #123 from a text:

>>> import re
>>> re.match(r"^(.*)(?P<task_code>#\d+)(.*)$", "#123 Do something")
<re.Match object; span=(0, 17), match='#123 Do something'>

Here, the #\d+ means "a hash (#) followed by at least one decimal digit."

The (?P<task_code>#\d+) means that part of the text matching a hash followed by digits should be made available as the named group "task_code".

>>> re.match(r"^(.*)(?P<task_code>#\d+)(.*)$", "#123 Do something").group("task_code")
'#123'
>>> re.match(r"^(.*)(?P<task_code>#\d+)(.*)$", "Do something #123").group("task_code")
'#123'

To make things more readable and skip the internal regex compiling step (for example, inside loops), compile a regex, store it in a variable with a meaningful name, and reuse the compiled variant:

>>> task_code_regex = re.compile(r"^(.*)(?P<task_code>#\d+)(.*)$")
>>> task_code_regex.match("Do something #123").group("task_code")
'#123'

PostgreSQL

PostgreSQL supports regular expression with special operators and functions. As of 2008, regular expressions are also supported by standard SQL, although the notation slightly differs.

With text ~ pattern you can check if text matches pattern. This is somewhat comparable with the like operator.

select '#123 Do something' ~ '(.*)(#\d+)(.*)';
-- Result: true

To extract the value from text matching pattern in the group at index group_index (with the index 1 referring to the fist group), use:

select (regexp_match(text, pattern))[group_index];

For example:

select (
    regexp_match(
        '#123 Do something',  -- text
        '(.*)(#\d+)(.*)'      -- pattern
    )
)[2];                         -- group_index
-- Result: '#123'

The regexp_match function can only be used within the select fields. If you try to use within the where clause, PostgreSQL refuses to do so. You should not do that because similar to like, regular expressions cannot be indexed and queries involving them perform slowly and drain the database.

Named groups with PostgreSQL

As of version 17, PostgreSQL does not support named groups.

As a workaround, you can use Python to get the index of a named group. For example:

>>> task_code_regex.groupindex
mappingproxy({'task_code': 2})
>>> task_code_regex.groupindex["task_code"]
2

After that, convert the group in the regex to unnamed groups. For example:

>>> r"^(.*)(?P<task_code>#\d+)(.*)$".replace("(?P<task_code>", "(")
'^(.*)(#\\d+)(.*)$'

Finally, store this information in a PostgreSQL table.