NOTE

This project is discontinued. The website is generated using πŸ’Ž Quartz instead.

This website is written in a combination of Markdown and source code files. For them to be displayed on this website, they must first be transformed to the HTML format. Markdown is a very user-friendly format for writing and basic formatting, which is much more efficient than writing straight in HTML and the source code obviously can’t be written straight in HTML files.

The source code for this product can be found here.

Features

  • Generate HTML from files
    • Generate HTML from Markdown
  • Generating a sitemap

Generate HTML from files

(πŸ”€ Generate HTML Changelog)

Very high-level, the process works like this:

  • Delete all existing folders (to make sure there are no accidental remains of old files).
  • For each file that needs to be transformed:
    • Create the destination folder.
    • Generate the HTML and apply the configured Template file, how the files are transformed into HTML depends on the type of file. This is described in the following sections.
  • Write the file to its final destination.

Some notes on the HTML template file:

  • <meta charset="UTF-8"> is required, because by default the data is interpreted as iso-8859-1, which incorrectly renders characters such as '.
  • The viewport meta-tag is required to get correct scaling, otherwise, Google Search Console will not consider the page mobile-friendly
  • When using a collapsable navbar for responsive design, make sure to add the Bootstrap JS and use data-bs-toggle and data-bs-target (with bs) when using Bootstrap 5

The files that need to be processed are defined in resource/files-to-process.yaml to avoid having to continuously update the source code every time a new file needs to be processed. (Source code that gets the list of files to process)

A couple of notes on this code:

  • The format is a list of items containing a URL and a type (E.g., text)
  • The list must be ordered to make sure that any changes in the sitemap are the result of actual changes instead of just re-ordering, making it easier for me what is changing.
  • files-to-process.yaml supports filename pattern matching (Glob), for example, products/generate-website/code/**/*.py to match all Python files.

Generate HTML from Markdown

(Source Code)

The HTML is generated from Markdown files using 🧰 Pandoc:

subprocess.run(
    ["pandoc", "--standalone", "--template", TEMPLATE, <source-path>, "-o", <destination_path>],
)

In my workflow, to-do items are added to the Markdown files using HTML comments. Pandoc supports YAML metadata blocks throughout the file that are stripped out automatically in the resulting HTML, but unfortunately, the Markdown previewer in VSCode does not support this. Besides that, the comments also don’t stand out as much within the file, making them harder to notice.

Since I still want to add todo items within my Markdown files without them being part of the resulting HTML file, I decided to strip them programmatically before passing them to the Pandoc process:

def compile_markdown(source_path, destination_path):
    os.makedirs(Path(destination_path).parent, exist_ok=True)
 
    with open(source_path, 'r') as f:
        input = f.read()
    
    stripped_input = re.sub(r'^<!--TODO.*?-->', '', input, flags=re.DOTALL | re.MULTILINE)
 
    subprocess.run(
        ["pandoc", "--template", TEMPLATE, "-o", destination_path],
        input=stripped_input,
        text=True
    )

The YAML Metadata blocks are used to define the title of a file. Pandoc automatically replaces $title$ in a template with the title field in the metadata block. The metadata block looks something like this:

---
title: Generate Website - Product Page
---

Generating a sitemap

Google supports sitemaps that are just a textual list of URLs instead of the standard XML format.

As a result, a simple sitemap can be generated using:

"""
Sorts the provided file paths and writes them to a sitemap
"""
def generate_sitemap(file_groups: list[FileGroup]):
    relative_urls = []
    for file_group in file_groups:
        relative_urls.extend(file_group.get_relative_urls())
    
    urls = [f'{config.URL_PREFIX}/{relative_url}\n' for relative_url in relative_urls]
 
    # An explicit sort is required at the end, because wildcards in the input can cause unsorted
    # results:
    # E.g.,
    # - *.py -> extends to main.py and source.py
    # - resource/template.html -> would have to be in between main.py and source.py
    sorted_urls = sorted(urls)
 
    sitemap_path = f'{config.WEBSITE_DESTINATION_FOLDER}/sitemap.txt'
 
    with open(sitemap_path, 'w') as f:
        f.writelines(sorted_urls)