The implementation to generate HTML from files has changed multiple times throughout time. Originally, there was only support for Markdown files. The next step was introducing support for Source Code files. Finally, Index files were introduced, representing a collection of source code files. The reason for introducing this was to have an overview of the source code of a single product, to which I could refer from the product page.

Index files brought with it a small problem though, they aren’t files that are stored on the hard drive. The files are generated on the fly. This means conceptual changes were required to support this type of file because the implementation always assumed there to be a file on the hard drive.

The new implementation does not require a source path and instead relies on the destination path. This results in a more correct abstraction that is easier to reason about and makes it easier to implement the functionality.

Use YAML instead of a regular text file to define the list of files to be processed

This way I can add more information to the file, such as if something is a codebase. This can be used to automatically generate an index.html file for easier navigation.

Move implementation for different file types to separate classes

Instead of having def compile_markdown(...) and def compile_code(...) in the main.py file, the implementation is moved to specific classes. This way the main source file can work with the generic concept of source files instead of seeing any implementation details.

[BUG] Sorting the result of glob can still result in an unsorted list

The previous bugfix was not sufficient in the following example:

*.py
resource/template.html

where *.py extends to main.py and source.py. The final result would be

main.py
source.py
resource/template.html

instead of

main.py
resource/template.html
source.py

The solution is to sort the final list instead of the result of each separate glob.

[BUG] The result of glob must be explicitly sorted

When determining the list of files, glob is used to match file patterns. Glob does not necessarily return sorted results, but sorting the list of files is a requirement I set so that it’s easier to see if there are any changes to existing URLs.

To do this,

resolved_file_paths.extend(glob.glob(file, recursive=True))

had to be updated with

resolved_file_paths.extend(sorted(glob.glob(file, recursive=True)))

[BUG] Only todo items at the start of a line should be stripped

Some files contain examples of to-do items to explain how the code works. Those should not be stripped of course. To achieve that, a small change to the Regex is required:

re.sub(r'^<!--TODO.*?-->', '', input, flags=re.DOTALL | re.MULTILINE)

instead of

re.sub(r'<!--TODO.*?-->', '', input, flags=re.DOTALL)

The two changes are:

  • ^ to force the regex to start at the beginning of a line
  • re.MULTILINE to consider each line in the string instead of treating the string as a whole Multiple flags can be combined by OR-ing them together (using |).