[05-09-2022] | Generated Article List

The article list for this website is now generated. To see the block of HTML that is generated, view the source of index.html and note the comments prepending the articles. contains the full source code for the generator.

I was looking for a productive way to keep learning Python, and this seemed like an interesting project to tackle. However, Python is not the best option for text processing, and I could have done this with a few lines of bash and sed/awk.

How does work?

The Python script looks in the /articles directory and starts by counting all files within that directory. It then adds all the file names to a list, opens each individual file, and parses for a line that contains the TITLE_LOCATOR string (in this case, &ltdiv class=\"articles\">).

This class is only ever found in one place within an article - one line above the actual title of the article. A raw line looks like this:

&ltp>&ltspan class="bold">[05-09-2022] | </span&gtGenerated Article List&ltbr /></p>

The line saved contains &ltp></p> tags not needed in the article block, so they are pruned.

&ltspan class="bold">[05-09-2022] | </span&gtGenerated Article List&ltbr />

&lta href> tags are added between </span> and &ltbr /> tags by searching each line for the first and second tags, and adding the respective link tags after and before. Because articles names are not tied directly to the article, the file name list is used to build the links.

&ltspan class="bold">[03-09-2022] | </span>&lta href="articles/articleListGenerator.html"&gtArticle List Generator </a>&ltbr />

The links are now built, but they are in alphabetical order by file name. They need to be sorted by date. So, a sort of key=lambda x: datetime.strptime(x[37:49], '[%d-%m-%Y]' is ran on the list. This works but it's rickety - it depends on the format of the line to always have the date start at column 37. It seems like there's no good way to easily sort based on a date that's anywhere within a string that contains other data.

The data is ready to be injected into index.html. Care had to be taken to make sure I could run this in whatever state the last generation left it in. To mark the lines that are added by the script, <!--generated--> comments are added at the beginning of each line that gets injected.

<!--generated-->&ltspan class="bold">[05-09-2022] | </span >< a href="articles/articleListGenerator html" > Generated Article List </a>&ltbr />

Before injection, the file needs to be prepared. The generator looks at the file, and rewrites all lines except for those that contain the aforementioned <!--generated--> comment.

Another comment <!--articles begin--> is used as a locator for the beginning of the articles block. With the articles blank and the start of the block located, the list of finished article strings is injected into index.html.