Total Pageviews

Wednesday, 5 February 2025

gitblog-by-ralscha

 

The application that powers my blog: https://golb.hplar.ch

For more information check out https://golb.hplar.ch/2020/05/gitblog.html

from  https://github.com/ralscha/gitblog

-------------------------------------------------------

Gitblog - the software that powers my blog


A few people have asked me what software I use for this blog. Here's a blog post that answers those questions and walks you through the different parts of the application.

Requirements

I had a few requirements for the blog software:

  • Write something from scratch with Java. It's always fun to play with new libraries and technologies.
  • The generated blog posts should be static HTML files, with no JavaScript on the pages.
  • Blog posts should be stored in Markdown files, one file per blog post.
  • Blog posts should be stored in a Git repository.
  • Full-text search.
  • Code syntax highlighting.

Overview

I wrote a Java application with Spring Boot called Gitblog that covers all my requirements, and here is how it works.

overview

I write the blog posts on my computer in Markdown files (1). Then I commit and push the files to a self-hosted private Git repository (Gitea) (2). The Git server sends a POST request to the Gitblog application (3). Gitblog listens for these requests and pulls the changes from the Git repository to a local repository (4). Then it figures out what has changed or is new and creates HTML files of all new and changed Markdown files (5). The generated HTML files are stored in the filesystem. Additionally, Gitblog writes a new sitemap.xml, feed.atom, and feed.rss file and "pings" the Google and Bing search engines with a GET request (7). Nginx handles incoming HTTP requests from readers (8) and sends the generated HTML back (9).

Not all sites are static. Exceptions are the index and feedback pages, which are dynamically generated by Gitblog. Gitblog also maintains a full-text search index with Lucene so users can perform a full-text query over all blog posts. Gitblog takes the user feedback and sends it to me by email.

Blog Post format

I needed a way to add metadata to each blog post, like title and date. Because in this system, I only transfer Markdown files and there is no management user interface in the blog software, I add this information at the beginning of the Markdown file, enclosed with ---. The following example shows you how a blog post looks with all supported headers.

---
summary: The summary for the index page
tags: [java, database]
title: The title for the intext page
draft: false
published: 2020-01-01T10:10:59.167Z
updated: 2020-01-10T06:29:02.130Z
---

... Here the blog post ...

Most information is used for the index page and for querying the posts. A blog post with draft: true will be converted to HTML but does not appear on the index.html page, is not included in the full-text search index, and is not added to sitemap.xml and RSS and Atom feeds. I use the draft mode to review the posts before I publish them. Publishing means removing the draft line or setting the value to false.

Implementation

Here's an overview of all the components of Gitblog.

detail

The system consists of these HTTP endpoint controllers.



GiteaWebhook Handles requests coming from the Git server and starts the conversion process via MainService
IndexController Handles requests to / and /index.html. Queries the Lucene index and returns an HTML page with a list of blog posts
FeedbackController Handles feedback requests and sends emails to my account

Components that are responsible for the Markdown -> HTML conversion, creating the feed and sitemap files, and updating the Lucene full-text index:



MainService Runs the other services when changed Markdown files arrive
GitService Responsible for cloning and pulling files from the remote Git repository
FileService Manages the conversion process with the help of the following three components
GitHubCodeService Fetches code from GitHub and embeds it into the blog post
MarkdownService Converts Markdown to HTML
PrimsJsService Syntax highlights code blocks
FeedService Creates the feed files feed.rss and feed.atom
SitemapService Creates sitemap.xml and pings Google and Microsoft Bing
LuceneService Updates and queries the Lucene full-text index

Two scheduled jobs run tasks regularly:



URLChecker Runs once a month and checks all URLs of all blog posts and creates an HTML report with the errors
S3Backup Runs daily, backs up the local Git repository, and sends it to Amazon S3

You can find the complete source code of the application on GitHub:
https://github.com/ralscha/gitblog

from https://golb.hplar.ch/2020/05/gitblog.html

 

 

No comments:

Post a Comment