The application that powers my blog: https://golb.hplar.ch
For more information check out https://golb.hplar.ch/2020/05/gitblog.html
from https://github.com/ralscha/gitblog
-------------------------------------------------------
Gitblog - the software that powers my blog
A few people have asked me what software I use for this blog. Here's a blog post that answers those questions and walks you through the different parts of the application.
Requirements ¶
I had a few requirements for the blog software:
- Write something from scratch with Java. It's always fun to play with new libraries and technologies.
- The generated blog posts should be static HTML files, with no JavaScript on the pages.
- Blog posts should be stored in Markdown files, one file per blog post.
- Blog posts should be stored in a Git repository.
- Full-text search.
- Code syntax highlighting.
Overview ¶
I wrote a Java application with Spring Boot called Gitblog that covers all my requirements, and here is how it works.
I write the blog posts on my computer in Markdown files (1). Then I
commit and push the files to a self-hosted private Git repository (Gitea)
(2). The Git server sends a POST request to the Gitblog application
(3). Gitblog listens for these requests and pulls the changes from the
Git repository to a local repository (4). Then it figures out what has
changed or is new and creates HTML files of all new and changed Markdown
files (5). The generated HTML files are stored in the filesystem.
Additionally, Gitblog writes a new sitemap.xml, feed.atom, and feed.rss
file and "pings" the Google and Bing search engines with a GET request
(7). Nginx handles incoming HTTP requests from readers (8) and sends the
generated HTML back (9).
Not all sites are static. Exceptions are the index and feedback pages, which are dynamically generated by Gitblog. Gitblog also maintains a full-text search index with Lucene so users can perform a full-text query over all blog posts. Gitblog takes the user feedback and sends it to me by email.
Blog Post format ¶
I needed a way to add metadata to each blog post, like title and
date. Because in this system, I only transfer Markdown files and there
is no management user interface in the blog software, I add this
information at the beginning of the Markdown file, enclosed with ---. The following example shows you how a blog post looks with all supported headers.
---
summary: The summary for the index page
tags: [java, database]
title: The title for the intext page
draft: false
published: 2020-01-01T10:10:59.167Z
updated: 2020-01-10T06:29:02.130Z
---
... Here the blog post ...
Most information is used for the index page and for querying the posts. A blog post with draft: true will be converted to HTML but does not appear on the index.html page, is not included in the full-text search index, and is not added to sitemap.xml and RSS and Atom feeds. I use the draft mode to review the posts before I publish them. Publishing means removing the draft line or setting the value to false.
Implementation ¶
Here's an overview of all the components of Gitblog.
The system consists of these HTTP endpoint controllers.
| GiteaWebhook | Handles requests coming from the Git server and starts the conversion process via MainService |
| IndexController | Handles requests to / and /index.html. Queries the Lucene index and returns an HTML page with a list of blog posts |
| FeedbackController | Handles feedback requests and sends emails to my account |
Components that are responsible for the Markdown -> HTML conversion, creating the feed and sitemap files, and updating the Lucene full-text index:
| MainService | Runs the other services when changed Markdown files arrive |
| GitService | Responsible for cloning and pulling files from the remote Git repository |
| FileService | Manages the conversion process with the help of the following three components |
| GitHubCodeService | Fetches code from GitHub and embeds it into the blog post |
| MarkdownService | Converts Markdown to HTML |
| PrimsJsService | Syntax highlights code blocks |
| FeedService | Creates the feed files feed.rss and feed.atom |
| SitemapService | Creates sitemap.xml and pings Google and Microsoft Bing |
| LuceneService | Updates and queries the Lucene full-text index |
Two scheduled jobs run tasks regularly:
| URLChecker | Runs once a month and checks all URLs of all blog posts and creates an HTML report with the errors |
| S3Backup | Runs daily, backs up the local Git repository, and sends it to Amazon S3 |
You can find the complete source code of the application on GitHub:
https://github.com/ralscha/gitblog
from https://golb.hplar.ch/2020/05/gitblog.html
No comments:
Post a Comment