Total Pageviews

Sunday, 19 January 2014

使用你所喜爱的perl框架生成静态站点

(I have myself been using ttree for years, but writing code using http://template-toolkit.org/'s DSL can be limiting.)
The setup is always the same: take a bunch of files in some format (usually Markdown, reStructuredText or Textile), plus some configuration, and run the tool. The problem with that is always the same: the model fits the original author's needs, and you have to follow their rules. Personalisation not included.

Web frameworks

Perl has plenty of awesome web frameworks, such as Catalyst, Dancer, Mojolicious, and many others, to let you write your web application the way you want. Each has its own set of advantages and disadvantages, but that is not the point of this article.
The point is that using those to run a blog or the framework's marketing^Whome page may seem wasteful, as there's probably little need to regenerate a page for each request, no matter if the content has changed or not.

Static web sites made with web frameworks

PSGI is an interface between web servers and applications written in Perl. The Plack implementation of PSGI is supported by most Perl web frameworks. It's also possible to write your own application (a PSGI application is just a subroutine) and connect it to any supported web server — and most web servers are supported.
After having tried to write my own static site generator, and having failed at making it as flexible as I would have liked (which in retrospect would probably have made it a web framework in itself), it seemed wiser to start building a site with one of those nice web frameworks and to use Plack as my entry point to get the to the content.
wallflower is a command-line tool that takes a PSGI application, and uses Plack to access to the content and save it to local files, ready to be uploaded to your static web server.
After obtaining the coderef for your application, it repeatedly creates the PSGI environment for the URL you want to process and runs your app on it (using Plack::Util::run_app), saving the response content to a local file. If the response content type is text/html or text/css, it will automatically look for embedded links and add them to its queue, thus enabling auto-discovery of the entire web site.
The point of Wallflower is to let you write any static website using all the power of your favorite web framework. It also follows links inside your Plack application, so if your site is properly organized, you only need to point it to /.

Blogging statically with your favorite framework

The obvious example for this would be to write a blog. I'll use Dancer, because it's the only web framework I know, but keep in mind that this will work with any PSGI-compliant framework. You could actually write your own PSGI application, if no existing framework suited you.
Since our target is a static web site, the main thing to keep in mind is that the target web server will determine the content type by looking at the extension, each all of our URLs must have an extension.
The sources for our basic blog will be a set of text files in the public/ subdirectory, with the content written in Markdown. URL will simply be mapped to those files.
So, we start by writing a route to handle all URL ending with .html:

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14: 

 

package ShyBlog;
use Dancer ':syntax';
use Text::Markdown;
use Path::Class qw( file );

my $m = Text::Markdown->new;

get qr{/(.*)\.html} => sub {
    my ($file) = splat;
    my $text = file( setting('public') => "$file.txt" )->slurp;
    template 'blog', { content => $m->markdown($text) };
};

1;

 
Since we put our blog entries in the public/ directory, Dancer will automatically serve the source when we end the URL in .txt! And we didn't even need to write a route for that!
Now, we want to get any further than a single blog post, to showing a main page with the latest post, some side bars on every page pointing to the archives by month, and maybe a JSON file with all our tags for making a nice tag cloud in JavaScript, we have a bit of a problem: we need to know about all our blog's posts when generating any individual one.
Remember that our PSGI application is ultimately a subroutine that will be called repeatedly by wallflower, so we just have to make the needed data available to the subroutine by building the list of all posts, once and for all, during the initialisation phase of the application.
A simple call to File::Find will help us generate the list of all posts, from which we can create a data structure. In this example it's an array:

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32: 

 

use File::Find;
use Path::Class;

my @entries;

find(
    sub {

# we only care about blog entries return if !/\.txt$/;

# get a Path::Class::File for it my $file = file($File::Find::name);
        my $fh = $file->openr;

# parse a simple header using the kite secret operator chomp( my ( $title, $date, $tags ) = ( ~~<$fh>, ~~<$fh>, ~~<$fh> ) );

# update the structure will all relevant information my $source = substr( $File::Find::name, length( setting('public') ) );
        ( my $url = $source ) =~ s/\.txt$/.html;

        push @entries, {
            url => '/
' .
            title => $title,
            date => $date,
            tags => [ split /\s*,\s*/, $tags ],
            source => "/$year/$month/$_.txt",
        };
    },
    setting( '
public' )
);

 
Actually, for simplicity, and integration with the framework, it would make sense to create a temporary SQLite database, with a few tables for blog entries meta-information, tags, etc. The code in the templates and some special routes (like the main page) can then use that database to fetch all the information they need.
Generating the website is now simply a matter of running:
    $ wallflower -a bin/app.pl -d /path/to/the/output/
wallflower will start browsing the application from / and will follow all links (from HTML and CSS files) to generate your site content.
You can then copy the content of output/ to the proper location on the target web server, and you're done!

See Also