Total Pageviews

Thursday, 25 August 2016

Create docker image to speed up initial build of hakyll

As mentioned in this @axil’s comment, a custom docker image could speed up the initial build.
The problem is a hakyll site is a full fledged application with its own dependencies (hakyll is only one of them). We could have something like
FROM haskell:7.10.3
RUN stack install hakyll
which would install hakyll globally, but then we would have no guarantee that the installed hakyll version matches the version specified in the site dependencies. (Note that the same thing happens with the Haskell version but it is not that severe because Haskell updates less frequently and individual Haskell versions are usually mostly compatible.)
Another problem is with caching additional dependencies, other than hakyll. Stack caches the already built libraries in something like $STACK_ROOT/precompiled/x86_64-linux/ghc-7.10.3/ (where $STACK_ROOT is $HOME/.stack by default). If we install, for example, gd package for creating thumbnails, the built library will be stored into the directory. Between the CI builds, we are only able to cache the files inside project directory, though – which is why we currently have the $STACK_ROOT set to be inside the project directory.
We can create a default Docker package with only hakyll installed, but the main reason for using hakyll is its customizability and having to choose between
  • longer wait on first build, subsequent builds are cached (current state)
  • fast first build, fast subsequent builds if you did not use any additional libraries, slow subsequent builds if you did (default Docker package)
  • every build is fast but you have to roll your own docker image (kinda beats the whole purpose of these templates)
I would choose the first one.
  • Achilleas Pipinellis 
    @axil commented 
    @jtojnar agreed, but my concern about cache is that it might not always be there since we use shared Runners running on DigitalOcean and they are pretty ephemeral. If you update your hakyll site once a week then you have to wait 30min for it to build every time. Were do you declare the haskell dependencies? I see there is and then you have the import declaration in site.hs.
    Anyway, our goal is to provide a generic .gitlab-ci.yml that works for almost everybody, no need to go into details. An experienced user will just edit .gitlab-ci.yml to their needs.
  • Jan Tojnar 
    @jtojnar commented 
    Oh, I hoped cache would be somewhat more lasting.
    The point is, unlike most other SSGs, Hakyll is only a library – you have to write a generator (site.hs) yourself. Also, a majority of hakyll users will be experienced and will use additional dependencies.
    I am more and more inclined to use custom docker image and add instructions how to upgrade the image to README, although it will mean the dependencies will have to be specified in two places (cabal file and Dockerfile).
  • Jan Tojnar 
    @jtojnar commented 
    @axil Maybe GitLab container registry could be used for this – build a Docker image each time one of the generator files changes and use that image for building the site.
    I do not have any experience with Docker yet, though.
  • Achilleas Pipinellis 
    @axil commented 
    Maybe GitLab container registry could be used for this 
    That would be a nice solution indeed and it crossed my mind too. I may give it a shot when I find some free time :)
  • Jan Tojnar 
    @jtojnar commented 
    OK, I have managed to get the docker build to work in my fork. Only two things to figure out:
    • Repo name in .gitlab-ci.yml has to be customised for each fork.
    • Image is rebuilt with each commit, not only when the source changes. I thought pulling the image from repository before build would make Docker realize it was already built from those files but nope.
    As a side note, so far I quite like Docker except for one thing – the way ADD behaves. The person who came up with it could not have been in his right mind. COPY is little better but still, WTF:
    Note: The directory itself is not copied, just its contents.
    Also the GitLab insfrastructure is a failure gitlab-org/gitlab-ce#19719
  • Jan Tojnar 
    @jtojnar commented 
    Repo name is now specified using an environment variable $CI_PROJECT_SLUG. Currently it has to be set manually, I filed a MR in Runner repo: gitlab-org/gitlab-ci-multi-runner!226
    Now, only the source change detection remains to be implemented. Any ideas?