Total Pageviews

Tuesday 14 June 2016

Optimus Cache Prime

Optimus Cache Prime (OCP) is a smart cache preloader for websites with XML sitemaps. It crawls all URLs in a given sitemap so the web server builds cached versions of the pages before visitors or search engine spiders arrive.

Since Google began penalizing websites with long response times in their rankings, serving all of your pages quickly has become more important than ever. Optimus Cache Prime helps you do that by making sure your cache — be it an in-memory cache like memcached or APC, or a flat file cache like WP Super Cache or W3 Total Cache — is primed so random requests are served lightning fast.

Download

PlatformVersionPackageSize
Linux (32-bit)2.6 – 2012-04-23ocp-2.6-i386.tar.gz1.2 MB
Linux (64-bit)2.7 – 2014-12-27ocp-2.7-amd64.tar.gz1.8 MB
Windows (64-bit)2.7 – 2014-12-27ocp-2.7.zip1.7 MB
Other platforms1.1 – 2010-12-20Legacy Python Version
Git repositoryDevelopmenthttps://github.com/patrickmn/ocp

Usage Examples

CommandExplanation
./ocp /home/patrick/sitemap.xmlPrime all URLs in a local sitemap
./ocp http://mysite.com/sitemap.xmlPrime all URLs in a remote sitemap
./ocp -c 10 http://mysite.com/sitemap.xml.gzPrime all URLs in a remote sitemap, priming up to 10 URLs at once
./ocp -l /var/www/mysite.com/wp-content/cache/supercache/mysite.com http://mysite.com/sitemap.xmlPrime each URL in a remote sitemap only if a cached version of the page (<-l path>/<page name>/index.html) doesn’t already exist
./ocp -l /var/www/mysite.com/wp-content/w3tc/pgcache/ -ls _index.html http://mysite.com/sitemap.xmlPrime each URL in a remote sitemap only if a cached version of the page (<-l path>/<page name>/_index.html) doesn’t already exist
./ocp -l /var/www/mysite.com/wp-content/cache/supercache/mysite.com –max 20 http://mysite.com/sitemap.xmlPrime each URL in a remote sitemap only if a cached version of the page doesn’t already exist, and stop after priming 20 (uncached) pages
./ocp –print http://mysite.com/sitemap.xml | xargs curl -IDon’t prime, but run curl -I to get the response headers from each of the URLs in a sitemap or set of nested sitemaps
Run ./ocp without any parameters to get an overview and explanation of all available options.

Features

  • Cache checking
  • Nested sitemap/sitemap index support
  • Priming an arbitrary number of URLs simultaneously
  • HTTP KeepAlive and session re-use (when applicable)
  • Warns if the pages in your sitemap don’t load, return a 404 Not Found status, etc.
  • Customizable User-Agent (–ua flag)
  • Doubles as a general-purpose (nested) sitemap parser for use with external commands (–print flag)
Run OCP on any machine specifying a target sitemap — e.g. one that links to all of your pages, or one that lists only the high-priority pages — and Optimus primes the links therein.
If run locally on your web server, OCP can probe your static file cache before making any requests to your web server, reducing the amount of requests and redundant log messages drastically. Pages are only crawled if they aren’t already cached.
OCP checks up to 10,000 pages per second with local mode enabled if the cache is mostly primed from previous runs. Local mode was designed for use with W3 Total Cache and WP Super Cache for WordPress, but will work with any system that uses a URL-relative flat file cache (i.e. /about/ is cached as e.g. ‘about’ or ‘about/index.html’ on the disk.)

FAQ

Q: How do I install and use OCP?
A: On Windows, download and extract the zip file above, then run ocp.exe either through a command prompt, or by right-clicking ocp.exe, making a shortcut, changing the parameters for that shortcut (in Properties) to e.g.: “D:.exe” -c 10 http://mysite.com/sitemap.xml, and then running it. On Linux, copy the link for your architecture above, then run curl -s <link> | tar xvz, and you’re good to go. cd ocp, and run e.g. ./ocp -c 10 http://mysite.com/sitemap.xml.
Q: How do I make an XML sitemap?
A: You can use a sitemap generator like OCP’s brother Optimus Sitemap Generator, or online/addon versions like XML-Sitemaps.com and Arne Brachhold’s XML Sitemap Generator (for WordPress.) You can also make one manually — here’s an example.
Q: Do I need to have WordPress, W3 Total Cache, WP Super Cache, memcached, … to use OCP?
A: No. All you need is an XML sitemap. To use Local mode you need something which stores its cached pages with file/directory names that are relative to the original URLs. (Both W3 Total Cache and WP Super Cache do just that.)
Q: Can you demonstrate how to use Local mode?
A: You can run ocp with parameters like:
  • WordPress with W3 Total Cache:
    ./ocp -l /var/www/patrickmylund.com/wp-content/w3tc/pgcache -ls _index.html /var/www/patrickmylund.com/sitemap.xml
    Translation: Look for already-cached files in /var/www/patrickmylund.com/wp-content/w3tc/pgcache, where the cached file for e.g. https://patrickmn.com/about/ is <path>/about/_index.html
  • WordPress with WP Super Cache:
    ./ocp -l /var/www/patrickmylund.com/wp-content/cache/supercache/patrickmylund.com /var/www/patrickmylund.com/sitemap.xml
    Translation: Look for already-cached files in /var/www/patrickmylund.com/wp-content/cache/supercache/patrickmylund.com, where the cached file for e.g. https://patrickmn.com/about/ is <path>/about/index.html
Q: How do I know if Local mode is working?
A: You shouldn’t see any requests from “Optimus Cache Prime” in your web server’s access log, and runs subsequent to the first should complete in less than a second.
Q: How do I preload the cache regularly?
A: The easiest way is to set up a cron job. On most Linux distributions you can do this by adding a cron entry using crontab -e. The entry can be e.g. /5 * * * /home/patrick/ocp https://patrickmn.com/sitemap.xml, which will run OCP every five minutes. For more information, see Ubuntu’s Cron Howto.
(Note that Cron’s environment/path is very minimal, and you might need to use full paths to your commands.)
Q: How can I make sure only one OCP process runs at the same time?
A: If you’d like to run OCP e.g. every minute via cron, but don’t want several copies of the program to launch if the priming process is taking a while, you can use one.shto launch OCP.

Support

If you have any problems with OCP, or have a question that isn’t answered in the FAQ, please search through or send a message to the Optimus Cache Prime group(optimus-cache-prime@googlegroups.com)

Changelog

Here are the highlights from the past releases to the current:
VersionChanges
2.7 – 2014-12-27
  • Compiled with the latest Go release, fixing some issues with priming sites using new SSL configurations
  • The new command-line parameter –insecure-sslallows you to skip SSL certificate verification when priming sites using self-signed certificates
  • The Windows version of OCP is now 64-bit (let me knowif you need a 32-bit version)
2.6 – 2012-04-23
  • The new command-line parameter –ua allows you to customize the User-Agent header set by OCP in each GET request. (Can be used if a site behaves differently depending on whether the requests “come from” a mobile device, Firefox, Internet Explorer, and so on.) Example: ocp -ua “Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Q312461; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)” https://patrickmn.com/sitemap.xml
2.5 – 2012-02-11
  • Fixed a bug where OCP would crash when unable to establish a TCP connection to the server with a remote sitemap
  • Fixed a bug where OCP would crash when unable to establish a TCP connection to the host specified in the URL from an entry in a sitemap
  • Significant performance improvement when priming many uncached pages
  • Sitemap parser is now case-sensitive (per the XML standard): sitemaps must have all-lower-case tags
  • Now shows how many URLs will be primed in a given run
2.4 - 2011-12-22
  • New command-line flag, –max [number], which makes OCP exit after priming e.g. 5 (uncached) pages
  • It is now apparent from the log shown with -v whether a GET request is sent to the web server, or if the cached page already exists on disk
2.3 - 2011-11-23
  • New command-line flag, –print (used exclusively), which causes OCP to simply print all of the URLs read from the sitemap (or set of nested sitemaps) sorted by priority. This can be used with xargs to run arbitrary commands on the URLs, e.g. ocp –print sitemap.xml | xargs curl -I
2.2 - 2011-11-17
  • OCP now reports dead/broken links and pages that can’t be loaded (e.g. HTTP status 500 is received from the web server). The warnings can be turned off with the –no-warn flag
2.1 - 2011-11-17
  • Sitemapindex/nested sitemaps support; OCP will now prime all URLs in all listed sitemaps of a Sitemapindex XML file
2.0 - 2011-08-06
  • Rewritten in Go; use KeepAlive/existing sessions, optional probing concurrency
1.1 - 2010-12-20
  • Switched XML parsing library to improve compatibility
1.0 - 2010-11-12
  • Probe all URLs in XML sitemaps