Here is a few samples of HTML compression results with default settings:
Site Name | Original | Compressed | Decrease |
BBC | 77,054b | 55,324b | 28.2% |
CNet | 86,492b | 61,896b | 28.4% |
FOX News | 75,266b | 64,221b | 14.7% |
GameTrailers | 112,199b | 92,851b | 17.2% |
Kotaku | 134,938b | 116,280b | 13.8% |
National Post | 75,006b | 55,628b | 25.8% |
SlashDot | 158,137b | 142,346b | 10.0% |
StackOverflow | 116,032b | 100,478b | 13.4% |
Table of Contents
- How it Works
- How to Use
- Dependencies
- Compressing HTML and XML files from a command line
- Defining custom preservation rules
- Compressing whole directories and multiple files at once
- HTML Analyzer
- Using HTML Compressor from Java API
- Creating your own block preservation rules
- Using different and CSS compressor implementations
- Retrieving HTML compression statistics
- Using XML Compressor from Java API
- Compressing selective content in JSP pages
- Compressing selective content in Velocity templates
- Setting up Ant task to compress files
- Maven Integration
- Who Uses it
How it Works
During HTML compression the following is applied to the page source:- Any content within <pre>, <textarea>, <script> and <style> tags will be preserved and remain untouched (with the exception of <script type="text/x-jquery-tmpl"> tags which are compressed as HTML). Inline javascript inside tags (onclick="test()") will be preserved as well. You can wrap any part of the page in <!-- {{{ -->...<!-- }}} --> comments to preserve it, or provide a set of your own preservation rules (out of the box <?php...?>, <%...%>, and <!--#... --> are also supported)
- Comments are removed (except IE conditional comments). Could be disabled.
- Multiple spaces are replaced with a single space. Could be disabled.
- Unneeded spaces inside tags (around = and before />) are removed.
- Quotes around tag attributes could be removed when safe (off by default).
- All spaces between tags could be removed (off by default).
- Spaces around selected tags could be removed (off by default).
- Existing doctype declaration could be replaced with simple <!DOCTYPE html> declaration (off by default).
- Default attributes from <script>, <style>, <link>, <form>, <input> tags could be removed (off by default).
- Values from boolean tag attributes could be removed (off by default).
- javascript: pseudo-protocol could be removed from inline event handlers (off by default).
- http:// and https:// protocols could be replaced with // inside href, src, cite, and action tag attributes (tags marked with rel="external" are skipped).
- Content inside <style> tags could be optionally compressed using YUI compressor or your own compressor implementation.
- Content inside <script> could be optionally compressed using YUI compressor, Google Closure Compiler or your own compressor implementation.
You can optionally remove all unnecessary quotes from tag attributes (attributes that consist from a single word: <div id="example"> would become <div id=example>). This usually gives around 3% pagesize decrease at no performance cost but might break strict HTML validation so this option is disabled by default.
About extra 3% pagesize can be saved by removing inter-tag spaces. It is fairly safe to turn this option on unless you rely on spaces for page formatting. Even if you do, you can always preserve required spaces with  or . This option has no performance impact.
You can quickly test how each of the compressor settings would affect filesize of your page by running command line HTML Analyzer.
During XML compression:
- Any content inside <![CDATA[...]]> is preserved.
- All comments are removed. Could be disabled.
- All spaces between tags are removed. Could be disabled.
- Unneeded spaces inside tags (multiple spaces, spaces around =, spaces before />) are removed.
How to Use
Before reading further, if you are not serving your HTML/XML content using GZip compression, you should really look into that first, as it would give you very significant compression ratio (sometimes up to 10 times) and usually very easy to implement. For further reading on GZip compression please see this article for example.If you want to reach further size decrease, the next step would be removing insignificant, from browser's perspective, characters from your pages, that's where this library comes in handy.
Juriy Zaytsev did an excellent detailed research on HTML minification techniques, which you can use as a guide to what HTML compression settings would work best for your project. Please see his Optimizing HTML and Experimenting with html minifier articles.
For Java Projects
If you are generating static HTML files on the server, the most flexible solution would be calling html compression before writing output to the file. If you are generating HTML files once in a while and then uploading them to a production server, the easiest solution that doesn't require any code modifications would be using ANT task that calls the command line version of the library and rewrites files with their compressed versions.For dynamic sites that are using JSP, the best way of compressing the output would be using compressor taglib.
For dynamic sites using Velocity, you can either wrap your templates with compressor directives or call compressor manually after merging the template.
For other dynamic cases you will probably have to call compressors directly from the code before serving a page to the client.
HtmlCompressor and XmlCompressor classes are considered thread safe* and can be used in multi-thread environment (the only unsafe part is setting compression options, so it would be a good idea to initialize a compressor with required settings once per application and then use it to compress different pages in parallel in multiple threads. Please note that enabling statistics generation makes compressor not safe).
For Maven projects HtmlCompressor library is available as a Maven artifact.
For Non-Java Projects
If you are generating HTML for your site (or have simple site in pure HTML) you can use a command line version of the library (Java still must be installed). For dynamic sites it other languages the only option would be programmatically executing shell command that runs command line compressor.Bad Practices
- Don't feed the compressor actual templates (php, jsp, etc). This most likely won't work, even if it does it would be a bad idea anyway as you will lose their readability and further development will be very inconvenient. Instead of compressing templates you should consider compressing resulting html after a template is merged. If you absolutely have to compress templates, you need to set custom block preservation rules for HTML Compressor.
- If your site is in pure HTML, always keep original files and only compress their copies that will be served to the client. If you compress your only sources, again your further development will be very hard and there is no easy way to decompress pages back.
Known Issues
- When <script> tag contains some custom preserved block (for example <?php>), enabling inline javascript compression will fail. Such <script> tags could be skipped by wrapping them with <!-- {{{ -->...<!-- }}} --> comments (skip blocks).
- Removing intertag spaces might break text formatting, for example spaces between words surrounded with <b> will be removed. Such spaces might be preserved by replacing them with  or .
Dependencies
XML compressor doesn't rely on any external libraries.HTML compressor with default settings doesn't require any dependencies.
Inline CSS compression requires YUI compressor library.
Inline JavaScript compression requires either YUI compressor library (by default) or Google Closure Compiler library.
All dependencies could be found in /lib/ folder of the package.
Please note that if using command line compressor, there are strict restrictions on jar filenames for dependencies (more details in the command line compressor section below).
Compressing HTML and XML files from a command line
If you have Java installed, you can run HTML and XML compressors from a command line.For inline JavaScript compression you can choose between YUI Compressor (default) and Google Closure Compiler. If YUI compressor is used, jar file yuicompressor-2.4.6.jar (or yuicompressor-2.4.*.jar or yuicompressor.jar) must be present at the same directory as HtmlCompressor jar. For Closure Compiler compiler.jar must be present. (Please note that jar filenames cannot be changed).
Usage: java -jar htmlcompressor.jar [options] [input]
from http://code.google.com/p/htmlcompressor/
Javascript-based HTML compressor/minifier (with Node.js support)
kangax.github.io/html-minifier/HTMLMinifier
HTMLMinifier is a highly configurable, well-tested, JavaScript-based HTML minifier.
See corresponding blog post for all the gory details of how it works, description of each option, testing results and conclusions.
Test suite is available online.
Also see corresponding Ruby wrapper, and for Node.js, Grunt plugin, Gulp module, Koa middleware wrapper and Express middleware wrapper.
For lint-like capabilities take a look at HTMLLint.
Minification comparison
How does HTMLMinifier compare to other solutions — HTML Minifier from Will Peavy (1st result in Google search for "html minifier") as well as htmlcompressor.com and minimize?
Site | Original size (KB) | HTMLMinifier | minimize | Will Peavy | htmlcompressor.com |
---|---|---|---|---|---|
46 | 42 | 46 | 48 | 46 | |
HTMLMinifier | 125 | 98 | 111 | 117 | 111 |
207 | 165 | 200 | 224 | 200 | |
Stack Overflow | 253 | 195 | 207 | 215 | 204 |
Bootstrap CSS | 271 | 260 | 269 | 228 | 269 |
BBC | 298 | 239 | 290 | 291 | 280 |
Amazon | 422 | 316 | 412 | 425 | n/a |
NBC | 553 | 530 | 552 | 553 | 534 |
Wikipedia | 565 | 461 | 548 | 569 | 548 |
New York Times | 678 | 606 | 675 | 670 | n/a |
Eloquent Javascript | 870 | 815 | 840 | 864 | n/a |
ES6 table | 5911 | 5051 | 5595 | n/a | n/a |
ES draft | 6126 | 5495 | 5664 | n/a | n/a |
Options Quick Reference
Most of the options are disabled by default.
Option | Description | Default |
---|---|---|
caseSensitive |
Treat attributes in case sensitive manner (useful for custom HTML tags) | false |
collapseBooleanAttributes |
Omit attribute values from boolean attributes | false |
collapseInlineTagWhitespace |
Don't leave any spaces between display:inline; elements when collapsing. Must be used in conjunction with collapseWhitespace=true |
false |
collapseWhitespace |
Collapse white space that contributes to text nodes in a document tree | false |
conservativeCollapse |
Always collapse to 1 space (never remove it entirely). Must be used in conjunction with collapseWhitespace=true |
false |
continueOnParseError |
Handle parse errors instead of aborting. | false |
customAttrAssign |
Arrays of regex'es that allow to support custom attribute assign expressions (e.g. '<div flex?="{{mode != cover}}"></div>' ) |
[ ] |
customAttrCollapse |
Regex that specifies custom attribute to strip newlines from (e.g. /ng-class/ ) |
|
customAttrSurround |
Arrays of regex'es that allow to support custom attribute surround expressions (e.g. <input {{#if value}}checked="checked"{{/if}}> ) |
[ ] |
customEventAttributes |
Arrays of regex'es that allow to support custom event attributes for minifyJS (e.g. ng-click ) |
[ /^on[a-z]{3,}$/ ] |
decodeEntities |
Use direct Unicode characters whenever possible | false |
html5 |
Parse input according to HTML5 specifications | true |
ignoreCustomComments |
Array of regex'es that allow to ignore certain comments, when matched | [ /^!/ ] |
ignoreCustomFragments |
Array of regex'es that allow to ignore certain fragments, when matched (e.g. <?php ... ?> , {{ ... }} , etc.) |
[ /<%[\s\S]*?%>/, /<\?[\s\S]*?\?>/ ] |
includeAutoGeneratedTags |
Insert tags generated by HTML parser | true |
keepClosingSlash |
Keep the trailing slash on singleton elements | false |
maxLineLength |
Specify a maximum line length. Compressed output will be split by newlines at valid HTML split-points | |
minifyCSS |
Minify CSS in style elements and style attributes (uses clean-css) | false (could be true , Object , Function(text, type) ) |
minifyJS |
Minify JavaScript in script elements and event attributes (uses UglifyJS) | false (could be true , Object , Function(text, inline) ) |
minifyURLs |
Minify URLs in various attributes (uses relateurl) | false (could be String , Object , Function(text) ) |
preserveLineBreaks |
Always collapse to 1 line break (never remove it entirely) when
whitespace between tags include a line break. Must be used in
conjunction with collapseWhitespace=true |
false |
preventAttributesEscaping |
Prevents the escaping of the values of attributes | false |
processConditionalComments |
Process contents of conditional comments through minifier | false |
processScripts |
Array of strings corresponding to types of script elements to process through minifier (e.g. text/ng-template , text/x-handlebars-template , etc.) |
[ ] |
quoteCharacter |
Type of quote to use for attribute values (' or ") | |
removeAttributeQuotes |
Remove quotes around attributes when possible | false |
removeComments |
Strip HTML comments | false |
removeEmptyAttributes |
Remove all attributes with whitespace-only values | false (could be true , Function(attrName, tag) ) |
removeEmptyElements |
Remove all elements with empty contents | false |
removeOptionalTags |
Remove optional tags | false |
removeRedundantAttributes |
Remove attributes when value matches default. | false |
removeScriptTypeAttributes |
Remove type="text/javascript" from script tags. Other type attribute values are left intact |
false |
removeStyleLinkTypeAttributes |
Remove type="text/css" from style and link tags. Other type attribute values are left intact |
false |
removeTagWhitespace |
Remove space between attributes whenever possible. Note that this will result in invalid HTML! | false |
sortAttributes |
Sort attributes by frequency | false |
sortClassName |
Sort style classes by frequency | false |
trimCustomFragments |
Trim white space around ignoreCustomFragments . |
false |
useShortDoctype |
Replaces the doctype with the short (HTML5) doctype |
false |
Sorting attributes / style classes
Minifier options like sortAttributes
and sortClassName
won't impact the plain-text size of the output. However, they form long
repetitive chains of characters that should improve compression ratio
of gzip used in HTTP compression.
Special cases
Ignoring chunks of markup
If you have chunks of markup you would like preserved, you can wrap them <!-- htmlmin:ignore -->
.
Preserving SVG tags
SVG tags are automatically recognized, and when they are minified, both case-sensitivity and closing-slashes are preserved, regardless of the minification settings used for the rest of the file.
Working with invalid markup
HTMLMinifier can't work with invalid or partial chunks of markup. This is because it parses markup into a tree structure, then modifies it (removing anything that was specified for removal, ignoring anything that was specified to be ignored, etc.), then it creates a markup out of that tree and returns it.
Input markup (e.g. <p id="">foo
)
↓
Internal representation of markup in a form of tree (e.g. { tag: "p", attr: "id", children: ["foo"] }
)
↓
Transformation of internal representation (e.g. removal of id
attribute)
↓
Output of resulting markup (e.g. <p>foo</p>
)
HTMLMinifier can't know that original markup was only half of the tree; it does its best to try to parse it as a full tree and it loses information about tree being malformed or partial in the beginning. As a result, it can't create a partial/malformed tree at the time of the output.
Installation Instructions
From NPM for use as a command line app:
npm install html-minifier -g
From NPM for programmatic use:
npm install html-minifier
From Git:
git clone git://github.com/kangax/html-minifier.git
cd html-minifier
npm link .
Usage
Note that almost all options are disabled by default. For command line usage please see html-minifier --help
for a list of available options. Experiment and find what works best for you and your project.
- Sample command line:
html-minifier --collapse-whitespace --remove-comments --remove-optional-tags --remove-redundant-attributes --remove-script-type-attributes --remove-tag-whitespace --use-short-doctype --minify-css true --minify-js true
from https://github.com/kangax/html-minifier
(HTML 开源压缩工具:html-minifier
html-minifier是一个开源的HTML 压缩工具,遵守MIT开源协议。测试报告显示要比 minimize、Will Peavy、htmlcompressor.com 压缩性能好.
html-minifier运行需要Node.js ,运行后,可以根据选项,对html代码进行精简。
[repo owner=”kangax” name=”html-minifier”] )