<!-- 
.. title: Better graphic formats
.. slug: better-graphic-formats
.. date: 2016-12-26 14:08:43 UTC
.. tags: presentations, web
.. category: 
.. link: 
.. description: 
.. type: text
-->

The most frequently used ([and abused](http://pdes-net.org/cobra/posts/generation-jpeg.html)) raster image format—JPEG—recently celebrated its [25th anniversary.](https://www.heise.de/newsticker/meldung/JPEG-25-Jahre-und-kein-bisschen-alt-3342519.html) Its cousins are mostly even older: [TIFF](https://en.wikipedia.org/wiki/TIFF) stems from 1986, [GIF](https://en.wikipedia.org/wiki/GIF) from 1987, and only [PNG](https://en.wikipedia.org/wiki/Portable_Network_Graphics), the latter's intended replacement, was developed a few years later, namely, in 1995. 

What kind of computer did I have 1995? [A Pentium 90 with 16 MB RAM and a 512 MB HDD.](http://pdes-net.org/cobra/posts/archeology.html) And that's what these formats were designed for. Today, 20 years later, we enjoy a factor of about 1000 with regard to CPU speed, memory, and storage size, but despite this enormous difference, our image file formats have so far remained the same.

Several new formats have been proposed in the past few years, such as Google's [WEBP](https://en.wikipedia.org/wiki/WebP) in 2010,  [BPG](https://en.wikipedia.org/wiki/Better_Portable_Graphics) (better portable graphics), which is essentially owned by the [MPEGLA](https://en.wikipedia.org/wiki/MPEG_LA), in 2014, and [FLIF](https://en.wikipedia.org/wiki/Free_Lossless_Image_Format) (free lossless image format) in 2015. Only WEBP is supported to a degree that allows one to actually use it, while BPG and FLIF are essentially still on the level of technology demonstrations.

[This page](http://xooyoozoo.github.io/yolo-octo-bugfixes) offers a most illustrative comparison between the different lossy image formats, among them JPEG and its intended successors as well as BPG and WEBP. There's absolutely no question about the winner. Just look at [Tennis](http://xooyoozoo.github.io/yolo-octo-bugfixes/#tennis&jpg=s&bpg=s) or [Steinway](http://xooyoozoo.github.io/yolo-octo-bugfixes/#steinway&jpg=s&bpg=s), for Pete's sake. No question, wouldn't it be for the sodding [patents.](https://www.philipstorry.net/thoughts/bpg-vs-jpeg-vs-webp-vs-jpeg-xr) *sigh*
_________________________________

But forget the patents for the moment, let's rather look at something interesting. In this post, I look at these new image formats from a different perspective. How well can they compress an essentially black-and-white line art? 

Not that one should ever even consider to do that. Line art should always be stored as vector graphics, that much is obvious to anyone with even the faintest knowledge of graphic formats. Even a few scientific publishers know that. In the author guide to [Nature Communications,](http://www.nature.com/ncomms/submit/how-to-submit) for example, we find the statement:

> All line art, graphs, charts and schematics should be supplied in vector format [...]. 

The author guides of most other publishers lack such explicit statements and rather breath the spirit of the 1990s. For example, in an [Elsevier](https://www.elsevier.com/authors/author-schemas/artwork-frequently-asked-questions) FAQ we can read:

> <strong>Why don't you accept PNG files?</strong>

> We will constantly review technological developments in the graphics industry including emerging file formats - new recommended formats will be introduced where appropriate. PNG files do not cause issues in processing, but our submission systems are in progress of updating to allow for this useful new format. 

In practice, however, most publishers have no problem with accepting vector graphics in EPS or PDF format and, most importantly, also use it 1:1 for the final publication.With one prominent exception: the [American Chemical Society](https://en.wikipedia.org/wiki/American_Chemical_Society) (ACS). Vector graphics submitted to any of the [numerous ACS journals](https://en.wikipedia.org/wiki/American_Chemical_Society#Journals_and_magazines) are invariably converted to a raster image. [Some](http://pubs.acs.org/paragonplus/submission/jacsat/jacsat_authguide.pdf) of their author guides even include a corresponding note:

> NOTE: While EPS files are accepted, the vectorbased graphics will be rasterized for production. 

Regarding the format and resolution of this raster graphics, we find the following [exemplary](http://pubs.acs.org/paragonplus/submission/ancac3/ancac3_checklist.pdf) recommendation in this guide:

> Figures containing photographic images must be at least 300 dpi tif files in CMYK format; line art should be at least 1200 dpi eps files.

To specify a resolution for EPS files demonstrates a complete lack of understanding of vector graphics. And in the same spirit, we read:
	
> Cover images should be 21.5 cm in width and 28 cm in height, with a resolution of 300 dpi at this size (this should be a file of at least 8 MB).
	
Oh, we cannot even handle compressed TIFFs? How wonderful to work with professionals.

Perhaps as a direct consequence of the resulting size of 1200 dpi bitmaps, I have never seen any figure in an ACS journal whose resolution would have exceeded 300 dpi. At least these figures are compressed, contrary to the implicit recommendation in the author guide. Depending on the preference of the technical stuff at the respective ACS journal, the figures are included in the manuscript either as overcompressed JPEGs, exhibiting plainly visible compression artefacts, or as insufficiently compressed PNG files.

Insufficiently compressed? Yes—in contrast to JPEG, PNG employs lossless compression, and one can and should thus always employ the maximum compression level (9). Not doing so only increases the file size. The technical stuff at ACS typically invokes only the minimum compression level 1. Furthermore, the file format is invariably 8 bit/color RGB, even for black and white line art. As a result, the 692 kB of a 295 dpi figure (extracted as described [here](http://pdes-net.org/cobra/posts/extract-bitmaps.html)) in one of my recent ACS publications could have been easily reduced to 138 kB. Or, alternatively, one could have produced a 1200 dpi version with a file size of only 787 kB—barely larger than that included in the galley proofs. 

And for all this “professional” service, we even pay handsomely. Why, then, do we publish there at all? Because of the [impact factor](https://en.wikipedia.org/wiki/Impact_factor), of course. I'll write more about this much too powerful incentive in the near future.     
_________________________________

But let's come back now to the actual topic of this post, and consider the following grayscale line art that has been created with the help of [graph](http://pdes-net.org/cobra/posts/plotting-challenge.html) and inkscape:

![](../images/cairo.svg)

The original SVGZ has 21.6 kB, a PDF saved by inkscape 52 kB. Now let's see what happens if we convert this PDF into various raster image formats with a resolution of 1200 dpi.

**PNG:**

The obvious choice of format is PNG. We can convert the SVGZ or the PDF in various ways. We could export a PNG directly from inkscape, of course. Alternatively, we could open the PDF by gimp and export it as PNG. Both are viable ways, but the CLI is actually more flexible and powerful. So let's open a terminal and enter

	pdftocairo -png -scale-to-x 4000 -scale-to-y -1 -gray -antialias gray valence_bands.pdf valence_bands_cairo.png

That would be my usual way. Results in a nice grayscale png with 356 kB.

Another possibility is 

	convert -verbose -density 483.87 valence_bands.pdf -depth 8 valence_bands_convert.png

Equivalent to  '-depth 8' is '-colorspace gray' (in this particular case). In any case, we get a file with  330 kB. Can we do better? Oh yes, by tuning the PNG compression parameters:

	convert -density 483.87 valence_bands.pdf -colorspace gray -define png:compression-filter=1 -define png:compression-level=9 -define png:compression-strategy=1 def.png

300 kB! For the parameters, see [here](http://stackoverflow.com/questions/27267073/imagemagick-lossless-max-compression-for-png).

Now, that seems to be a fairly optimized PNG, but it is still almost six times larger than its predecessor, the PDF. That's the time of the PNG optimizers! Let's apply them to the smallest PNG we have obtained so far, the one with 300 kB.

[optipng](http://optipng.sourceforge.net/)

	optipng def.png -out opti.png

225 kB.

[pngquant](https://pngquant.org/) 

	pngquant def.png 
	
In contrast to the other optimizers, pngquant converts to a color palette! But with unexpected success:

220 kB.

[pngout](https://en.wikipedia.org/wiki/PNGOUT)

	pngout def.png out.png		
	
189 kB. Needs ages. But it's the tool of the [duke](https://en.wikipedia.org/wiki/Duke_Nukem_3D).

[zoplipng](https://github.com/google/zopfli)

	zopflipng def.png zopfli.png	
	
190 kB. [Google](https://en.wikipedia.org/wiki/Zopfli) vs [Ken Silverman](https://en.wikipedia.org/wiki/Ken_Silverman): 0:1! 

That's about the limit for PNG. 

Let's check other lossless formats.

**TIFF**:

	convert -verbose -density 483.87 valence_bands.pdf -depth 8 -flatten -compress lzma valence_bands.tiff
	
188 kB. Surprise, surprise: basically equal in size to the smallest PNG.

**WEBP**:

	convert def.png -define webp:lossless=true def.webp
	
159 kB! Not bad at all.

**BPG**:

	bpgenc -lossless def.png -o def.bpg

387 kB. Not a format for lossless compression.

**FLIF**:

	flif def.png def.flif
	
92 kB. Now that's a statement! 

But still way larger than the PDF. Is there perhaps a lossy algorithm capable of creating a 1200 dpi image smaller in file size than the PDF? Note that the present graphics with its hard contrasts is a worst case scenario for JPEG and, I presume, for essentially all lossy image formats.

**JPEG (**[libjpeg-turbo](https://en.wikipedia.org/wiki/Libjpeg#libjpeg-turbo)**)**

	convert def.png -flatten -quality 1 def_default.jpeg

165 kB. Hardly smaller than the lossless variants and with the characteristic [ringing](https://en.wikipedia.org/wiki/Ringing_artifacts#JPEG) and [quilting](https://en.wikipedia.org/wiki/Compression_artifact#Block_boundary_artifacts) artefacts surrounding every edge and corner (see below).   

**JPEG (**[mozjpeg](https://en.wikipedia.org/wiki/Libjpeg#mozjpeg)**)**

	convert def.png -flatten -quality 1 def_moz.jpeg

83 kB. Better than the default above, but still larger than the PDF. The compression artefacts are different from those of the default JPEG implementation, but the image is still of terrible quality (see below). 

**WEBP**:

	convert def.png def_lossy.webp
	
203 kB. Worse than lossless (but I didn't explore the various parameters convert offers for WEBP).

**BPG**:

	convert def.png -flatten def_spec.png
	bpgenc -q 44 def_spec.png -o def_lossy.bpg
	
50 kB. I had to preprocess the image since I needed a screenshot of the final BPG for the comparison below. The result is indeed smaller than the PDF, and exhibits (compared to the JPEG) only moderate compression artefacts (see below). Very impressive. 

Here's a comparison of a section of the above graphics.

![](../images/lossies_comparison.svg)

BPG is certainly a major improvement over JPEG also for line art. However, nothing beats vector formats: the PDF is of similar size and is arbitrarily scalable. A version for an A0 poster would still be 54 kB in size, whereas a corresponding BPG of the same quality as shown above would be truly gigantic.

An ideal strategy for scientific artwork would look like that: line art, labels, and annotations as vector graphics (SVG or PDF), photography as BPG, stored together in a PDF or SVGZ container. That's imagery for the 21st century. And, in case you didn't notice, I didn't find any reason to mention WEBP or FLIF. For either of them, there's always a better alternative. If we disregard the patents. 😉 
