<!-- 
.. title: Vector screenshot
.. slug: vector-screenshot
.. date: 2014-02-23 16:31:34 UTC+01:00 
.. tags: presentations, desktop, 
.. link: 
.. description: 
.. type: text 
--> 

I just had to prepare a poster based on roughly 30 publications, and for several of them I didn't have the original figures but only the manuscript as a pdf file. Using okular and a magnification of 800%, I've got screenshots of these figures as comparatively highly resolved bitmaps, but the price I had to pay was that the editing of the poster in LibreOffice (which I've used as the least common denominator) was getting almost unbearably slow.

I couldn't silence the thought that it should be possible to take a 'vector screenshot' from a pdf file. I had the vague idea that pdftocairo could be useful in this respect, since it can output arbitrary parts of a pdf file as pdf or svg. And it turned out that [Peter Williams,](http://newton.cx/~peter/about-me/ "Peter Williams,") a young radio astronomer from Harvard, had the same idea and came up with a [script](http://newton.cx/~peter/2012/10/extracting-pdf-figures-as-pdfs-in-linux/ "script") which does exactly what I wanted.

I've fixed a small error (_pageh_ should also be an integer) and ensured Arch and Fedora compatibility (python2), but otherwise its Peter's script:

	#! /bin/bash

	# original: <https://gist.github.com/pkgw/3892706>
	# see <http://newton.cx/~peter/2012/10/extracting-pdf-figures-as-pdfs-in-linux/>

	margin=1

	# XPDF gives its y coordinates in terms of the standard PDF coordinate
	# system, where (0,0) is the bottom left corner and y increases going
	# up. But pdftocairo uses Cairo coordinates, in which (0,0) is the top
	# left corner and y increases going down. We can use pdfinfo to get
	# the page size to translate between these conventions.

	file="$1"
	page="$2"
	pageh=$(pdfinfo -f $page -l $page "$file" |grep '^Page.*size' \
	    |sed -e 's/.* x ' -e 's/pts.*$')

	# Our variables end up in Cairo convention, so the box height is ybr -
	# ytl.

	xtl=$(python2 -c "import math; print int (math.floor ($3))")
	ytl=$(python2 -c "import math; print int ($pageh) - int (math.ceil ($4))")
	xbr=$(python2 -c "import math; print int (math.ceil ($5))")
	ybr=$(python2 -c "import math; print int ($pageh) - int (math.floor ($6))")
	w=$(python2 -c "print $xbr - $xtl")
	h=$(python2 -c "print $ybr - $ytl")

	# Lamebrained uniqifying of output filename.

	n=1

	while [ -f fig$n.pdf ] ; do
	    n=$((n + 1))
	    done

	# OK to go.

	echo pdftocairo -pdf -f $page -l $page -x $xtl -y $ytl -W $w -H $h \
	  -paperw $w -paperh $h "$file" '|' pdfcrop --margin $margin fig$n.pdf
	exec pdftocairo -pdf -f $page -l $page -x $xtl -y $ytl -W $w -H $h \
	  -paperw $w -paperh $h "$file" - | pdfcrop --margin $margin - fig$n.pdf

Unlike Peter (and thanks to piet and haui), I can show an actual vector screenshot made by this script:

<img src="../images/vectorshot.svg" width="600" alt="Vectorshot!" />

The size of this shot is 2.7 kB. A bitmap of this size showing the same section is so terribly ugly that I've decided not to present it here.

