Thu Dec 29 18:21:24 CET 2011
Visual paradigms
Fashion also rules in the digital world. What do you feel when you see the default Luna wallpaper? What do you think when you see those ever-equal glassy icons Apple is using on all of their gadgets?
Personally, I need a change from time to time, and that's why I
decided to give the look of this blog an overhaul. Hope you like
it. 
Mon Dec 26 18:05:33 CET 2011
Modern plague
Since I was just talking about publications: one of the corner-stones of the modern publishing business is the peer review process. Non-scientists usually have a hard time to believe what this term actually means. Believe me, I've had a hard time too.
In short, peer review means that you have to read the manuscripts of absolute strangers and provide an authoritative and tactfully written review which details the reasons for acceptance or rejection. All for the editors of the journals for which you pay to publish, and for which you pay to read, and all for free, of course. That's right, we don't get anything for that. Some of the journals send a Christmas card, but the main motivation is that the community sells it as an honor and a moral obligation.
That wouldn't be a problem if they'd ask you once a month, and you had nothing more important to do anyway. I'm currently getting asked twice a week, and I know that in 95% in all cases, the manuscript under consideration is from China or Korea and I will reject it anyway.
Why? Well, let me quote an insider:
'One Chinese scientist has referred to the majority of China's publications as “pollution"'
Poorly written, zero content. And if they continue to flood the journals with their trash, the peer review system is going to end.

Mon Dec 26 15:38:12 CET 2011
Act of desperation
An essential, even defining element of modern science is that it's taking place in the public. Scientific results must be published: this principle is valid since centuries, but the process itself has changed drastically only over the past decades.
Back in 1958, for example, John J. Hopfield obtained his PhD in physics at Cornell University. The results of his work are summarized in one paper submitted to the Physical Review.
The manuscript itself was handwritten, as usual in these days. It was probably John's girlfriend rather than the department's secretary which transcribed the text using a mechanical typewriter, excluding, of course, the numerous equations which John carefully inserted into the transcript. John also provided rough sketches of the figures, which were then traced by the ladies in Cornell's drafters office and photographed for submission. The people at Cornell's postal office helped with the submission of the paper.
Upon acceptance, the manuscript was carefully proofread and converted into a typographically correct typescript by the Physical Review's assistant editor responsible for the manuscript. Particularly, the handwritten equations of John's manuscript had to be transcribed correctly, and that was no small task:

Now, let's compare that to the situation encountered 2012 by the imaginary PhD student Hans Weißwurst. Hans has been told that prospective employers will not be very impressed by a single Physical Review. He heard lots of talking about illustrious "high-impact" journals, and he has the vague hope to to submit his work to one of them.
What he doesn't know, and what nobody prepared him for: he will be author, secretary, designer, drafter, editor, first critic and proof-reader in one person. Unfortunately, he does not have the a clue about any of these jobs. When he's told that it's about time to write a paper, the drama unfolds.
Hans believes that the first thing he has to work on is the title, which is not altogether unreasonable if nobody tells you otherwise. He gives himself a week to find an appropriate title, but is unsatisfied even after the second week. He thus starts to write the abstract and the introduction, but finds himself brooding over the title even after six weeks instead.
After three months, his supervisor asks about the progress of the paper, and Hans hurries to finish his work. What's still missing is the representation and discussion of his results, but that's not too difficult, since he knows them all by heart. He thus quickly describes them and finally rushes to create some rough plots of the data.
Hans is imaginary, but his manuscript is not. In the past two
years I've edited and rewritten many manuscripts by young
scientists, and several of them reflected a shocking naïvety and an
astounding ignorance. This year, the situation has worsened to such
a degree that I couldn't see any other way than to write a tutorial. At first, this tutorial was intended for
internal use only, but at a second glance I believe it may also be
read profitably by a broader audience. Have fun. 
Posted by cobra | Permanent link | File under: presentations, thoughts, mathematics, latex, linux, hardware
Sun Nov 20 16:58:48 CET 2011
Growth
Every now and then I pick up Indy and step on our balance to check his weight. The results are displayed in the figure below.

The thick, light blue line is the result of a linear fit, revealing a weekly weight increase of about 140 g. A deviation from linearity is not yet apparent. That's why the second fit by a logistic function (thin dark blue line) and the resulting prediction of a final weight of about 6 kg is not very accurate at the moment.
Here's the gnuplot script generating this figure:
set terminal svg size 600,400 dynamic enhanced fname 'palatino' fsize 12 solid
set output 'indyweight.svg'
set xlabel "date"
set ylabel "weight (kg)"
set xtics nomirror
set ytics nomirror
set style line 1 ps 1.5 pt 4 lc -1
set style line 2 lw 8 lc 8
set style line 3 lw 2 lc 6
set key right bottom
set xdata time
set timefmt "%d.%m.%y"
set xrange ["11.07.11":"27.11.11"]
toffset=strptime("%d.%m.%y","18.07.11")
linear(x) = a + b*(x-toffset)
c = 5; d = 0.1; e = 3e-9
logistic(x) = c/(1 + d*exp(-e*(x-toffset)))
fit linear(x) "/home/cobra/Documents/indy" using 1:2 via a,b
fit logistic(x) "/home/cobra/Documents/indy" using 1:2 via c,d,e
gain = 1000*7*24*60*60*b
finalweight = c
set label "linear gain: %g g/week", gain at graph 0.02, graph 0.95
set label "predicted final weight: %g kg", finalweight at graph 0.02, graph 0.9
plot "/home/cobra/Documents/indy" using 1:2 notitle with points ls 1, linear(x) with line ls 2, logistic(x) with lines ls 3
Update: For some reason, webkit-based browsers such as Chromium don't handle the particular svg above correctly, but add huge upper and lower margins. I thus have replaced it for the moment with a png.
Sun Nov 13 17:13:57 CET 2011
The encoding hell
At work, we've now discarded JabRef in favor of Mendeley for reference management. This step turned out to be a breakthrough: adding papers to Mendeley is so easy (you simply drag and drop from the browser to the Mendeley client) that people actually do it unsolicited. Our databases are thus rapidly growing.
Mendeley extracts all bibliographic information directly from the pdf and is fully unicode aware. That's nice, since all author names and special characters in the title will be displayed correctly in the Mendeley client. However, what will happen when this information is exported for further use as a LaTeX bibliography? Mendeley itself actually sports a conversion facility, but will that be sufficient?
Well, let's try and analyze the resulting bibliography with the help of the little script I've shown previously. Just as I've feared, the encoding is shown to be ok (the file does not contain any non-utf8 encoded characters) but compilation fails. The reason is simple: good ol' LaTeX is ASCII only, unicode support via inputenc of very limited nature, and mendeley translates only a very few characters (and perversely just those where such an action would not be required).
What to do now?
The right thing to do would be to use a modern TeX system. Both XeTeX and LuaTeX fully support unicode, and so does the bibTeX successor 'biber'. The main problem with that approach is simply that the journals to which we submit will change from the original TeX to one of its modern incarnations not earlier than...say, 2017. And that's optimistic.
I hoped that biber alone could solve the problem, since it has the capability to convert from one encoding (in the input) to another one (in the output). However, it turned out that biber also knows only a very limited set of unicode characters. What's worse is that biber/biblatex is not compatible to natbib, a prerequsite for RevTeX. Of course, the fact that biber is not available in the standard repositories of the major Linux distributions will not contribute to its further dissemination.
A partial solution is the package 'mat2bib' which contains the python script 'utf8_to_latex.py'. Using a conversion map contained in 'latex.py', calling this script converts the majority of characters to LaTeX compliant command sequences. Those it doesn't know will be converted to an expression like '\char{xxxx}', where xxxx is the decimal (html or utf-16) descriptor for the character in question.
What you thus will see anytime when attempting to convert a bibliography from Mendeley containing names such as 'Sánchez-García' to pure LaTeX are the following sequences: 769, 771, and 776. Those sequences do not correspond to actual characters, but to accents accompanying certain German and Spanish letters:
769 [Unicode Character 'COMBINING ACUTE ACCENT' (U+0301)] i.e., an acute accent as in á
771 [Unicode Character 'COMBINING TILDE' (U+0303)] i.e., a tilde as in ñ
776 [Unicode Character 'COMBINING DIAERESIS' (U+0308)] i.e., an umlaut as in ö
The character 'á' can thus be represented in two different ways using unicode...
(i) a + 'U+0301' (letter first)
(ii) 'U+00ED'
...while in LateX, this character is represented by
\'{a} (letter last)
The python script mentioned above lacks the ability to translate these characters. Looks like we have to do it ourselves. Stay tuned.
Sun Oct 30 16:34:00 CET 2011
Cloudy
Geek [giːk]: A person discovering the cloud the day the icloud is announced.
Nerd [nɜːd]: A person bored to death by geeks chattering about the cloud.
Freak [fɹi:k]: A person believing that clouds are in the sky.
Synchronization of my data has been an issue for me long before Dropbox materialized in 2008. I used a crude but simple solution based on rsync scripts started manually or via the crontab. Something more elegant and efficient would have been possible with inotify as described here. Lsyncd is another option aiming at the same purpose. However, building an automatic two-way sync service based on these tools comparable to Dropbox or Ubuntu One is far from trivial. Since the amount of data I have to sync is steadily increasing, I start to feel a little frustrated with this situation.
As much as I disapprove of the recent hype of cloud services, I cannot deny that Dropbox & Co. are far more complete synchronization services than the primitive and rudimentary solutions I've been using. Particularly, the real-time synchronization offered by these services results in a data integrity unattainable by conventional sync or even backup schemes. For example, while I'm typing this very blog entry, any one of these services would ensure that not a word would be lost even if my cat suddenly hits the power button, because everything I type would be synced in real time to the cloud.
Well, then, why don't I use these services if they are so great? For reasons of control, security, and privacy. As a general rule, I prefer to have control over my data rather than turning them over to an organization which I do not trust by default (and why should I?). This attitude is corroborated by experience, and indeed, there was never a better example than Dropbox. How can we be expected to trust the system if its proven to be broken by design after one critical glance (see also here and here). These security concerns compromise the usability of Dropbox: I really don't want to have to think about which data would better be contained in an encfs encypted folder before putting them on the cloud.
Wouldn't it be great if there were a free and trustworthy service capable of the same effortless, instantaneous synchronization of data as offered by Dropbox & Co? Ideally, this service could be installed on our own servers, so that there'd be no need to register or pay, no size limit, and no one to trust except ourselves. Meet sparkleshare.

Server
Any machine running openssh-server and git-core will do. On
pdes-net.org, piet took care of the latter dependency some days
ago—thx, piet! 
After installation, issuing
cd git init --bare sync.git
will initialize a git repository in the directory /home/user/sync.git.
For matured versions of git, do the following:
mkdir sync.git cd sync.git git --bare init-db
Client
I assume that you have already a public-key ssh connection to the server of your choice. If you connect to server.org via a non-standard port, for example 1234, define it in ~/.ssh/config:
Host server.org Port 1234
Now, install Sparkleshare and its dependencies. Both Archlinux and Ubuntu had the latest version in their repositories, but YMMV.
If you did not use git before, introduce yourself:
git config --global user.name "Firstname Lastname" git config --global user.email "first.last@email.com"
Now, start Sparkleshare from the menu or via the commandline by issuing 'sparkleshare start'. Answer the questions. Note that the server address should be in the form "user@server" and the subsequent path should be absolute.
That's it. You have established your own personal cloud.

Sun Oct 9 12:48:57 CEST 2011
Bundestrojaner
Almost 5 years ago, I speculated that a state trojan launched by the German government and the BKA would soon be detected by common anti-virus scanners ("Eine hohe Verbreitung vorausgesetzt, werden nach einer gewissen Anlaufzeit vermutlich auch alle Malwarescanner in der Lage sein, den Bundestrojaner zu erkennen.")
The following screenshot illustrates the situation just one day after the CCC disassembled code they believe to represent such a state trojan.

Sun Oct 2 17:46:39 CEST 2011
Groundhog Day
Every year at that time we are preparing the annual report, and every year I'm shocked by the horrid look of many of the submitted figures. This year I decided to try to understand what people do and why (C: Cobra, S: Student).
C: Your figures are not suitable for the annual report. They
look...eh...horrible.
S: Why?
C: Well, don't you see all the compression artifacts here *point*
and there *point* and all these pixels all over the place?
S: Now that you say that...but what can I do? *shrug*
C: What about telling me what you did?
S: Nothing special, the standard way.
C: The standard way?
S: Sure. I create the figure with Powerpoint, copy and paste it
into this Gimp thing and then save it as eps.
C: Hm...do you actually know the difference between vector and
pixel graphics?
S: Of course! Most certainly!
C: Tell me.
S: In pixel graphics, the information, I mean the color and so on,
is encoded per pixel. In vector graphic, each pixel is represented
by three vectors, one for each color. That's why vector graphics is
so much bigger. But the advantage is that you can distort the image
as you like, while pixel graphics is fixed because the pixel is
always square shaped.
C: ...
Let's examine these statements with the help of an example.
Here's the original Archlinux Logo as vector art, scaled to the column width of 600 px. Regardless of scaling, its size is 4 kB when saved as svgz. Saved as pdf reduces its site further to 2.9 kB.
And here's the the same image when saved in the format of the 21st century, scaled to the column width of 600 px, and reduced in quality to finally yield a size of 4.8 kB (and thus similar to the vector art above).

For educational reasons, I invite you to press the '+' key on
your keyboard five, no, ten times. I'm sure you'll see the
difference and understand it, at least from a practical point of
view. 
When we finally save the pathetic remains of the logo as a vector graphic, the resulting size is indeed on the order of a hundred kB. The reason is obvious.
If not, magnify the images again and have a closer look.
PS: I'm not alone:

Sun Sep 11 14:26:40 CEST 2011
The last day of summer
Or: How to lose loyal users: a beginners guide for soon-to-be extinct Linux distributions
1. Promise that the badly needed upgrade will be recognized by
the update manager.
2. Just in case if not, put a description on the Wiki which can't
possibly work. Let the user find out why.
3. After the user found out and forced the upgrade, arrange
numerous conflicts which increase the users problem-solving
ability.
4. When the user has sorted out all the challenges, present a
kernel panik upon reboot. Give him the real deal!
Farewell, Mandriva! You were my trusted companion for a decade, and I'm sure to miss many of your amenities. But I can't use a system which offers a TeX distribution from 2007, and which breaks upon an online upgrade.
There was no question what I'd install instead: that had been clear since my discovery of Arch Linux more than two years ago. I use Debian Testing on all workstations, but it's not quite up-to-date enough for a desktop if you ask me (I was an avid user of Mandriva Cooker until I decided that this platform, while offering comparatively current packages, is simply too unstable to be of use). In contrast, I have not seen Arch to break in the two years I'm following its progress in two virtual machines. I also became moderately familiar with Arch Linux itself, which I still believe to be the most transparent and, in a sense, most simple distribution I've ever tested and used.
The installation and configuration was, as usual, straightfoward, but two issues remain. First, 'keychain' works, but neither 'openssh-askpass' nor 'ksshaskpass' do. I thus have to manually call 'ssh-add' upon each reboot. Admittingly not a big thing. The second issue is more disturbing: while both 'privoxy' and 'pdnsd' work perfect separately, they don't work together. I just get 404s when trying, and I have no clue as to the reason.
Everything else, however, functions perfectly. There are, of
course, many small things to be taken care of when changing from a
very old to a very new distribution (just think about python 2.x
and 3.x), but most of this tinkering is over and done. I can lean
back and enjoy. 
The little pacman you see in the tray, by the way, is the icon
of yapan, a cute little update manager which keeps the
system up-to-date in its own cute little way. 
