.. title: The fastest search, or: find your password
.. slug: the-fastest-search-or-find-your-password
.. date: 2019-02-21 19:03:37 UTC
.. tags: encryption, linux
.. category: 
.. link: 
.. description: 
.. type: text

A couple of weeks ago, `a monstrous set of login data was passed around in the dark web <https://www.wired.com/story/collection-leak-usernames-passwords-billions/>`_. Now known as collections #1 – #5, the files contain an estimated 2.2 billion of unique email address/password combinations. And that means cracked passwords, mind you, not only hashes.

Some of my account data have been leaked ages ago by Adobe and Dropbox (at that time, I believed in simple – `8-digit <https://cobra.pdes-net.org/posts/passwords.html>`_ – passwords for test accounts). Incidentally, I had already deleted these accounts when the breach became public. 

Currently, all of my accounts are secured by an at least 25-digit random password with an actual `entropy <https://cobra.pdes-net.org/posts/password-hysteria.html>`_ not lower than 100 bits, which is essentially impossible to brute force even from unsalted SHA1 hashes. Hence, none of my accounts should show up in the collections mentioned above. If one did, it would show that the password had been saved in plain text, and I would react correspondingly by deleting this account.

The most comprehensive list of password hashes (including collection #1) has been assembled by `Troy Hunt <https://haveibeenpwned.com/>`_. We can download and search this list very easily as shown in the following.   
	
::

	$ cd /hdd/Downloads/pwned
	$ wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-sha1-ordered-by-hash-v4.7z
	$ cp pwned-passwords-sha1-ordered-by-hash-v4.7z /ssd/temp/pwned
	$ cd /ssd/temp/pwned/
	$ unp ppwned-passwords-sha1-ordered-by-hash-v4.7z
	$ wc -l <pwned-passwords-sha1-ordered-by-hash-v4.txt 
	551509767
	$ awk -F: '{ SUM += $2 } END { print SUM }' pwned-passwords-sha1-ordered-by-hash-v4.txt 
	3344070078

551 million passwords, and 3.344 billion accounts: holy cow. What's the fastest way to search for a single string in such a huge file? C't 5/2019 has actually `an entire article on this issue <https://www.heise.de/select/ct/2019/5/1551437903574108>`_, complete with a `github repository <https://github.com/pinae/HaveIBeenPwnedOffline>`_. 

Pina Merkert points out that grep, the standard tool for searching text files under Linux, does not exploit the fact that Troy Hunt offers the download also in a version where the passwords are ordered by hash (that's the version I've downloaded above). The `linear search <https://en.wikipedia.org/wiki/Linear_search>`_ performed by grep is indeed rather slow:

::

	$ time echo -n "111111" | sha1sum | awk '{print toupper($1)}' | grep -f - pwned-passwords-sha1-ordered-by-hash-v4.txt 
	3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220
	
	real	0m48.227s
	user	0m30.939s
	sys	0m5.752s

The possibility to order the data allows a `binary search <https://en.wikipedia.org/wiki/Binary_search_algorithm>`_ to be performed, which is potentially orders of magnitude faster – O(log[n]) vs. O(n) – than a linear search. Pina's python script is indeed dramatically faster:

::

	$ time ./binary_search.py '111111'
	Searching for hash 3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D of password "111111".
	Password found at byte  5824044773: "3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220"
	Your password "111111" was in 3093220 leaks or hacked databases! Please change it immediately.
	
	real	0m0.042s
	user	0m0.035s
	sys	0m0.007s

But my one-liner is four times faster than her 57-line script: 😉

::

	$ time look $(echo -n "111111" | sha1sum | awk '{print toupper($1)}') pwned-passwords-sha1-ordered-by-hash-v4.txt
	3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220

	real	0m0.011s
	user	0m0.009s
	sys	0m0.005s

'Look', by the way, is a binary search tool, part of the util-linux package and thus present on any Linux installation (and also on the Windows subsystem for Linux, I guess). Unfortunately, several distributions (such as Debian and Ubuntu) manage to ship a `broken version <https://bugs.launchpad.net/ubuntu/+source/bsdmainutils/+bug/510613>`_ of look since about 10 years. There's an `unofficial patched version <https://github.com/stuartraetaylor/bsdmainutils-look>`_ available on GitHub if you'd like to try.

With the help of 'look', one can also very easily go through lists of passwords – let's take these completely arbitrary examples:

::

	less topten.dat
	
	123456
	123456789
	qwerty
	password
	111111
	12345678
	abc123
	1234567
	password1
	12345

	$ time while read -r password; do look $(echo -n "$password" | sha1sum | awk '{print toupper($1)}') pwned-passwords-sha1-ordered-by-hash-v4.txt; done <topten.dat 
	
	7C4A8D09CA3762AF61E59520943DC26494F8941B:23174662
	F7C3BC1D808E04732ADF679965CCC34CA7AE3441:7671364
	B1B3773A05C0ED0176787A4F1574FF0075F7521E:3810555
	5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8:3645804
	3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220
	7C222FB2927D828AF22F592134E8932480637C0D:2889079
	6367C48DD193D56EA7B0BAAD25B19455E529F5EE:2834058
	20EABE5D64B0E216796E834F52D61FD0B70332FC:2484157
	E38AD214943DAAD1D64C102FAEC29DE4AFE9DA3D:2401761
	8CB2237D0679CA88DB6464EAC60DA96345513964:2333232
	
	real	0m0.053s
	user	0m0.048s
	sys	0m0.023s
	
If you ever had a shred of doubt that humanity has a glorious future, look at these 10 ingenious passwords, which secure the 54 million accounts of our most brilliant minds!
