21 Jan 2022

Git Steganography

Here is a repository on Github. Look at the commit hashes — they count!

This was fairly easily accomplished. In constructing the commit hash, git uses the full timestamp (alas, second precision only) as well as the commit message. Meanwhile, the default abbreviated hash is seven hexadecimal digits long. So, if we just guess \(\sim 3 \times 10^8\) different times and commit messages (while holding the actual content of the commit fixed), we’ll probably find one that yields a hash that begins with the desired seven digits.

Some care is required. If we demand that all 7 digits be the desired value and don’t change the commit message, then we may need to shift the commit date by several years. We either control only the first few characters of the hash, or we must accept changing something about the commit itself (probably the message).

Variations on this trick are possible, with a bit more effort. For instance, one can take an existing repository and shift commits forward or backward by a few seconds, in order to get the first couple characters of the hashes to be desired values. The repository would look perfectly normal in all other respects, but would encode a secret message (hence “git steganography”).

More interestingly, this sort of trick can be used maliciously! Suppose you’re preparing a large changeset for a popular repository that does merges, not rebases. The commits you prepare will be in the repository for all time. It’s computationally cheap to tweak the first byte of every hash to encode a desired message (“Henry’s marriage to Anne Boleyn is invalid”). However, it’s expensive (in a different sense) to remove that message. All people who are watching the repository will have to force-pull. This makes people suspicious, and so the circumstances must be explained. The censorship cannot be accomplished surreptitiously.