Bury it under the noise floor (Steganography)
2013-03-22 01:59 by Ian
Here is discussed a PHP program to hide encrypted messages and files inside of images without significantly changing the way the image looks. This is meant to be an instructional write-up to touch on some common ideas and mechanisms in cryptography.
A test-fixture for the program can be used at this location.
- AES-256 encrypted with optional BZ2 compression.
- Multi-round password hashing to derive key material from passwords.
- No predictable patterns to search for. Offset is derived from the password, and the stride between pixels is arhythmic, with a seed value derived from the password.
- Each color is treated as a discrete channel, with messages able to span channels. This allows up to three independent messages to be overlaid on the same carrier, or one message can occupy multiple channels.
- Option for automatic re-scaling of the carrier image to minimize waste.
- All relevant parameters are derived from the key, or stored as discrete bits, allowing for simple decoding. Just supply a carrier image and a correct password.
- MD5 integrity verification.
- File storage and retrieval
- Built-in trivial logging faculty.
- Hiding data and messages (bitcoins / password lists / etc).
- Watermarking intellectual property
- The modulated carrier must remain as it was output. If it is scaled, re-sampled, or compressed, the data it contains will be lost forever.
- The output format must be a lossless format (PNG, BMP, etc). Compression may be used, but only if that compression is lossless. The same holds true for conversion between lossless formats.
- Animated GIF doesn't work.
Here are some examples of images that are well-suited for hiding data in this manner...
This image is good because it was taken with a crappy cellphone camera, and has lots of noise.
This image is good because it was converted from a lossy format and is suffering from some bad artifacts. However, it should be noted that this particular brand of noise has a certain character that can be distinguished from the flat, even noise that the program will introduce. But it would take very close scrutiny to notice.
These images are ill-suited for hiding data...
This image is bad because it was made from scratch in an image editor, and has no noise, and few colors.
This image is bad because it has large fields of pure colors.
To demonstrate just how little the image quality is impacted, I will deliberately use an ill-suited image. The image below has the key "test keys". This image was originally a GIF with a 256-color palette. The program automatically up-sampled it to true-color prior to modulating it. There is a JPG image embedded in this one:
Because the three channels are treated independently, I can distribute both of the source files required to implement it within a single image. I will also take this chance to give you some visual representation of what goes where. But first, the source code:
I'm not jerking your chain. That image contains about 52KB of source code encrypted in parallel with two separate keys. Those keys are:
key_for_steg-img.php and key_for_form.php
That is, the image contains all the code required to decode itself. If you take it over to the tool I linked you to earlier, you can dig it out. Some of the character sequences in the source code may bork the browser, so you might have to "view Source" to see everything.
As many as three messages may be layered into one file, on the following conditions...
- StegImage::testPasswordCompatibility() must return true for the given passwords, or else one or more messages will be lost.
- Rescaling must be disabled after the first message, or all previously-encoded messages will be lost.
- A channel used for one message cannot be used for any others. This means a maximum of 3 messages per carrier.
The class has example usage of the static function that will help you test passwords.
When encrypting, if you use the option to expose affected pixels, you can generate maps like those below. These images are useless for data storage, because the modulation function simply drives the pixel full-red, and doesn't encode any data. This is the map for steg-img.php, which was modulated into the red and green channels, and was 41790 bytes. It was the first file to go into the image (because it was the biggest), and so it set the size of the rescaled carrier. Basically every red pixel is carrying data.
Below is the map for the form. As you can see, it's much denser despite the smaller file size (in this case, 11120 bytes). This is because the maximum size of the arythmic stride was 3, and so there are never more than two unaffected pixels in any given stretch.
It is worth mentioning that to keep the noise profile consistent across the image, random data is written to the file when the message bit-stream is exhausted. If this were not done, the image would show serious weakness against simple statistical tests, even if a human being couldn't tell the difference.
And the merger of both maps, showing every altered pixel.
During a later entry, I will cover this and some details of the code that I think are difficult to understand, broken, or interesting for some other reason.