Android --> GZip --> PHP

2013-09-12 05:36 by Ian

Sometimes, you need your android app to ship a large volume of data to a server, and bandwidth is more important than CPU. I ran up against this recently while working on a log-dump function. If the user tries to upload 5000 lines of log, it might be several megabytes of low-density data going over his line.

If this were a linux-native environment, I'd have chosen to use bzip2, but Java/Android doesn't have it. We do, however, have gzip. Good enough I suppose.

Sadly, PHP's support for gzip is considerably less clear. There are several different functions, support mis-matches between versions, and lots of over-simplification from users on PHP.net and StackOverflow. For what it's worth, I am using PHP 5.3.27 on my dev machine, with the gzinflate() function.

One more resource that will inform the example below. If you read no other documentation I've posted here, make sure you read this one. If one is available, the RFC is to be considered the authority on how a given idea works http://www.faqs.org/rfcs/rfc1952.html

If you read the RFC, you will discover why so many people get confused over this problem. Android exhibits what might be considered to be a basic, no-frills implementation of gzip compression. The library on JellyBean doesn't appear to take advantage of many of the features that gzip provides, although there is certainly nothing stopping us from writing a wrapper class that does. But this is beyond our scope here.

PHP, on the other hand, doesn't seem to care what the RFC says, and disregards the gzip header completely. Now... this is not to say that it produces incorrect results. But if you do feed the header to gzinflate(), it will not decompress the data, as it will interpret the header as being the actual start of the data stream. So remember....



After much argument with PHP, I finally figured this out. Here is what I ended up with.

The java function that does the compression...

public static byte[] compress(String str) throws IOException {
    ByteArrayOutputStream os = new ByteArrayOutputStream(str.length());
    GZIPOutputStream gz_out = new GZIPOutputStream(os);
    gz_out.write(str.getBytes());
    gz_out.finish();
    gz_out.flush();
    gz_out.close();
    os.flush();
    os.close();
    return os.toByteArray();
}


The PHP code that does the decompression...

function convertBinToString($bin, $p_len = -1) {
    $temp_bin_str = "";
    $len = ($p_len == -1) ? strlen($bin) : $p_len;
    if ($len > strlen($bin)) {
        $len = strlen($bin);
    }
    for ($i = 0; $i < $len; $i++) {
        $temp_bin_str .= ord($bin[$i]). ' ';
    }
    return $temp_bin_str;
}

// Returns decompressed data on success, false on failure.
function decompress($compressed) { 
    $return_value = false;
    if (isset($compressed)) {
        $encoded_text = str_replace('-','+',$compressed);  // PHP doesn't like URL-SAFE base64.
        $encoded_text = str_replace('_','/',$encoded_text);
        $decoded = base64_decode($encoded_text);
        // Do NOT use short-circuit evaluation on the line below...
        if ($decoded & (substr($decoded) > 10)) {
            if (substr($decoded, 0, 3) == (chr(0x1F).chr(0x8B).chr(0x08))) {
                $decomp = gzinflate(substr($decoded, 10));
                if ($decomp) {
                    $return_value = $decomp;
                }
                else {
                    error_log('Failed to decompress data. We found this header: '.convertBinToString($decrypted, 10));
                }
            }
            else {
                error_log('Data does not appear to be valid GZIP. Header not found.');
            }
            else {
                error_log('Data failed to decrypt.');
            }
        }
        else {
            error_log('Data failed to base64_decode.');
        }
    }
    else {
        error_log('Someone tried to post data that did not exist.');
    }
    return $return_value;
}

As a side-note, there is an under-rated block of code in there that is easy to gloss over...

$encoded_text = str_replace('-','+',$compressed);  // PHP doesn't like URL-SAFE base64.
$encoded_text = str_replace('_','/',$encoded_text);


This has the potential to drive you bananas if you forget it: base64_decode() does not work reliably with all base64 alphabets specified in RFC4648. And if you don't convert it, as done above, your data will sometimes decode and sometimes fail, depending on the presence or absence of the contentious characters. This will be a non-issue for POST data, because it doesn't need to be URL safe. But if your compressed data needs to pass through a binary-unsafe layer along its journey, you must base64 (or similar) encode it. And sometimes those layers have strange rules. So there you have it.

Three Red Bulls were harmed during the research and writing of this poast.

Previous:
Next: