The JPEG file format is one of the most ubiquitous formats on the web, but the actual technology that powers the compression is old. A new method, based on machine learning, might change what photography really looks like.
Before we get into the new format, let’s talk a little bit about how JPEG actually works. JPEG is a lossy, variably compressed file format first introduced in 1992. Breaking down those two key terms reveals what makes JPEG important and different. A file format can be compressed, meaning the size is reduced from its original form. This compression can be lossy or lossless — a lossless format compresses the data as best it can, without throwing anything out, while lossy formats can discard some (hopefully unimportant) data to make the file even smaller.
What’s important to understand as a photographer is the trade off between size and quality. At 80 quality, JPEG compression is virtually indistinguishable from a non-compressed image, but at very high compression ratios (small sizes), artifacts and quality loss are terrible. JPEG compression also doesn’t handle repeated operations well. Remember the blocky images of early memes that were saved, sent around, and saved again?
There’s been a number of attempts to replace JPEG over the years, but they’ve all lost out for various reasons. Partly, it’s chicken and egg. Nobody is going to save into a new file type if nobody else is going to be able to open it. Some companies, like Apple, have tried to use their large installed user base to push for a new standard. In Apple’s case, the relatively new HEIF is even the default for some functions of iOS, but typically it’s converted back to JPEG anyway for most apps and purposes.
As a result, a new file format has to actually offer a few things. It’s not enough for it to just be good at compression, or versatile, but also for it to be relatively unencumbered by patents or licensing, and above all, widely adopted.
Some of those attributes are business decisions made by the gatekeepers of the web: think Google’s Chrome browser, Apple’s Webkit, and the like. There won’t be a push for them to change things until something really revolutionary comes along. A new research effort, led by Fabian Mentzer and a team from Google Research, might have just made that leap forward.
They call their method of compression HiFic. HiFic “combines a generative adversarial neural network with learned compression to obtain a state of the art generative lossy compression system”. To try and put this simply, it trains a computer mind to rebuild that thrown-out data during compression, creating a higher quality result at a lower size, compared to current compression techniques. What’s key about this technique is that it relies on a generator that even the authors acknowledge “in theory, can produce images that are very different from the input”.
The research is fascinating, and it’s very interesting to see just how far things have come in a few years. If you’re at all interested, check out the link to the paper’s site. There’s more comparisons of different images and quality levels than can fit in this article.
Why Does This Have the Potential to Change Photography?
Photography and videography have always faced contention over how closely they represent reality. Whether it’s dodging and burning or color temperature adjustments, all the way up to compositing, focal length blending, CGI, and deep fakes, editing choices can affect how true-to-life a picture is. In the era of “fake news”, the verifiability of an image or video clip can be hugely important. I’d even say there’s an ethical dimension to it for any type of photography – not just journalism and news coverage.
All of those mentioned actions however, are deliberate choices made by the photographer or editor (setting aside arguments of the impact of color versus black and white, or choice of focal length) after the shot. If HiFic, or another GAN powered file format takes hold, there’s going to be a new source of concern right at the time of file creation, even without any editing.
Consider this comparison between the original shot and the HiFic compressed version. It’s easy to imagine a scenario where the time on that clock was important, like to verify someone’s alibi, for instance. If the image were to be heavily compressed with HiFic, should the time that clock showed actually be taken as the truth? Perhaps the algorithm generated the clock hands in the wrong position.
This scenario has already happened with a different compression algorithm used by Xerox copiers. In that case, the algorithm changed 6s to 8s when used with a certain typeface. Now, extrapolate that behavior out to a compression method that is deliberately “dreaming up” details found in an image, and it’s easy to imagine the impact in a broader range of scenarios.
Beyond just the very serious evidentiary implications of this technology, there’s also an artistic question. In their current implementation, the algorithm does a good job of retaining the overall appearance of the input image, but at what point does the image your viewers see stop being your image? Addressing the ship of Theseus is beyond the scope this article, but there’s a more concrete concern as it relates to image quality and artistic merit.
There’s already a huge divide between viewing devices that have HDR support, color accurate displays, and just plain size disparities, so this may seem minor. I’d argue it’s not, as I’m sure this technique won’t be applied in moderation. If Instagram already tries to stomp your image down to a few hundred KBs, why wouldn’t they use this to shrink it to tens of KBs? At that point are so many details being reconstructed that your image isn’t even being conveyed anymore? Instead, it could be argued your viewer’s device is just making up a picture based on a rough description of your original.
The Philosophy of Photography
The debate around what’s acceptable in photography is ever evolving and inherently connected to your values. I feel that the standard for journalistic photography should be far stricter than that of artistic pursuits, but regardless, a file format that can meaningfully impact the subject matter of the image is worth understanding. This technology is still in the lab and could still significantly change before it comes to a device near you. In the meantime, what do you think the implications of this would be on photography, or more broadly the role that images and video play in shaping discourse?
Lead image courtesy of Morning Brew