Question: Can you use .zip support in ED?

Nov 1, 2011

Question:

Does the XML ITS 1.0 data types spec allowed sending zipped compressed content with ED datatype ?

ED support for compression

ED has a compression flag, with the definition: “Indicates whether the raw byte data is compressed, and what compression algorithm was used.” The possible values are:

DF deflate required The deflate compressed data format as specified in RFC 1951 [http://www.ietf.org/rfc/rfc1951.txt].
GZ gzip indifferent A compressed data format that is compatible with the widely used GZIP utility as specified in RFC 1952 [http://www.ietf.org/rfc/rfc1952.txt] (uses the deflate algorithm).
ZL zlib indifferent A compressed data format that also uses the deflate algorithm. Specified as RFC 1950 [http://www.ietf.org/rfc/rfc1952.txt]
Z compress deprecated Original UNIX compress algorithm and file format using the LZC algorithm (a variant of LZW). Patent encumbered and less efficient than deflate.

The status means that implementations are required to support deflate - i.e. receiving systems that process EDs must be able to handle deflate. And “z” should not be used - though we defined a code for it, which we declined to do for other things. I don’t know why that is.

It can be a bit confusing, when dealing with ED, because there’s multiple levels of compression around. ED has it’s own compression, but the content might already be compressed - the compression is an inherent part of the content (such as, say, JPEG). Or the ED might be a reference to content on a web server that might gzip compress the http response. After much discussion, we clarified this in data types R2, and this clarification should be taken to hold for R1:

The compression applies to the data applied in line, not to data provided by reference, even if there is no data provided in line. Note that some compression formats allow multiple archive files to be embedded within a single compressed volume. Applications SHALL ensure that the decompressed form of the data conforms to the stated media type. The stated media type applies to the uncompressed data.

I understand this to mean that it’s wrong to specify a compression if there’s no inline data.

.Zip

The obvious thing missing from that list is .zip - and people often as about that: why isn’t it supported?

Well, the .Zip specification is not actually a compression method, it’s an archive format, that uses other compression methods as part of the archive format. Users don’t usually worry about which one - just the default or the highest compresssion level, instead they focus on which files are in the .Zip. And it’s exactly that reason why .Zip isn’t one of the compression methods that ED supports - an ED is a single stream with a fixed media type. So there’s no real point in using a .zip archive - unless the media type itself specifies a .zip archive (say, for a docx file, which is “application/vnd.openxmlformats-officedocument.

summary: .Zip is not a supported compression, but you can include .zip archives if you use a mediaType that specifies a zip archive.