Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Ticket #31 (closed task: fixed)

Opened 6 years ago

Last modified 4 years ago

Compression library

Reported by: larsivi Assigned to: larsivi
Priority: major Milestone: 0.99.4
Component: Tango Version:
Keywords: vfs Cc: daniel.keep+tango@gmail.com

Description

Tango needs a compression package, preferably supporting as many of the common compression algorithms as possible.

An easy start might be to use std.zip from Phobos.

Change History

06/12/06 17:37:17 changed by kris

  • owner changed from kris to larsivi.

(in reply to: ↑ description ; follow-up: ↓ 3 ) 07/23/06 23:20:36 changed by jpelcis

After a discussion on IRC, here's the current outline for the Tango compression libraries:

  • The base class will be tango.io.Conduit, with the IConduitFilter functionality used for the compression algorithm.
  • All of the Tango classes to handle the compression libraries will go in the tango.compression namespace.

The following compression formats will be supported (eventually):

  • Zip
  • Gzip
  • Tar
  • Bz2
  • 7z (pending license question)

The discussion is still open as coding hasn't started.

(in reply to: ↑ 2 ) 07/25/06 22:03:26 changed by jpelcis

* 7z (pending license question)

Usage has been approved by Igor Pavlov.

(follow-up: ↓ 5 ) 08/14/06 15:47:56 changed by larsivi

Unless we have some serious hope of getting this done for 1.0, it should be moved to 2.0

(in reply to: ↑ 4 ) 08/15/06 17:55:15 changed by jpelcis

  • milestone changed from 1.0 to 2.0.

Replying to larsivi:

Unless we have some serious hope of getting this done for 1.0, it should be moved to 2.0

It's been moved.

01/02/07 15:17:57 changed by larsivi

See CompressionPackageDesign for further discussions.

04/16/07 17:14:02 changed by kris

notes from Oskar:

[12 Mar 07 09:30] Kris: so what does the (final) folder hierarchy look like?
[12 Mar 07 09:31] Oskar_: suggestion: under transform: digest, cipher, compression, crc, encoding(?)
[12 Mar 07 09:31] Kris: and what lives in each of them?
[12 Mar 07 09:32] Oskar_: digest: Sha0, Sha1, Sha256, Sha512, Tiger, Md4, Md5, etc... Also possibly a digest version of Crc32
[12 Mar 07 09:33] Oskar_: cipher: AES, DES, TripleDES, Blowfish, etc
[12 Mar 07 09:33] Oskar_: Compression: Zlib, RLE, LZW, etc...
[12 Mar 07 09:35] Oskar_: Crc: possibly: Crc32, Adler32, Crc64
[12 Mar 07 09:35] Oskar_: But nowdays, more expensive crcs are generally replaced by hashes/digests instead
[12 Mar 07 09:36] Oskar_: encoding: Base64, MimeQuotedPrintable(?), MimeBase64, UUEncode, ...

07/25/07 11:30:19 changed by DRK

  • cc set to daniel.keep+tango@gmail.com.

I've been playing with this and I've got a functioning (but not thoroughly tested) implementation of Zlib and Bzip2 compression done. They both compress and decompress a short sample of text which has also been independently compressed. Both are implemented as Input/Output Filters (separate filters for compression and decompression).

However, this has turned up a potential problem with the current IO system. Basically, both zlib and bzip2 will only produce output once they've gotten enough input to compress a full block. When you've finished feeding them all your input data, you need to "finish" the stream, which will cause them to pad and compress any remaining input data.

The problem is that filters don't have any kind of a "finish" method. The closest they have is a "flush" method, which I've been told is always called before a conduit is closed; however, the expectation with flush would be that you can call it any time you want to force output. If a programmer calls flush on a conduit with an attached compression filter, they will prematurely truncate the compression stream, preventing any further data from being compressed.

One potential workaround would be to simply allow multiple streams to be read/written back-to-back. The problem with this is that Tango would then be incompatible with any other programs that use zlib/bzip2 streams, which would probably be not so good.

The short of it is that programmers will need to be aware that they must not call flush on a conduit with attached compression filters until they are finished writing to it. The only real solution to this is to add an explicit "finish" or "detach" method to filters.

The current implementation of the Zlib and Bzip2 modules are at http://users.on.net/~drkeep/Zlib.d and http://users.on.net/~drkeep/Bzip2.d.

08/17/07 18:10:35 changed by kris

  • milestone changed from 2.0 to 1.0.

OutputStream? now has a commit() method, which could be used for the purpose you describe?

I've been at odds over how to deal with this commit issue, since adding such a method would subsequently require users to basically always invoke commit() before closure; just in case there's a dependent filter attached.

To avoid additional baggage, a solution would have been to place a close() method on the streams also. Unfortunately, that leads to potential conflict between multiple streams attached to a common conduit (input and output, for example). Whereas, explicitly closing the underlying conduit doesn't appear to have the same potential for misconception? Perhaps I'm being reactive on that point, so it may be worthy of discussion?

Anyway ... after trying a few ideas, the trunk code now contains a Conduit.dispose() method which invokes flush + commit, then closes the conduit. The notion is that we'll encourage ppl to use .dispose instead of .close, or swap the functionality around, or something like that. Either way, the approach also allows for an explicit commit() to occur without flush or close being invoked. The latter may be useful in some contexts (such as socket usage).

I'd like to ask if you'd try this out, Daniel, and let us know how it goes? Alternative ideas and/or approaches are most welcome too :)

08/29/07 08:01:57 changed by larsivi

Zlib and Bzip2 filters are committed to trunk

11/12/07 06:05:12 changed by larsivi

  • keywords set to vfs.

VFS abstraction is now ready to get the archive support in.v

12/17/07 22:54:59 changed by kris

Daniel is suggesting we consider moving some stuff around in the io subtree. We probably should discuss before the next release

12/19/07 04:25:53 changed by larsivi

  • status changed from new to closed.
  • resolution set to fixed.
  • milestone changed from 1.0 to 0.99.4.

Zip archive support was added in [3044]. It is time to close this ticket. Further formats and features should be governed by new tickets.