Forum Navigation

What do I use for a general-purpose "read or write some crap" interface?

Moderators: kris

JarrettBillingsley

Joined: 06/21/06

Posts: 26

Posted: 06/02/07 02:37:14 Modified: 06/02/07 02:48:20

In Phobos, you have Stream. Everything is a Stream, and so all you need to do is have a function which takes a Stream if it wants to do some reading or writing. It doesn't matter what the underlying device is, whether it's buffered or not, whether it has filters on it or not etc. It's just a Stream.

In Tango, I can't find any one, consistent alternative.

If I use IReader and IWriter, they don't provide all the capabilities necessary for reading and writing -- namely, you can't just read or write a chunk of data of a given size without accessing the underlying IBuffer. Why is an IReader or IWriter tied to an IBuffer, anyway? What if you wanted to stick one directly to a non-buffered conduit? One thing they do provide is a bunch of convenient methods for reading and writing a lot of simple types.

If I use an IConduit, I'm restricted to a very basic read and write interface which just reads and writes raw bytes. But what if I want to use an abstraction of data that doesn't even have a IConduit, such as a memory-based IBuffer? So IConduit isn't universal.

If I use an IBuffer, I don't have access to the nice methods that IReader and IWriter provide. And if I have some kind of IO thing that doesn't have an IBuffer stuck on it, I can't use this. And why is there a .readExact method, but no corresponding .writeExact method?

I don't know where the InputStream and OutputStream interfaces are fitting into this system yet, or if they're completely defined yet (as of now, they only provide simple void[] read and write methods). Will they replace some of these other abstractions, or work alongside them in an even more confusing and inconsistent fashion? And where do filters fit into all of this? It looks like they will be implemented as Input/OutputStreams, so hopefully that'll be more consistent..

I don't know why I'd make a new scripting language. I mean, I might as well just draw some lines in the sand with a stick.

Author Message
r.lph50

Joined: 11/27/06

Posts: 21

Posted: 06/02/07 06:05:36 -- Modified: 06/02/07 06:06:56 by r.lph50

If I use an IBuffer, I don't have access to the nice methods that IReader and IWriter provide.

Do you mean in the same interface? Because while I haven't used them, the reader and writer interfaces seem to easily hook up to any IBuffer: auto read = new Reader(buffer);

why is there a .readExact method, but no corresponding .writeExact method?

Isn't append() on a buffer (in its various forms) basically a writeExact?

kris

Joined: 03/27/04

Posts: 581

Posted: 06/02/07 07:52:07 -- Modified: 06/02/07 08:03:27 by kris
JarrettBillingsley wrote:
In Phobos, you have Stream. Everything is a Stream, and so all you need to do is have a function which takes a Stream if it wants to do some reading or writing. It doesn't matter what the underlying device is, whether it's buffered or not, whether it has filters on it or not etc. It's just a Stream.

In Tango, I can't find any one, consistent alternative.

If I use IReader and IWriter, they don't provide all the capabilities necessary for reading and writing -- namely, you can't just read or write a chunk of data of a given size without accessing the underlying IBuffer. Why is an IReader or IWriter tied to an IBuffer, anyway? What if you wanted to stick one directly to a non-buffered conduit? One thing they do provide is a bunch of convenient methods for reading and writing a lot of simple types.

If I use an IConduit, I'm restricted to a very basic read and write interface which just reads and writes raw bytes. But what if I want to use an abstraction of data that doesn't even have a IConduit, such as a memory-based IBuffer? So IConduit isn't universal.

If I use an IBuffer, I don't have access to the nice methods that IReader and IWriter provide. And if I have some kind of IO thing that doesn't have an IBuffer stuck on it, I can't use this. And why is there a .readExact method, but no corresponding .writeExact method?

I don't know where the InputStream and OutputStream interfaces are fitting into this system yet, or if they're completely defined yet (as of now, they only provide simple void[] read and write methods). Will they replace some of these other abstractions, or work alongside them in an even more confusing and inconsistent fashion? And where do filters fit into all of this? It looks like they will be implemented as Input/OutputStreams, so hopefully that'll be more consistent..

Reader/Writer handle arrays also, so you can happily work with big chunks of data. They are buffered for performance reasons. I can't think of any reason right now to access the underlying buffer instead, in the general case.

Memory based IConduit is represented by MemoryConduit.d, which is a bidi-stream. Conduits are nothing more than bidi streams.

Buffer doesn't have the Reader/Writer methods because those reside at a different abstraction layer. Buffer is actually a switchpoint between conduit-based content, memory-only content, and memory-mapped content. Once you have a Buffer, there's really not much reason to care about Conduit at all. One can attach a variety of different clients to a Buffer such as Readers/Writers, iterators, etc. Each will remain in synch with the others. Buffer doesn't need a writeExact() since it always appends everything it is given. Buffer is also the central broker for streaming tokenization, as used by the iterators and readline() style functions.

As for Streams, they will operate as views upon a conduit. That is, a conduit is a host for both an input and output stream. You can treat any conduit as either one (e.g. when passed as an argument). Streams are very simple, supporting a minimal number of methods only. Buffer also masquerades as both an input and output buffer, so you can use that as either or both also (MemoryConduit may be redundant due to this). Reader/Writer et al will be updated to support streams too. In other words, one can happily design around InputStream and/or OutputStream and pass conduits, buffers, and so on as arguments.

To access a file InputStream, for example, open a FileConduit and call the conduit.input() method.

The optional filter mechanism is a chained set of streams attached to a conduit. One set for input and another for output. When you access conduit.input or conduit.output, it actually gives you the head of the assigned filter chain. It's done this way rather than using 'decorators' so that conduit specializations remain exposed to the user. We're talking about seek() facilities on a FileConduit, join/leave facilities on a MulticastConduit, etc. The traditional 'streams' also exhibit the 'decorator' problem, since they are "least common denominator" approaches.

It is also possible to attach a buffer as a conduit-filter, thus hiding the buffering altogether. This is handy for certain types of usage, while some (such as Reader/Writer) will continue to take advantage of what Buffer exposes directly.

Tango.io with likely remain a layered design, since that's how we provide high-performance, flexibility, and eliminate some serious bloat. Recall that D has char/wchar/dchar variations ... this can cause serious issues for an IO API unless it it carefully layered, and is why the core of tango.io operates upon void[] only.

We're aware that some folks are having difficulty with the approach taken by tango.io, but we are trying to address those. Keep in mind that tango.io is probably the most efficient IO lib around ... it was originally constructed to support high-performance server IO in a manner that avoids /all/ ongoing heap activity. It also extends gracefully to Selector-style IO with zero changes to the model (in tango.io.selector). It's quite a powerful model, but perhaps needs to come down to earth a bit. We hope the stream perspective will help get there.

kris

Joined: 03/27/04

Posts: 581

Posted: 06/02/07 07:52:55
r.lph50 wrote:

If I use an IBuffer, I don't have access to the nice methods that IReader and IWriter provide.

Do you mean in the same interface? Because while I haven't used them, the reader and writer interfaces seem to easily hook up to any IBuffer: auto read = new Reader(buffer);

why is there a .readExact method, but no corresponding .writeExact method?

Isn't append() on a buffer (in its various forms) basically a writeExact?

correct on both counts

JarrettBillingsley

Joined: 06/21/06

Posts: 26

Posted: 06/02/07 18:27:38

Do you mean in the same interface? Because while I haven't used them, the reader and writer interfaces seem to easily hook up to any IBuffer: auto read = new Reader(buffer);

I guess that works, but it seems clumsy, and I'm not fond of just instantiating classes left and right. Class instances also have to be stored somewhere static if you want better performance, and then things just start looking ugly.

Isn't append() on a buffer (in its various forms) basically a writeExact?

That's not entirely obvious from the name :P but yes, that's exactly what I need.

Reader/Writer handle arrays also, so you can happily work with big chunks of data. They are buffered for performance reasons. I can't think of any reason right now to access the underlying buffer instead, in the general case.

For reading/writing a struct that doesn't have a read/write method as a chunk, perhaps? Or an array of structs? It just seems weird that the Readers and Writers don't support all the functionality of their underlying Buffers.

Buffer doesn't have the Reader/Writer methods because those reside at a different abstraction layer. Buffer is actually a switchpoint between conduit-based content, memory-only content, and memory-mapped content. Once you have a Buffer, there's really not much reason to care about Conduit at all. One can attach a variety of different clients to a Buffer such as Readers/Writers, iterators, etc. Each will remain in synch with the others. Buffer doesn't need a writeExact() since it always appends everything it is given. Buffer is also the central broker for streaming tokenization, as used by the iterators and readline() style functions.

So Buffer is looking like the closest thing to a universal interface.

Thanks for the replies.

I don't know why I'd make a new scripting language. I mean, I might as well just draw some lines in the sand with a stick.

r.lph50

Joined: 11/27/06

Posts: 21

Posted: 06/03/07 10:27:24
JarrettBillingsley wrote:
kris wrote:
Reader/Writer handle arrays also, so you can happily work with big chunks of data. They are buffered for performance reasons. I can't think of any reason right now to access the underlying buffer instead, in the general case.

For reading/writing a struct that doesn't have a read/write method as a chunk, perhaps? Or an array of structs? It just seems weird that the Readers and Writers don't support all the functionality of their underlying Buffers.

I hope, I'm not misunderstanding but I don't see the need either. Is using a reader/writer on a struct's members not possible (or tasteful) using the 'whisper' style? e.g. reader(struct.firstByte)(cast(int) struct.someEnum);. Or by chunk did you mean casting from raw bytes?

As for an array of structs, the simplest way would be to to give the struct reader/writer methods and prepend the array size before any serialisation of each of the structs (which would pass in the reader/writer delegate)... I see you already sorta do something similar in MiniD.

Although, would there be room in Tango for a class that is a buffer, reader, and writer in one that takes a conduit? Call it SimpleStream? or something
kris

Joined: 03/27/04

Posts: 581
Posted: 06/03/07 21:26:27 -- Modified: 06/03/07 21:35:15 by kris

Serializing aggregates (classes, structs) is handled by making them Reader/Writer compatible. Simply writing the raw bytes of a struct is rarely sufficient, so it's not done that way. Think how that would mess things up when, for example, passing a struct from one-machine to another? If the two have different endian alignment, the struct content would be invalid upon arrival. Thus, Tango serialization is performed at the attribute level, and controlled explicitly by the encompassing aggregate. For example:
class Foo : IReadable, IWritable
{
   double x;
   int    y;
   long[] z;

   void write (Writer write)
   {
        write (x) (y) (z);
   }

   void read(Reader read)
   {
        read (x) (y) (z);
   }
}

auto write = new Writer (new SocketConduit("foo.bar.com:8080"));
auto foo = new Foo;

// serialize foo
write (foo);
The above should probably use a Network-agnostic Protocol attached to the writer, to ensure the content of foo remains consistent across machines (when transmitted in binary form), or the Protocol should probably be text-based instead using something like a json protocol.

Reading operates in a similar manner:
auto file = new FileConduit ("foo.bin", FileConduit.ReadWriteCreate);
auto write = new Writer (file);
auto read = new Reader (file);
auto foo = new Foo;

write (foo);
read (foo);
Notice that reading does not have to bother itself about the size of the array 'z'? There are configurable 'allocators' under the covers, which manage array allocation. One of which is a SliceAllocator? (to slice the buffer content directly -- no heap activity for temporal usage) and a few others to manage memory from a local pool, or hit the heap directly, etc. One can also read directly into a stack-array, if there's a need to manage things explicitly.

A note on classes and heap-activity: I've heard a few people grumble about allocating classes and so on, versus using some other approach. What's usually missing in such an 'argument' is consideration for the implicit or explicit heap activity prevalent in subsequent actions; e.g. a look at the bigger picture of what's going on. I see people allocating willy nilly during the processing of content, doing things such as array concatentation and calling static functions which allocate from the heap. That's frustrating. On the other hand, tango.io is intended to avoid all heap allocation where needed, through the allocation or pooling of constructs such as Buffer and Conduit "up front" and reusing them. And hey, one can always create class instances on the stack in D :)

In fact, if there's one general notion prevalent through Tango, it is to generally avoid unwarranted heap activity. This is absolutely not the case in other common D libraries ;)
JarrettBillingsley

Joined: 06/21/06

Posts: 26

Posted: 06/03/07 21:47:06

Think how that would mess things up when, for example, passing a struct from one-machine to another? If the two have different endian alignment, the struct content would be invalid upon arrival.

What if certain file formats checked for that and didn't allow loading of files of the wrong endianness ;)

I just thought it would be a performance enhancement to write out some structs in chunks.

And hey, one can always create class instances on the stack in D :)

And I've been making plenty use of that, too.

I don't know why I'd make a new scripting language. I mean, I might as well just draw some lines in the sand with a stick.

kris

Joined: 03/27/04

Posts: 581

Posted: 06/03/07 22:05:09

it's only notable a performance enhancement if the output is unbuffered. In the long run, buffered IO is generally preferable :)