FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Odds and Ends

 
Post new topic   Reply to topic     Forum Index -> Mango
View previous topic :: View next topic  
Author Message
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Mon Feb 13, 2006 12:10 pm    Post subject: Odds and Ends Reply with quote

Kris,

I just thought I'd drop a new thread here about some observations I've had in my current work with Mango. Maybe some of these would make good candidates for inclusion into the library.

- peek() methods for Reader. The current reader doesn't support any kind of look-ahead, making it very difficult to handle some parsing tasks. Something like peek() would allow for examining what's next in the buffer without calling getBuffer.get(x,false) as it would be a bit more typesafe.

- The Text.text package seems to be missing a replace(T[], T[], T[]) method. Other variants like replace(T[], T, T[]) and replace(T[], T[], T) are also missing, but are probably far less important/useful.

- Maybe I'm working with an out of date Mango distro, but it looks like EofException isn't used. This makes it very tricky to use the "while not eof" idiom. Also, catching the EoF exception seems to be the only way to determine if you've hit the end of the file or not; there's no IConduit.isEof() or somesuch.

Also, is there a quick and dirty way to get all of a file's contents via an IConduit or IBuffer? I see that FileConduit has a great shorthand method to get the job done, but it relies on obtaining the file's length before reading. I suppose I could seek to the end and read the position, then read a buffer of that size, but then what about non-seekable conduits?

IMO, something like this would be very useful as a getAll() or getRemaining() method on IReader.
_________________
-- !Eric.t.Anderton at gmail
Back to top
View user's profile Send private message Yahoo Messenger
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Mon Feb 13, 2006 1:30 pm    Post subject: Re: Odds and Ends Reply with quote

Appreciate the feedback, as always, Eric.

pragma wrote:
- peek() methods for Reader. The current reader doesn't support any kind of look-ahead, making it very difficult to handle some parsing tasks. Something like peek() would allow for examining what's next in the buffer without calling getBuffer.get(x,false) as it would be a bit more typesafe.

peek() is something I've toyed with, but it doesn't seem to suit the 'Reader' idiom very well. For example, Reader expects you to know what's next in the buffer ~ it simply tries to convert whatever is there into what you asked for. How should peek() distinguish a char from a double?

I don't suppose Buffer.readable() would be of any help there?

pragma wrote:
- The Text.text package seems to be missing a replace(T[], T[], T[]) method. Other variants like replace(T[], T, T[]) and replace(T[], T[], T) are also missing, but are probably far less important/useful.

I'll add the appropriate methods to the Text package. Thanks for the prodding!

pragma wrote:
- Maybe I'm working with an out of date Mango distro, but it looks like EofException isn't used. This makes it very tricky to use the "while not eof" idiom. Also, catching the EoF exception seems to be the only way to determine if you've hit the end of the file or not; there's no IConduit.isEof() or somesuch.

EOF testing is, I think, one of those Iterator style things that go hand-in-hand with lookahead, and I'll add that to the list of things to address. There's no explicit Conduit.isEof() since the idiom doesn't seem to apply in some cases. For example, how would it work with a socket that's meant to be kept open (a la "keep alive")? The Reader would perhaps need to wrap the comms-protocol to understand when a virtual-eof was reached? The problem is that, in general, one doesn't have the necessary info to support isEof() ~ at least, not before it actually happens Smile

Having said that, Conduit will return Eof when it has nothing more to provide. Atop of that, both Reader & Buffer will throw an IOException when you try to read more than the conduit can provide (Buffer.get() will throw the exception). You can also use the Buffer.readable() method to see how many bytes are left unread; but that's for the buffered content, not for the entire conduit. The idea there was that since you can't reliably perform isEof() testing (in any generic IO design I know of) then the input should always be buffered such that you can easily see what's left there instead.

pragma wrote:
Also, is there a quick and dirty way to get all of a file's contents via an IConduit or IBuffer? I see that FileConduit has a great shorthand method to get the job done, but it relies on obtaining the file's length before reading. I suppose I could seek to the end and read the position, then read a buffer of that size, but then what about non-seekable conduits?

IMO, something like this would be very useful as a getAll() or getRemaining() method on IReader.

Getting the remaining content from a buffer can be done via toString(), whose name is somewhat of a misnomer. I agree it should be doable though.

One question is, where would the content go? Should the Buffer just keep increasing in size until everything fits? If so, then GrowBuffer would perhaps be the place to host such a method? Presumable you'd call GrowBuffer.readAll() and it would expand until Conduit.read() had returned an Eof?

An interim (cook your own) alternative might look something like this:

Code:
void[] readAll(IConduit conduit)
{
  uint filled;
  auto content = new byte[8192];

  while (true)
        {
        uint chunk = conduit.read(content[filled..$]);
        if (chunk is conduit.Eof)
            break;

        filled += chunk;
        if (content.length - filled < 1024)
            content.length += 8192;
        }
  return content [0..filled];
}
Back to top
View user's profile Send private message
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Mon Feb 13, 2006 3:33 pm    Post subject: Re: Odds and Ends Reply with quote

kris wrote:
Appreciate the feedback, as always, Eric.


NP Wink My apologies if this reply seems a bit off-kilter. Seems I got ahold of some bad Guiness last night - or rather, it got ahold of me. Embarassed

Quote:
peek() is something I've toyed with, but it doesn't seem to suit the 'Reader' idiom very well. For example, Reader expects you to know what's next in the buffer ~ it simply tries to convert whatever is there into what you asked for. How should peek() distinguish a char from a double?

I don't suppose Buffer.readable() would be of any help there?


readable() won't exactly do the job for reasons you cited at the end of your post. End of buffer != end of conduit, so you need more logic than that method provides, much like Reader.read().

Anyway, I had a longer response drafted, but I scrubbed it as I realized a much more elegant solution: How about a LookAheadReader, that simply extends Reader and overrides the behavior to not advance on read?

By itself it wouldn't look very useful as it would never advance the buffer's internal pointer. But were it used in conjunction with a normal reader on a shared buffer, it could get some pretty nice results.

Barring that, I would advocate for a simple peek(n) method that returns a ubyte[], representing the next n bytes regardless of the buffer's state (pull in more bytes from the conduit if needed). My vote is for it to live on the Reader class, but it could just as easily be a part of Buffer.

Quote:
I'll add the appropriate methods to the Text package. Thanks for the prodding!


Woot!

Quote:
EOF testing is, I think, one of those Iterator style things that go hand-in-hand with lookahead, and I'll add that to the list of things to address. There's no explicit Conduit.isEof() since the idiom doesn't seem to apply in some cases. For example, how would it work with a socket that's meant to be kept open (a la "keep alive")? The Reader would perhaps need to wrap the comms-protocol to understand when a virtual-eof was reached?


Gotcha. My $0.02 on the matter is that if you have an isEof on a conduit, it would simply return 'false' in situations where it doesn't apply - much like isSeekable(). While not technically correct for conduits that have no discernable end (except when closed), IMO its not totally misleading either.

Quote:
The problem is that, in general, one doesn't have the necessary info to support isEof() ~ at least, not before it actually happens Smile


Right. I think I understand. I can see how even a one-byte lookahead could cause issues design-wise (non-blocking sockets and non-allocating designs being two of them). So using exceptions is the only way to go since you have to be committed to handling the read output?

Quote:
Having said that, Conduit will return Eof when it has nothing more to provide. Atop of that, both Reader & Buffer will throw an IOException when you try to read more than the conduit can provide (Buffer.get() will throw the exception).


Well, the problem here is that IOException, by itself isnt very useful since the only way to discern an EOF from other exceptions is by comparing the string contents of the exception itself:

Code:
// in Reader.d - read() method:
buffer.error ("end of input"); //Reader.d line 696


Code:
// in Buffer.d
final void error (char[] msg)
{
   throw new IOException (msg);
 }


To me, everything seems to funneled through buffer.error(), which only generates IOException. Meanwhile, EofException seems oddly left out of the design. Is it being used anywhere? Wink

Quote:
Getting the remaining content from a buffer can be done via toString(), whose name is somewhat of a misnomer. I agree it should be doable though.


That's still not 100? what's needed though, since it only gets you [position..limit] rather than everything that's in the underlying Conduit. But its still pretty darn handy (if misnamed).

Quote:
One question is, where would the content go? Should the Buffer just keep increasing in size until everything fits? If so, then GrowBuffer would perhaps be the place to host such a method? Presumable you'd call GrowBuffer.readAll() and it would expand until Conduit.read() had returned an Eof?


That makes the most sense. Its either that, or place it on Reader and use the array allocator do to the allocation work instead?

Quote:
An interim (cook your own) alternative might look something like this:
Code:
auto content = new byte[8192];


Very Happy I was afraid you might say that. I'll probably do just that for now, but I'm keeping my eye open for a more obvious "mango style" solution - the library has done so well without using set-aside buffers that its starting to make me rething how I approach I/O tasks like this.
_________________
-- !Eric.t.Anderton at gmail
Back to top
View user's profile Send private message Yahoo Messenger
sean



Joined: 24 Jun 2004
Posts: 609
Location: Bay Area, CA

PostPosted: Mon Feb 13, 2006 5:49 pm    Post subject: Re: Odds and Ends Reply with quote

kris wrote:
Appreciate the feedback, as always, Eric.
peek() is something I've toyed with, but it doesn't seem to suit the 'Reader' idiom very well. For example, Reader expects you to know what's next in the buffer ~ it simply tries to convert whatever is there into what you asked for. How should peek() distinguish a char from a double?

Is putback or unget available? If so, missing peek shouldn't be a big deal. Though this does seem to be a feature more stuited to Buffer. How does one peek into an unbuffered conduit?
Quote:
EOF testing is, I think, one of those Iterator style things that go hand-in-hand with lookahead, and I'll add that to the list of things to address. There's no explicit Conduit.isEof() since the idiom doesn't seem to apply in some cases. For example, how would it work with a socket that's meant to be kept open (a la "keep alive")? The Reader would perhaps need to wrap the comms-protocol to understand when a virtual-eof was reached? The problem is that, in general, one doesn't have the necessary info to support isEof() ~ at least, not before it actually happens Smile

Aye, in most cases there's no way to know of an EOF condition until a read failure occurs. Though at that point, is a "bad reader state" flag set somewhere? Streams in C++ allow state flags to be set and reset, for example, but I'm not certain if such behavior is appropriate here.
Quote:
pragma wrote:
Also, is there a quick and dirty way to get all of a file's contents via an IConduit or IBuffer? I see that FileConduit has a great shorthand method to get the job done, but it relies on obtaining the file's length before reading. I suppose I could seek to the end and read the position, then read a buffer of that size, but then what about non-seekable conduits?

IMO, something like this would be very useful as a getAll() or getRemaining() method on IReader.

Getting the remaining content from a buffer can be done via toString(), whose name is somewhat of a misnomer. I agree it should be doable though.

One question is, where would the content go? Should the Buffer just keep increasing in size until everything fits? If so, then GrowBuffer would perhaps be the place to host such a method? Presumable you'd call GrowBuffer.readAll() and it would expand until Conduit.read() had returned an Eof?

The C++ stream model simply provides a method to access the underlying buffer. So an optimal file copy is simply a matter of doing this:
Code:
std::ifstream ifile( "in.txt" );
std::ofstream ofile( "out.txt" );
ofile << ifile.rdbuf();

In this case, rdbuf returns a reference to the underlying read buffer, and the write method for stream buffers simply consumes the buffer until its state changes. This is far more efficient than a looping get into a local buffer followed by a write to the destination.

Aside from efficient stream redirection/copying, the only use I can think of for this is to load an entire file into memory for later use, and doesn't Mango have a MemFile class for this purpose?
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Mon Feb 13, 2006 9:29 pm    Post subject: Re: Odds and Ends Reply with quote

pragma wrote:
Anyway, I had a longer response drafted, but I scrubbed it as I realized a much more elegant solution: How about a LookAheadReader, that simply extends Reader and overrides the behavior to not advance on read?

By itself it wouldn't look very useful as it would never advance the buffer's internal pointer. But were it used in conjunction with a normal reader on a shared buffer, it could get some pretty nice results.

Very Happy That's what I really like about shared-buffer designs. However, there's a potential problem where the buffer needs to be reloaded to perform the lookahead, but then you can't easily "back up" to the prior state. Thus, it would need a design whereby you commit yourself to moving forward (in general). Sean suggested using unget() instead, which would always work in this scenario (at least, where the size is less than how much has been read from the buffer). Buffer currently has a skip() method which accepts negative offsets. So, a Reader might be constructed to maintain the size of the last chunk read from the buffer, and use that as the unget() value? Alternatively, the client code could handle that via the x.sizeof property? I suppose a first pass would be to expose the skip() or unget() method at the Reader level, yes?

pragma wrote:
Quote:
The problem is that, in general, one doesn't have the necessary info to support isEof() ~ at least, not before it actually happens Smile


Right. I think I understand. I can see how even a one-byte lookahead could cause issues design-wise (non-blocking sockets and non-allocating designs being two of them). So using exceptions is the only way to go since you have to be committed to handling the read output?

With a Reader design, yes, I think so. A reader is a pretty pedandic creature ~ it's really just a converter of existing data. When you ask it to convert something and there's no more data, it's treated as an exceptional condition since there's no default values anywhere. Iterators, on the other hand, are typically much more relaxed about such conditions (because one has to explicitly test for 'more' anyway). This is partly why I've been looking into a hybrid approach ~ just to see what comes out of it.

pragma wrote:
Quote:
Having said that, Conduit will return Eof when it has nothing more to provide. Atop of that, both Reader & Buffer will throw an IOException when you try to read more than the conduit can provide (Buffer.get() will throw the exception).


Well, the problem here is that IOException, by itself isnt very useful since the only way to discern an EOF from other exceptions is by comparing the string contents of the exception itself:

Code:
// in Reader.d - read() method:
buffer.error ("end of input"); //Reader.d line 696


Code:
// in Buffer.d
final void error (char[] msg)
{
   throw new IOException (msg);
 }


To me, everything seems to funneled through buffer.error(), which only generates IOException. Meanwhile, EofException seems oddly left out of the design. Is it being used anywhere? Wink

Nope <g>. Perhaps each of the primary IO type of exceptions should be isolated? Underflow, overflow, Eof, and so on?

pragma wrote:
Quote:
One question is, where would the content go? Should the Buffer just keep increasing in size until everything fits? If so, then GrowBuffer would perhaps be the place to host such a method? Presumable you'd call GrowBuffer.readAll() and it would expand until Conduit.read() had returned an Eof?


That makes the most sense. Its either that, or place it on Reader and use the array allocator do to the allocation work instead?

I suspect it belongs on the Buffer, since Reader is more about conversion.

pragma wrote:
Quote:
An interim (cook your own) alternative might look something like this:
Code:
auto content = new byte[8192];


Very Happy I was afraid you might say that. I'll probably do just that for now, but I'm keeping my eye open for a more obvious "mango style" solution - the library has done so well without using set-aside buffers that its starting to make me rething how I approach I/O tasks like this.

Another approach would be to read the entire file into a buffer before you start. The mango.io.File class (or UnicodeFile) will load content into an array for you, and you can just pass that to a Buffer ctor. Then, when you get to the stage where you need the rest of the content, use buffer.toString() or equivalent?

One might do something similar to mango.io.File:
Code:
auto file = new FileConduit(path);
auto data = new byte[file.length];
file.read (data);

auto buffer = new Buffer (data);

I know that's not what you're looking for, but thought it worth tossing into the pool anyway Smile


Last edited by kris on Mon Feb 13, 2006 9:58 pm; edited 1 time in total
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Mon Feb 13, 2006 9:48 pm    Post subject: Re: Odds and Ends Reply with quote

sean wrote:
kris wrote:
Appreciate the feedback, as always, Eric.
peek() is something I've toyed with, but it doesn't seem to suit the 'Reader' idiom very well. For example, Reader expects you to know what's next in the buffer ~ it simply tries to convert whatever is there into what you asked for. How should peek() distinguish a char from a double?

Is putback or unget available? If so, missing peek shouldn't be a big deal. Though this does seem to be a feature more stuited to Buffer. How does one peek into an unbuffered conduit?

Yes ~ there's a skip() method on the buffer, which will reload the buffer going forward (as necessary) and also accepts negative arguments for backing up as much as has currently been read within the buffer extent. This, as you note, allows one to push content back after it's been extracted.

sean wrote:
Quote:
EOF testing is, I think, one of those Iterator style things that go hand-in-hand with lookahead, and I'll add that to the list of things to address. There's no explicit Conduit.isEof() since the idiom doesn't seem to apply in some cases. For example, how would it work with a socket that's meant to be kept open (a la "keep alive")? The Reader would perhaps need to wrap the comms-protocol to understand when a virtual-eof was reached? The problem is that, in general, one doesn't have the necessary info to support isEof() ~ at least, not before it actually happens Smile

Aye, in most cases there's no way to know of an EOF condition until a read failure occurs. Though at that point, is a "bad reader state" flag set somewhere? Streams in C++ allow state flags to be set and reset, for example, but I'm not certain if such behavior is appropriate here.

Mango should just keep returning Eof from Conduit.read(), which everything else is routed onto. Thus, appropriate exceptions (where applicable) should continue to be thrown upon subsequent Reader requests.

sean wrote:
The C++ stream model simply provides a method to access the underlying buffer. So an optimal file copy is simply a matter of doing this:
Code:
std::ifstream ifile( "in.txt" );
std::ofstream ofile( "out.txt" );
ofile << ifile.rdbuf();

In this case, rdbuf returns a reference to the underlying read buffer, and the write method for stream buffers simply consumes the buffer until its state changes. This is far more efficient than a looping get into a local buffer followed by a write to the destination.

Aside from efficient stream redirection/copying, the only use I can think of for this is to load an entire file into memory for later use, and doesn't Mango have a MemFile class for this purpose?

Mango does something very similar, and uses Buffer to cross the bridge. The equivalent is
Code:
dstConduit.copy (srcConduit);

An example might be
Code:
Cout.conduit.copy (new FileConduit("myFile"));

Or
Code:
Stdout.conduit.copy (new SocketConduit(new InternetAddress("someplace")));

As you say, there's also mango.io.File which reads a file into memory. There's also UnicodeFile for BOM oriented files, and MappedBuffer for memory-mapped IO. Each of these can feed Reader (or Iterator) one way or another.
Back to top
View user's profile Send private message
pragma



Joined: 28 May 2004
Posts: 607
Location: Washington, DC

PostPosted: Thu Feb 16, 2006 4:42 am    Post subject: Re: Odds and Ends Reply with quote

kris wrote:
Sean suggested using unget() instead, which would always work in this scenario (at least, where the size is less than how much has been read from the buffer). Buffer currently has a skip() method which accepts negative offsets. So, a Reader might be constructed to maintain the size of the last chunk read from the buffer, and use that as the unget() value? Alternatively, the client code could handle that via the x.sizeof property? I suppose a first pass would be to expose the skip() or unget() method at the Reader level, yes?


That would work. I guess the Reader would keep the unget value as a sort of look-aside buffer, or would an unget push the data back to the underlying buffer instead?

Anyway, I'm going to make a stab at a LookAheadReader to see what I can come up with.

Quote:
Nope <g>. Perhaps each of the primary IO type of exceptions should be isolated? Underflow, overflow, Eof, and so on?


That might be a good idea. If for any reason, trapping just Eof or just Overflow will be far more useful than just IOException everywhere. Smile
_________________
-- !Eric.t.Anderton at gmail
Back to top
View user's profile Send private message Yahoo Messenger
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Thu Feb 16, 2006 11:09 am    Post subject: Re: Odds and Ends Reply with quote

pragma wrote:
[That would work. I guess the Reader would keep the unget value as a sort of look-aside buffer, or would an unget push the data back to the underlying buffer instead?

It would do a Buffer.skip (-sizeof(lastElement));
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> Mango All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group