Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Reading a line longer than 16 384

Moderators: kris

Posted: 04/28/07 17:32:38

DeviceConduit?'s buffer size is 1024 * 16, or 16 384. This means, among other things, that there's no way to read a line longer than that from stdin, at least not through Cin.

Or then, I just can't find such a way. But I think this is something that should work by default, or this limitation should be clearly displayed somewhere.

Author Message

Posted: 04/28/07 18:21:09

Is this a real problem? As in, do you have lines that are that long and that needs to be read as one? Or would proper documentation of this limit be what you're after. It sounds unlikely that data with new lines spread that distantly is meant to be read line by line, and having explicit support for this could be kludgy and/or reductive on peformance.

Posted: 04/28/07 18:38:06

It's not a real problem per se (I just ran into it while generating profiling data for a program which doesn't really expect such long input), except for my incapability to wrap my head around the Tango IO idioms which is why I can't read such a line at all. It doesn't matter whether I read it in one go or not, all I want is an array containing the whole line, without reading too far: the input buffer should contain the rest of stdin. Given a line of length, say, 32768, how do I accomplish this?

But yes, I feel that this should be clearly documented as well.

Posted: 04/28/07 20:12:15

I don't entirely get what you're trying to do, but the code below works if I understand you correctly.

char[] line = new char[32768];
Cin.conduit.read(line);

Posted: 04/28/07 20:22:47

You've understood me correctly, but what if I don't know the length of the line beforehand? For instance, I don't want to read too many bytes if the line is less than 32768 bytes long, and if the next line is 33000 I want to be able to read that, too.

Posted: 05/01/07 20:04:33

Deewiant wrote:

You've understood me correctly, but what if I don't know the length of the line beforehand? For instance, I don't want to read too many bytes if the line is less than 32768 bytes long, and if the next line is 33000 I want to be able to read that, too.

Tokenization is Tango is designed to be more efficient that the C library, and it uses a buffer to support that. The buffer is typically a fixed size, whatever size you want, though it can default to a size indicated by an attached conduit. Part of the reason for doing this is to avoid unnecessary duplication of content, when each token can be simply sliced from the buffer instead. You only .dup when the app needs to save the result.

It would be possible to ask the buffer to resize itself when a token is too long to fit within, but then what happens when you tokenize a huge file with no delimiters at all? I think we're now into application-specific behaviour, in which case it would be appropriate for the application to take some action instead? After all, this does seem to be a very rare case, where a massive token is apparently not considered to be a critical application error.

Note that Tango will not "read too many bytes" ... it simply streams content into a buffer, and tokenizes the array represented by the buffer. When the current buffer content is consumed, more is read. Unread content remains in the buffer for subsequent read activity.

To reset the size of the Cin buffer, try the equivalent of this: Cin.buffer.setContent(new byte[yourMaxTokenSize]);

(note that all streaming-token processing is typically handled this way in Tango: lines, delimiters, patterns, etc)