View previous topic :: View next topic |
Author |
Message |
teqdruid
Joined: 11 May 2004 Posts: 390 Location: UMD
|
Posted: Fri Nov 04, 2005 4:42 pm Post subject: HUGE memory allocations |
|
|
So for some reason the following code
Code: |
import mango.io.FileConduit;
import mango.io.Reader;
import mango.io.TextReader;
import mango.io.model.IReader;
void main(){
auto FileConduit fc = new FileConduit("1.n", FileStyle.ReadExisting);
uint length = fc.length();
IReader r = new TextReader(fc.createBuffer());
while (fc.getPosition() < length) {
char[] str;
r(str);
}
}
|
(with "1.n" being located at:
http://www.wam.umd.edu/~teqdruid/1.n)
Tries to allocate MASSIVE amounts of memory on both my linux boxes.
Not surprisingly, when I change Reader to TextReader, it works fine. Although this is my program's bug, this shouldn't happen.
This was an interesting issue... Three seperate bugs came together to appear as one and totally clobber my program! I don't see that too often.
~John |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Fri Nov 04, 2005 7:06 pm Post subject: |
|
|
That sucks.
With a Reader, array input is handled by reading the number of elements from the file (prefixes the array data). In this case, the first four bytes are treated as the number of following chars to read into the char array ... hence the enormous allocation
I've been bitten by something similar in the past, and have seen posts by others with respect to the same issue via Phobos IO modules. Any suggestions on how to avoid it happening? How does one know how much of a binary file to read for an array, without using counters? Failing that, how would one prevent the use of the plain, binary-style Reader in such cases?
(BTW: it's not clear to me what the code is trying to do. Is this purely a test example?) |
|
Back to top |
|
|
sean
Joined: 24 Jun 2004 Posts: 609 Location: Bay Area, CA
|
Posted: Fri Nov 04, 2005 9:31 pm Post subject: |
|
|
For binary input, there's little else you can do :p Though I suppose you could set the file size as an upper bound in the instances where that information is available. |
|
Back to top |
|
|
teqdruid
Joined: 11 May 2004 Posts: 390 Location: UMD
|
Posted: Fri Nov 04, 2005 9:52 pm Post subject: |
|
|
sean wrote: | Though I suppose you could set the file size as an upper bound in the instances where that information is available. |
That would probably be a good idea, to prevent Mango from trying to allocate OBSCENE amounts of memory (it was trying to allocationan ~888mb portion) to a 40 byte file. |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Sat Nov 05, 2005 1:31 am Post subject: |
|
|
sean wrote: | For binary input, there's little else you can do :p Though I suppose you could set the file size as an upper bound in the instances where that information is available. |
Unfortunately, there's no way to consistently know the size ~ the content could be coming in from, for example, a socket ~ as you imply (though not a GB, one would hope!). Perhaps a perceptual change would help somewhat? How about changing the name to BinaryReader or something? |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Sat Nov 05, 2005 1:34 am Post subject: |
|
|
teqdruid wrote: | sean wrote: | Though I suppose you could set the file size as an upper bound in the instances where that information is available. |
That would probably be a good idea, to prevent Mango from trying to allocate OBSCENE amounts of memory (it was trying to allocationan ~888mb portion) to a 40 byte file. |
That, most certainly, is obscene! Please bring up any and all suggestions. |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Sun Nov 06, 2005 1:18 am Post subject: Re: HUGE memory allocations |
|
|
teqdruid wrote: | So for some reason the following code
Code: |
import mango.io.FileConduit;
import mango.io.Reader;
import mango.io.TextReader;
import mango.io.model.IReader;
void main(){
auto FileConduit fc = new FileConduit("1.n", FileStyle.ReadExisting);
uint length = fc.length();
IReader r = new TextReader(fc.createBuffer());
while (fc.getPosition() < length) {
char[] str;
r(str);
}
}
|
|
John; this is quite non-intuitive to me, which would indicate a design issue within Mango.io ~ I'd like to understand what the intent is, please? |
|
Back to top |
|
|
teqdruid
Joined: 11 May 2004 Posts: 390 Location: UMD
|
Posted: Sun Nov 06, 2005 2:17 am Post subject: Re: HUGE memory allocations |
|
|
kris wrote: | teqdruid wrote: | So for some reason the following code
Code: |
import mango.io.FileConduit;
import mango.io.Reader;
import mango.io.TextReader;
import mango.io.model.IReader;
void main(){
auto FileConduit fc = new FileConduit("1.n", FileStyle.ReadExisting);
uint length = fc.length();
IReader r = new TextReader(fc.createBuffer());
while (fc.getPosition() < length) {
char[] str;
r(str);
}
}
|
|
John; this is quite non-intuitive to me, which would indicate a design issue within Mango.io ~ I'd like to understand what the intent is, please? |
Right. So, I'm basically serializing some objects. I'm using a TextWriter to output the strings seperated by line breaks so that I can edit the files in a text editor. My reasoning was that since I'm writing them out using a TextWriter, I should read them in using a TextReader- and it should work perfectly since it uses a Line tokenizer by default. Well, not so, it turns out. Furthermore, I couldn't figure out how to get, from the reader, whether or not the stream is done, so I figured I'd just use the fc.getPosition (dumb move, I know). This should have been my indication that I wasn't doing this correctly. Obviously, the above code doesn't work as I thought it would since the FileConduit will get to the end before I finish reading from the buffer via the TextReader.
I've since switched from code similar to the above to using Tokenizers.Line and a buffer from fc.createBuffer. I run the while loop with the condition (Tokenizers.Line.next(buffer, token)) and it seems to be working rather nicely thus far.
So, the system was at least a little un-intuitive.
~John |
|
Back to top |
|
|
teqdruid
Joined: 11 May 2004 Posts: 390 Location: UMD
|
Posted: Sun Nov 06, 2005 2:21 am Post subject: |
|
|
kris wrote: | sean wrote: | For binary input, there's little else you can do :p Though I suppose you could set the file size as an upper bound in the instances where that information is available. |
Unfortunately, there's no way to consistently know the size ~ the content could be coming in from, for example, a socket ~ as you imply (though not a GB, one would hope!). Perhaps a perceptual change would help somewhat? How about changing the name to BinaryReader or something? |
Consistent? Who cares? Why not set the cap whenever that information is available? Whenever Buffer is reading from a FileConduit, cap the memory allocations at the file size. Whenever it's drawing from something else, set the cap to a reasonable constant defined in Mango. ('cause an 888mb memory allocation is too much no matter what the situation- OTOH, it did help me find my bug, so there is something to be said for consistent operation).
~John |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Sun Nov 06, 2005 3:05 am Post subject: Re: HUGE memory allocations |
|
|
teqdruid wrote: | Right. So, I'm basically serializing some objects. I'm using a TextWriter to output the strings seperated by line breaks so that I can edit the files in a text editor. My reasoning was that since I'm writing them out using a TextWriter, I should read them in using a TextReader- and it should work perfectly since it uses a Line tokenizer by default. Well, not so, it turns out. Furthermore, I couldn't figure out how to get, from the reader, whether or not the stream is done, so I figured I'd just use the fc.getPosition (dumb move, I know). This should have been my indication that I wasn't doing this correctly. Obviously, the above code doesn't work as I thought it would since the FileConduit will get to the end before I finish reading from the buffer via the TextReader.
I've since switched from code similar to the above to using Tokenizers.Line and a buffer from fc.createBuffer. I run the while loop with the condition (Tokenizers.Line.next(buffer, token)) and it seems to be working rather nicely thus far.
So, the system was at least a little un-intuitive.
~John |
OK. Thanks
CR/LF terminated lines can be read as follows:
Code: |
import mango.io.Token;
import mango.io.FileConduit;
// open a file for reading
FileConduit fc = new FileConduit ("test.txt");
// create a Token and bind it to both the file and a line-tokenizer
CompositeToken line = new CompositeToken (Tokenizers.line, fc);
// read file a line at a time. Method next() returns false when no more
// delimiters are found. Note there may be an unterminated line at eof
while (line.next)
Stdout (line) (CR);
|
TextReader should have been doing something similar with char[] input, but apparently did not. You're right that this is somewhat non-intuitive; I'll see what can be done about it ... |
|
Back to top |
|
|
teqdruid
Joined: 11 May 2004 Posts: 390 Location: UMD
|
Posted: Sun Nov 06, 2005 4:03 am Post subject: Re: HUGE memory allocations |
|
|
kris wrote: |
CR/LF terminated lines can be read as follows:
Code: |
import mango.io.Token;
import mango.io.FileConduit;
// open a file for reading
FileConduit fc = new FileConduit ("test.txt");
// create a Token and bind it to both the file and a line-tokenizer
CompositeToken line = new CompositeToken (Tokenizers.line, fc);
// read file a line at a time. Method next() returns false when no more
// delimiters are found. Note there may be an unterminated line at eof
while (line.next)
Stdout (line) (CR);
|
TextReader should have been doing something similar with char[] input, but apparently did not. You're right that this is somewhat non-intuitive; I'll see what can be done about it ... |
Right. That's essentially what I'm doing now.
~John |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Wed Nov 30, 2005 10:27 pm Post subject: |
|
|
Finally came up with a way to handle this problem.
Pending changes include full UTF support throughout the I/O subsystem. One of those changes indicate whether the content is binary or text-based. This is then checked within the Reader/Writer ctor, and an exception is thrown on a mismatch.
Naturally, one has to set this text-based attribute somewhere ~ one of those places will be in FileConduit, when a file is opened. Additionally, a new subclass called TextFile (or something similar) will be introduced. This new class will also handle BOM concerns.
Do you think that will be sufficient? |
|
Back to top |
|
|
teqdruid
Joined: 11 May 2004 Posts: 390 Location: UMD
|
Posted: Thu Dec 01, 2005 9:22 am Post subject: |
|
|
kris wrote: | Finally came up with a way to handle this problem.
Pending changes include full UTF support throughout the I/O subsystem. One of those changes indicate whether the content is binary or text-based. This is then checked within the Reader/Writer ctor, and an exception is thrown on a mismatch.
Naturally, one has to set this text-based attribute somewhere ~ one of those places will be in FileConduit, when a file is opened. Additionally, a new subclass called TextFile (or something similar) will be introduced. This new class will also handle BOM concerns.
Do you think that will be sufficient? |
This all sounds like a reasonable way of handling it. I'll comment on it again when I see the new code, however.
~John |
|
Back to top |
|
|
|