FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

HUGE memory allocations

 
Post new topic   Reply to topic     Forum Index -> Mango
View previous topic :: View next topic  
Author Message
teqdruid



Joined: 11 May 2004
Posts: 390
Location: UMD

PostPosted: Fri Nov 04, 2005 4:42 pm    Post subject: HUGE memory allocations Reply with quote

So for some reason the following code
Code:

import mango.io.FileConduit;
import mango.io.Reader;
import mango.io.TextReader;
import mango.io.model.IReader;

void main(){
  auto FileConduit fc = new FileConduit("1.n", FileStyle.ReadExisting);
  uint length = fc.length();
  IReader r = new TextReader(fc.createBuffer());
  while (fc.getPosition() < length) {
    char[] str;
    r(str);
  }
}

(with "1.n" being located at:
http://www.wam.umd.edu/~teqdruid/1.n)

Tries to allocate MASSIVE amounts of memory on both my linux boxes.
Not surprisingly, when I change Reader to TextReader, it works fine. Although this is my program's bug, this shouldn't happen.

This was an interesting issue... Three seperate bugs came together to appear as one and totally clobber my program! I don't see that too often.

~John
Back to top
View user's profile Send private message Send e-mail AIM Address
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Fri Nov 04, 2005 7:06 pm    Post subject: Reply with quote

That sucks.

With a Reader, array input is handled by reading the number of elements from the file (prefixes the array data). In this case, the first four bytes are treated as the number of following chars to read into the char array ... hence the enormous allocation Crying or Very sad

I've been bitten by something similar in the past, and have seen posts by others with respect to the same issue via Phobos IO modules. Any suggestions on how to avoid it happening? How does one know how much of a binary file to read for an array, without using counters? Failing that, how would one prevent the use of the plain, binary-style Reader in such cases?

(BTW: it's not clear to me what the code is trying to do. Is this purely a test example?)
Back to top
View user's profile Send private message
sean



Joined: 24 Jun 2004
Posts: 609
Location: Bay Area, CA

PostPosted: Fri Nov 04, 2005 9:31 pm    Post subject: Reply with quote

For binary input, there's little else you can do :p Though I suppose you could set the file size as an upper bound in the instances where that information is available.
Back to top
View user's profile Send private message
teqdruid



Joined: 11 May 2004
Posts: 390
Location: UMD

PostPosted: Fri Nov 04, 2005 9:52 pm    Post subject: Reply with quote

sean wrote:
Though I suppose you could set the file size as an upper bound in the instances where that information is available.


That would probably be a good idea, to prevent Mango from trying to allocate OBSCENE amounts of memory (it was trying to allocationan ~888mb portion) to a 40 byte file.
Back to top
View user's profile Send private message Send e-mail AIM Address
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Sat Nov 05, 2005 1:31 am    Post subject: Reply with quote

sean wrote:
For binary input, there's little else you can do :p Though I suppose you could set the file size as an upper bound in the instances where that information is available.

Unfortunately, there's no way to consistently know the size ~ the content could be coming in from, for example, a socket ~ as you imply (though not a GB, one would hope!). Perhaps a perceptual change would help somewhat? How about changing the name to BinaryReader or something?
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Sat Nov 05, 2005 1:34 am    Post subject: Reply with quote

teqdruid wrote:
sean wrote:
Though I suppose you could set the file size as an upper bound in the instances where that information is available.


That would probably be a good idea, to prevent Mango from trying to allocate OBSCENE amounts of memory (it was trying to allocationan ~888mb portion) to a 40 byte file.

That, most certainly, is obscene! Please bring up any and all suggestions.
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Sun Nov 06, 2005 1:18 am    Post subject: Re: HUGE memory allocations Reply with quote

teqdruid wrote:
So for some reason the following code
Code:

import mango.io.FileConduit;
import mango.io.Reader;
import mango.io.TextReader;
import mango.io.model.IReader;

void main(){
  auto FileConduit fc = new FileConduit("1.n", FileStyle.ReadExisting);
  uint length = fc.length();
  IReader r = new TextReader(fc.createBuffer());
  while (fc.getPosition() < length) {
    char[] str;
    r(str);
  }
}


John; this is quite non-intuitive to me, which would indicate a design issue within Mango.io ~ I'd like to understand what the intent is, please?
Back to top
View user's profile Send private message
teqdruid



Joined: 11 May 2004
Posts: 390
Location: UMD

PostPosted: Sun Nov 06, 2005 2:17 am    Post subject: Re: HUGE memory allocations Reply with quote

kris wrote:
teqdruid wrote:
So for some reason the following code
Code:

import mango.io.FileConduit;
import mango.io.Reader;
import mango.io.TextReader;
import mango.io.model.IReader;

void main(){
  auto FileConduit fc = new FileConduit("1.n", FileStyle.ReadExisting);
  uint length = fc.length();
  IReader r = new TextReader(fc.createBuffer());
  while (fc.getPosition() < length) {
    char[] str;
    r(str);
  }
}


John; this is quite non-intuitive to me, which would indicate a design issue within Mango.io ~ I'd like to understand what the intent is, please?


Right. So, I'm basically serializing some objects. I'm using a TextWriter to output the strings seperated by line breaks so that I can edit the files in a text editor. My reasoning was that since I'm writing them out using a TextWriter, I should read them in using a TextReader- and it should work perfectly since it uses a Line tokenizer by default. Well, not so, it turns out. Furthermore, I couldn't figure out how to get, from the reader, whether or not the stream is done, so I figured I'd just use the fc.getPosition (dumb move, I know). This should have been my indication that I wasn't doing this correctly. Obviously, the above code doesn't work as I thought it would since the FileConduit will get to the end before I finish reading from the buffer via the TextReader.

I've since switched from code similar to the above to using Tokenizers.Line and a buffer from fc.createBuffer. I run the while loop with the condition (Tokenizers.Line.next(buffer, token)) and it seems to be working rather nicely thus far.

So, the system was at least a little un-intuitive.

~John
Back to top
View user's profile Send private message Send e-mail AIM Address
teqdruid



Joined: 11 May 2004
Posts: 390
Location: UMD

PostPosted: Sun Nov 06, 2005 2:21 am    Post subject: Reply with quote

kris wrote:
sean wrote:
For binary input, there's little else you can do :p Though I suppose you could set the file size as an upper bound in the instances where that information is available.

Unfortunately, there's no way to consistently know the size ~ the content could be coming in from, for example, a socket ~ as you imply (though not a GB, one would hope!). Perhaps a perceptual change would help somewhat? How about changing the name to BinaryReader or something?


Consistent? Who cares? Why not set the cap whenever that information is available? Whenever Buffer is reading from a FileConduit, cap the memory allocations at the file size. Whenever it's drawing from something else, set the cap to a reasonable constant defined in Mango. ('cause an 888mb memory allocation is too much no matter what the situation- OTOH, it did help me find my bug, so there is something to be said for consistent operation).

~John
Back to top
View user's profile Send private message Send e-mail AIM Address
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Sun Nov 06, 2005 3:05 am    Post subject: Re: HUGE memory allocations Reply with quote

teqdruid wrote:
Right. So, I'm basically serializing some objects. I'm using a TextWriter to output the strings seperated by line breaks so that I can edit the files in a text editor. My reasoning was that since I'm writing them out using a TextWriter, I should read them in using a TextReader- and it should work perfectly since it uses a Line tokenizer by default. Well, not so, it turns out. Furthermore, I couldn't figure out how to get, from the reader, whether or not the stream is done, so I figured I'd just use the fc.getPosition (dumb move, I know). This should have been my indication that I wasn't doing this correctly. Obviously, the above code doesn't work as I thought it would since the FileConduit will get to the end before I finish reading from the buffer via the TextReader.

I've since switched from code similar to the above to using Tokenizers.Line and a buffer from fc.createBuffer. I run the while loop with the condition (Tokenizers.Line.next(buffer, token)) and it seems to be working rather nicely thus far.

So, the system was at least a little un-intuitive.

~John

OK. Thanks

CR/LF terminated lines can be read as follows:

Code:

import mango.io.Token;
import mango.io.FileConduit;

// open a file for reading
FileConduit fc = new FileConduit ("test.txt");

// create a Token and bind it to both the file and a line-tokenizer
CompositeToken line = new CompositeToken (Tokenizers.line, fc);

// read file a line at a time. Method next() returns false when no more
// delimiters are found. Note there may be an unterminated line at eof
while (line.next)
       Stdout (line) (CR);

TextReader should have been doing something similar with char[] input, but apparently did not. You're right that this is somewhat non-intuitive; I'll see what can be done about it ...
Back to top
View user's profile Send private message
teqdruid



Joined: 11 May 2004
Posts: 390
Location: UMD

PostPosted: Sun Nov 06, 2005 4:03 am    Post subject: Re: HUGE memory allocations Reply with quote

kris wrote:


CR/LF terminated lines can be read as follows:

Code:

import mango.io.Token;
import mango.io.FileConduit;

// open a file for reading
FileConduit fc = new FileConduit ("test.txt");

// create a Token and bind it to both the file and a line-tokenizer
CompositeToken line = new CompositeToken (Tokenizers.line, fc);

// read file a line at a time. Method next() returns false when no more
// delimiters are found. Note there may be an unterminated line at eof
while (line.next)
       Stdout (line) (CR);

TextReader should have been doing something similar with char[] input, but apparently did not. You're right that this is somewhat non-intuitive; I'll see what can be done about it ...


Right. That's essentially what I'm doing now.

~John
Back to top
View user's profile Send private message Send e-mail AIM Address
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Wed Nov 30, 2005 10:27 pm    Post subject: Reply with quote

Finally came up with a way to handle this problem.

Pending changes include full UTF support throughout the I/O subsystem. One of those changes indicate whether the content is binary or text-based. This is then checked within the Reader/Writer ctor, and an exception is thrown on a mismatch.

Naturally, one has to set this text-based attribute somewhere ~ one of those places will be in FileConduit, when a file is opened. Additionally, a new subclass called TextFile (or something similar) will be introduced. This new class will also handle BOM concerns.

Do you think that will be sufficient?
Back to top
View user's profile Send private message
teqdruid



Joined: 11 May 2004
Posts: 390
Location: UMD

PostPosted: Thu Dec 01, 2005 9:22 am    Post subject: Reply with quote

kris wrote:
Finally came up with a way to handle this problem.

Pending changes include full UTF support throughout the I/O subsystem. One of those changes indicate whether the content is binary or text-based. This is then checked within the Reader/Writer ctor, and an exception is thrown on a mismatch.

Naturally, one has to set this text-based attribute somewhere ~ one of those places will be in FileConduit, when a file is opened. Additionally, a new subclass called TextFile (or something similar) will be introduced. This new class will also handle BOM concerns.

Do you think that will be sufficient?


This all sounds like a reasonable way of handling it. I'll comment on it again when I see the new code, however.

~John
Back to top
View user's profile Send private message Send e-mail AIM Address
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> Mango All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group