FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Binary serialization - the 64 bit question

 
Post new topic   Reply to topic     Forum Index -> Doost
View previous topic :: View next topic  
Author Message
baxissimo



Joined: 23 Oct 2006
Posts: 241
Location: Tokyo, Japan

PostPosted: Thu Jun 26, 2008 4:28 pm    Post subject: Binary serialization - the 64 bit question Reply with quote

So any thoughts on how to deal with a 64-bit platforms?

I don't have any experience in that area, but from what I understand, the length parameter of an array will be 64-bits on such a platform. I think that's the main thing that will affect serializaition, since no pointers are saved in the stream directly.

But also I think size_t and maybe a handful of other types are different lengths, so anyone who tries to save a size_t on a 64-bit platform and load it on a 32-bit platform is in trouble. It will be saved as ulong, but get loaded as just uint. And there will be 4 extra bytes sitting there in the stream.

For arrays we can do something about it. But for size_t and the like, I don't see how the current automatic approach can deal with that, given that size_t is just an alias.
Back to top
View user's profile Send private message
aarti_pl



Joined: 25 Jul 2006
Posts: 28

PostPosted: Thu Jun 26, 2008 5:45 pm    Post subject: Reply with quote

Well, now I see that it is just a variation on different 'real' size on different platform.

There is no problem with types as arrays e.g. strings because the real length of array is always written in the byte stream. It is somewhat different in case of numbers, as they size is implicitly kept in they type.

So the solution would be to store type size explicitly in stream. Then you have option when deserializing: you can check if stored size is same as your platform type size. If not, you can fail fast or try to convert and throw only when overflow is occurring.

I think it should be configurable through some kind of policy on serializer. Trying to convert should be default I think. And the same applies for reals also, although conversion when sizes differ will be more difficult.
Back to top
View user's profile Send private message
aarti_pl



Joined: 25 Jul 2006
Posts: 28

PostPosted: Thu Jun 26, 2008 6:02 pm    Post subject: Reply with quote

On 8 bits there should be enough place to put length of number (4bits) and also some simple type information (other 4 bits). I mean info about simple types + other in case of e.g. size_t. It should be possible. It will give possibility to do some more checks.
Back to top
View user's profile Send private message
baxissimo



Joined: 23 Oct 2006
Posts: 241
Location: Tokyo, Japan

PostPosted: Thu Jun 26, 2008 6:09 pm    Post subject: Reply with quote

aarti_pl wrote:
On 8 bits there should be enough place to put length of number (4bits) and also some simple type information (other 4 bits). I mean info about simple types + other in case of e.g. size_t. It should be possible. It will give possibility to do some more checks.


I got some handy routines from Tom S over on irc to save a variable length encoded int. Kind of like utf8 for integers.

So I think I'll use that for array lengths. It means that length is only 1 byte for short arrays, and can go up to 64 bits as needed. On a 32 bit platform, obviously you'll never hit 64 bits for the output. But for input side you can generate an intelligent error if an int that's too big gets read in.

[edit]
Actually, thinking about it some more, the only things that are likely to be affected by the 32/64 split are size_t's. I guess ptrdiff_t is the other one, but we don't save pointers so I can't see any legitimate reason for saving the difference of two pointers. So size_t can be either uint or ulong. That means if we just save all uints and ulongs in variable length format, then (A) we'll probably save some space, and (B) we don't have to worry about whether size_t is uint or ulong. We just try to load it, and if it doesn't fit into the type being loaded to we report an error.

The only problem I see is if someone is using special codes like cast(uint)-1 to mean "NOT_FOUND" or "NO_INDEX". If they are, those won't be reconstructed properly when going across the 32/64 divide.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> Doost All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group