Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

fopen with large file support

Moderators: larsivi kris

Posted: 07/09/09 11:44:25

does tango stdc 0.99.8 compiled with large file support (64bit) since i'm used to fopen and i need support for 64bit offsets/pointers.

Author Message

Posted: 07/28/09 01:52:47 -- Modified: 07/28/09 11:08:10 by
kar

*removed*

Posted: 07/28/09 13:40:17

Sorry for the late reply; Tango should use large file support, but it has historically been somewhat clumsy to support, so I can't promise that you won't meet issues with it.

Posted: 07/28/09 13:54:31

Btw, I had the time to read your original post, and we'd like to hear about any multithreading issues you've had, and why you found it necessary to use curl (I assume it is used for fetching web pages), and why you used libxml2 instead of Tango's XML package (libxml2 is to my knowledge magnitudes slower than Tango).

I understand that you may not always have the time to wait for a response from us, but some issues may be solvable in relatively short time. Or we could try to solve them for later releases - it would certainly be useful to know of them.

Posted: 07/29/09 01:46:53

Thanks, at 1st my attempt at d/tango was inspired by dlucene (d/phobos), but we actually making a real search engine not open source library. we used HttpClient? + Uri module in our crawler (multiple crawler instances + single indexer setup) but it failed after 200-300 docs, we tried with different seeds and still the same, it stopped at random url not the same problematic url (ill try to reproduce the error/exception codes) so we resorted to libcurl. Issues with tango.xml is minor but we need a strong and stable html parser that was our main reason for libxml. There are lots of things to consider when building a search engine for web, our primary focus is performance and stability, and in most area we just use tango's module and they did great.

i'll have to dig up old backup archive for tango.xml and tango.net version to reproduce the bugs. ill post it here as soon as i found it.

* some info on our project:

- distributed search engines with each server holding up to 100-150 mil docs for performance. - custom index format (word level inverted index, with packed original source text) - independent indexers with built-in crawler. - custom ranking algorithm, modified bm25 + phrase proximity - url queue server (tango's linked list + sqlite)

also we had to built custom file stream to support our 3-bytes uint and 5-bytes ulong, integer data. we used this method instead of vint to pack integer since vint require twice I/O overhead which is expensive.

thanks again for the replies.

Posted: 08/08/09 11:27:56

I experienced the same problem with the HTTPClient module. It broke after a while mostly after 7000 GET requests but we used 50 parallel threads. I'll try to break it down too. Wired stuff.

Posted: 08/13/09 18:16:05

OK, looks like the Exceptions are thrown by the HttpClient? stop closing the sockets correctly. Therefore after a while you just run out of sockets.

Posted: 08/14/09 11:21:36

see Ticket #1723