Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

SocketConduit + LineIterator Performance Troubles

Moderators: kris

Posted: 08/10/07 20:37:56

After using oprofile a bit, trying to figure out why when sustaining 100mbit/s throughput, my SMTP daemon was using 77% cpu+, I discovered the issue:

CPU: AMD64 processors, speed 1994.89 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
samples % image name symbol name


70617 41.7924 smtp _D5tango4text6stream12LineIterator20T12LineIteratorTaZ12LineIterator4scanMFAvZk

70617 100.000 smtp _D5tango4text6stream12LineIterator20T12LineIteratorTaZ12LineIterator4scanMFAvZk


Any ideas what can be done to increase the performance in this situation?

Basically doing:
LineIterator?!(char) it = new LineIterator?!(char)(socketConduit);
foreach(char[] line; it)
{

}

The sample messages I'm sending through are about 114MB in size, so there's several thousand lines being sent.

Author Message

Posted: 08/11/07 00:04:03

This is really Kris' domain, but looking over LineIterator?, StreamIterator?, and Buffer I /think/ Buffer works like this (based on the code in Buffer.next):

while LineIterator can't fine a newline between buf[0 .. $] do
    get more data    

So if the SocketConduit? is only returning a few bytes at a time, LineIterator? will pass over the same buffer a whole bunch of times before a newline is read. I'm not really sure how to suggest fixing this though, assuming this is the problem. Perhaps use a specialized Buffer that scans only from the endpoint of the previous scan rather than the current read position every time. Beyond that, for real throughput I'd probably opt for a push-oriented IO model like Selector or the announced-but-not-yet-implemented multiplexing IO design juanjoc is developing. Both /should/ allow you to scan for newlines in a more customized manner, though they will likely require a somewhat different programming model to use.

Posted: 08/11/07 18:20:19

as long as the data is arriving in reasonably large chunks, and the buffer size is suitably sized, then scanning for line-endings in a streaming iterator is pretty darned fast (much faster than concatenating one char at a time, and no heap activity). I'm at a loss to explain why you're seeing such high CPU activity, but then what is the overall throughput? Perhaps it's chewing through the email fast enough?

You will see some data-shuffling in a Buffer when it runs out of data. Basically, the tail of the buffer is moved to the front, and new content is read to fill in the tail. As sean says, that scenario will result in a rescan of the existing tail also. The bigger the buffer, the less this will happen - though the overhead is actually pretty low.

Another way to speed up would be to use the tango.text.Util code to search for line endings within LineIterator?, since it uses assembler on an x86 machine. Fastest way overall would be to read the whole thing into memory at one time, and thus avoid the streaming overhead - but I rather doubt that the latter is such a big issue and that approach might even take longer overall?

Posted: 08/14/07 17:07:47

I'll try modifying the scan function to use the x86 asm.

It definitely seems to be the scan function itself that's slow. To compare, in C, I would just use strchr to search, which is fairly fast, and would use about 20% or so CPU on the same machine at 100mbit/s throughput, using an algorithm identical to what Sean describes.

Posted: 08/15/07 17:23:40

So, after some testing.. I was wrong.

The issue was not using a buffer to do the writes, and having it flush every time. Doing a straight socket read (no writing) CPU usage on this box is about 10% (at 100MBit/s throughput, ~76byte lines)

The difference from using the regular foreach scan function, and indexOf is about 4% or so. indexOf would be at 6-10%, regular scan function 10-12%.

Here's the new scan function I'm using:

    protected uint scan(void[] data)
    {

        T[] content = convert (data);
        if (content)
        {
            uint pos = Text.indexOf!(T)(content.ptr, '\n', content.length);
            if (pos != content.length)
            {
                int slice = pos;
                if (pos && content[pos-1] is '\r')
                    --slice;
                set(content.ptr, 0, slice);
                return found(pos);
            }
        }
        return notFound(content);
    }

Here's the CPU usage difference between the two functions using oprofile:

no indexOf

CPU: AMD64 processors, speed 1994.87 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               app name                 symbol name
-------------------------------------------------------------------------------
6361     15.9029  smtp                     smtp                     _D5tango4text6stream20LineIteratorEnhanced20__T12LineIteratorTaZ12LineIterator4scanMFAvZk
  6361     100.000  smtp                     smtp                     _D5tango4text6stream20LineIteratorEnhanced20__T12LineIteratorTaZ12LineIterator4scanMFAvZk [self]
-------------------------------------------------------------------------------
3953      9.8827  vmlinux                  vmlinux                  nv_nic_irq
  3953     100.000  vmlinux                  vmlinux                  nv_nic_irq [self]
-------------------------------------------------------------------------------

indexOf

CPU: AMD64 processors, speed 1994.87 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               app name                 symbol name
-------------------------------------------------------------------------------
7085     11.2750  vmlinux                  vmlinux                  nv_nic_irq
  7085     100.000  vmlinux                  vmlinux                  nv_nic_irq [self]
-------------------------------------------------------------------------------
6278      9.9908  smtp                     smtp                     _D5tango4text4Util14__T7indexOfTaZ7indexOfFPaakZk
  6278     100.000  smtp                     smtp                     _D5tango4text4Util14__T7indexOfTaZ7indexOfFPaakZk [self]
-------------------------------------------------------------------------------