[Mono-devel-list] mono AES performance woes (was: poor PPC JIT output)
allan at counterpop.net
Mon Jul 18 15:18:34 EDT 2005
On Jul 18, 2005, at 2:59 AM, Paolo Molaro wrote:
> On 07/15/05 Allan Hsu wrote:
>> Is there any reference on what sorts of things you can change using
>> mono_set_defaults? Following the mono source for references to that
>> function wasn't particularly enlightening. It would be useful if the
> grep mono_set_defaults *.c
> mini.c:mono_set_defaults (int verbose_level, guint32 opts)
> Should be pretty evident. Just always use the result of
> mono_parse_default_optimizations (NULL) as the opts value.
I understood the verbose_level parameters, but the opts parameter was
what mystified me. I should have been more specific about what I was
looking for. At the time, I didn't understand the value that
mono_parse_default_optimizations() returns or what values you can
pass in to affect it. I've since traced it back to the relevant code
in driver.c and the mini-X.c platform code now and see how it works.
Is it safe to mess with those parameters, or will it cause undefined
>> To be fair, the native implementation is able to take advantage of
>> bit processors when available, while all mono builds in the above
>> benchmarks are 32-bit. The Windows XP machine is the standard 32-bit
>> install, even though the processor is 64-bit. This is a pretty
>> informal benchmark, but all I'm interested in showing here is how bad
>> the AES performance under mono is.
> The current implementation causes lots of spilling and other
> unnecessary work which the jit doesn't remove (the work massi is
> doing should improve this). Some parts of it can be easily changed
> to use unsafe code and that should improve performance a lot: I'll
> that to Sebastien:-)
This is good to hear. I hope the benchmarking I did will provide some
information that somebody will find useful.
For my specific application, there is no such thing as "enough"
performance:) I plan on writing a managed wrapper around libcrypto
for this reason. This will be the subject of another email.
>>> Some of the data looks definitely bogus: it reports a stall even on
>>> the addi, here:
>>> 0x2e143c8 lwz r4,32(r1) 3:1 Stall=2
>>> 0x2e143cc lwz r5,12(r4) 3:1 Stall=2
>>> 0x2e143d0 cmplwi r5,0x0000 3:1 Stall=2
>>> 0x2e143d4 blel $+696 <0x2e1468c [8B]> 2:1
>>> 0.4% 0x2e143d8 addi r4,r4,16 2:1 Stall=1
>> As for the stall statistics, you have misread them. Each line that
>> says "Stall=N" is saying that the instruction latency of the marked
>> instruction will cause a subsequent dependent instruction to stall,
>> not that the marked instruction itself will stall. N is the maximum
>> number of stall cycles for the nearest dependent instruction. The
> Since the tool reports that the addi stalls only sometimes (check the
> similar code sequences where no stall is reported), my take
> is that your interpretation or the data reported is not correct.
I'm not sure if my meaning came across. The line next to the addi
instruction that says "Stall=1" means that a dependent instruction
*following* the addi looks like it will stall while waiting for the
results from addi, not that the addi instruction itself will stall.
The code that follows that specific instruction looks like this:
0.4% 0x2e143d8 addi r4,r4,16 2:1 Stall=1
0x2e143dc lbz r4,0(r4) 3:1 Stall=2
0x2e143e0 add r3,r3,r4 2:1 Stall=1
0x2e143e4 stw r3,44(r1) 3:1
The instruction latency of the addi instruction is 2 cycles; the lbz
that immediately follows the addi is dependent on the addi. The lbz
will stall for 1 cycle. That is what the Shark output is trying to say.
Allan Hsu <allan at counterpop dot net>
1E64 E20F 34D9 CBA7 1300 1457 AC37 CBBB 0E92 C779
More information about the Mono-devel-list