[Mono-dev] More updates on Mono (before the call)

Sergei Dyshel qyron.private at gmail.com
Thu Sep 16 13:15:47 EDT 2010

I'm very sorry, this post was intended for another mailing-list
Moderators, please delete it.
Sergei Dyshel

On Wed, Sep 15, 2010 at 22:59, Sergei Dyshel <qyron.private at gmail.com>wrote:

> Hi,
> I've almost finished tuning Mono's Altivec performance. The results are ,
> as usual, in this table:
> https://spreadsheets.google.com/ccc?key=0AhjvSAvEoHopdG1LUE9Zdkd1TTZIQ0FCWl82bU5Fa1E&hl=en&authkey=COqyrPMD
> <https://spreadsheets.google.com/ccc?key=0AhjvSAvEoHopdG1LUE9Zdkd1TTZIQ0FCWl82bU5Fa1E&hl=en&authkey=COqyrPMD>There
> are much more "blue" ratios now but there are still some optimization issues
> I couldn't solve:
> 1) 'mmm_intrchage' uses a different expression for alignment checking
> (versioning) and this expression is somehow isn't constand-folded during
> JITing. This results in twice bigger code and register allocator just can't
> act effectively there. By enabling full optimizations in Mono I could
> partially solve this problem but is not the best solution (since this
> increases compilation time).
> 2) 'video_dissolve_fp', 'saxpy_fp', 'dscal_fp' are all variations of simple
> 'a[i]=b*c[i]+d[i]' floating-point loop. The aligned version, generated by
> vectorizer, looks (in Gimple) like: "*(&a+i) = b* (*(&c+i)) + *(&d+i)" and
> this is converted further to CIL. Since Mono has no inter-bb constant
> propagation and all array's addresses are know at JIT time, all 3 addresses
> are generated by Mono in each iteration (and it takes 3 PPC instruction for
> each address). I think this is the reason for bad results but the ratios
> these benchmarks behave rather differently. Anyway, it would be much better
> if arrays' addresses were saved to locals in loop prolog and then used in
> each iteration.
> 'video_dissolve_s8' and 'small_sad' still need to be implemented/analyzed.
> Tommorow I'll update the numbers for SSE. I anticipate  an improvement after
> recent tweaks I've added to Mono but it won't so good as with Altivec,
> mostly because x87 instruction set is more stack-based so floating-point
> code doesn't get optimized as simply as on PowerPC. Anyway, let's
> wait until tomorrow's results...
> That's all, folks! (c)
> --
> Regards,
> Sergei Dyshel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20100916/e7baf2ad/attachment.html 

More information about the Mono-devel-list mailing list