[Mono-dev] [PATCH] A "fastpath" dead code elimination
massi at ximian.com
Tue Nov 15 11:29:35 EST 2005
The attached patch implements deadce, but without using SSA.
Instead, a simple alias analysis pass takes care of understanding
when local variables [may] change value, and liveness computation
has been modified to use this aliasing info if present.
The alias analysis pass has O(n) complexity (n = code size), it is
just a linear sweep on the list of instructions.
Then, deadce operates one BB at a time, scanning the code linearly
and using the liveness bits as start/end conditions (so it is O(n)
I positioned the deadce pass just between the liveness computation
and the linear scan register allocator, so that:
- it can reuse the liveness computation already computed for the
linear regalloc, without requiring a new pass, and
- it is "downstream" in the compilation pipeline, so it has the
possibility to catch as many dead instructions as possible.
The name "fastpath deadce" comes from Patrik Torstensson :-)
He called it like this on IRC because it is on the fast compilation
path (the one without SSA).
But now, without other technical stuff, some numbers...
First of all, the JIT overhead: the time to JIT mscorlib.dll with
various compilation options, in seconds (time mono --compile-all):
Options -all -all,ssa
JIT time 1.05 1.35
Here we see that just computing SSA has a 30% overhead with respect
to a compilation with no optimizations at all.
Options default ssa deadce ssa,deadce
JIT time 1.32 1.55 1.45 1.6
And here we see that the overhead of fastpath deadce on a "default"
compilation is almost 10%, while ssa deadce has 21% (and note that
in practice, since ssa does not work with aliasing, the new fastpath
deadce is applied to more methods).
So, the JIT overhead is reasonably small.
And finally, some lovely benchmark...
I tried SciMark, and here are the results:
(MFlops, just the composite score)
So, the idea is that fastpath deadce has some effect :-)
It is not as effective as ssa deadce, but it is useful anyway and
most of all it does not incur the same JIT time overhead.
BTW, I spent a *lot* of time trying to be sure that those numbers
are accurate. I had to kill evolution, gaim, beagle, the wireless
card, and carefully monitor system load otherwise the results were
erratic (just of a few percent, but this is what we're looking at).
I then executed the benchmark several (>10) times for every case
and took the *best* score for each (with the idea that it is the
one where other things interfered less), and anyway all the best
scores were almost identical.
A "funny" thing: these are the results I have on a SciMark.exe
compiled with MS .NET 2.0:
As you can see, they are *low*!!!
I still have to understand what the 2.0 csc does exactly, but it
looks a real disaster for us :-(
Watch out this quirk when doing/examining benchmarks!
OK, that's all... please approve the patch :-)
We can always refine it later...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 58443 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20051115/a66e3e20/attachment.bin
More information about the Mono-devel-list