[Mono-dev] CIL to CIL optimizer
Bjarke Hammersholt Roune
bjarke.roune at gmail.com
Mon Aug 14 13:39:45 EDT 2006
Massimiliano Mantione skrev:
> First things first: it's nice you'll do the CIL-CIL optimizer, but
> there are plans to do this in the JIT anyway, where these things
> I say "where these things belong" not to discourage you, but because
> the JIT has generally more visibility on how the code will really
> look like (it can inline methods, knows that the address of certain
> things maybe is a already constant, and in the future will also have
> visibility of the method prologue-epilogue), so things like SCCP and
> PRE should be performed by the JIT anyway if we want to take full
> advantage of them.
> Not to mention the issues involved in tuning redundancy elimination,
> which can easily get architecture dependent...
> That said, obviously until the JIT "is not there" having an external
> optimizer can be useful :-)
I think optimization of something like CIL belongs in BOTH the JIT and
in a CIL-CIL optimizer (I abbreviate this as CCO in the rest of the email).
There are several reasons that I think this. You have already given some
good reasons that optimization belongs in the JIT even in the face of an
excellent CCO. The CCO also has to take extra care to preserve the
constraints of the CIL format, which can be burdensome. E.g. CIL
concepts cannot be eliminated by translating them to something simpler,
at least not without being able to translate them back later when the
optimized assembly is to be output.
There are reasons that a CCO is desirable too, though:
1. The JIT cannot afford to do expensive whole-program optimizations and
a CCO can, so the CCO can optimize the program in ways that the JIT
cannot. In the same vein, the JIT will only aggressively optimize hot
parts of the code while a CCO can take its time and optimize everything.
2. It takes time before the JIT has optimized the running program, while
the effect of the CCO will be available at start up.
3. A CCO can decrease the size of an assembly by such optimizations as
dead code/function/class elimination (the last two only in the absence
of reflection), better use of the stack, better instruction selection,
redundancy elimination, better ordering of basic blocks and so on. This
decreases start up time.
4. Some JITs do not optimize very aggressively. CCO optimizations are
available to all JITs, and different CCOs can be run in sequence.
5. Some optimizations, like dead code elimination, can be done in a CCO
and then the JIT does not have to do them (or does not have to do them
as much). The CCO could embed some metadata in the assembly to indicate
which optimizations have already been performed. The same trick applies
to the results of expensive analyzes, though relying on that information
without checking it could be unsafe.
6. A CCO can output data that enables the JIT to start up faster and
optimize better. An example is SafeTSA which is a Java-equivalent
bytecode format where the code is already in SSA form and where some
array bounds checks and some null checks can be safely eliminated. On
top of this the SafeTSA format actually takes up less space then the
normal Java bytecode format. Another example is a paper I read that
explained how to annotate Java bytecode so that the JIT could do fast
and high quality register allocation in little time. This scheme is,
surprisingly, independent of the number of registers on the target machine.
The last two points have the drawback that they require respectively
slight and extensive cooperation between the CCO and the JIT. I don't
plan to implement point 6, btw.
As a practical matter, I also think that it will take less effort to
implement optimizations in ML than in C. On a personal note I have been
programming a Groebner Basis computation program for quite a while in
C++, and I am looking forward to programming in a functional-style
I cannot foresee how problematical the natural barrier to optimization
in a CCO that Miguel mentions will be. It is certainly there in the
sense that a JIT can do more optimization than a CCO if it had all the
time in the world. I guess I will have to write a CCO and see what
happens to find out.
In any case, my CCO would certainly serve the purpose that Miguel
mentions of cleaning up the output of mcs.
> Then, about the CFG shape and issues when try[/catch]/finally clauses
> are present, this discussion already reached mono-devel-list:
> I'd really like you to comment on that thread.
I think that discussion is a good example of how the different
constraints on JITs and CCOs play out.
You are discussing whether to let the CFG reflect the possibility that
an instruction could throw an exception. On one hand this would improve
precision of analyzes, but it would also make the CFG much larger which
could increase memory consumption and decrease performance of the JIT
optimizer. It sounded like you ended up agreeing on the less precise but
more efficient approach.
The less precise approach did not even occur to me as a CCO is not as
resource constrained as a JIT. I did think for quite a while about
whether or not it was a good idea to let all my analyzes know about
extended basic blocks, which in a sense is the best of both worlds, but
I decided it was not worth the effort.
I'm sure you know, but for any other people reading this, the issue I
raised in my initial email on the finally construct was on how to make
the CFG even more precise than is considered in the discussion you
linked to. I.e. the issue of what to do about the CFG edges going out
from the finally block.
I have decided what to do about that. My first implementation will
simply eliminate finally by inlining the finally block. A more
sophisticated approach that I might look into later would be to extract
the finally block into a function with ref arguments and insert calls to
this function that are impervious to code motion. These functions calls
could then be replaced by the finally construct just before emitting the
optimized CIL. This should work well together with whole-program
optimization. A hybrid approach that inlines small finally blocks and
preserves large finally blocks would probably be a good idea, especially
if Mono cannot analyze finally blocks precisely.
> Finally, about building SSA form, my favorite paper is this:
> I really like the dominance algorithm at the beginning of the paper
> you mentioned, an implemented it once, it's so easy it just took a
> few hours to have it running and fully debugged :-)
> But the rest of the paper (about dominance frontiers) was a bit hard
> for me, I think the one I linked is well explained.
> And BTW, there's already a C implementation of it for Mono in my
> old HSSA patch here:
> Keep in mind that HSSA is a big beast, including GVN and full alias
> analysis, tracing also heap bases values, so one can do PRE and SCCP
> taking all those things into account, and also register promotion.
> That patch implements most of the infrastructure, enough to have a
> working deadce implementation.
Thanks for the tip!
Do you know if there is a performance impact of entering and exiting try
blocks in Mono, even if no exception is ever (dynamically) thrown?
More information about the Mono-devel-list