[Mono-dev] Wishlist for the new IR
massi at ximian.com
Thu Nov 17 07:12:22 EST 2005
I see that work on the new IR will start soon...
Here is a list of things that I think it should have.
This is just from the top of my head, I didn't re-read all of the
HSSA code to see every point in which I would have liked something
more, but anyway if we'll have the infrastructure to handle all the
issues I present here adding the missing bits will be trivial ;-)
Of course these are just "wishes", let's discuss them...
I just wanted to discuss them before the work starts.
*** Declarative opcode metadata
For each opcode, these informations should be easily available,
also to offline tools that could be used to generate pieces of
C code at build time:
- Arity: now there's mono_burg_arity, and it is invaluable, but I
wish the information were provided offline (and just checked by
monoburg or anything that will replace instruction selection).
- Possible (i.e. allowed) "stack types" for arguments and result.
I know we'll not have the concept of "evaluation stack" anymore
because the IR will be linear, but the info is still useful in
itself applied to input and output virtual registers.
Moreover, it should be exactly known which are the input values
used in the opcode, and in which MonoInst field they can be
- It should be clear if the opcode has special "side effects" (like
OP_CALL), or if its *only* effect is read from its argument[s]
and write to its destination virtual registers.
Moreover, it should be clear if the semantics of the opcode is
such that, given identical input values, it will always give the
This is really important because certain classes of optimizations
can be applied (or not) depending on this.
Now in HSSA, SSAPRE, alias analysis, and in practice everything
else I rely on giant switch statements, which are *fugly* (and
often quite fragile, because there's always some opcode which I
don't fully understand so the "default" case is generally a
"safe fallback" and not an "assert this doesn't happen").
- It should also be clearly stated in which "stage" of the JIT
each opcode can be legally found.
Now we have a general distinction between CEE_* and OP_*, but
it is almost meaningless because most CEE_* opcodes are re-used
in the IR.
If it were for me, I'd also consider a radical change, which
would be avoid reusing CEE_* values at all (and of course create
all the necessary OP_* values to handle this).
Quite often, the semantics of CEE_ opcodes changes subtly between
their use in the CIL stream and their use in the IR, and this
would give us troubles in keeping the opcode metadata meaningful.
Also, consider that mono_method_to_ir is a giant switch anyway,
and inside it the benefit of reusing the CEE_* values is so small
that I would say it is nonexistent.
This would also make things much clearer "by default": CEE_*
opcodes would be allowed in the CIL stream (plus the CEE_MONO_*
additions in the case of wrappers I guess).
Inside the IR we'd just have OP_* opcodes, ant their numbers
would have *nothing* to do with the CEE_ ones (they would simply
restart from zero).
Note that in any case we should have declarative knowledge of which
OP_* opcodes are allowed in each JIT stage.
- Then there are a couple of operations where we lose too much
information in our IR: field accesses and method calls.
We keep CEE_LDELEMA opcodes, but we lose field access opcodes
after mono_method_to_ir. In HSSA I coded around this, but having
the information explicit would makes things easier, and also
allow more effective alias analysis.
And about method calls the issue I have is that it is not possible
to relate each OP_OUTARG[_*] opcode (which is an actual argument)
to its formal argument. Again, I coded around this but in the
future having the information would allow more precise alias
analysis (distinction between out and in-out parameters comes to
mind, but most of all *global* analysis when doing AOT).
Also, sometimes we read vtable values, which are likely to be
read-only in practice (or under the right conditions).
Now this information is totally lost, in the IR there's just a
lot of pointer arithmetic.
- I'd also like to see a general rework of the "ssa_op" and "flags"
MonoInst fields. Quite often they encode information that should
be known at build time, or information that belongs to local
variables (see below for a discussion on how they should change).
And anyway, we should have declarative knowledge of the exact set
of flags allowed for each opcode.
*** Storage (local variables and virtual registers):
- Local variables should not have a MonoInst anymore IMHO (which
means that OP_LOCAL and OP_ARG opcodes shuld disappear).
The rationale is that in a linear IR it is obvious that the
instruction operands are virtual registers and not MonoInst
structs, so it's pointless having special MonoInst opcodes to
This would have the nice side effect of making the two parallel
arrays for locals ("varinfo" and "vars") go away, and be replaced
by a single one.
- As part of this change, we should make "MonoMethodVar" go on a
Without jokes, with the linear IR the number of locals will go
up, so it would be nice to make them cheaper.
Ideally, every virtual register (VR) should just have:
- An id (a number).
- A MonoType (what kind of value can be stored in the VR).
- A few flags (volatile, address needed => don't allocate on
a HW register...).
- Liveness info, and maybe space for "regalloc hints".
Note that with HSSA all the following fields are *not* needed:
dfrontier, def_in, def, def_bb, uses, cpstate.
HSSA is already working *without* them, so when HSSA will replace
SSA they could just go away.
- More generally, this means we should think about how exactly we
should represent VRs and their relation to the original locals
and arguments in the CIL code.
- Another idea I have about VRs: now many values are "homeless",
because they implicitly "flow" in the MonoInst tree nodes (which
is just another representation for an evaluation stack).
These "homeless" values are necessarily local to one BB (the BB
which now contains the MonoInst tree).
Moreover, various "temp" locals created by the JIT (but not all
of them) clearly have the same property.
Increasing the number of local variables (VRs) has bad effects
on the performance of global data flow analysis (like liveness,
but in general those kind of algorithms).
I would propose to have two kinds of VRs:
- Global to the method.
- Local to a specific BB.
This way, if we keep the global ones indexed by one single array
where the local ones do not appear, every pass of "global data
flow analysis" will just work on this array, which will likely be
much smaller than the full set of VRs used in the method.
Also, "local" versions of algorithms will be allowed to work freely
on the local VRs, knowing that they will be dead at the end of the
BB, which is very handy.
- All the opcodes that now represent "special" storage and/or values
(OP_RETARG, OP_THREAD_LOCAL, OP_AOTCONST...) should be handled in
one of these two ways:
- They become "special VRs" that contain the "special value".
- They become special operations, with no arguments, that write
that "special value" in their destination VR.
In any case their semantics should be clarified.
OK, that's it.
Take this list as something between "requests", "suggestions" and
"wishes", and anyway as a starting point for discussion :-)
More information about the Mono-devel-list