[Mono-dev] Stability regression on recent git head

Dick Porter dporter at codicesoftware.com
Wed Sep 15 06:55:54 EDT 2010

Hi all

We've been testing the sgen GC with our server, as part of the effort to
stabilise it.  Recently however we've noticed that the runtime has been
very unstable, in particular using the _Boehm_ GC.

I can start our server with this morning's git head runtime, with Boehm
GC, and as soon as I issue a client command (which uses remoting) I get
the following stack trace on the server:

Plastic SCM daemon up. 1151 ms startup time

  at (wrapper managed-to-native) Mono.Unix.UnixSignal.WaitAny
(intptr[],int,int) <0x00003>
  at (wrapper managed-to-native) Mono.Unix.UnixSignal.WaitAny
(intptr[],int,int) <0x00003>
  at Mono.Unix.UnixSignal.WaitAny (Mono.Unix.UnixSignal[],int) <0x0011e>
  at Mono.Unix.UnixSignal.WaitAny (Mono.Unix.UnixSignal[]) <0x00012>
  at Codice.CM.Daemon.Daemon.HandleSignals () <0x0013a>
  at Codice.CM.Daemon.Daemon.LaunchUnixDaemon
(Codice.CM.Server.ISystemRunner,string) <0x00036>
  at xy.c (Codice.CM.Server.SystemRunner) <0x0005f>
  at xy.a (an) <0x0035c>
  at xy.a (string[]) <0x000b1>
  at (wrapper runtime-invoke) <Module>.runtime_invoke_int_object
(object,intptr,intptr,intptr) <0x0008f>

This is 100% repeatable, with the identical stack trace every time.
Interestingly, it doesn't happen with the sgen GC though I have seen
this stack trace appear intermittently with sgen, which suggests to me
that there might be some memory corruption going on that is more likely
to be tickled by the Boehm GC.

The actual line of code that triggers the segfault is in
mono/support/signal.c, in wait_for_any():

diff --git a/support/signal.c b/support/signal.c
index abd7638..a7f97fa 100644
--- a/support/signal.c
+++ b/support/signal.c
@@ -351,7 +351,7 @@ wait_for_any (signal_info** signals, int count, int
                        ptv = &tv;
                r = poll (fd_structs, count, timeout);
-       } while (keep_trying (r) && !shutting_down ());
+       } while (keep_trying (r) /*&& !shutting_down ()*/);

        idx = -1;
        if (r == 0)

Commenting out the delegate call cures the crash for me.

I reopened bug https://bugzilla.novell.com/show_bug.cgi?id=592981 with these traces, but
as no-one has commented on it in a couple of weeks I'm highlighting it here too.

- Dick

More information about the Mono-devel-list mailing list