From xemacs-m  Fri Feb 14 11:10:45 1997
Received: from portofix.ida.liu.se (portofix.ida.liu.se [130.236.177.25])
	by xemacs.org (8.8.5/8.8.5) with ESMTP id LAA22021
	for <xemacs-beta@xemacs.org>; Fri, 14 Feb 1997 11:10:43 -0600 (CST)
Received: from sen2.ida.liu.se (sen2.ida.liu.se [130.236.176.112]) by portofix.ida.liu.se (8.8.3/8.8.3) with SMTP id SAA02842 for <xemacs-beta@xemacs.org>; Fri, 14 Feb 1997 18:10:42 +0100 (MET)
Received: by sen2.ida.liu.se (SMI-8.6/ida.slave-V1.0b6d6S2)
	id SAA01144; Fri, 14 Feb 1997 18:10:42 +0100
Date: Fri, 14 Feb 1997 18:10:42 +0100
Message-Id: <199702141710.SAA01144@sen2.ida.liu.se>
From: David Byers <davby@ida.liu.se>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
To: xemacs-beta@xemacs.org
Subject: Re: Whoops.. a new way to crash it? (20.0-final3)
In-Reply-To: <m220akqsc4.fsf@altair.xemacs.org>
References: <199702131837.NAA36776@black-ice.cc.vt.edu>
	<m220akqsc4.fsf@altair.xemacs.org>
X-Face: (@~#v$c[GP"T}a;|MU<%Dpm5*6yv"NR|7k;uk8MAISFxdZ(Og$C{u(j"9X7v$qonp}SKfhT
 g|5[Pu~/3F7XQEk70gK'4z%1R%%gg7]}=>/jD`qcBeHDgo&HS,^S!&.zoTSxh<>-O6EB?SSy96&m37

> > Lstream_flush_out(??) at 0x10123880
> > Lstream_flush(??) at 0x10123aac
> > encoding_flusher(??) at 0x1013b060
> > Lstream_flush_out(??) at 0x10123a30
> > Lstream_flush(??) at 0x10123aac
> > Lstream_pseudo_close(??) at 0x10121574
> > Lstream_close(??) at 0x10122860
> > sweep_lcrecords_1(??, ??) at 0x10029e8c
> > gc_sweep() at 0x10027af4
> 
> Hmm, a backtrace without line numbers in garbage collection. :-(

I think I can explain it, because this is almost exactly the same
backtrace I got from encode-coding-region.

When you close an lstream using Lstream_close, any buffered output is
flushed by Lstream_flush. This calls whatever flusher is associated
with the lstream. Some of these will signal errors through the use of
signal_1. For instance, a flusher that writes to a lisp buffer will
signal a read-only buffer this way. 

The garbage collector, when sweeping garbage, calls a function called
a finalizer for each object it sweeps. The finalizer for an lstream
calls Lstream_close to make sure the lstream is closed before it is
removed. If the stream is open and has buffered output, that output
will be flushed using Lstream_flush. 

The problem Valdis Kletnieks is running in to is that an error is
being signaled through signal_1 while the finalizer method for the
lstream is executing. But signal_1 is not permitted during garbage
collection, and so an assertion fails and Emacs dumps core. This is
nearly identical to the problem I had with encode-coding-region and
decode-coding-region.

One possible source of problems like these is this type of code (and
code similar to this can be found in several places):

    outstream = make_encoding_output_stream (XLSTREAM (outstream), system);
    GCPRO (outstream);
        /* Do some work here */
    UNGCPRO;
    Lstream_close (outstream)
    /* More cleanup here */

If Lstream_close fails outstream will be left open but not protected
from garbage collection and the rest of the cleanup code will not be
executed. If, when the garbage collector is called, the state that
caused Lstream_close to fail still exists, Lstream_close will fail
again, but this time with fatal results. 

What you probably need is a way to call Lstream_close that will close
the stream, errors or not. If an error occurs while flushing the
stream it may make more sense to lose it and signal an error than to
keep it around and try to flush it again during garbage collection,
hoping that conditions have changed to permit flushing the data. As a
note here I might mention that I experimented with this: cause an
Lstream_flush to fail and then remove the cause of failure (a
read-only buffer) and see what happens in garbage collection.
I was rewarded with asserstion failed at line 1874 in eval.c and a
core dump because of some other error in Lstream_flush.

Another option would be to have a way to call Lstream_flush that
ensures tha no calls to signal_1 are made, and use this in the
finalizer for lstreams. 

No matter which option you select, you probably have to clean up code
like the example I gave above. Unwind-protecting it will probably do
the job, but it won't be pretty.


Hope all that made sense...


In case anyone was getting ideas:

No, I'm not volunteering. Even if I may know where to start, I have no
clue about where to go next. Besides that, I really don't have time.

--
David Byers.

