From xemacs-m  Sat Mar  1 15:56:07 1997
Received: from mecca.spd.louisville.edu (mecca.spd.louisville.edu [136.165.40.148])
	by xemacs.org (8.8.5/8.8.5) with SMTP id PAA21915
	for <xemacs-beta@xemacs.org>; Sat, 1 Mar 1997 15:56:04 -0600 (CST)
Received: (from tjchol01@localhost) by mecca.spd.louisville.edu (950413.SGI.8.6.12/8.6.12) id VAA26348; Sat, 1 Mar 1997 21:56:09 GMT
Date: Sat, 1 Mar 1997 21:56:09 GMT
Message-Id: <199703012156.VAA26348@mecca.spd.louisville.edu>
From: "Tomasz J. Cholewo" <tjchol01@mecca.spd.louisville.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
To: XEmacs-Beta Listserv <xemacs-beta@xemacs.org>
Subject: [patch] core dump in open-network-stream-internal

I have a simple recipe (unfortunately not for cookies).  It "works" on
IRIX and HP-UX for XEmacs versions since 19.14:

1.  M-: (open-network-stream-internal "x" "y" "136.165.99.250" 80)
2.  Wait for 6 to 25 seconds.                  ^^^^^^^^ any nonexistent address
3.  Press C-g.
4.  Repeat step 1.
5.  Wait for about 25 seconds.
6.  Optionally press C-g again.

Ready!
=====================================================
Fatal error: assertion failed, file signal.c, line 243, async_timer_suppress_count > 0
Fatal error (6).
...
  open-network-stream-internal("xx" "xx" "136.165.99.250" 80)
  eval((open-network-stream-internal "xx" "xx" "136.165.99.250" 80))
  # bind (expression)
  #<compiled-function (from "simple.elc") (expression) "...(16)" [eval expression values prin1 t] 3 1015478 (list (read-from-minibuffer "Eval: " nil read-expression-map t ...))>((open-network-stream-internal "xx" "xx" "136.165.99.250" 80))
  call-interactively(eval-expression)
  # (condition-case ... . error)
  # (catch top-level ...)
=====================================================
(gdb)
#0  0xfa610e8 in _kill () at kill.s:15
#1  0xfab4e48 in raise () at raise.c:22
#2  0xfa66638 in abort () at abort.c:38
#3  0x4cf5c0 in assert_failed (file=0x0, line=13749, expr=0x0) at emacs.c:2201
#4  0x660550 in start_async_timeouts () at signal.c:243
#5  0x662064 in start_interrupts () at signal.c:616
#6  0x6606c4 in speed_up_interrupts () at signal.c:288
#7  0x600838 in get_internet_address (host=807521916, address=0x7fff2060,
    errb={
      really_unlikely_name_to_have_accidentally_in_a_non_errb_structure = 666})
    at process.c:1308
#8  0x600da4 in Fopen_network_stream_internal (name=807521868,
    buffer=807521904, host=807521916, service=80) at process.c:1395
#9  0x4dbed8 in primitive_funcall (fn=0x4ca5b4 <fatal_error_signal>, nargs=6,
    args=0x35b5) at eval.c:3459
#10 0x4dc230 in funcall_subr (subr=0x0, args=0x7fff1f18) at eval.c:3481
#11 0x4da274 in Feval (form=540358804) at eval.c:3029
...
(gdb) p interrupts_slowed_down
$3 = 3
(gdb) p async_timer_suppress_count
$5 = 0
=====================================================

It is pretty easy to trigger this core dump inadvertently while using W3
and trying to interrupt a connection.  It was reported multiple times
since last May.

The source of the problem is using of Fsleep-for for waiting before a
retry after receiving EADDRINUSE from `connect'.  One gets this
misleading error number if `connect' gets interrupted and avoiding it
was the main reason behind the code for "slowing down" interrupts.
Unfortunately when a user presses C-g in `sleep-for' the retry loop is
left with interrupts_slowed_down > 0.  Replacing Fsleep-for (which BTW
is perfectly OK in FSFmacs) with `sleep' removes this possibility.

There is a general problem with EADDRINUSE handling as I can see it on
IRIX.  Once `connect' is interrupted it will keep returning this code
for the same socket.  The timeout for the 'one shot timer' set now at 5
seconds is definitely too short for many W3 connections.  Any `connect'
which doesn't succeed after this time will return EADDRINUSE for all 20
retries while user is just sitting there for 20 more seconds before he
sees an error message.  I increased the timeout arbitrarily to 15 secs
so more connections have a chance to be finalized but probably the
number of retries could also be cut down.

Probably we could improve the current wait-and-retry approach by closing
the socket, opening another and retrying after that.  Does anyone know
the proper way to handle this?

Tom

diff -urd xemacs-20.1-b3/src/process.c xemacs-20.1-b3-work/src/process.c
--- xemacs-20.1-b3/src/process.c	Mon Feb 24 20:35:00 1997
+++ xemacs-20.1-b3-work/src/process.c	Sat Mar  1 15:39:39 1997
@@ -1446,8 +1446,11 @@
 	{
 	  /* A delay here is needed on some FreeBSD systems,
 	     and it is harmless, since this retrying takes time anyway
-	     and should be infrequent.  */
-	  Fsleep_for (make_int (1));
+	     and should be infrequent.
+             `sleep-for' allowed for quitting this loop with interrupts
+             slowed down so it can't be used here.  Async timers should
+             already be disabled at this point so we can use `sleep'. */
+          sleep (1);
 	  retry++;
 	  goto loop;
 	}
diff -urd xemacs-20.1-b3/src/signal.c xemacs-20.1-b3-work/src/signal.c
--- xemacs-20.1-b3/src/signal.c	Wed Dec 18 17:44:07 1996
+++ xemacs-20.1-b3-work/src/signal.c	Sat Mar  1 15:41:02 1997
@@ -75,7 +75,7 @@
 
 static int interrupts_slowed_down;
 
-#define SLOWED_DOWN_INTERRUPTS_SECS 5
+#define SLOWED_DOWN_INTERRUPTS_SECS 15
 #define NORMAL_QUIT_CHECK_TIMEOUT_MSECS 250
 #define NORMAL_SIGCHLD_CHECK_TIMEOUT_MSECS 250
 

