From owner-tcp-impl  Wed Oct 30 22:04:02 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA04214 for tcp-impl-list; Wed, 30 Oct 1996 22:04:01 GMT
Return-Path: <owner-tcp-impl>
Received: from bigfun.engr.sgi.com (fddi-bigfun.engr.sgi.com [192.26.75.20]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA04206 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 30 Oct 1996 14:03:58 -0800
Received: by bigfun.engr.sgi.com (950413.SGI.8.6.12/940406.SGI.AUTO)
	for tcp-impl id OAA11116; Wed, 30 Oct 1996 14:03:58 -0800
Date: Wed, 30 Oct 1996 14:03:58 -0800
From: kessler@bigfun (Tom Kessler)
Message-Id: <199610302203.OAA11116@bigfun.engr.sgi.com>
To: tcp-impl@bigfun
Subject: a Test
Sender: owner-tcp-impl
Precedence: bulk


Hi guys this is a test

From owner-tcp-impl  Fri Nov 22 17:19:24 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA12827 for tcp-impl-list; Fri, 22 Nov 1996 17:18:55 GMT
Return-Path: <owner-tcp-impl>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA12815 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 22 Nov 1996 09:18:54 -0800
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA01102 for <tcp-impl@engr.sgi.com>; Fri, 22 Nov 1996 09:18:38 -0800
Message-Id: <199611221718.JAA01102@refugee.engr.sgi.com>
To: tcp-impl
Subject: BOF Description
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <4345.848683118.1@refugee.engr.sgi.com>
Date: Fri, 22 Nov 1996 09:18:38 -0800
From: Steve Alexander <sca@refugee>
Sender: owner-tcp-impl
Precedence: bulk

Description of BOF Session
--------------------------
 
   The objective of this meeting is to decide how to best address known
   problems in existing implementations of the current TCP standard(s).  The
   overall goal is to improve conditions in the existing Internet by enhancing
   the quality of current TCP/IP implementations.  It is hoped that both 
   performance and correctness issues can be resolved by making implementors
   aware of the problems and their solutions.  In the long term, it is felt
   that this will provide a reduction in unnecessary traffic on the network,
   the rate of connection failures due to protocol errors, and load on network
   servers due to time spent processing both unsuccessful connections and
   retransmitted data.

   The BOF is intended to give an overview of the current list of known
   problems and to consider approaches a Working Group could take to improve
   TCP implementations.
   
   There will also be discussion on the proposal to create a working group to
   document the issues.  The charter and other issues surrounding working group
   creation will be discussed in an effort to reach consensus on the best way
   to proceed.

   Attendees should come prepared to discuss both TCP technical issues and
   IETF process issues regarding working group formation, document creation,
   etc.

Mailing List
------------
   A mailing list has been set up to facilitate discussion:
		tcp-impl@engr.sgi.com 

   Subscribe by sending mail to:
		Majordomo@engr.sgi.com

   with a 1 line message that says
		subscribe tcp-impl
   in the body.

From owner-tcp-impl  Fri Nov 22 17:19:49 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA13062 for tcp-impl-list; Fri, 22 Nov 1996 17:19:34 GMT
Return-Path: <owner-tcp-impl>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA13055 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 22 Nov 1996 09:19:32 -0800
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA04637 for <tcp-impl@engr.sgi.com>; Fri, 22 Nov 1996 09:19:17 -0800
Message-Id: <199611221719.JAA04637@refugee.engr.sgi.com>
To: tcp-impl
Subject: BOF Agenda Draft
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <1091.848683156.1@refugee.engr.sgi.com>
Date: Fri, 22 Nov 1996 09:19:17 -0800
From: Steve Alexander <sca@refugee>
Sender: owner-tcp-impl
Precedence: bulk

TCP Implementation BOF Agenda
-----------------------------
 
o Motivation for the BOF
        - Experience has shown a number of current implementation problems with
	  existing TCP/IP stacks
		+ correctness
		+ performance
        - We would like to catalog and document these, so that implementors
	  are aware of them and address them
        - Long-term goal is to improve the overall quality of the Internet by
	  improving TCP implementations

o Presentation(s)
	- Observed Behavior of TCP in the Internet - Vern Paxson (25 min)
	- Short descriptions of known TCP implementation problems - various

o Possible Solutions
	- Discussion of possible ways to solve the problems, including:
		+ Documents (Recommendations, BCP, Other)
		+ Tests?
		+ Other Approaches?

o Working Group Discussion
        - Outline WG proposal
        - Proposed Charter
        - Discussion
 
o WG Administration
        - Chair(s)
        - Mailing list

From owner-tcp-impl  Fri Nov 22 17:21:05 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA13418 for tcp-impl-list; Fri, 22 Nov 1996 17:20:49 GMT
Return-Path: <owner-tcp-impl>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA13411 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 22 Nov 1996 09:20:48 -0800
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA01070 for <tcp-impl@engr.sgi.com>; Fri, 22 Nov 1996 09:20:32 -0800
Message-Id: <199611221720.JAA01070@refugee.engr.sgi.com>
To: tcp-impl
Subject: BOF Discussion Topics
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <4617.848683232.1@refugee.engr.sgi.com>
Date: Fri, 22 Nov 1996 09:20:32 -0800
From: Steve Alexander <sca@refugee>
Sender: owner-tcp-impl
Precedence: bulk

Welcome to the TCP Implementation BOF Mailing list.

We have scheduled a BOF session in San Jose to discuss known problems in
current TCP implementations.  In order to ensure that the discussion is as
detailed as possible, we are soliciting short descriptions of existing bugs or
performance problems that users and vendors have experienced when using TCP
in the Internet.  Please come prepared with your favorite(s) and if possible,
let me know in advance.

The BOF will be held on Wednesday, December 11th, at 3:30 PM.

Steve Alexander (sca@sgi.com)
Allyn Romanow	(allyn@eng.sun.com)
Jamshid Mahdavi (mahdavi@psc.edu)
Vern Paxson	(vern@ee.lbl.gov)
Allison Mankin	(mankin@isi.edu)

From owner-tcp-impl  Fri Nov 22 18:17:05 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA24407 for tcp-impl-list; Fri, 22 Nov 1996 18:16:41 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA24400 for <tcp-impl@engr.sgi.com>; Fri, 22 Nov 1996 10:16:40 -0800
Received: from mail-out2.apple.com (mail-out2.apple.com [17.254.0.51]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA19916 for <tcp-impl@engr.sgi.com>; Fri, 22 Nov 1996 10:16:36 -0800
Received: from federal-excess.apple.com (federal-excess.apple.com [17.255.0.16]) by mail-out2.apple.com (8.7.5/8.7.3) with ESMTP id KAA21480 for <tcp-impl@engr.sgi.com>; Fri, 22 Nov 1996 10:11:36 -0800
Received: from [17.202.32.184] (cooldo3.apple.com [17.202.32.184]) by federal-excess.apple.com (8.7.5/8.7.3) with SMTP id KAA09565 for <tcp-impl@engr.sgi.com>; Fri, 22 Nov 1996 10:13:44 -0800 (PST)
Date: Fri, 22 Nov 1996 10:13:44 -0800 (PST)
X-Sender: dfc@mail.apple.com
Message-Id: <aebb28fc0f02100475aa@[17.202.32.184]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
To: tcp-impl
From: dfc@apple.com (Don Coolidge)
Subject: Re: BOF Description
Sender: owner-tcp-impl
Precedence: bulk

>Description of BOF Session
>--------------------------
>
>   The objective of this meeting is to decide how to best address known
>   problems in existing implementations of the current TCP standard(s).  The
>   overall goal is to improve conditions in the existing Internet by enhancing
>   the quality of current TCP/IP implementations.  It is hoped that both
>   performance and correctness issues can be resolved by making implementors
>   aware of the problems and their solutions.  In the long term, it is felt
>   that this will provide a reduction in unnecessary traffic on the network,
>   the rate of connection failures due to protocol errors, and load on network
>   servers due to time spent processing both unsuccessful connections and
>   retransmitted data.
>
>   The BOF is intended to give an overview of the current list of known
>   problems and to consider approaches a Working Group could take to improve
>   TCP implementations.
>
>   There will also be discussion on the proposal to create a working group to
>   document the issues.  The charter and other issues surrounding working group
>   creation will be discussed in an effort to reach consensus on the best way
>   to proceed.
>
>   Attendees should come prepared to discuss both TCP technical issues and
>   IETF process issues regarding working group formation, document creation,
>   etc.

This is all well and good, but thus far a bit too nebulous for my liking. I
would ask of anybody who is planning on attending the BOF and who has a
specific TCP implementation problem to discuss, that they bring it up, in
advance, on this mailing list. In addition, I'd think it considerate if an
effort were made to contact TCP implementors not on the mailing list whose
stacks will be under discussion at the BOF and on the list - they deserve
that courtesy.

I, for one, am not fond of surprises being sprung at the IETF, or any other
"formal" meeting. It's not only reasonable, but also fair, that
implementors or maintainers of stacks with known "problems" be advised in
advance that those issues will be discussed in a public forum, allowing
them to do their homework and contribute substantively to the discussion.
(Not to mention fix problems that perhaps they don't know about!)

Please don't misunderstand my concern here - I'm not implying that
"suprises" are being planned. But it's clearly to everyone's advantage that
each TCP implementation work as cleanly and neighborly as possible, and I
welcome any effort that advances that goal. I would simply prefer that, as
with most IETF WGs (and the plan apparently is to turn this into a WG), the
majority of work be done on the mailing list rather than at a meeting.
That's a far more efficient method, it's less likely to ruffle feathers,
and it gives the people responsible for the code a chance to get up to
speed.

-- Don Coolidge



From owner-tcp-impl  Sat Nov 23 00:49:53 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA13942 for tcp-impl-list; Sat, 23 Nov 1996 00:49:33 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA13931 for <tcp-impl@relay.engr.SGI.COM>; Fri, 22 Nov 1996 16:49:31 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA20244 for <tcp-impl@relay.engr.SGI.COM>; Fri, 22 Nov 1996 16:49:27 -0800
Received: by daffy.ee.lbl.gov (8.7.5/1.43r)
	id QAA28732; Fri, 22 Nov 1996 16:39:15 -0800 (PST)
Message-Id: <199611230039.QAA28732@daffy.ee.lbl.gov>
To: dfc@apple.com (Don Coolidge)
Cc: tcp-impl
Subject: Re: BOF Description
In-reply-to: Your message of Fri, 22 Nov 1996 10:13:44 PST.
Date: Fri, 22 Nov 1996 16:39:15 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

> I would ask of anybody who is planning on attending the BOF and who has a
> specific TCP implementation problem to discuss, that they bring it up, in
> advance, on this mailing list.

I agree.  In putting together the BOF we discussed the issue of identifying
specific flaws.  We recognize that it is crucial that TCP implementors find
the working group (if one is formed) useful and not hostile.  Implementors
understandably don't want problems blown out of proportion, particularly
ones that have already been fixed.  On the other hand, people inevitably
want to know which TCP's exhibit which sorts of behavior, as this can save
immense amounts of work in trying to debug networking problems.

I think the key distinction is between implementor "bashing" and implementor
"naming".  It will be vital for those participating in the working group
to recognize that TCP implementation is very tricky - that's the whole reason
why the BOF is valuable - and that the goal is to assist the process of
improving the state of the art.  Related to this, it seems like one of the
likely products of the working group will be a catalog of known implementation
problems.  I would argue that this catalog should include implementation
details, as well as notes discussing fixes in later or pending releases.

The agenda has me kicking off the BOF with a "motivation" presentation.
I haven't put this together yet, but here's a sketch of some of the
implementation problems I intend to talk about.

		Vern


- The Comer/Lin study on probing TCP implementations.  As I recall, the main
  implementation problems they found regard zero-window probes in SunOS
  4.0.3 and Solaris 2.1.  I don't know the status of whether these have
  been fixed (I would certainly guess they have), if someone knows, please
  let me know.  I may omit this because it's now fairly dated, or maybe I'll
  talk about it later in the BOF as one approach for identifying flaws.

- The Brakmo/Peterson study on problems in 4.4 BSD.  They found medium-serious
  header prediction and fast retransmit bugs, and suboptimal RTO estimation.
  This is not specific to a particular vendor, though many implementations
  are derived from 4.4 BSD.

- A recent study by Dawson, Jahanian, and Mitton (of eecs.umich.edu), not yet
  published, on testing TCP implementations using a "fault injection tool".
  It's available from

	http://www.eecs.umich.edu/~sdawson

  They look at SunOS 4.1.3, AIX 3.2.3, NeXT Mach, Solaris 2.3, OS/2, and
  Windows 95 TCP implementations, finding a number of flaws.  I haven't
  read the paper yet.

- Findings from a large-scale TCP dynamics study I'm close to finishing.
  These have not yet been published.  The ones I plan to talk about are:

	- Windows 3.1 with the Trumpet/Winsock 2.0B TCP stack doesn't do
	  slow start, and discards any packets that arrive above a sequence
	  hole, forcing go-back-N retransmission.  It also sometimes
	  generates only one ack for a large number of packets.

	- Linux 1.1 retransmits the entire unacknowledged window on loss
	  rather than just the first unacknowledged packet.  Alan Cox tells
	  me this has been fixed.

	- Solaris 2.3 and 2.4 can time out prematurely if the RTT is fairly
	  large (since fixed).  Solaris 2.3 sometimes only acks if the segment
	  has PSH set or if the delayed ack timer expires.

	- A bug leading to slow start being skipped by a number of Reno-derived
	  implementations if the remote TCP doesn't send an MSS option in
	  its SYN ack.

Some minor problems I may talk about if there's time:

	- Some Reno implementations are inconsistent about whether they
	  include the length of TCP options in determining the maximum segment
	  size (Matt Mathis discussed this in the past, along with how to
	  fix it).

	- Solaris 2.3 does not recognize FIN's set on packets received out
	  of sequence.

	- Windows NT 4.0 with the Microsoft TCP stack does not do
	  fast retransmit.

	- The duplicate-ack counter in HP/UX 9.0 is not cleared on a timeout.

	- In HP/UX 9.0 and 10.0, after a timeout subsequent dup acks (rare)
	  advance the congestion window if the timeout was for a packet
	  previously sent using fast retransmission.

	- SunOS omits generating dup acks for packets received above a
	  sequence hole.

	- Solaris 2.3 and 2.4 have a bug in the fast recovery code so they
	  don't send packets when they could.

From owner-tcp-impl  Sat Nov 23 01:57:49 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA25868 for tcp-impl-list; Sat, 23 Nov 1996 01:57:27 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA25856 for <tcp-impl@relay.engr.SGI.COM>; Fri, 22 Nov 1996 17:57:25 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA03062 for <tcp-impl@relay.engr.SGI.COM>; Fri, 22 Nov 1996 17:57:14 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.7.1/8.7.1) with UUCP id BAA04594; Sat, 23 Nov 1996 01:37:59 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0vR74o-0005KgC; Sat, 23 Nov 96 01:40 GMT
Message-Id: <m0vR74o-0005KgC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: BOF Description
To: dfc@apple.com (Don Coolidge)
Date: Sat, 23 Nov 1996 01:40:13 +0000 (GMT)
Cc: tcp-impl
In-Reply-To: <aebb28fc0f02100475aa@[17.202.32.184]> from "Don Coolidge" at Nov 22, 96 10:13:44 am
Content-Type: text
Sender: owner-tcp-impl
Precedence: bulk

Ok just to start things off. Some minor items on correctness that have come
up recently -- mostly trivial

1.	BSD stacks have a 75second (ie very short) timeout, which means 
you have to hack the kernel to make it usable on amateur radio

2.	RFC 1337 and Draft-heavans seem to have been ignored by all of us

3.	Not everyone is implementing secure sequence numbers

4.	Most of us are outputting different responses to random ACK frames
	for a closed socket or listening socket, making stealth scanners
	too easy

5.	TTCP doesnt work well on slow links. If someone does a new TTCP
	spec can they lower the 4K initial assumed send size to something
	like min(2*dev->mtu, 4096).

6.	Question more than anything else: Has anyone finally proved Vegas
	is good or bad.

7.	How do we handle spoofed 0 window frames, especially when window
	scaling is making the chance of guessing an in window seq far
	higher.

Alan

From owner-tcp-impl  Sat Nov 23 03:50:52 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id DAA08943 for tcp-impl-list; Sat, 23 Nov 1996 03:50:31 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA08936 for <tcp-impl@relay.engr.SGI.COM>; Fri, 22 Nov 1996 19:50:30 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id TAA19618 for <tcp-impl@relay.engr.SGI.COM>; Fri, 22 Nov 1996 19:50:28 -0800
Received: by daffy.ee.lbl.gov (8.7.5/1.43r)
	id TAA29047; Fri, 22 Nov 1996 19:40:29 -0800 (PST)
Message-Id: <199611230340.TAA29047@daffy.ee.lbl.gov>
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
Cc: tcp-impl
Subject: Re: BOF Description
In-reply-to: Your message of Sat, 23 Nov 1996 01:40:13 PST.
Date: Fri, 22 Nov 1996 19:40:29 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

Another thing we discussed while putting together the BOF is what we thought
was the best scope for the working group's goals.  We all felt that a line
needs to be drawn between implementation issues and research issues.  While
there are many interesting and important research questions in how TCP might
be modified, a working group focussing on these has a different nature than
one focussing on aiding the correctness and performance of existing TCP
implementations following existing standards.

I think a good place to draw the line is something like: current TCP standards
and common practice, as well as TCP extensions that are essentially "done
deals".  This last might have as a rule of thumb something like "has a draft
RFC and two independent implementations".  By this rule, an option like
SACK is within scope, but an extension like Vegas is beyond the scope.

With that in mind:

> 1.	BSD stacks have a 75second (ie very short) timeout, which means 
> you have to hack the kernel to make it usable on amateur radio
> 
> 3.	Not everyone is implementing secure sequence numbers
> 
> 4.	Most of us are outputting different responses to random ACK frames
> 	for a closed socket or listening socket, making stealth scanners
> 	too easy
> 
> 5.	TTCP doesnt work well on slow links. If someone does a new TTCP
> 	spec can they lower the 4K initial assumed send size to something
> 	like min(2*dev->mtu, 4096).
> 
> 7.	How do we handle spoofed 0 window frames, especially when window
> 	scaling is making the chance of guessing an in window seq far
> 	higher.

These strike me as being in scope.

> 2.	RFC 1337 and Draft-heavans seem to have been ignored by all of us
> 
> 6.	Question more than anything else: Has anyone finally proved Vegas
> 	is good or bad.

These strike me as being outside the scope, with (2) being borderline (it
would be interesting to know if there are particular reasons why these have
not drawn much attention).

A last comment: the main theme of the San Jose BOF is administrative.
Probably discussion of addressing problems like these (and the ones I talk
about) needs to wait until after the BOF.  This would then be on the
mailing list and at the next IETF.

	IMHO,

		Vern

From owner-tcp-impl  Sat Nov 23 13:07:14 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA09275 for tcp-impl-list; Sat, 23 Nov 1996 13:06:54 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA09269 for <tcp-impl@relay.engr.SGI.COM>; Sat, 23 Nov 1996 05:06:52 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id FAA13685 for <tcp-impl@relay.engr.SGI.COM>; Sat, 23 Nov 1996 05:06:46 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.7.1/8.7.1) with UUCP id MAA23278; Sat, 23 Nov 1996 12:55:27 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0vR7P4-0005KgC; Sat, 23 Nov 96 02:01 GMT
Message-Id: <m0vR7P4-0005KgC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: BOF Description
To: vern@ee.lbl.gov (Vern Paxson)
Date: Sat, 23 Nov 1996 02:01:09 +0000 (GMT)
Cc: dfc@apple.com, tcp-impl
In-Reply-To: <199611230039.QAA28732@daffy.ee.lbl.gov> from "Vern Paxson" at Nov 22, 96 04:39:15 pm
Content-Type: text
Sender: owner-tcp-impl
Precedence: bulk

>   header prediction and fast retransmit bugs, and suboptimal RTO estimation.
>   This is not specific to a particular vendor, though many implementations
>   are derived from 4.4 BSD.

There are lots of suprises when you get superbly accurate RTT estimation.
Linux is doing 1/100th of a second quality, that means you start to get
funnies on links where the packet length is a significant factor in
transmit time

> 	- Linux 1.1 retransmits the entire unacknowledged window on loss
> 	  rather than just the first unacknowledged packet.  Alan Cox tells
> 	  me this has been fixed.

Yep, quite a while ago. 1.2 isnt a brilliant TCP stack - its quite noisy.
2.0 should be as polite as BSD. 2.1.x is a development stack and has
vegas and stuff in it. I'd like to get figures on that but I dont have
good ones yet.

> 	- Some Reno implementations are inconsistent about whether they
> 	  include the length of TCP options in determining the maximum segment
> 	  size (Matt Mathis discussed this in the past, along with how to
> 	  fix it).

Linux 1.2 gets the MSS too low on devices with variable length headers (AX.25,
one or two other weird protocol layers).


> 	- Windows NT 4.0 with the Microsoft TCP stack does not do
> 	  fast retransmit.

Linux 1.2 does not do fast retransmit. 2.0 does.

> 	- SunOS omits generating dup acks for packets received above a
> 	  sequence hole.

SunOS if you follow the Sun security recommendation for protection on SYN
attacks is badly vulnerable to SYN attacks. The existing Linux 2.0 patch
is too. People send packets with SYN[64K of 0 byte data]. a 512 long 
listen queue eats 512x64K = deep cack.

Not directly a TCP bug. But checking addresses carefully and how to handle
ICMP errors ought to get bashed about. Things like "should we treat
HOST unreachable off a SYN frame" as fatal. Is an administratively unreachable
a fatal tcp error etc.

Do we want to add 'that tcp_drain bug'. I'm not at liberty to reveal it
here but I think someone who is at liberty should do so, even if only at
the BOF to people they know they can trust.

Alan


From owner-tcp-impl  Sat Nov 23 22:36:57 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA11685 for tcp-impl-list; Sat, 23 Nov 1996 22:36:37 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA11679 for <tcp-impl@relay.engr.SGI.COM>; Sat, 23 Nov 1996 14:36:35 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA24498 for <tcp-impl@relay.engr.SGI.COM>; Sat, 23 Nov 1996 14:36:34 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id OAA29076; Sat, 23 Nov 1996 14:26:29 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id OAA08667; Sat, 23 Nov 1996 14:26:27 -0800
Received: from fstop. by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id OAA18117; Sat, 23 Nov 1996 14:22:14 -0800
Received: from fstop by fstop. (SMI-8.6/SMI-SVR4)
	id OAA15994; Sat, 23 Nov 1996 14:18:03 -0800
Message-Id: <199611232218.OAA15994@fstop.>
From: sparker@Eng.Sun.COM
To: Vern Paxson <vern@ee.lbl.gov>
cc: tcp-impl
Subject: Re: BOF Description 
Date: Sat, 23 Nov 1996 14:18:03 -0800
Sender: owner-tcp-impl
Precedence: bulk


- > I would ask of anybody who is planning on attending the BOF and who has a
- > specific TCP implementation problem to discuss, that they bring it up, in
- > advance, on this mailing list.
- 
- I agree.  In putting together the BOF we discussed the issue of identifying
- specific flaws.  We recognize that it is crucial that TCP implementors find
- the working group (if one is formed) useful and not hostile.  Implementors
- understandably don't want problems blown out of proportion, particularly
- ones that have already been fixed.  On the other hand, people inevitably
- want to know which TCP's exhibit which sorts of behavior, as this can save
- immense amounts of work in trying to debug networking problems.
- 
- I think the key distinction is between implementor "bashing" and implementor
- "naming".

I would like to suggest one way we might achieve this is by trying to
identify, develop, and make available tools and/or methods which
identify the problems.  By doing this, and not mentioning vendor names,
the group can produce the means by which anyone may check any system
for a given defect.  In this way, "bashing" is clearly avoided.  Those
who are curious if one of their systems is broken, can take the tools
and methodologies which come from this group and ascertain for
themselves the truth.

Cheers,

	~sparker

From owner-tcp-impl  Sun Nov 24 01:05:50 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA20219 for tcp-impl-list; Sun, 24 Nov 1996 01:05:30 GMT
Return-Path: <owner-tcp-impl>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA20211 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 23 Nov 1996 17:05:28 -0800
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA26280; Sat, 23 Nov 1996 17:05:07 -0800
Message-Id: <199611240105.RAA26280@refugee.engr.sgi.com>
X-Mailer: exmh version 1.6.9 8/22/96
To: dfc@apple.com (Don Coolidge)
Cc: tcp-impl
Subject: Re: BOF Description 
In-reply-to: Message from dfc@apple.com of 22 Nov 1996 10:13:44 PST
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Sat, 23 Nov 1996 17:05:06 -0800
From: Steve Alexander <sca@refugee>
Sender: owner-tcp-impl
Precedence: bulk

[
  I sent a reply to this before, which I haven't seen, so I'm assuming that
  it was eaten somewhere.  Since others have done a good job of covering the
  issues raised, I can get away with a shorter reply at this point. ;-> My
  apologies if you've already seen the other message.
]

dfc@apple.com (Don Coolidge) writes:
>I would simply prefer that, as with most IETF WGs (and the plan apparently is
>to turn this into a WG), the majority of work be done on the mailing list
>rather than at a meeting.
>That's a far more efficient method, it's less likely to ruffle feathers,
>and it gives the people responsible for the code a chance to get up to
>speed.

If the result of the BOF is that a working group is formed, (which is not
guaranteed), I can assure you that most business will be conducted on the
list because:

	- It's the IETF way

	- It is not possible to do much useful work in the meeting milieu

	- Even if it were possible, not everyone can attend every meeting

	- Even if everybody were able to attend each meeting, meetings do
	  not occur often enough to allow progress to be made in a timely
	  manner

-- Steve



From owner-tcp-impl  Sun Nov 24 07:33:51 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA08943 for tcp-impl-list; Sun, 24 Nov 1996 07:33:30 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA08937 for <tcp-impl@relay.engr.SGI.COM>; Sat, 23 Nov 1996 23:33:29 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id XAA26319 for <tcp-impl@relay.engr.SGI.COM>; Sat, 23 Nov 1996 23:33:28 -0800
Received: by daffy.ee.lbl.gov (8.7.5/1.43r)
	id XAA00636; Sat, 23 Nov 1996 23:23:32 -0800 (PST)
Message-Id: <199611240723.XAA00636@daffy.ee.lbl.gov>
To: sparker@Eng.Sun.COM
Cc: tcp-impl
Subject: Re: BOF Description 
In-reply-to: Your message of Sat, 23 Nov 1996 14:18:03 PST.
Date: Sat, 23 Nov 1996 23:23:31 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

> - I think the key distinction is between implementor "bashing" and implementor
> - "naming".
> 
> I would like to suggest one way we might achieve this is by trying to
> identify, develop, and make available tools and/or methods which
> identify the problems.

I think this would be an excellent product for the working group, and also
an excellent way to avoid bashing.  I'd like to see it be the mainstream
technique used for identifying problems.

I imagine though that there will be cases for which it doesn't work -
implementation problems that are hard to reproduce, or for which it will
take a while to make a diagnostic tool available.  An example of the latter
is that a number of the problems mentioned in my earlier message were found
using an analysis tool I'm working on.  This tool is not yet ready for
public release, though.  I wouldn't want that to hold up identifying in
which implementations the problems exist.

Another thorny area concerns security issues - Alan brought up a number of
these.  Depending on the particular security problem, it might not be
appropriate for the working group to encourage development of tools that
detect whether particular implementations suffer from it, if the mechanism
of detection also allows exploitation.  Likewise, it seems that a reasonable
policy for identifying implementations with security problems is to do so
only if there has already been public disclosure that the implementation
has the problem.

		Vern

From owner-tcp-impl  Sun Nov 24 07:49:26 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA09566 for tcp-impl-list; Sun, 24 Nov 1996 07:49:06 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA09560 for <tcp-impl@relay.engr.SGI.COM>; Sat, 23 Nov 1996 23:49:04 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id XAA27904 for <tcp-impl@relay.engr.SGI.COM>; Sat, 23 Nov 1996 23:49:03 -0800
Received: by daffy.ee.lbl.gov (8.7.5/1.43r)
	id XAA00707; Sat, 23 Nov 1996 23:38:56 -0800 (PST)
Message-Id: <199611240738.XAA00707@daffy.ee.lbl.gov>
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
Cc: tcp-impl
Subject: Re: BOF Description
In-reply-to: Your message of Sat, 23 Nov 1996 02:01:09 PST.
Date: Sat, 23 Nov 1996 23:38:56 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

> There are lots of suprises when you get superbly accurate RTT estimation.
> Linux is doing 1/100th of a second quality, that means you start to get
> funnies on links where the packet length is a significant factor in
> transmit time

I would add that RTT estimation when you have highly accurate clocks, or
when you're timing more than one packet per RTT, is a research area.  It's
not at all obvious that the existing algorithms can be directly applied.
So I'd argue that this particular issue is outside the likely scope of
the working group.

> Do we want to add 'that tcp_drain bug' ...

Hmmmmm, the general question of how to address security problems is
difficult.  Many of them lack fixes having "an RFC draft plus two
independent implementations" to make them within the scope I proposed.
I think the working group needs to avoid getting involved in trying to
develop fixes for security problems, and needs to stick with implementation
issues for fixes devised in other venues.  Without this, the group could
easily lose focus - witness the immense energy put into fixing SYN flooding
attacks on various mailing lists.

		Vern

From owner-tcp-impl  Sun Nov 24 13:06:37 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA26110 for tcp-impl-list; Sun, 24 Nov 1996 13:06:18 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA26104 for <tcp-impl@relay.engr.SGI.COM>; Sun, 24 Nov 1996 05:06:16 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id FAA04186 for <tcp-impl@relay.engr.SGI.COM>; Sun, 24 Nov 1996 05:06:12 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.7.1/8.7.1) with UUCP id MAA30083; Sun, 24 Nov 1996 12:59:40 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0vRd2M-0005FbC; Sun, 24 Nov 96 11:47 GMT
Message-Id: <m0vRd2M-0005FbC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: BOF Description
To: vern@ee.lbl.gov (Vern Paxson)
Date: Sun, 24 Nov 1996 11:47:50 +0000 (GMT)
Cc: sparker@Eng.Sun.COM, tcp-impl
In-Reply-To: <199611240723.XAA00636@daffy.ee.lbl.gov> from "Vern Paxson" at Nov 23, 96 11:23:31 pm
Content-Type: text
Sender: owner-tcp-impl
Precedence: bulk

> using an analysis tool I'm working on.  This tool is not yet ready for
> public release, though.  I wouldn't want that to hold up identifying in
> which implementations the problems exist.

It is going to be far better that we have tools, even partially useful ones
than getting them from other sources. I get most of my network test tools
from either phrack types or firewalls folk. The phrack types are learning
far faster than the firewalls folk right now and some of them (Uriel springs
to mind) are getting very good at finding the tiniest gaps.

> of detection also allows exploitation.  Likewise, it seems that a reasonable
> policy for identifying implementations with security problems is to do so
> only if there has already been public disclosure that the implementation
> has the problem.

The current policy appears to sit on a problem and pray like crazy it doesn't
get out. That IMHO has never been an acceptable policy and isnt one the
Linux community (or I suspect OpenBSD/NetBSD/FreeBSD community) can really
subscribe too. We ship source. Source code patches pretty much have to
explain a problem.

Alan


From owner-tcp-impl  Sun Nov 24 19:31:39 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA16114 for tcp-impl-list; Sun, 24 Nov 1996 19:31:15 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA16108 for <tcp-impl@relay.engr.SGI.COM>; Sun, 24 Nov 1996 11:31:12 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA16604 for <tcp-impl@relay.engr.SGI.COM>; Sun, 24 Nov 1996 11:31:11 -0800
Received: by daffy.ee.lbl.gov (8.7.5/1.43r)
	id LAA01391; Sun, 24 Nov 1996 11:21:12 -0800 (PST)
Message-Id: <199611241921.LAA01391@daffy.ee.lbl.gov>
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
Cc: tcp-impl
Subject: Re: BOF Descriptio
In-reply-to: Your message of Sun, 24 Nov 1996 11:47:50 PST.
Date: Sun, 24 Nov 1996 11:21:12 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

> It is going to be far better that we have tools, even partially useful ones
> than getting them from other sources.

If you mean the most preferable way of characterizing an implementation problem
is with a tool that detects it, I fully agree (I'm not sure exactly what you
meant by "than getting them from other sources").  I'm just saying that we
also need to consider that sometimes the tools will not be available, and
I think it's still useful to talk about implementation problems in the
absence of a diagnostic tool.  Hopefully this will be rare.  One idea is
to acknowledge a range of ways to characterize problems: a diagnostic tool
is best, a description of how to reliably reproduce the problem is okay, and
evidence such as a packet trace is still useful, though not as greatly as the
others.

> I get most of my network test tools
> from either phrack types or firewalls folk.

There are classes of implementation problems such as security ones for which
diagnostic tools are often already available.  There are other classes such
as congestion behavior for which devising such tools is a lot harder.

> The current policy appears to sit on a problem and pray like crazy it doesn't
> get out. That IMHO has never been an acceptable policy ...

It's crucial to keep in mind that the working group will be most successful
by including as large a group of implementors as possible.  You might well
argue that the policies of some of the implementors for reporting security
problems is misguided, but I think for purposes of the working group these
policies need to be recognized.  I'd be surprised if an implementor would
choose to change their policy in order to participate in the group, rather
than deciding the group isn't useful because joining it would require
changing their policy.

Security issues raise enough hard problems that I wonder whether security
shouldn't have its own effort, either as a subgroup within the implementor
WG, or as its own WG.

		Vern

From owner-tcp-impl  Mon Nov 25 07:33:30 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA05023 for tcp-impl-list; Mon, 25 Nov 1996 07:33:06 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA05008 for <tcp-impl@relay.engr.SGI.COM>; Sun, 24 Nov 1996 23:33:05 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id XAA18762 for <tcp-impl@relay.engr.SGI.COM>; Sun, 24 Nov 1996 23:33:02 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm002-18.dialip.mich.net [141.211.7.154]) by merit.edu (8.7.6/merit-2.0) with SMTP id CAA01479 for <tcp-impl@relay.engr.SGI.COM>; Mon, 25 Nov 1996 02:29:15 -0500 (EST)
Date: Sun, 24 Nov 96 15:43:56 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5506.wsimpson@greendragon.com>
To: tcp-impl
Subject: Re: BOF Description
Sender: owner-tcp-impl
Precedence: bulk

> From: alan@lxorguk.ukuu.org.uk (Alan Cox)
> 1.	BSD stacks have a 75second (ie very short) timeout, which means
> you have to hack the kernel to make it usable on amateur radio
>
Actually, the problem isn't the 75 seconds, which is way too _long_
in most cases.  The problem is both send and receive use the same wait
time (receive could be much shorter) and that they are non-configurable.

Indeed, configuration should be on our list.  IRTT comes to mind.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl  Mon Nov 25 07:33:33 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA05029 for tcp-impl-list; Mon, 25 Nov 1996 07:33:07 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA05015 for <tcp-impl@relay.engr.SGI.COM>; Sun, 24 Nov 1996 23:33:06 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id XAA18760 for <tcp-impl@relay.engr.SGI.COM>; Sun, 24 Nov 1996 23:33:00 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm002-18.dialip.mich.net [141.211.7.154]) by merit.edu (8.7.6/merit-2.0) with SMTP id CAA01476 for <tcp-impl@relay.engr.SGI.COM>; Mon, 25 Nov 1996 02:29:13 -0500 (EST)
Date: Sun, 24 Nov 96 15:25:09 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5505.wsimpson@greendragon.com>
To: tcp-impl
Subject: Re: BOF Description
Sender: owner-tcp-impl
Precedence: bulk

Looks like the list is having some problems, as I am seeing replies to
messages that I didn't get.  That means others are getting them....

Anyway, I rather disagree about naming specific implementation problems.
I'm not sure that "bashing" is the likely result.  A better result is to
use the openness to get folks to upgrade their implementations, and
customers to get the upgrades.

Witness MacTCP.  A horrendous implementation.  Folks need to be clear
that it is in bad shape, and upgrade to MacOT.  Having a test tool for
demo would be helpful.  Might find out if there are any MacOT problems.

In any case, having tools to test with would be wonderful.  I'm not sure
how easy it is to test for some TCP features, though.

Some topics not mentioned so far that I'd like to see written down as
implementation notes in a comprehensive RFC (and in the tests):

 - slow start
 - RTT calculation
 - RTT ceilings and floors
 - Nagle algorithm
 - silly windows
 - delayed Ack
 - proper MSS calculation
 - proper MSS advertisement
 - MTU discovery

And of course, we probably ought to document somewhere the "best"
solutions to the Syn attack....

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl  Mon Nov 25 11:29:47 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA22398 for tcp-impl-list; Mon, 25 Nov 1996 11:29:27 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id DAA22392 for <tcp-impl@relay.engr.sgi.com>; Mon, 25 Nov 1996 03:29:25 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id DAA20717 for <tcp-impl@relay.engr.sgi.com>; Mon, 25 Nov 1996 03:29:24 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id DAA09464; Mon, 25 Nov 1996 03:19:19 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id DAA05632; Mon, 25 Nov 1996 03:19:17 -0800
Received: by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id DAA08030; Mon, 25 Nov 1996 03:19:14 -0800
Date: Mon, 25 Nov 1996 03:19:14 -0800
From: jerry.chu@Eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199611251119.DAA08030@jurassic.eng.sun.com>
To: vern@ee.lbl.gov
Subject: Re: BOF Description
Cc: tcp-impl
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl
Precedence: bulk

>The objective of this meeting is to decide how to best address known
>problems in existing implementations of the current TCP standard(s).

Sometimes it's the lack of details in the specification that cause
implementation defects. Of course the line between specification and
implementation is not all that clear. Sometimes I wish the original TCP
state diagram had been augmented with all possible input to each state,
including all combinations of header bits, good/bad sequence number,
good/bad ack number... so that us TCP implementors could just check
the diagram to do the right thing w/o worrying about recurring bugs.
The extreme could even be simply feeding a complete spec to a protocol
complier to get the "implementation", so that all of us can focus on new,
less well-defined/understood areas of the protocol.

>- The Comer/Lin study on probing TCP implementations.  As I recall, the main
>  implementation problems they found regard zero-window probes in SunOS
>  4.0.3 and Solaris 2.1.  I don't know the status of whether these have
>  been fixed (I would certainly guess they have), if someone knows, please

Those versions are antique. So any statements regarding bugs on them are
likely to be outdated. (We normally ask our customers running older
versions to move up to Solaris 2.4 or later, or SunOS 4.1.3 or later if
they want to stick to SunOS.)

Yes, the zero-win probe bug was fixed a while back in Solaris. The bug
had to do with the code failing to recognize zero-win probe state, and
use snd_nxt instead of snd_una as th_seq. I'll verify SunOS later.

>Solaris 2.3 sometimes only acks if the segment
>has PSH set or if the delayed ack timer expires.

The Mentat-derived Solaris TCP implementation adopted a delayed ack
approach where in the streamlined case acks are delayed until either
half of the window is filled, or when the delayed ack timer fires. The
timer has a default value of 50ms.

The delayed ack scheme helps to cut down CPU time in a LAN environment,
but is controversial in a WAN (e.g. the Internet) environment where
frequent feedback from the other side is considered necessary to keep Van's
algorithm functioning. We're still debating what the best approach is...

>- Solaris 2.3 does not recognize FIN's set on packets received out
>  of sequence.

True. Our reassembly code currently doesn't record out-of-order FIN, and
rely on the remote to retransmit the FIN. This may be fixed soon.

>- SunOS omits generating dup acks for packets received above a
>  sequence hole.

Are you referring to the latest version (414)?

>- Solaris 2.3 and 2.4 have a bug in the fast recovery code so they
>  don't send packets when they could.

I wouldn't call this a bug, (is rstevens' "TCP ... fast recovery" a BCP
already?) and I don't see how it could happen right away. But there
are known deficiencies in this area that will be addressed soon in
Solaris, especially concerning a large window size where a burst of
packets got dropped:

1. Vanilla fast retransmit code can only handle the first dropped packet.
It is possible to enhance the code to handle more than one drop w/o
resorting to SACK. (Sally Floyd described one in her paper.)

2. The timer gets backed off multiple times due to multiple drops from
one window. This *bug* probably exists in many implementations.

Another problematic area for us has been RTO estimation. With the growth
and load of the Internet causing large variation of RTTs, and the ever
increasing combinations of bandwidth-delay, the orignal algorithm
recommened by Van may need an overhaul.

I also have some issues regarding window management and ISS calculation
I'll post later.

Jerry Chu
Internet Engineering
SunSoft

From owner-tcp-impl  Mon Nov 25 16:21:18 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA14209 for tcp-impl-list; Mon, 25 Nov 1996 16:19:47 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA14199 for <tcp-impl@relay.engr.SGI.COM>; Mon, 25 Nov 1996 08:19:44 -0800
Received: from socks2.raleigh.ibm.com (socks2.raleigh.ibm.com [204.146.167.123]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id IAA10150 for <tcp-impl@relay.engr.SGI.COM>; Mon, 25 Nov 1996 08:19:43 -0800
Received: from rtpdce02.raleigh.ibm.com by socks2.raleigh.ibm.com (AIX 4.1/UCB 5.64/RTP-FW1.0)
          id AA40032; Mon, 25 Nov 1996 11:14:46 -0500
Received: from ludwigia.raleigh.ibm.com (ludwigia.raleigh.ibm.com [9.37.83.125]) by rtpdce02.raleigh.ibm.com (8.7.3/8.7.3/RTP-ral-1.0) with SMTP id LAA50954; Mon, 25 Nov 1996 11:14:44 -0500
Received: from localhost.raleigh.ibm.com by ludwigia.raleigh.ibm.com (AIX 4.1/UCB 5.64/4.03-RAL)
          id AA21120; Mon, 25 Nov 1996 11:16:28 -0500
Message-Id: <9611251616.AA21120@ludwigia.raleigh.ibm.com>
To: Steve Alexander <sca@refugee>
Cc: tcp-impl
Subject: Re: BOF Discussion Topics 
In-Reply-To: Your message of "Fri, 22 Nov 1996 09:20:32 PST."
             <199611221720.JAA01070@refugee.engr.sgi.com> 
Date: Mon, 25 Nov 1996 11:16:28 -0400
From: Thomas Narten <narten@raleigh.ibm.com>
Sender: owner-tcp-impl
Precedence: bulk

>We have scheduled a BOF session in San Jose to discuss known problems in
>current TCP implementations.  In order to ensure that the discussion is as
>detailed as possible, we are soliciting short descriptions of existing bugs or
>performance problems that users and vendors have experienced when using TCP
>in the Internet.

I don't have a specific problem to point to. However, I dial in over
28.8 modems a fair amount and have often gotten the feeling that TCP
is not behaving all that well over the low bandwidth link, and I
wonder if it is due to implementation problems. For example, I'll be
ftp'ing (showing hash marks) and things will freeze for much longer
than I'd expect. One thing I'm pretty sure I've seen is the usage of
windows that are much larger than useful, which make recovery time
after a loss really long.

If there are any tools/benchmarks/tricks that specifically address
this area, I'd love to hear about them.

Thomas

From owner-tcp-impl  Mon Nov 25 22:27:56 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA12495 for tcp-impl-list; Mon, 25 Nov 1996 22:27:35 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA12488 for <tcp-impl@relay.engr.sgi.com>; Mon, 25 Nov 1996 14:27:33 -0800
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA17158 for <tcp-impl@relay.engr.sgi.com>; Mon, 25 Nov 1996 14:27:29 -0800
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.7.3/8.7.3) with ESMTP id RAA28556; Mon, 25 Nov 1996 17:26:08 -0500 (EST)
Message-Id: <199611252226.RAA28556@brookfield.ans.net>
To: "William Allen Simpson" <wsimpson@greendragon.com>
cc: tcp-impl
Reply-To: curtis@ans.net
Subject: Re: BOF Description 
In-reply-to: Your message of "Sun, 24 Nov 1996 15:25:09 GMT."
             <5505.wsimpson@greendragon.com> 
Date: Mon, 25 Nov 1996 17:26:07 -0500
From: Curtis Villamizar <curtis@ans.net>
Sender: owner-tcp-impl
Precedence: bulk


In message <5505.wsimpson@greendragon.com>, "William Allen Simpson" writes:
> Looks like the list is having some problems, as I am seeing replies to
> messages that I didn't get.  That means others are getting them....
> 
> Anyway, I rather disagree about naming specific implementation problems.
> I'm not sure that "bashing" is the likely result.  A better result is to
> use the openness to get folks to upgrade their implementations, and
> customers to get the upgrades.
> 
> Witness MacTCP.  A horrendous implementation.  Folks need to be clear
> that it is in bad shape, and upgrade to MacOT.  Having a test tool for
> demo would be helpful.  Might find out if there are any MacOT problems.
> 
> In any case, having tools to test with would be wonderful.  I'm not sure
> how easy it is to test for some TCP features, though.
> 
> Some topics not mentioned so far that I'd like to see written down as
> implementation notes in a comprehensive RFC (and in the tests):
> 
>  - slow start
>  - RTT calculation
>  - RTT ceilings and floors
>  - Nagle algorithm
>  - silly windows
>  - delayed Ack
>  - proper MSS calculation
>  - proper MSS advertisement
>  - MTU discovery

   - fast retransmit
   - fast recovery
	(apparently some implementations do do it at all)
   - rfc1323 features - window scale, PAWS, ...
   - fast recovery fix - multiple drop in one window
   - SACK?

I'm not sure if not having rfc1323 and SACK is a bug, just a lack of
high performance features, though SACK can be enormously useful even
at lower speeds.

> And of course, we probably ought to document somewhere the "best"
> solutions to the Syn attack....

That's been discussed too much already on quite a few lists.  Only
reference the "best" if there is clear and immediate consensus on an
existing summary.

> WSimpson@UMich.edu
>     Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
> BSimpson@MorningStar.com
>     Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

Curtis

From owner-tcp-impl  Mon Nov 25 22:41:10 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA15141 for tcp-impl-list; Mon, 25 Nov 1996 22:40:44 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA15119 for <tcp-impl@relay.engr.sgi.com>; Mon, 25 Nov 1996 14:40:42 -0800
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA20116 for <tcp-impl@relay.engr.sgi.com>; Mon, 25 Nov 1996 14:40:38 -0800
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.7.3/8.7.3) with ESMTP id RAA28602; Mon, 25 Nov 1996 17:38:34 -0500 (EST)
Message-Id: <199611252238.RAA28602@brookfield.ans.net>
To: Vern Paxson <vern@ee.lbl.gov>
cc: alan@lxorguk.ukuu.org.uk (Alan Cox), tcp-impl
Reply-To: curtis@ans.net
Subject: Re: BOF Descriptio 
In-reply-to: Your message of "Sun, 24 Nov 1996 11:21:12 PST."
             <199611241921.LAA01391@daffy.ee.lbl.gov> 
Date: Mon, 25 Nov 1996 17:38:24 -0500
From: Curtis Villamizar <curtis@ans.net>
Sender: owner-tcp-impl
Precedence: bulk


In message <199611241921.LAA01391@daffy.ee.lbl.gov>, Vern Paxson writes:
> 
> > I get most of my network test tools
> > from either phrack types or firewalls folk.
> 
> There are classes of implementation problems such as security ones for which
> diagnostic tools are often already available.  There are other classes such
> as congestion behavior for which devising such tools is a lot harder.

There are also classes of performance problems that don't show
upunless you have delays in some range of values (10s or 100s of msec
of geographic distance delay) and certain levels of congestion and
accompanying loss, mix of implementations, type of intermediate
network equipment (ATM switches without PPD come to mind as having
unique queue overflow characteristics which SACK improves but probably
does no solve, though most problems are not as severe).

In these sorts of cases TCP works, sort of.  It may run really slowly.
It may be a bad thing for the end user as a result.  It may retransmit
excessively.  It may be a bad thing for the network as a result.

These problems are not impossible to make test code for, but you might
have to write TCP packets out on a raw socket to get the timing right.

Curtis

From owner-tcp-impl  Mon Nov 25 22:52:41 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA18225 for tcp-impl-list; Mon, 25 Nov 1996 22:52:11 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA18212 for <tcp-impl@relay.engr.sgi.com>; Mon, 25 Nov 1996 14:52:09 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA22969 for <tcp-impl@relay.engr.sgi.com>; Mon, 25 Nov 1996 14:51:36 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.7.1/8.7.1) with UUCP id WAA08629; Mon, 25 Nov 1996 22:38:14 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0vS6gf-0005FbC; Mon, 25 Nov 96 19:27 GMT
Message-Id: <m0vS6gf-0005FbC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: BOF Description
To: wsimpson@greendragon.com (William Allen Simpson)
Date: Mon, 25 Nov 1996 19:27:25 +0000 (GMT)
Cc: tcp-impl
In-Reply-To: <5506.wsimpson@greendragon.com> from "William Allen Simpson" at Nov 24, 96 03:43:56 pm
Content-Type: text
Sender: owner-tcp-impl
Precedence: bulk

> Actually, the problem isn't the 75 seconds, which is way too _long_
> in most cases.  The problem is both send and receive use the same wait
> time (receive could be much shorter) and that they are non-configurable.
> 
> Indeed, configuration should be on our list.  IRTT comes to mind.

irtt is a Linux per route configurable item. Unlike Solaris 2 we don't learn
irtt's. Im still not sure thats a good or bad idea, but one I do need to play
with.

Alan


From owner-tcp-impl  Mon Nov 25 23:09:01 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA21130 for tcp-impl-list; Mon, 25 Nov 1996 23:08:39 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA21124 for <tcp-impl@relay.engr.SGI.COM>; Mon, 25 Nov 1996 15:08:37 -0800
Received: from kalae.kohala.com (kalae.kohala.com [206.62.226.35]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA27119 for <tcp-impl@relay.engr.SGI.COM>; Mon, 25 Nov 1996 15:08:15 -0800
Received: from kohala.kohala.com (kohala.kohala.com [206.62.226.33]) by kalae.kohala.com (8.8.3/8.7.3) with ESMTP id QAA23799 for <tcp-impl@relay.engr.SGI.COM>; Mon, 25 Nov 1996 16:07:32 -0700 (MST)
Received: (from rstevens@localhost) by kohala.kohala.com (8.8.3/8.8.3) id QAA02145 for tcp-impl@relay.engr.SGI.COM; Mon, 25 Nov 1996 16:07:31 -0700 (MST)
Message-Id: <199611252307.QAA02145@kohala.kohala.com>
From: rstevens@kohala.com (W. Richard Stevens)
Date: Mon, 25 Nov 1996 16:07:31 -0700
Reply-To: "W. Richard Stevens" <rstevens@kohala.com>
X-Phone: +1 520 297 9416
X-Homepage: http://www.noao.edu/~rstevens
X-Mailer: Mail User's Shell (7.2.5 10/14/92)
To: tcp-impl
Subject: Re: BOF Description
Sender: owner-tcp-impl
Precedence: bulk

[In your message of Nov 25,  5:26pm you write:]
> 
> > And of course, we probably ought to document somewhere the "best"
> > solutions to the Syn attack....
> 
> That's been discussed too much already on quite a few lists.  Only
> reference the "best" if there is clear and immediate consensus on an
> existing summary.

I think a summary of existing-and-implemented solutions would be handy.
I was at a major vendor last week talking to the people who'd put out
their patch for the problem, and they were unaware of Dave Borman's
fix that was implemented.  And there might be one or two other solutions
that I am not aware of, as I cannot follow all the lists either.

	Rich Stevens

From owner-tcp-impl  Mon Nov 25 23:09:42 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA21351 for tcp-impl-list; Mon, 25 Nov 1996 23:09:25 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA21337 for <tcp-impl@engr.sgi.com>; Mon, 25 Nov 1996 15:09:23 -0800
Received: from darkstar.isi.edu (darkstar.isi.edu [128.9.128.127]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id PAA27312 for <tcp-impl@engr.sgi.com>; Mon, 25 Nov 1996 15:09:21 -0800
Received: from dash.isi.edu by darkstar.isi.edu (5.65c/5.61+local-23)
	id <AA01231>; Mon, 25 Nov 1996 15:05:35 -0800
Received: from dash.isi.edu (johnh@localhost.isi.edu [127.0.0.1]) by dash.isi.edu (8.7.5/8.7.3) with ESMTP id PAA07553; Mon, 25 Nov 1996 15:05:30 -0800
Message-Id: <199611252305.PAA07553@dash.isi.edu>
X-Url: <http://www.isi.edu/~johnh/>
To: tcp-impl
Cc: johnh@ISI.EDU, touch@ISI.EDU, katia@ISI.EDU, faber@ISI.EDU
Subject: slow-start related performance bugs in TCP
Date: Mon, 25 Nov 1996 15:05:28 -0800
From: John Heidemann <johnh@ISI.EDU>
Sender: owner-tcp-impl
Precedence: bulk


Dan Coolidge wrote:
> This is all well and good, but thus far a bit too nebulous for my liking. I
> would ask of anybody who is planning on attending the BOF and who has a
> specific TCP implementation problem to discuss, that they bring it up, in
> advance, on this mailing list.

We would like to discuss a couple of performance problems in many TCP
implementations:

Slow-start/Delayed-ACK interactions
	We have found two interactions between the slow-start and
	delayed ACK algorithms that force TCP to stall until
	a delayed ACK triggers (which takes an average of 100ms
	in BSD-based TCP stacks).  For short connections
	(common in HTTP traffic) the effects of this stall
	can be significant.

	The first interaction occurs when the congestion window (cwnd)
	is at 1x MSS (as at the start of a connection, or after an idle
	period as described below).  The sender transmits the single
	packet allowed by cwnd, but the receiver delays ACKing this
	packet because immediate ACKs are only required after two
	full packets.

	The second interaction occurs when the cwnd is at 2x MSS.
	If the sender transmits a less-than full-size segment
	the receiver will refuse to ACK immediately
	because delayed ACKs require receipt of two *full-MSS*
	segments for an immediate ACK.

Slow-start restart inconsistencies
	Current TCPs restart slow-start in two ways that are not
	always optimal:  when the connection has been idle for a long
	time and when the back-end of the window jumps by a large
	amount.  TCP initiates slow-start to avoid sending a large
	number of back-to-back packets since the ACK clock has become
	disconnected.

	There are two problems here:  first, many implementations fail to
	slow-start after connection idleness (thus sending large
	packet trains).  4.4BSD (and derivatives) and Linux close the
	window (to 1x MSS), many other TCP stacks don't (including at least
	SunOS).  Is there consensus on what should be done here?

	Second, we believe that slow-starting is overly conservative.
	We are currently experimenting with rate-based pacing
	to re-start the	ACK clock.  Rather than send back-to-back
	packets or slow-start, we use the old connection statistics
	to pace outgoing packets at a rate conservatively below
	the prior rate, but faster than slow-start.  This approach
	provides better throughput while avoiding 
	overly aggressive transmission.

We have encountered each of these problems in the context of P-HTTP
(see ``Performance Interactions Between P-HTTP and TCP
Implementations'' at
<URL:http://www.isi.edu/~johnh/PAPERS/Heidemann96b.html>).  We are
investigating these problems as a part of TCP del Rey, an effort at
ISI to tune TCP for 5-20KB transactions common for web and distributed
object systems.

In addition to the work described here we also plan on examining TCP
control-block sharing issues raised by T/TCP and concurrent TCP
connections between pairs of hosts (see
draft-touch-tcp-interdep-00.txt for details of this work).

Are these problems implementation or research issues?  
Let's look at the parts:

    - rate-based pacing:  research issue

    - TCP control block sharing:  research issue

    - cwnd on connection idle:  I think that clarification of what
	should be done here is an important implementation issue.
	Jacobson introduced the problem in his 1990 revision of
	``Congestion Avoidance and Control'' but it hasn't been
	nailed down by an RFC one way or the other.

    - slow-start/delayed-ACK interactions:  These seem like small
	performance bugs that may not have been anticipated in the
	specification.  I'd like to see them resolved.

Comments?

   -John Heidemann, Joe Touch, Katia Obraczka, Ted Faber


From owner-tcp-impl  Mon Nov 25 23:19:35 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA23171 for tcp-impl-list; Mon, 25 Nov 1996 23:19:13 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA23164 for <tcp-impl@relay.engr.SGI.COM>; Mon, 25 Nov 1996 15:19:11 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA29578 for <tcp-impl@relay.engr.SGI.COM>; Mon, 25 Nov 1996 15:19:07 -0800
Received: by daffy.ee.lbl.gov (8.7.5/1.43r)
	id PAA04109; Mon, 25 Nov 1996 15:08:33 -0800 (PST)
Message-Id: <199611252308.PAA04109@daffy.ee.lbl.gov>
To: "William Allen Simpson" <wsimpson@greendragon.com>
Cc: tcp-impl
Subject: Re: BOF Description 
Date: Mon, 25 Nov 1996 15:08:32 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

> Indeed, configuration should be on our list.  IRTT comes to mind.

Configuration is an excellent suggestion.  (Stevens Vol 3 has an eye-opening
chapter on the range of behavior of packets seen arriving at a busy HTTP
server.  My guess is that many of the more unusual ones are due to naive
users over-configuring their TCP's.  Since Rich can't make the BOF, I'll
give a brief overview of his findings.)

The list of classes of implementation issues I have now reads:

	interoperability
	congestion behavior
	performance
	security
	configuration

> Anyway, I rather disagree about naming specific implementation problems.
> I'm not sure that "bashing" is the likely result.  A better result is to
> use the openness to get folks to upgrade their implementations, and
> customers to get the upgrades.

I agree that we should name implementations (except this issue gets sticky
with security problems).  I just want to include the caution that by doing
so we need to be sure it doesn't turn into bashing.

> Some topics not mentioned so far that I'd like to see written down as
> implementation notes in a comprehensive RFC (and in the tests):

I think these all fit well with the general classes above.

> And of course, we probably ought to document somewhere the "best"
> solutions to the Syn attack....

I agree with Curtis that since solutions are still being worked out,
this is something that the WG should wait on documenting.

		Vern

From owner-tcp-impl  Mon Nov 25 23:38:29 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA26337 for tcp-impl-list; Mon, 25 Nov 1996 23:38:03 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA26322 for <tcp-impl@relay.engr.sgi.com>; Mon, 25 Nov 1996 15:38:00 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA03840 for <tcp-impl@relay.engr.sgi.com>; Mon, 25 Nov 1996 15:37:59 -0800
Received: by daffy.ee.lbl.gov (8.7.5/1.43r)
	id PAA04179; Mon, 25 Nov 1996 15:27:52 -0800 (PST)
Message-Id: <199611252327.PAA04179@daffy.ee.lbl.gov>
To: jerry.chu@Eng.Sun.COM (Hsiao-keng Jerry Chu)
Cc: tcp-impl
Subject: Re: BOF Description
In-reply-to: Your message of Mon, 25 Nov 1996 03:19:14 PST.
Date: Mon, 25 Nov 1996 15:27:52 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

> The Mentat-derived Solaris TCP implementation adopted a delayed ack
> approach where in the streamlined case acks are delayed until either
> half of the window is filled, or when the delayed ack timer fires. The
> timer has a default value of 50ms.
> 
> The delayed ack scheme helps to cut down CPU time in a LAN environment,
> but is controversial in a WAN (e.g. the Internet) environment where
> frequent feedback from the other side is considered necessary to keep Van's
> algorithm functioning. We're still debating what the best approach is...

I have lots of traces that show that in WANs this leads to quite bursty
behavior.  If the loss rate for the acks is non-negligible, it can also lead
to unnecessary retransmissions.

> >- SunOS omits generating dup acks for packets received above a
> >  sequence hole.
> 
> Are you referring to the latest version (414)?

Yes.  I have a trace in which 35 packets are received above a sequence
hole (so many because of using a big receiver window), but the SunOS 4.1.4
TCP only generates one duplicate ack, apparently on the dup ack timer.

> >- Solaris 2.3 and 2.4 have a bug in the fast recovery code so they
> >  don't send packets when they could.
> 
> I wouldn't call this a bug, (is rstevens' "TCP ... fast recovery" a BCP
> already?) and I don't see how it could happen right away ...

I didn't give much detail, sorry about that.  This is a genuine bug.
The problem comes about because Solaris TCP is careful to advance
the congestion window in terms of the amount of data ack'd.  That's more
faithful to Van's paper than the usual technique of just assuming each
ack is for an MSS' worth of data.  But for fast recovery the acks coming
in are dups and don't ack any new data, so the inflated congestion window
doesn't advance.  (I imagine it's a one line fix to treat acks in this
case as ack'ing a full MSS.)

> Another problematic area for us has been RTO estimation. With the growth
> and load of the Internet causing large variation of RTTs, and the ever
> increasing combinations of bandwidth-delay, the orignal algorithm
> recommened by Van may need an overhaul.

This is something I'm looking into as part of an Internet packet dynamics
study I'm working on.  (So it's research and not well established how to
revise the RTO algorithm.)

		Vern

From owner-tcp-impl  Tue Nov 26 04:39:37 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id EAA18290 for tcp-impl-list; Tue, 26 Nov 1996 04:39:16 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id UAA18281 for <tcp-impl@relay.engr.SGI.COM>; Mon, 25 Nov 1996 20:39:14 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id UAA01552 for <tcp-impl@relay.engr.SGI.COM>; Mon, 25 Nov 1996 20:39:13 -0800
Received: from [128.9.32.190] (ppp8.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA20745>; Mon, 25 Nov 1996 20:34:55 -0800
X-Sender: touch@zephyr.isi.edu
Message-Id: <v02130513aec01ddf7999@[128.9.32.190]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Mon, 25 Nov 1996 20:35:13 -0800
To: curtis@ans.net, Vern Paxson <vern@ee.lbl.gov>
From: touch@isi.edu (Joe Touch)
Subject: Re: BOF Descriptio
Cc: alan@lxorguk.ukuu.org.uk (Alan Cox), tcp-impl
Sender: owner-tcp-impl
Precedence: bulk

At 5:38 PM 11/25/96, Curtis Villamizar wrote:
>
>In these sorts of cases TCP works, sort of.  It may run really slowly.
>It may be a bad thing for the end user as a result.  It may retransmit
>excessively.  It may be a bad thing for the network as a result.

Which brings us to another, related issue, regarding the charter of the
discussion.

Are we purley discussing current implementations, or is there
room to consider the limitations thereof, notably in cases
the following optimizations do not affect:
   fast, few-packet exchanges (T/TCP)
   large, bulk transfers (TCP)
   character exchanges (Nagle optimizations)
   long, fat pipes (Large windows)

Joe



From owner-tcp-impl  Tue Nov 26 18:22:31 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA09086 for tcp-impl-list; Tue, 26 Nov 1996 18:22:07 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA08870 for <tcp-impl@relay.engr.sgi.com>; Tue, 26 Nov 1996 10:21:21 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA22367 for <tcp-impl@relay.engr.sgi.com>; Tue, 26 Nov 1996 10:21:12 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.7.1/8.7.1) with UUCP id SAA10909; Tue, 26 Nov 1996 18:11:25 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0vSAt4-0005FbC; Mon, 25 Nov 96 23:56 GMT
Message-Id: <m0vSAt4-0005FbC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: BOF Description
To: curtis@ans.net
Date: Mon, 25 Nov 1996 23:56:30 +0000 (GMT)
Cc: wsimpson@greendragon.com, tcp-impl
In-Reply-To: <199611252226.RAA28556@brookfield.ans.net> from "Curtis Villamizar" at Nov 25, 96 05:26:07 pm
Content-Type: text
Sender: owner-tcp-impl
Precedence: bulk

> I'm not sure if not having rfc1323 and SACK is a bug, just a lack of
> high performance features, though SACK can be enormously useful even
> at lower speeds.

RFC1323 makes 0 window spoofing (phrack 49) a lot harder. It is also 
subtley mandatory for IPv6[1]. Its something to discuss. SACK is new but
seems to have great potential. It's a pity the RFC rules are so hard to
follow.

> reference the "best" if there is clear and immediate consensus on an
> existing summary.

Sensible

Alan


[1] IPv6 removes the time guarantee on TTL making it a hop counter. That
means without PAWS we break the TIME_WAIT magic (what of it isnt broken
already according to RFC1337 and draft-heavens). I suspect however IPv6
should be off topic for now.




From owner-tcp-impl  Tue Nov 26 23:41:56 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA15901 for tcp-impl-list; Tue, 26 Nov 1996 23:41:33 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA15894 for <tcp-impl@relay.engr.SGI.COM>; Tue, 26 Nov 1996 15:41:31 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA10323 for <tcp-impl@relay.engr.SGI.COM>; Tue, 26 Nov 1996 15:41:28 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.7.1/8.7.1) with UUCP id XAA19869; Tue, 26 Nov 1996 23:16:42 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0vSUhd-0005FbC; Tue, 26 Nov 96 21:06 GMT
Message-Id: <m0vSUhd-0005FbC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: BOF Descriptio
To: touch@isi.edu (Joe Touch)
Date: Tue, 26 Nov 1996 21:06:01 +0000 (GMT)
Cc: curtis@ans.net, vern@ee.lbl.gov, alan@lxorguk.ukuu.org.uk, tcp-impl
In-Reply-To: <v02130513aec01ddf7999@[128.9.32.190]> from "Joe Touch" at Nov 25, 96 08:35:13 pm
Content-Type: text
Sender: owner-tcp-impl
Precedence: bulk

> Which brings us to another, related issue, regarding the charter of the
> discussion.
> 
> Are we purley discussing current implementations, or is there
> room to consider the limitations thereof, notably in cases
> the following optimizations do not affect:
>    fast, few-packet exchanges (T/TCP)
>    large, bulk transfers (TCP)
>    character exchanges (Nagle optimizations)
>    long, fat pipes (Large windows)

I think all of them matter. Except for the words T/TCP. Thats for the
experimental folk to play with not real systems.

The amount of character exchanges on the internet I would assume is dropping
a lot nowdays. Does anyone have good figures on percentages for the four
quoted categories ?


From owner-tcp-impl  Wed Nov 27 00:12:54 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA22014 for tcp-impl-list; Wed, 27 Nov 1996 00:12:30 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA21999 for <tcp-impl@relay.engr.SGI.COM>; Tue, 26 Nov 1996 16:12:24 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA17903 for <tcp-impl@relay.engr.SGI.COM>; Tue, 26 Nov 1996 16:12:22 -0800
Received: by daffy.ee.lbl.gov (8.7.5/1.43r)
	id QAA07965; Tue, 26 Nov 1996 16:02:19 -0800 (PST)
Message-Id: <199611270002.QAA07965@daffy.ee.lbl.gov>
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
Cc: tcp-impl
Subject: Re: BOF Descriptio
In-reply-to: Your message of Tue, 26 Nov 1996 21:06:01 PST.
Date: Tue, 26 Nov 1996 16:02:18 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

> >    fast, few-packet exchanges (T/TCP)
> >    large, bulk transfers (TCP)
> >    character exchanges (Nagle optimizations)
> >    long, fat pipes (Large windows)
> 
> I think all of them matter. Except for the words T/TCP. Thats for the
> experimental folk to play with not real systems.

I disagree.  T/TCP has an RFC defining it, a book about it, and independent
implementations.  That certainly seems to me to qualify it.

> The amount of character exchanges on the internet I would assume is dropping
> a lot nowdays. Does anyone have good figures on percentages for the four
> quoted categories ?

There is tremendous diversity across sites in their traffic mixes, so you
can't meaningfully talk about "typical" mixes except perhaps in the backbone.
Both Web and FTP connections have heavy-tailed size distributions.  This means
that you get a lot of "fast, few-packet exchanges" but that if what you care
about is where all the bytes are coming from, they're due to a (very) few
"large, bulk transfers".  Regarding Joe's last item, certainly there are
plenty of long pipes, and they do get fatter over time.

Finally, while character exchange isn't a heavy hitter in terms of bytes
generated (it does a bit better in terms of packets generated), I wouldn't
write it off.  For example, there were 2,600 telnet and rlogin connections
into and out of LBL yesterday (we only have 2,200 employees).  Your site
might look completely different - as noted above!

		Vern

From owner-tcp-impl  Wed Nov 27 20:54:49 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA10897 for tcp-impl-list; Wed, 27 Nov 1996 20:54:27 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA10891 for <tcp-impl@relay.engr.SGI.COM>; Wed, 27 Nov 1996 12:54:26 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA25649 for <tcp-impl@relay.engr.SGI.COM>; Wed, 27 Nov 1996 12:53:56 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.7.1/8.7.1) with UUCP id UAA25873; Wed, 27 Nov 1996 20:46:21 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0vSq0W-0005FbC; Wed, 27 Nov 96 19:50 GMT
Message-Id: <m0vSq0W-0005FbC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: BOF Descriptio
To: vern@ee.lbl.gov (Vern Paxson)
Date: Wed, 27 Nov 1996 19:50:56 +0000 (GMT)
Cc: alan@lxorguk.ukuu.org.uk, tcp-impl
In-Reply-To: <199611270002.QAA07965@daffy.ee.lbl.gov> from "Vern Paxson" at Nov 26, 96 04:02:18 pm
Content-Type: text
Sender: owner-tcp-impl
Precedence: bulk

> > I think all of them matter. Except for the words T/TCP. Thats for the
> > experimental folk to play with not real systems.
> 
> I disagree.  T/TCP has an RFC defining it, a book about it, and independent
> implementations.  That certainly seems to me to qualify it.

The RFC explicitly states its experimental and not recommended for general
implementation. Maybe it is worth looking at as it shows up tcp bugs in
older KA9Q (and I think the old Solaris 2.1 stack).

All this reminds me of another one. Emulex printservers show a bug I've seen 
a ton of embedded products. Im guessing someone sells these folks a buggy
stack. The problem seen is:

	Linux sends data to the printer and the printer keeps shrinking window
		quite legally.
	The window goes below 1 MSS. Linux waits and then sends to fil the
		window (only then will it open the window)
	The window opens
	Linux sends the complete packet it sent partially to fill the window.
	The printer ignores it
	[Repeat until timeout]

basically the stack doesnt cope with partially overlapping frames. If the
frame starts before the expected next frame it goes in the bitbucket.

I've modded Linux 2.0.16+ to cope with this case by splitting the packet,
which is a pain but one I can live with.

I'd love to know whose stack these printservers all originate from so I can
actually get a bug report to the right place.

Alan

	

From owner-tcp-impl  Thu Nov 28 03:39:38 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id DAA19378 for tcp-impl-list; Thu, 28 Nov 1996 03:38:59 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA19372 for <tcp-impl@relay.engr.SGI.COM>; Wed, 27 Nov 1996 19:38:58 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id TAA07946 for <tcp-impl@relay.engr.SGI.COM>; Wed, 27 Nov 1996 19:38:51 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm035-03.dialip.mich.net [141.211.7.14]) by merit.edu (8.7.6/merit-2.0) with SMTP id WAA00864 for <tcp-impl@relay.engr.SGI.COM>; Wed, 27 Nov 1996 22:35:04 -0500 (EST)
Date: Thu, 28 Nov 96 03:15:27 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5511.wsimpson@greendragon.com>
To: tcp-impl
Subject: Re: slow-start related performance bugs in TCP
Sender: owner-tcp-impl
Precedence: bulk

> From: John Heidemann <johnh@isi.edu>
> 	The first interaction occurs when the congestion window (cwnd)
> 	is at 1x MSS (as at the start of a connection, or after an idle
> 	period as described below).  The sender transmits the single
> 	packet allowed by cwnd, but the receiver delays ACKing this
> 	packet because immediate ACKs are only required after two
> 	full packets.
>
> 	The second interaction occurs when the cwnd is at 2x MSS.
> 	If the sender transmits a less-than full-size segment
> 	the receiver will refuse to ACK immediately
> 	because delayed ACKs require receipt of two *full-MSS*
> 	segments for an immediate ACK.
>
Well, on my typically low speed links into a fat pipe, I've always
viewed these both as a "feature"....  The delayed Ack helps keep the
SRTT from dropping too quickly on these less than full size packets,
preventing retransmissions of full MSS packets later.

Unnecessary retransmission of a 1000 ms packet is a much bigger
performance loser over a short term or bursty connection than an extra
50 ms delayed Ack at the start.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl  Thu Nov 28 12:13:47 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA19377 for tcp-impl-list; Thu, 28 Nov 1996 12:13:25 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id EAA19337 for <tcp-impl@relay.engr.SGI.COM>; Thu, 28 Nov 1996 04:12:29 -0800
Received: from oberon.di.fc.ul.pt (oberon.di.fc.ul.pt [192.67.76.44]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id EAA04305 for <tcp-impl@relay.engr.SGI.COM>; Thu, 28 Nov 1996 04:10:44 -0800
Received: (from roque@localhost) by oberon.di.fc.ul.pt (8.7.5/8.7.3) id MAA26179; Thu, 28 Nov 1996 12:08:29 GMT
Date: Thu, 28 Nov 1996 12:08:29 GMT
Message-Id: <199611281208.MAA26179@oberon.di.fc.ul.pt>
From: Pedro Roque <roque@di.fc.ul.pt>
To: "William Allen Simpson" <wsimpson@greendragon.com>
Cc: tcp-impl
Subject: Re: slow-start related performance bugs in TCP
In-Reply-To: <5511.wsimpson@greendragon.com>
References: <5511.wsimpson@greendragon.com>
Sender: owner-tcp-impl
Precedence: bulk

>>>>> "William" == William Allen Simpson <wsimpson@greendragon.com> writes:

    >> From: John Heidemann <johnh@isi.edu> The first interaction
    >> occurs when the congestion window (cwnd) is at 1x MSS (as at
    >> the start of a connection, or after an idle period as described
    >> below).  The sender transmits the single packet allowed by
    >> cwnd, but the receiver delays ACKing this packet because
    >> immediate ACKs are only required after two full packets.
    >> 
    >> The second interaction occurs when the cwnd is at 2x MSS.  If
    >> the sender transmits a less-than full-size segment the receiver
    >> will refuse to ACK immediately because delayed ACKs require
    >> receipt of two *full-MSS* segments for an immediate ACK.
    >> 
    William> Well, on my typically low speed links into a fat pipe,
    William> I've always viewed these both as a "feature"....  The
    William> delayed Ack helps keep the SRTT from dropping too quickly
    William> on these less than full size packets, preventing
    William> retransmissions of full MSS packets later.

    William> Unnecessary retransmission of a 1000 ms packet is a much
    William> bigger performance loser over a short term or bursty
    William> connection than an extra 50 ms delayed Ack at the start.

One interestening point about delayed acks is that RFC 1122 recomends
that delayed acks be implemented acording to:

[TCP:5] "Window and Acknowledgment Strategy in TCP," D. Clark, RFC-813,
     July 1982.

Which BSD based stacks usually don't conform to (they use the random 0-50ms
interval).

Is this a bug ? Or is 1122 superseded in this regard by one random document
i'm not aware off ?

./Pedro.

From owner-tcp-impl  Fri Nov 29 20:40:07 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA00345 for tcp-impl-list; Fri, 29 Nov 1996 20:39:45 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA00339 for <tcp-impl@relay.engr.SGI.COM>; Fri, 29 Nov 1996 12:39:43 -0800
Received: from fore.co.uk (zander.fore.co.uk [193.132.138.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA00993 for <tcp-impl@relay.engr.SGI.COM>; Fri, 29 Nov 1996 12:39:39 -0800
Received: from ih-linux-pc.fore.co.uk ([193.132.138.143]) by fore.co.uk (4.1/SMI-4.1)
	id AA08623; Fri, 29 Nov 96 20:32:51 GMT
Message-Id: <329F44DF.14A5@fore.co.uk>
Date: Fri, 29 Nov 1996 20:17:35 +0000
From: Ian Heavens <iheavens@fore.co.uk>
Organization: Fore Systems
X-Mailer: Mozilla 3.0Gold (Win95; I)
Mime-Version: 1.0
To: tcp-impl
Subject: my 2p worth
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

The following occurred to me:

1.It would be interesting to know how many independent
  TCP implementations exist, and the rough proportions in
  which they are represented.  Once the derivation of
  an implementation is known, one can deduce a lot about the
  kind of bugs it has (e.g. anything derived from BSD 4.4 is
  likely to have similar bugs).

2. It used to be the case that there were bugs in common implementations
  that you had to workaround (e.g. BSD4.3 did a strange telnet echo
  negotiation to ascertain whether the peer was BSD4.2 or BSD4.3- maybe
  not a violation but surprising if the peer TCP is neither).  Since
  these are impossible to find except by rigorous interoperability
  testing, it would be enormously useful to have these listed.  OK,
  the idea is to get the buggy implementations fixed but in the 
  short term it is a good idea to workaround flaws in e.g. BSD TCP/IP.
  I guess this is what we're discussing, but adding a workaround
  if it exists would be nice.

3.  Most of the bugs mentioned are protocol issues or end-to-end
   performance issues on a single connection.  Is it within scope to
   mention scalability issues, e.g. supporting large numbers of
   connections in TIME-WAIT?  This would reduce some of the problems
   with RSTs.  I tried to summarise the issues, mostly researched
   by Jeff Mogul (and the Sequent SIGCOMM paper) in section 5.1
   of draft-heavens-problems-rsts-03.txt.  

4.  Are there bugs that crop up persistently in independent TCP
    implementations?  e.g. mishandling of queued data for transmission
    when a close is issued?  These would point to the most difficult
    issues.

5.   Should we separate buggy implementations of algorithms and
     inappropriate (or debatable) algorithms?   

regards

ian

-- 
Ian Heavens, Fore Systems.  email: iheavens@eu.fore.com


From owner-tcp-impl  Fri Dec  6 01:20:47 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA19160 for tcp-impl-list; Fri, 6 Dec 1996 01:20:23 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA19128 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Dec 1996 17:20:15 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA04836 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Dec 1996 17:20:14 -0800
Received: by daffy.ee.lbl.gov (8.7.5/1.43r)
	id RAA16291; Thu, 5 Dec 1996 17:10:12 -0800 (PST)
Message-Id: <199612060110.RAA16291@daffy.ee.lbl.gov>
To: Ian Heavens <iheavens@fore.co.uk>
Cc: tcp-impl
Subject: Re: my 2p worth
In-reply-to: Your message of Fri, 29 Nov 1996 20:17:35 PST.
Date: Thu, 05 Dec 1996 17:10:12 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

> 1.It would be interesting to know how many independent
>   TCP implementations exist, and the rough proportions in
>   which they are represented.

Also, it would be very interesting (but hard to determine!) how much
traffic each implementation contributes.  This would let you gauge to
what degree a widespread PC implementation actually influences Internet
traffic.

One idea floating around is having as one of the group's products a Web
page that would have this sort of info.  Something along the lines of

	http://www.psc.edu/networking/perf_tune.html

which Jamshid Mahdavi et al put together.  There are IETF issues regarding
a WG providing such a page which Allison & Allyn are looking into.  We
should have something more concrete to say about this at the BOF.

> 2. It used to be the case that there were bugs in common implementations
>   that you had to workaround ...
>   it would be enormously useful to have these listed.

Definitely!  This is one of the central goals of the WG as I see it.

> 3.  Most of the bugs mentioned are protocol issues or end-to-end
>    performance issues on a single connection.  Is it within scope to
>    mention scalability issues, e.g. supporting large numbers of
>    connections in TIME-WAIT?

This also strikes me as within scope, though with the line drawn at
the point where things cross into the research frontier.  So, for example,
the issues raised in Joe Touch's draft RFC are good things to discuss;
but for some of them their resolution (e.g., how to share cwnd across
multiple connections) is research and needs to be dealt with in another
forum.  It seems an important benefit of the WG is to highlight where
the research frontier is, too, so implementors can gauge where different
features lie.

> 4.  Are there bugs that crop up persistently in independent TCP
>     implementations?  e.g. mishandling of queued data for transmission
>     when a close is issued?  These would point to the most difficult
>     issues.

Likewise, this strikes me as a mainstream goal for the proposed WG.

> 5.   Should we separate buggy implementations of algorithms and
>      inappropriate (or debatable) algorithms?   

Definitely, if by algorithm you mean what's specified in an RFC.  We need
to consider whether some of the RFC's need clarification or expansion, but
this is a considerably more significant step than cataloging implementation
issues.

		Vern

From owner-tcp-impl  Fri Dec  6 22:06:22 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA27831 for tcp-impl-list; Fri, 6 Dec 1996 22:05:54 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA27810 for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Dec 1996 14:05:52 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id OAA13316 for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Dec 1996 14:05:50 -0800
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA02844>; Fri, 6 Dec 1996 14:02:03 -0800
Date: Fri, 6 Dec 1996 14:01:34 -0800
Posted-Date: Fri, 6 Dec 1996 14:01:34 -0800
Message-Id: <199612062201.AA23900@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA23900>; Fri, 6 Dec 1996 14:01:34 -0800
To: iheavens@fore.co.uk, vern@ee.lbl.gov
Subject: Re: my 2p worth
Cc: tcp-impl
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl
Precedence: bulk

> > 3.  Most of the bugs mentioned are protocol issues or end-to-end
> >    performance issues on a single connection.  Is it within scope to
> >    mention scalability issues, e.g. supporting large numbers of
> >    connections in TIME-WAIT?
> 
> This also strikes me as within scope, though with the line drawn at
> the point where things cross into the research frontier.  So, for example,
> the issues raised in Joe Touch's draft RFC are good things to discuss;
> but for some of them their resolution (e.g., how to share cwnd across
> multiple connections) is research and needs to be dealt with in another
> forum.  It seems an important benefit of the WG is to highlight where
> the research frontier is, too, so implementors can gauge where different
> features lie.

Has anyone brought up the issue of "kinds" of implementation issues?

I.e.,

	implementations that don't follow RFCs

	implementations that don't follow reference implementations
		e.g., Reno, Tahoe, etc., but were never spec'd

	errors in the RFCs/reference implementations (if any)
		i.e., things that don't work as "intended"

	interactions between the RFCs/reference implementations

	underspecification of the RFCs/reference implementations

and to differentiate between flaws that cause errors, vs. just
decrease performance.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl  Fri Dec 13 02:21:38 1996
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id CAA07032 for tcp-impl-list; Fri, 13 Dec 1996 02:20:15 GMT
Return-Path: <owner-tcp-impl>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA07011 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 12 Dec 1996 18:20:13 -0800
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id SAA08805 for <tcp-impl@engr.sgi.com>; Thu, 12 Dec 1996 18:20:13 -0800
Message-Id: <199612130220.SAA08805@refugee.engr.sgi.com>
To: tcp-impl
Subject: archive now available
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <8798.850443613.1@refugee.engr.sgi.com>
Date: Thu, 12 Dec 1996 18:20:13 -0800
From: Steve Alexander <sca@refugee>
Sender: owner-tcp-impl
Precedence: bulk

I've set up a mechanism for copying the mailing list archive outside our
firewall.  Assuming this works correctly, the archive should now be available
at:

	ftp://ftp.sgi.com/pub/tcp-impl/mail.archive

It will be updated every two hours for now.

-- Steve

From owner-tcp-impl  Fri Jan 10 04:44:05 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id EAA14811 for tcp-impl-list; Fri, 10 Jan 1997 04:43:30 GMT
Return-Path: <owner-tcp-impl>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id UAA14804 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 9 Jan 1997 20:43:29 -0800
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id UAA20328; Thu, 9 Jan 1997 20:42:17 -0800
Message-Id: <199701100442.UAA20328@refugee.engr.sgi.com>
To: minutes@buzzsaw.mti.sgi.com, tcp-impl
Subject: San Jose TCP Implementation BOF Minutes
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <20261.852871336.1@refugee.engr.sgi.com>
Date: Thu, 09 Jan 1997 20:42:16 -0800
From: Steve Alexander <sca@refugee>
Sender: owner-tcp-impl
Precedence: bulk

[ My apologies for being tardy with these ]

The TCP Implementation BOF was held at 3:30 PM, on Wednesday, December 11th.
The BOF was intended to determine whether or not consensus exists for forming
a working group for the purpose of making implementors aware of problems in
current TCP implementations.

The BOF was co-chaired by Steve Alexander (sca@sgi.com) from Silicon Graphics
and Vern Paxson (vern@ee.lbl.gov) of Lawrence Berkeley Labs.

Steve started off by presenting the BOF Agenda and then gave a motivation for
having the BOF.  The idea (originally suggested by Jamshid Mahdavi and Matt
Mathis) is to help TCP implementors improve the quality of their products by
making them aware of problems in existing implementations and any tools or test
suites that might make the development process more productive.

Vern Paxson then gave a presentation on the results of several research studies,
which included work by Brakmo and Peterson; Comer and Lin; Stevens; Dawson,
Jahanian, and Milton; and Paxson.

The above studies showed that many TCP implementations exhibit bad behavior
at times, and that some of these problems can lead to a great deal of data
being needlessly retransmitted.  Among the other problems mentioned were:
	- Some TCP implementations send data after having reset the connection
	- Some do not acknowledge zero-window probes
	- Some throw away all data after a hole in the sequence space
	- Wide variety of MSS values (some very bizarre)
	- Many problems with keepalives
	- Congestion avoidance algorithms such as slow-start are not ubiquitous

Vern wrapped up with an overview of two tools, ORCHESTRA, and tcpanaly.
ORCHESTRA (http://www.eecs.umich.edu/~sdawson) is an x-kernel based tool that
allows instrumentation of networking code.  It allows a developer to trace
packets and generate probe packets.

tcpanaly is a tool that Vern has developed for post-processing of tcpdump
traces.  It can detect anomalies in a TCP connection, particularly WRT
congestion avoidance.

Steve Parker from SunSoft then gave a brief presentation on the Packet Shell,
which is a tool that he and Chris Schmechel have been developing.  The packet
shell is a set of TCL extensions that allows packet-level tests to be developed
easily using the TCL scripting language.  Steve presented several examples of
how this could be used to verify that a TCP is behaving correctly.  The packet
shell is available at ftp://playground.sun.com/pub/sparker.

After the presentations concluded, discussion turned to the proposed charter
and milestones for the working group, which were:

   - Produce a compilation of known problems and their solutions.  This will
     raise awareness of these issues.
     
   - (optional) Determine if any problems found are the result of ambiguities
     in the TCP specification.  If necessary, produce a document which
     clarifies the specification.

   - Catalog existing TCP test suites, diagnostic tools, testing organizations,
     and procedures that can be used by implementors to improve their code, and
     produce a document enumerating them.

 Goals and Milestones:

   Dec 96       Working group formation

   Mar 97       Produce initial document describing problems and fixes

   May 97       Produce I-D which clarifies 793, 1122, 1323, if necessary

   May 97       Produce initial catalog of test suites, etc.

A large percentage of the discussion centered around publically naming vendors
with sub-optimal implementations.  It was emphasized several times that the
group is targeted at developers, not at users or administrators.  In that
context, it is not clear that mentioning implementors by name serves the
group's purpose.  Allyn Romanow mentioned that the IESG is still determining
whether doing so has any legal ramifications.  It was not clear that consensus
was reached on this issue.

Another potentially controversial issue has to do with handling security
issues.  Steve Bellovin argued in favor of being as open as possible about such
matters.

Although the goal of updating the TCP specifications is potentially an
unbounded problem, several people felt that it will be necessary.  In
particular, Dave Borman presented a list of several items that he felt needed
clarification.  Although there were few other specific issues mentioned, there
seemed to be a growing belief that some enhancements to existing specifications
will be needed.

It was generally felt that the scope of the group will need to be carefully
defined in order to make forward progress.  For example, drawing the line
between what is "research" and what is "production code" will be an issue.
Discussion with the IRTF might be useful in evaluating this issue.

There was some discussion on test suites and test tools, although it did not
appear that any conclusions were drawn.  At the present time, it is not clear
that everyone agrees on what a test suite should do, or whether a test suite
could be a product of an IETF group.

Bob Braden suggested that the proposed dates were wildly optimistic.  There
was general agreement on this point.

Several other specific issues were raised:

There was a brief mention of SACK, and whether or not it is implemented widely
enough to be addressed? (no consensus yet).

The question of issues surrounding asymmetric paths was raised as well.

Fred Baker raised the issue of performance over satellites as something that
mighe be appropriate (there was no clear consensus on this being appropriate
yet).

One concern was that a working group could become a perpetually ongoing effort.
Some felt that this group could never really finish as long as TCP
implementations keep evolving.  The consensus seemed to be that this might be
true but that a first effort has to be successful prior to continuing on.  If
the group cannot be productive initially, then the question is moot.

Some questions were raised about environmental assumptions present in the
specifications, namely are there some, and if so are they clear?  It seems
that further discussion is needed on this topic.

It was suggested that another potential deliverable is a guide to
tuning/defaults for administrators.  Again, this needs further discussion, and
may require working with the User Services area.

There seemed to be general agreement that a working group should be formed, but
many details remain to be worked out about the charter and deliverables.

Steve Alexander

From owner-tcp-impl  Fri Jan 10 05:09:48 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id FAA18528 for tcp-impl-list; Fri, 10 Jan 1997 05:09:21 GMT
Return-Path: <owner-tcp-impl>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id VAA18520 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 9 Jan 1997 21:09:19 -0800
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id VAA12667 for <tcp-impl@engr.sgi.com>; Thu, 9 Jan 1997 21:09:19 -0800
Message-Id: <199701100509.VAA12667@refugee.engr.sgi.com>
To: tcp-impl
Subject: web page now available
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <20871.852872958.1@refugee.engr.sgi.com>
Date: Thu, 09 Jan 1997 21:09:18 -0800
From: Steve Alexander <sca@refugee>
Sender: owner-tcp-impl
Precedence: bulk

http://reality.sgi.com/sca/tcp-impl

It has the meeting slides, minutes, etc., and will be updated as needed.

-- Steve

From owner-tcp-impl  Fri Jan 10 17:30:17 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA12702 for tcp-impl-list; Fri, 10 Jan 1997 17:29:50 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA12692 for <tcp-impl@relay.engr.SGI.COM>; Fri, 10 Jan 1997 09:29:49 -0800
Received: from postoffice.Reston.mci.net (postoffice.Reston.mci.net [204.70.128.20]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA18620 for <tcp-impl@relay.engr.SGI.COM>; Fri, 10 Jan 1997 09:29:47 -0800
Received: from huddle.reston.mci.net ([166.45.3.198]) by postoffice.Reston.mci.net (8.7.5/8.7.3) with SMTP id MAA13169 for <tcp-impl@relay.engr.SGI.COM>; Fri, 10 Jan 1997 12:19:40 -0500 (EST)
Message-Id: <2.2.32.19970110171852.008a441c@mci.net>
X-Sender: huddle@mci.net
X-Mailer: Windows Eudora Pro Version 2.2 (32)
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Fri, 10 Jan 1997 11:18:52 -0600
To: tcp-impl
From: Scott Huddle <huddle@mci.net>
Subject: More details of packet traces 
Sender: owner-tcp-impl
Precedence: bulk

One of the things I'd like to better understand from the results
that were presented at the BOF is the frequency of "bad" 
implementations to "good" implementations that was observed.
While its interesting that you can scan TCPdumps and pick out
particularly evil characteristics of an implementation, its not
necessarily a problem worth fixing if the implementation is
relatively rare.  Obviously, a majorly flawed implementation of,
say, the Win95 stack, is a bigger operational problem than 
a flawed implementation of an Atari2600 stack.

I'd like to propose that one of the goals of the group to come 
up with a list of "good" characteristics, perhaps document them
as a BCP, and let vendors saythey conform to the BCP.

I'd also like to support Fred's interest in TCP over satellite,
especially high bandwidth flows.  

-scott


From owner-tcp-impl  Mon Jan 20 20:57:40 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA16529 for tcp-impl-list; Mon, 20 Jan 1997 20:57:12 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA16523 for <tcp-impl@relay.engr.sgi.com>; Mon, 20 Jan 1997 12:57:10 -0800
Received: from fore.co.uk (zander.fore.co.uk [193.132.138.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA16590 for <tcp-impl@relay.engr.sgi.com>; Mon, 20 Jan 1997 12:57:00 -0800
Received: by fore.co.uk (4.1/SMI-4.1)
	id AA00148; Mon, 20 Jan 97 20:47:22 GMT
From: iheavens@fore.co.uk (Ian Heavens)
Message-Id: <9701202047.AA00148@fore.co.uk>
Subject: Re: cleaning up TIME_WAIT states
To: rstevens@kohala.com
Date: Mon, 20 Jan 1997 20:47:21 +0000 (WET)
Cc: end2end-interest@isi.edu, tcp-impl
In-Reply-To: <199701182032.NAA05344@kohala.kohala.com_fore.ext.ietf.end2end> from "W. Richard Stevens" at Jan 18, 97 03:39:06 pm
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 3682      
Sender: owner-tcp-impl
Precedence: bulk



W. Richard Stevens wrote:

[ end of thread on moving the burden of TIME-WAIT from web servers to
clients. I cc:ed this to the TCP implementors WG mailing list since it 
seems appropriate ]

 >
> > This has the effect of making the server-side active close convert into
> > 'active' at the client, and effectively distributed the storage
> > of old TCBs at heavily-loaded servers.
> 
> The problem with all of the suggestions so far is that they require
> a change at the client end.  If this is really a problem (I recall a
> Jeff Mogul paper not too long ago that actually plotted the number of
> connections in the TIME_WAIT state on a busy Web server) then I think
> it's better to work on the problem at the server.  Trying to get all the
> different clients that are out there (most are PC stacks, I'd guess) to
> implement something new sounds impossible.

You make an interesting point - it's always easier to improve the
quality of servers, which must be more robust, support more connections 
etc.  On the other hand, the idea of the TCP implementors' WG is to improve TCP
implementations; from the BOF it appears that it is the implementations 
that generally act as clients that need the most improvement.  Sounds 
like a dual approach of fixing the server side first if possible, but 
making recommendations for the clients, might be a good idea.

> When BSDI upgraded their stack this past summer to make their Web server
> "faster", they moved all the connections in the TIME_WAIT state onto their
> own queue, to get them out of the tcp_slowtimo() function.  I'd bet that's
> the majority of the CPU savings right there.  (I've always thought that the
> BSD tcp_{slow,fast}timo() functions must be one of the biggest bottlenecks
> on a busy system.)

I agree.  PCB lookup costs have been analysed (the Sequent paper on
efficient demultiplexing & Jeff Mogul's "Case for Persistent HTTP" and 
"Network Behaviour of a Busy Web Server and its Clients"). I wonder about 
the effect of timer traversals of thousands of PCBs, especially on the 
cache.

> 
> If memory is then a problem (I think the BSDI code still saves the inpcb{}
> and the tcpcb{}; 384 bytes if my memory is right) then I'd bet you could
> save a lot less state information for the TIME_WAIT state, similar to the
> savings they obtained with the changes to help with the SYN flood attacks
> this fall.  If you could reduce the amount of state information down to
> 32 bytes, then you have increased by one order or magnitude the number of
> these TIME_WAIT connections for a given amount of memory.
> 

Jeff Mogul's conclusions in his SIGCOMM paper were that memory occupancy
was not a problem (less than a Mbyte of state for TIME-WAIT connections 
in a busy web server) - at least for a web server.  I think you could get 
the state down to about 13 bytes (1 address + 2 ports + some bits) and 
reduce timer search costs at the same time, by reducing the granularity 
of the TIME-WAIT expiry timeout. If the burden of TIME-WAIT is moved to 
the client (or for applications where the client enters TIME-WAIT), this 
might be an idea.  Certainly it is a fairly easy way to free up around 
0.5 Mbyte on a web server.

The perceived costs of TIME-WAIT are still too high, if people are 
reducing the MSL from the (low) 60 seconds down to 10 seconds, or zero; 
in addition it looks like a lot of RSTs are being used to avoid it 
(5-10% on the DEC election server: see ftp://ftp.digital.com/pub/DEC/traces/netstat).  
I think this should be flagged as a concern of the TCP implementors' 
working group.

ian
-- 
Ian Heavens, Fore Systems.  email: iheavens@eu.fore.com  
f

--------------6C217F964D81--





From owner-tcp-impl  Mon Jan 20 21:38:23 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA22156 for tcp-impl-list; Mon, 20 Jan 1997 21:37:55 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA22148 for <tcp-impl@relay.engr.SGI.COM>; Mon, 20 Jan 1997 13:37:53 -0800
Received: from palrel3.hp.com (palrel3.hp.com [15.253.88.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA23377 for <tcp-impl@relay.engr.SGI.COM>; Mon, 20 Jan 1997 13:37:52 -0800
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185]) by palrel3.hp.com with SMTP (8.7.5/8.7.3) id NAA01025 for <tcp-impl@relay.engr.SGI.COM>; Mon, 20 Jan 1997 13:34:15 -0800 (PST)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA04972; Mon, 20 Jan 1997 13:27:39 -0800
Message-Id: <32E3E34A.5B1D@cup.hp.com>
Date: Mon, 20 Jan 1997 13:27:38 -0800
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: end2end-interest@isi.edu, tcp-impl
Subject: Re: cleaning up TIME_WAIT states
References: <9701202047.AA00148@fore.co.uk>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

Ian Heavens wrote:
> 
> W. Richard Stevens wrote:
> > it's better to work on the problem at the server.  Trying to get all the
> > different clients that are out there (most are PC stacks, I'd guess) to
> > implement something new sounds impossible.
> 
> You make an interesting point - it's always easier to improve the
> quality of servers, which must be more robust, support more connections
> ...
> that generally act as clients that need the most improvement.  Sounds
> like a dual approach of fixing the server side first if possible, but
> making recommendations for the clients, might be a good idea.

Or gack, the three-pronged attack of getting the applications that
behave poorly to do somethig a bit better? I suspect it is much easier
to get people to download new versions of web browser foo that say
shutdown first than it is to get them to upgrade their stack, and it is
probably easier to get poeple to update webserver foo than the server
stack. Or at least it looks like the web server and browser software is
roolling with greater frequency than the transport implementations.

> 
> > When BSDI upgraded their stack this past summer to make their Web server
> > "faster", they moved all the connections in the TIME_WAIT state onto their
> > own queue, to get them out of the tcp_slowtimo() function.  I'd bet that's
> > the majority of the CPU savings right there.  (I've always thought that the
> > BSD tcp_{slow,fast}timo() functions must be one of the biggest bottlenecks
> > on a busy system.)
> 
> I agree.  PCB lookup costs have been analysed (the Sequent paper on
> efficient demultiplexing & Jeff Mogul's "Case for Persistent HTTP" and
> "Network Behaviour of a Busy Web Server and its Clients"). I wonder about
> the effect of timer traversals of thousands of PCBs, especially on the
> cache.

I suspect that many (bletch :) commercial stacks are getting tuned-up to
handle large numbers of TIME_WAITs quite well. For some examples, you
might look at the SPECweb96 results on www.specbench.org. There are some
systems being benchmarked at over 1000 connections per second for
fifteen minutes or more, which is a decent number of connections in the
TIME_WAIT state at either 60 or 240 2*MSL.

> > If memory is then a problem (I think the BSDI code still saves the inpcb{}
> > and the tcpcb{}; 384 bytes if my memory is right) then I'd bet you could
> > save a lot less state information for the TIME_WAIT state, similar to the
> > savings they obtained with the changes to help with the SYN flood attacks
> > this fall.  If you could reduce the amount of state information down to
> > 32 bytes, then you have increased by one order or magnitude the number of
> > these TIME_WAIT connections for a given amount of memory.
> >
> 
> Jeff Mogul's conclusions in his SIGCOMM paper were that memory occupancy
> was not a problem (less than a Mbyte of state for TIME-WAIT connections
> in a busy web server) - at least for a web server.  I think you could get

That depends on your definition of busy, the sice of the PCB, and the
length of the TIME_WAIT state. At a high enough connection per second
level, the memory occupied by TIME_WAITs can be rather larger than the
web servers URL working-set - even for something like SPECweb96, which
increases the working-set as the square-root of the requested load.

> the state down to about 13 bytes (1 address + 2 ports + some bits) and

Wouldn't we want a sequence number in there, or are we precluding the
possibility of restarting a connection in TIME_WAIT?

> The perceived costs of TIME-WAIT are still too high, if people are
> reducing the MSL from the (low) 60 seconds down to 10 seconds, or zero;
> in addition it looks like a lot of RSTs are being used to avoid it
> (5-10% on the DEC election server: see ftp://ftp.digital.com/pub/DEC/traces/netstat).
> I think this should be flagged as a concern of the TCP implementors'
> working group.

Indeed, lets make sure that abortive close does not get out of hand.

rick jones
http://www.cup.hp.com/netperf/NetperfPage.html

From owner-tcp-impl  Mon Jan 20 23:13:30 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA04844 for tcp-impl-list; Mon, 20 Jan 1997 23:12:54 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA04838 for <tcp-impl@relay.engr.sgi.com>; Mon, 20 Jan 1997 15:12:52 -0800
Received: from tipper.oit.unc.edu ([152.2.22.85]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA08737 for <tcp-impl@relay.engr.sgi.com>; Mon, 20 Jan 1997 15:12:47 -0800
Received: from tipper.oit.unc.edu (tipper.oit.unc.edu [152.2.22.85]) by tipper.oit.unc.edu (8.6.12/8.6.10) with SMTP id SAA21811; Mon, 20 Jan 1997 18:05:42 -0500
Date: Mon, 20 Jan 1997 18:05:41 -0500 (EST)
From: Simon Spero <ses@tipper.oit.unc.edu>
To: Ian Heavens <iheavens@fore.co.uk>
cc: rstevens@kohala.com, end2end-interest@ISI.EDU, tcp-impl
Subject: Re: cleaning up TIME_WAIT states
In-Reply-To: <9701202047.AA00148@fore.co.uk>
Message-ID: <Pine.SUN.3.91.970120175509.21668C-100000@tipper.oit.unc.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl
Precedence: bulk

On Mon, 20 Jan 1997, Ian Heavens wrote:
> 
> The perceived costs of TIME-WAIT are still too high, if people are 
> reducing the MSL from the (low) 60 seconds down to 10 seconds, or zero; 
> in addition it looks like a lot of RSTs are being used to avoid it 
> (5-10% on the DEC election server: see 
> ftp://ftp.digital.com/pub/DEC/traces/netstat).  
> I think this should be flagged as a concern of the TCP implementors' 
> working group.

The real pain was when solaris made the mistake of following the host 
recs and actually defaulting to the suggested 240 seconds. 

I don't know if this is the old election data or new election data, so 
I'll just witter on and hope for the best :-) 

Web traffic causes a hell of a lot of resets due to a hell of a lot of 
connections being canceled when users move to a new page before all 
images have been completely downloaded. I posted something on this 
subject to www-talk around ~Nov 94 with the subject 
The Silent Scream - the tragic tale of aborted connections.


 -----
Still waiting for the last  	We had to destroy The Kings Head in order	
helicopter out of Chigwell	to save it.
					   You can take the Lad out of Essex, 
				  But you can't take the Essex out of the Lad


From owner-tcp-impl  Mon Jan 20 23:56:50 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA08726 for tcp-impl-list; Mon, 20 Jan 1997 23:56:24 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA08717 for <tcp-impl@relay.engr.sgi.com>; Mon, 20 Jan 1997 15:56:22 -0800
Received: from palrel3.hp.com (palrel3.hp.com [15.253.88.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA15389 for <tcp-impl@relay.engr.sgi.com>; Mon, 20 Jan 1997 15:56:20 -0800
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185]) by palrel3.hp.com with SMTP (8.7.5/8.7.3) id PAA22259 for <tcp-impl@relay.engr.sgi.com>; Mon, 20 Jan 1997 15:52:43 -0800 (PST)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA06492; Mon, 20 Jan 1997 15:46:10 -0800
Message-Id: <32E403C1.7AB5@cup.hp.com>
Date: Mon, 20 Jan 1997 15:46:10 -0800
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: end2end-interest@ISI.EDU, tcp-impl
Subject: Re: cleaning up TIME_WAIT states
References: <Pine.SUN.3.91.970120175509.21668C-100000@tipper.oit.unc.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

Simon Spero wrote:
> The real pain was when solaris made the mistake of following the host
> recs and actually defaulting to the suggested 240 seconds.

I do not agree that was a mistake. It was simply unfortunate that when
that happened, the rest of the stack was not-yet insensitive to the
number of TIME_WAIT states on the machine.

60 seconds is probably sufficient to the task, but it is a bit on the
thin side. (IMHO).

rick jones
BTW, what is the required quoted to new text ratio for end2end-interest?

From owner-tcp-impl  Tue Jan 21 03:07:08 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id DAA00708 for tcp-impl-list; Tue, 21 Jan 1997 03:06:39 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA00702 for <tcp-impl@relay.engr.sgi.com>; Mon, 20 Jan 1997 19:06:38 -0800
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id TAA14745 for <tcp-impl@relay.engr.sgi.com>; Mon, 20 Jan 1997 19:06:37 -0800
Received: from Eng.Sun.COM ([129.146.1.13]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id SAA24242; Mon, 20 Jan 1997 18:56:32 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id SAA16914; Mon, 20 Jan 1997 18:56:30 -0800
Received: from taipei.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id SAA12225; Mon, 20 Jan 1997 18:56:27 -0800
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id SAA13670; Mon, 20 Jan 1997 18:52:38 -0800
Date: Mon, 20 Jan 1997 18:52:38 -0800
From: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199701210252.SAA13670@taipei.eng.sun.com>
To: ses@tipper.oit.unc.edu
Subject: Re: cleaning up TIME_WAIT states
Cc: end2end-interest@ISI.EDU, tcp-impl
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl
Precedence: bulk

>The real pain was when solaris made the mistake of following the host 
>recs and actually defaulting to the suggested 240 seconds. 

Note that in Solaris this is tunable by using the "name dispatch"
facility "ndd" on "tcp_close_wait_interval" parameter (the name is really
a misnomer, and will be fixed soon).

Also in 2.6 release the timeout processing of all TIME_WAIT states has
also been made much more efficient by running a single timeout periodically.

Perhaps it's time for the TCP-IMPL WG to recommend a better default MSL.

Jerry Chu
Internet Engineering
SunSoft

From owner-tcp-impl  Tue Jan 21 03:40:52 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id DAA03064 for tcp-impl-list; Tue, 21 Jan 1997 03:40:23 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA03058 for <tcp-impl@relay.engr.SGI.COM>; Mon, 20 Jan 1997 19:40:21 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id TAA18851 for <tcp-impl@relay.engr.SGI.COM>; Mon, 20 Jan 1997 19:40:20 -0800
Received: by daffy.ee.lbl.gov (8.8.4/1.43r)
	id TAA19498; Mon, 20 Jan 1997 19:30:22 -0800 (PST)
Message-Id: <199701210330.TAA19498@daffy.ee.lbl.gov>
To: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Cc: ses@tipper.oit.unc.edu, end2end-interest@ISI.EDU, tcp-impl
Subject: Re: cleaning up TIME_WAIT states
In-reply-to: Your message of Mon, 20 Jan 1997 18:52:38 PST.
Date: Mon, 20 Jan 1997 19:30:21 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

> Perhaps it's time for the TCP-IMPL WG to recommend a better default MSL.

I think that crosses the line between implementation issues & TCP research
issues, so this is the sort of issue the WG refers to the IRTF (End-to-end
in particular).

		Vern

From owner-tcp-impl  Tue Jan 21 17:01:40 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA18164 for tcp-impl-list; Tue, 21 Jan 1997 17:01:04 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA18154 for <tcp-impl@relay.engr.sgi.com>; Tue, 21 Jan 1997 09:01:03 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id JAA09136 for <tcp-impl@relay.engr.sgi.com>; Tue, 21 Jan 1997 09:01:01 -0800
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA28837>; Tue, 21 Jan 1997 08:57:06 -0800
Date: Tue, 21 Jan 97 08:58:23 PST
From: braden@ISI.EDU
Posted-Date: Tue, 21 Jan 97 08:58:23 PST
Message-Id: <9701211658.AA05859@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA05859>; Tue, 21 Jan 97 08:58:23 PST
To: ses@tipper.oit.unc.edu
Subject: Re: cleaning up TIME_WAIT states
Cc: end2end-interest@ISI.EDU, tcp-impl
Sender: owner-tcp-impl
Precedence: bulk



  *> 
  *> The real pain was when solaris made the mistake of following the host 
  *> recs and actually defaulting to the suggested 240 seconds. 
  *> 

"Actually"?  The TIME-WAIT state delay is there for a good reason.  I
would not want to quibble about a factor of 2 or 4, but to
significantly reduce TW delay would seem to me to be a Bad Idea, and to
encourage vendors to do so is an even Worse Idea.

Bob Braden

From owner-tcp-impl  Tue Feb  4 09:50:38 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA10044 for tcp-impl-list; Tue, 4 Feb 1997 09:50:10 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id BAA10028 for <tcp-impl@relay.engr.SGI.COM>; Tue, 4 Feb 1997 01:50:07 -0800
Received: from mercury.spider.com (mercury.spider.com [194.217.109.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id BAA07739 for <tcp-impl@relay.engr.SGI.COM>; Tue, 4 Feb 1997 01:50:01 -0800
Received: from asimov.spider.com (asimov.spider.com [194.217.109.66]) by mercury.spider.com (8.8.3/8.6.12) with SMTP id JAA29915 for <tcp-impl@relay.engr.SGI.COM>; Tue, 4 Feb 1997 09:46:12 GMT
Received: from malatesta. by asimov.spider.com (SMI-8.6/SMI-SVR4)
	id JAA22050; Tue, 4 Feb 1997 09:41:33 GMT
Received: by malatesta. (SMI-8.6/SMI-SVR4)
	id JAA06652; Tue, 4 Feb 1997 09:45:13 GMT
Date: Tue, 4 Feb 1997 09:45:13 GMT
From: ian@cova-tech.com (Ian Heavens)
Message-Id: <199702040945.JAA06652@malatesta.>
X-Mailer: Mail User's Shell (7.2.6 beta(2) 2/29/96)
To: tcp-impl
Subject: tcpdump analysis tools?
Sender: owner-tcp-impl
Precedence: bulk

I'd like to analyse TCP connections collected by tcpdump in terms of
state transitions:

how many simultaneous opens (few)
how many closed by RST from state X (lots)
how many closed first by server, how many by client

I think treating each connection as a list of consecutive states and
a segment sent/received to effect the transition, and listing the
statistics by each combination, will yield interesting results.

- it would show how applications use TCP
- it could highlight particular bugs.

Is there anything out there that does this, or should I start on something?

ian

Ian Heavens, Spider Software Ltd.
ian@spider.com

From owner-tcp-impl  Tue Feb  4 18:13:14 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA29578 for tcp-impl-list; Tue, 4 Feb 1997 18:12:30 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA29547 for <tcp-impl@relay.engr.SGI.COM>; Tue, 4 Feb 1997 10:12:27 -0800
Received: from picard.cs.ohiou.edu (picard.cs.ohiou.edu [132.235.3.128]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA02735 for <tcp-impl@relay.engr.SGI.COM>; Tue, 4 Feb 1997 10:12:20 -0800
Received: from picard by picard.cs.ohiou.edu (8.6.11/1.930630)
	id SAA03465; Tue, 4 Feb 1997 18:02:35 GMT
Message-Id: <199702041802.SAA03465@picard.cs.ohiou.edu>
To: ian@cova-tech.com (Ian Heavens)
Cc: tcp-impl
From: "Shawn Ostermann" <sdo@picard.cs.OhioU.Edu>
Subject: Re: tcpdump analysis tools? 
Date: Tue, 04 Feb 1997 13:02:34 -0500
Sender: owner-tcp-impl
Precedence: bulk



>> I'd like to analyse TCP connections collected by tcpdump in terms of
>> state transitions:
>> 
>> how many simultaneous opens (few)
>> how many closed by RST from state X (lots)
>> how many closed first by server, how many by client

I have a public domain program called tcptrace that might give you a
headstart.  I doesn't answer any of the questions you've asked, but it
wouldn't take much coding to get that information out.  It already
sifts through tcpdump files gathering information about individual TCP
connections.  Probably overkill for what you need, but it already does
enough of the busy work that you could code up something quickly.  If
you're interested, see:

	http://jarok.cs.ohiou.edu/software/tcptrace/tcptrace.html

--sdo


For example, it might tell you (for one particular connection):


connection 4:
        host g:        lawyers.cs.ohiou.edu:3999
        host h:        indigo.cs.ohiou.edu:1034
        complete conn: yes
        first packet:  Fri Jan 31 17:27:47.565430
        last packet:   Fri Jan 31 17:29:10.496596
        elapsed time:  0:01:22.931166
        total packets: 36739
   g->h:                              h->g:
     total packets:          9368           total packets:         27371      
     ack pkts sent:          9367           ack pkts sent:         27371      
     unique bytes sent:         0           unique bytes sent:  12589056      
     actual data pkts:          0           actual data pkts:      27368      
     actual data bytes:         0           actual data bytes:  12589056      
     rexmt data pkts:           0           rexmt data pkts:           0      
     rexmt data bytes:          0           rexmt data bytes:          0      
     outoforder pkts:           0           outoforder pkts:           0      
     SYN/FIN pkts sent:       1/1           SYN/FIN pkts sent:       1/1      
     req 1323 ws/ts:          Y/Y           req 1323 ws/ts:          Y/Y      
     adv wind scale:            1           adv wind scale:            1      
     req sack:                  Y           req sack:                  Y      
     sacks sent:                0           sacks sent:                0      
     mss requested:          1460 bytes     mss requested:           472 bytes
     max segm size:             0 bytes     max segm size:           460 bytes
     min segm size:             0 bytes     min segm size:           236 bytes
     avg segm size:             0 bytes     avg segm size:           459 bytes
     max win adv:          114696 bytes     max win adv:          131070 bytes
     min win adv:          113680 bytes     min win adv:          114696 bytes
     zero win adv:              0 times     zero win adv:              0 times
     avg win adv:          114685 bytes     avg win adv:          114696 bytes
     throughput:                0 Bps       throughput:           151801 Bps  

     RTT samples:               0           RTT samples:            9364      
     RTT min:                 0.0 ms        RTT min:               586.2 ms   
     RTT max:                 0.0 ms        RTT max:               796.6 ms   
     RTT avg:                 0.0 ms        RTT avg:               662.5 ms   
     RTT stdev:               0.0 ms        RTT stdev:              10.8 ms   
     segs cum acked:            0           segs cum acked:        18004      

From owner-tcp-impl  Tue Feb  4 18:28:29 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA03444 for tcp-impl-list; Tue, 4 Feb 1997 18:27:59 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA03437 for <tcp-impl@relay.engr.SGI.COM>; Tue, 4 Feb 1997 10:27:56 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA06458 for <tcp-impl@relay.engr.SGI.COM>; Tue, 4 Feb 1997 10:27:55 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id KAA24380; Tue, 4 Feb 1997 10:17:52 -0800 (PST)
Message-Id: <199702041817.KAA24380@daffy.ee.lbl.gov>
To: ian@spider.com (Ian Heavens)
Cc: tcp-impl
Subject: Re: tcpdump analysis tools?
In-reply-to: Your message of Tue, 04 Feb 1997 09:45:13 PST.
Date: Tue, 04 Feb 1997 10:17:52 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

> how many simultaneous opens (few)
> how many closed by RST from state X (lots)
> how many closed first by server, how many by client
> ...
> Is there anything out there that does this, or should I start on something?

You might be able to hack "tcp-reduce" in the Internet Traffic Archive

	http://town.hall.org/Archives/pub/ITA/

to do this.

		Vern

From owner-tcp-impl  Tue Feb 11 08:16:32 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA12769 for tcp-impl-list; Tue, 11 Feb 1997 08:15:52 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA12761 for <tcp-impl@relay.engr.sgi.com>; Tue, 11 Feb 1997 00:15:50 -0800
Received: from mercury.spider.com (mercury.spider.com [194.217.109.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id AAA17121 for <tcp-impl@relay.engr.sgi.com>; Tue, 11 Feb 1997 00:15:47 -0800
Received: from asimov.spider.com (asimov.spider.com [194.217.109.66]) by mercury.spider.com (8.8.3/8.6.12) with SMTP id IAA13276 for <tcp-impl@relay.engr.sgi.com>; Tue, 11 Feb 1997 08:12:00 GMT
Received: from malatesta. by asimov.spider.com (SMI-8.6/SMI-SVR4)
	id IAA03426; Tue, 11 Feb 1997 08:07:14 GMT
Received: by malatesta. (SMI-8.6/SMI-SVR4)
	id IAA14919; Tue, 11 Feb 1997 08:07:13 GMT
Date: Tue, 11 Feb 1997 08:07:13 GMT
From: ian@spider.com (Ian Heavens)
Message-Id: <199702110807.IAA14919@malatesta.>
X-Mailer: Mail User's Shell (7.2.6 beta(2) 2/29/96)
To: tcp-impl
Subject: HTTP  and RFC1122 half duplex close
Sender: owner-tcp-impl
Precedence: bulk


There was a discussion a while back on the end2end-interest list about
the behaviour of HTTP clients on PC systems, why they generate RSTs
and whether it is the application or the stack that does this.  I talked
to Josh Cohen at Netscape who enlightened me, so I post it here as it
has implications for the WG. 
 
It relates to user abort of web page download (more and more frequent as
people follow more interesting links - up to 40% of connections abort,
according to Vern Paxson's logs at LBL).  The web server blocks on a
write() if there is unacknowledged data in the pipe - there usually is
- and the client does a close.  RFC1122 says that as it is a half duplex
close and there is unread data, a RST should be sent.  And it is needed
to flush the data and unblock the write - there is a PC stack out there
that doesn't do this, and the server process hangs for ever.  To get
round this, Netscape (but not all web browsers) sets SO_LINGER to zero,
i.e. the application forces the RST.
 
This is a workaround for a TCP/IP stack problem.  I think there are two
conclusions here:
 
1.  It is important (VERY!) for TCP/IP stacks to follow RFC1122 half duplex
close, and there are some that don't.
 
2.  RFC1122 half duplex close is a MUST rather than a SHOULD.  The above
example implies that with current applications using a blocking I/O model,
unread data on close easily leads to deadlock unless the data is flushed.
 
ian


From owner-tcp-impl  Tue Feb 11 17:13:36 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA27587 for tcp-impl-list; Tue, 11 Feb 1997 17:12:57 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA27561 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 09:12:54 -0800
Received: from socks2.raleigh.ibm.com (socks2.raleigh.ibm.com [204.146.167.123]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id JAA25774 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 09:12:49 -0800
Received: from rtpdce03.raleigh.ibm.com by socks2.raleigh.ibm.com (AIX 4.1/UCB 5.64/RTP-FW1.0)
          id AA06814; Tue, 11 Feb 1997 12:08:20 -0500
Received: from ludwigia.raleigh.ibm.com (ludwigia.raleigh.ibm.com [9.37.83.125]) by rtpdce03.raleigh.ibm.com (8.7.3/8.7.3/RTP-ral-1.0) with SMTP id MAA76410; Tue, 11 Feb 1997 12:08:14 -0500
Received: from localhost.raleigh.ibm.com by ludwigia.raleigh.ibm.com (AIX 4.1/UCB 5.64/4.03-RAL)
          id AA20950; Tue, 11 Feb 1997 12:08:20 -0500
Message-Id: <9702111708.AA20950@ludwigia.raleigh.ibm.com>
To: ian@spider.com (Ian Heavens)
Cc: tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close 
In-Reply-To: Your message of "Tue, 11 Feb 1997 08:07:13 GMT."
             <199702110807.IAA14919@malatesta.> 
Date: Tue, 11 Feb 1997 12:08:20 -0400
From: Thomas Narten <narten@raleigh.ibm.com>
Sender: owner-tcp-impl
Precedence: bulk

I don't quite follow some of the details here:

> It relates to user abort of web page download (more and more frequent as
> people follow more interesting links - up to 40% of connections abort,
> according to Vern Paxson's logs at LBL).  The web server blocks on a
> write() if there is unacknowledged data in the pipe - there usually is
> - and the client does a close.

I don't quite follow the above. Do you really mean:

a) the server blocks on a close() if there is unacknowledged data in
the pipe (this would be independent of whether the client has closed
its end of the connection)?

b) the server blocks on a write() if a send window's worth of data is
unacknowledged.

c) Some TCP implementations actually block the caller on a write() if
there is *any* unacknowledged data in the pipe. (These stacks using
stop-and-wait must get really great throughput :-)

I don't see the problem with the first two behaviors. I'd argue that
the third case is a broken stack on the server, and there is little
the client can/should do.

At the same time, if the client has "closed" its end of the connection
and couldn't care less to receive any more data from the server, then
it should do a "half duplex" close as outlined in RFC 1122 which would
cause any subsequent traffic from the server to trigger a RST back to
the server.

> RFC1122 says that as it is a half duplex
> close and there is unread data, a RST should be sent.  And it is needed
> to flush the data and unblock the write - there is a PC stack out there
> that doesn't do this, and the server process hangs for ever.
                  ^^^^

What exactly does the "this" refer to? After the client has issued a
"half-duplex" close, it silently ignores any additional traffic from
the server (including retransmissions)? That would be majorly broken
(since the "blocked" server will eventually retransmit, which should
trigger the RST, preventing the server from hanging/deadlocking).

> To get round this, Netscape (but not all web browsers) sets
> SO_LINGER to zero, i.e. the application forces the RST.

But this would only send out a single RST, which might get lost. What
happens if the RST is lost and the server retransmits? Aren't we now
back in the same situation described in the first paragraph, which
presumably can lead to hung servers too, right?

Thomas

From owner-tcp-impl  Tue Feb 11 17:54:48 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA08603 for tcp-impl-list; Tue, 11 Feb 1997 17:54:16 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA08584 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 09:54:15 -0800
Received: from malatesta. (malatesta.spider.com [194.217.109.93]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA06705 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 09:54:12 -0800
Received: from malatesta by malatesta. (SMI-8.6/SMI-SVR4)
	id RAA23788; Tue, 11 Feb 1997 17:38:06 GMT
Message-ID: <3300AE7E.60F9@spider.com>
Date: Tue, 11 Feb 1997 17:38:06 +0000
From: Ian Heavens <ian@spider.com>
X-Mailer: Mozilla 3.01 (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: Thomas Narten <narten@raleigh.ibm.com>
CC: tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close
References: <9702111708.AA20950@ludwigia.raleigh.ibm.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

Thomas Narten wrote:
> 
> I don't quite follow some of the details here:
> 

My apologies for not being clear enough.  The scenario to which
I was referring  was a client TCP that does not support RFC1122
half duplex close with RST (it is a SHOULD, not a MUST).  This
causes the server process to hang if the client closes the connection
with traffic in the pipe.  Maybe it only happens for your second
scenario: if the server blocks on a write() when a send window of data
is unacknowledged;  if there is enough data to send then the send
window will go down to zero - presumably this ends up blocking the
write.  I don't know if the average web page contains enough data
to trigger flow control from the receiving application back to the
sender.

I think the sequence of events is as follows:

1.  server  is blocked on write()
2.  client closes normally and server ACKs (client in FIN-WAIT-2
and server in CLOSE-WAIT).  No RST because RFC1122 is not followed.
3.  server never closes

> > To get round this, Netscape (but not all web browsers) sets
> > SO_LINGER to zero, i.e. the application forces the RST.
> 
> But this would only send out a single RST, which might get lost. What
> happens if the RST is lost and the server retransmits? Aren't we now
> back in the same situation described in the first paragraph, which
> presumably can lead to hung servers too, right?
> 

But the client has closed, so any data received will be RST because
there is no socket, isn't that the case?

It's a small point maybe, but it's another bug that should be listed
(and a clarification of the TCP specification).

I forgot to mention that multiple concurrent TCP connections exacerbates
the situation, as might be expected.

regards

ian

From owner-tcp-impl  Tue Feb 11 18:00:05 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA09722 for tcp-impl-list; Tue, 11 Feb 1997 17:59:40 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA09705 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 09:59:39 -0800
Received: from socks2.raleigh.ibm.com (socks2.raleigh.ibm.com [204.146.167.123]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id JAA08084 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 09:59:37 -0800
Received: from rtpdce01.raleigh.ibm.com by socks2.raleigh.ibm.com (AIX 4.1/UCB 5.64/RTP-FW1.0)
          id AA24412; Tue, 11 Feb 1997 12:55:34 -0500
Received: from ludwigia.raleigh.ibm.com (ludwigia.raleigh.ibm.com [9.37.83.125]) by rtpdce01.raleigh.ibm.com (8.7.3/8.7.3/RTP-ral-1.0) with SMTP id MAA82116; Tue, 11 Feb 1997 12:55:33 -0500
Received: from localhost.raleigh.ibm.com by ludwigia.raleigh.ibm.com (AIX 4.1/UCB 5.64/4.03-RAL)
          id AA16096; Tue, 11 Feb 1997 12:55:40 -0500
Message-Id: <9702111755.AA16096@ludwigia.raleigh.ibm.com>
To: Ian Heavens <ian@spider.com>
Cc: tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close 
In-Reply-To: Your message of "Tue, 11 Feb 1997 17:38:06 GMT."
             <3300AE7E.60F9@spider.com> 
Date: Tue, 11 Feb 1997 12:55:40 -0400
From: Thomas Narten <narten@raleigh.ibm.com>
Sender: owner-tcp-impl
Precedence: bulk

> My apologies for not being clear enough.  The scenario to which
> I was referring  was a client TCP that does not support RFC1122
> half duplex close with RST (it is a SHOULD, not a MUST).

Minor nit, but text on page 88 says "MAY", not "SHOULD":

            A host MAY implement a "half-duplex" TCP close sequence, so
            that an application that has called CLOSE cannot continue to
            read data from the connection.  If such a host issues a
            CLOSE call while received data is still pending in TCP, or
            if new data is received after CLOSE is called, its TCP
            SHOULD send a RST to show that data was lost.

> This causes the server process to hang if the client closes the
> connection with traffic in the pipe.

I still don't understand this. Under no circumstances should a server
"hang" and stop (re)transmitting TCP segments if it has data it wants
to send or there is unacknowledged data in the pipe. If the client is
advertising a zero window, the server is required to periodically
probe. If the send window is full, the server must retransmit when
ACKs don't come back. If there is unsent data and the send window is
not full, the server will send. So the server should never "hang" as a
result of not sending TCP segments unless it is broken.

If the server is sending segments, but the client is responding to
them incorrectly (e.g., by silently discarding them), then the client
is broken.

>  Maybe it only happens for your second
> scenario: if the server blocks on a write() when a send window of data
> is unacknowledged;  if there is enough data to send then the send
> window will go down to zero - presumably this ends up blocking the
> write.

But normal TCP retransmit timers should take care of this case. The
server shouldn't stop transmitting.

> I think the sequence of events is as follows:
> 1.  server  is blocked on write()

Just because the server (more specifically, the calling application)
is blocked on a write call doesn't mean that TCP can stop sending
packets (i.e., it still needs to retransmit lost packets in order to
advance the window of the send window is full).

> 2.  client closes normally and server ACKs (client in FIN-WAIT-2
> and server in CLOSE-WAIT).  No RST because RFC1122 is not followed.
> 3.  server never closes

Seems to me that the server is never closing because it is not
retransmitting unacknowledged data.

What am I missing?

Thomas

From owner-tcp-impl  Tue Feb 11 18:24:40 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA15906 for tcp-impl-list; Tue, 11 Feb 1997 18:22:19 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA15841 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 10:22:10 -0800
Received: from malatesta. (malatesta.spider.com [194.217.109.93]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA14674 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 10:22:06 -0800
Received: from malatesta by malatesta. (SMI-8.6/SMI-SVR4)
	id SAA23867; Tue, 11 Feb 1997 18:06:16 GMT
Message-ID: <3300B518.3C75@spider.com>
Date: Tue, 11 Feb 1997 18:06:16 +0000
From: Ian Heavens <ian@spider.com>
X-Mailer: Mozilla 3.01 (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: Thomas Narten <narten@raleigh.ibm.com>
CC: tcp-impl, josh@birdcage.mcom.com
Subject: Re: HTTP and RFC1122 half duplex close
References: <9702111755.AA16096@ludwigia.raleigh.ibm.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

Thomas Narten wrote:
> 
> > My apologies for not being clear enough.  The scenario to which
> > I was referring  was a client TCP that does not support RFC1122
> > half duplex close with RST (it is a SHOULD, not a MUST).
> 
> Minor nit, but text on page 88 says "MAY", not "SHOULD":
> 
>             A host MAY implement a "half-duplex" TCP close sequence, so
>             that an application that has called CLOSE cannot continue to
>             read data from the connection.  If such a host issues a
>             CLOSE call while received data is still pending in TCP, or
>             if new data is received after CLOSE is called, its TCP
>             SHOULD send a RST to show that data was lost.
> 

The half duplex close is a MAY, but if it is implemented then
the RST transmission is a SHOULD...I'm not suggesting that half
duplex close be mandatory, but that when it is used, RSTs are mandatory
if received data is still pending.

>  This causes the server process to hang if the client closes the
> > connection with traffic in the pipe.
> 
> I still don't understand this. Under no circumstances should a server
> "hang" and stop (re)transmitting TCP segments if it has data it wants
> to send or there is unacknowledged data in the pipe. If the client is
> advertising a zero window, the server is required to periodically
> probe. If the send window is full, the server must retransmit when
> ACKs don't come back. If there is unsent data and the send window is
> not full, the server will send. So the server should never "hang" as a
> result of not sending TCP segments unless it is broken.
> 

> If the server is sending segments, but the client is responding to
> them incorrectly (e.g., by silently discarding them), then the client
> is broken.
> 
> >  Maybe it only happens for your second
> > scenario: if the server blocks on a write() when a send window of data
> > is unacknowledged;  if there is enough data to send then the send
> > window will go down to zero - presumably this ends up blocking the
> > write.
> 
> But normal TCP retransmit timers should take care of this case. The
> server shouldn't stop transmitting.
> 
> > I think the sequence of events is as follows:
> > 1.  server  is blocked on write()
> 
> Just because the server (more specifically, the calling application)
> is blocked on a write call doesn't mean that TCP can stop sending
> packets (i.e., it still needs to retransmit lost packets in order to
> advance the window of the send window is full).
> 
> > 2.  client closes normally and server ACKs (client in FIN-WAIT-2
> > and server in CLOSE-WAIT).  No RST because RFC1122 is not followed.
> > 3.  server never closes
> 
> Seems to me that the server is never closing because it is not
> retransmitting unacknowledged data.
>
> What am I missing?
>

The application process hangs, not the server TCP. Maybe I'm 
getting confused too.  My guess is the server TCP is
continually probing at this point (and so the application process
hangs).  I'll try and get a tcpdump of this behaviour (Josh, do
you have one handy?) by modifying our TCP to not do RFC1122 RSTs.

ian

From owner-tcp-impl  Tue Feb 11 18:55:55 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA27348 for tcp-impl-list; Tue, 11 Feb 1997 18:55:03 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA27337 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 10:55:02 -0800
Received: from c3po.mcom.com (h-205-217-237-46.netscape.com [205.217.237.46]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA25785 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 10:54:54 -0800
Received: from birdcage.mcom.com (birdcage.mcom.com [205.217.250.80]) by c3po.mcom.com (8.7.5/8.7.3) with SMTP id KAA04913; Tue, 11 Feb 1997 10:44:59 -0800 (PST)
Received: by birdcage.mcom.com (SMI-8.6/SMI-SVR4)
	id KAA25296; Tue, 11 Feb 1997 10:44:56 -0800
From: josh@birdcage.mcom.com (Josh Cohen)
Message-Id: <199702111844.KAA25296@birdcage.mcom.com>
Subject: Re: HTTP and RFC1122 half duplex close
To: ian@spider.com (Ian Heavens)
Date: Tue, 11 Feb 1997 10:44:56 -0800 (PST)
Cc: narten@raleigh.ibm.com, tcp-impl
In-Reply-To: <3300B518.3C75@spider.com> from "Ian Heavens" at Feb 11, 97 06:06:16 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk


> >
> > What am I missing?
> >
see below...
> 
> The application process hangs, not the server TCP. Maybe I'm 
> getting confused too.  My guess is the server TCP is
> continually probing at this point (and so the application process
> hangs).  I'll try and get a tcpdump of this behaviour (Josh, do
> you have one handy?) by modifying our TCP to not do RFC1122 RSTs.
> 
I dont have the misbehaving one handy, Ill try to find it.

It seems that there is some confusion.. Let me take a stab at summarizing
the problem.

Lets have a look at a time when RFC1122 is comes into play:


Note that in trace 1, the client is the active closer.
You'll see an initial RST which is presuably from the SO_LINGER,
but then a series of like 5 or so RSTs.  I presume that these
are from the 'data in pipe' or rfc 1122 issue we talked about.

TRACE 1  'STOP' with RST
  1   0.00000     birdcage -> orac.early.com TCP D=80 S=59500 Syn Seq=1463588337 Len=0 Win=8760
  2   0.08457 orac.early.com -> birdcage     TCP D=59500 S=80 Syn Ack=1463588338 Seq=693680153 Len=0 Win=8760
  3   0.00009     birdcage -> orac.early.com TCP D=80 S=59500     Ack=693680154 Seq=1463588338 Len=0 Win=8760

 [..  snip , normal data ..]

 44   0.00009     birdcage -> orac.early.com TCP D=80 S=59500     Ack=693714559 Seq=1463588596 Len=0 Win=8760
 45   0.00972 orac.early.com -> birdcage     TCP D=59500 S=80     Ack=1463588596 Seq=693714559 Len=1460 Win=8760
 46   0.00368     birdcage -> orac.early.com TCP D=80 S=59500 Rst Seq=1463588596 Len=0 Win=8760
[ client sends a RST to abort.  The server is in the process of sending
  a lot of data, eg the HTTP response... This could be a FIN, too ]

 47   0.04554 orac.early.com -> birdcage     TCP D=59500 S=80     Ack=1463588596 Seq=693716019 Len=1460 Win=8760
 48   0.00011     birdcage -> orac.early.com TCP D=80 S=59500 Rst Seq=1463588596 Len=0 Win=0
 49   0.01098 orac.early.com -> birdcage     TCP D=59500 S=80     Ack=1463588596 Seq=693717479 Len=1460 Win=8760
 50   0.00008     birdcage -> orac.early.com TCP D=80 S=59500 Rst Seq=1463588596 Len=0 Win=0
 51   0.03070 orac.early.com -> birdcage     TCP D=59500 S=80     Ack=1463588596 Seq=693718939 Len=1460 Win=8760
 52   0.00013     birdcage -> orac.early.com TCP D=80 S=59500 Rst Seq=1463588596 Len=0 Win=0
 53   0.00500 orac.early.com -> birdcage     TCP D=59500 S=80     Ack=1463588596 Seq=693720399 Len=892 Win=8760
 54   0.00007     birdcage -> orac.early.com TCP D=80 S=59500 Rst Seq=1463588596 Len=0 Win=0
 55   0.01253 orac.early.com -> birdcage     TCP D=59500 S=80     Ack=1463588596 Seq=693721291 Len=1460 Win=8760
 56   0.00007     birdcage -> orac.early.com TCP D=80 S=59500 Rst Seq=1463588596 Len=0 Win=0

notice the series of RSTs.  This is for each packet that was 'on its way'
when the client aborted.
It is these RSTs which cause the server's write() to return with an 
error, ie connection reset by peer...
The problem that occurs with a bad stack is that the server side
application will hang forever without the RSTs.

Assuming we are in the middle of this connection before the first
RST was sent. ( Im demonstrating a bad stack scenario )

1S = step 1, Server side
1C = step 1, client side

1S. Server is in a write() sending data to the client

1C. User hits STOP or something, the application does a close().

2C. Client aborts and sends either a FIN or RST 
	(lets say a FIN to show the problem, and reasons for RST instead)

3C. The client application has destroyed the socket, and it detached
 	from the stack for that connection. 
	From the client stack side, the connection no longer exists

3S Server continues to be in a write
   Server Stack goes to CLOSE_WAIT ( having received the FIN )
   
4S Server will continue to write to the client, until it has exhausted
   its window. This is OK for a half-close.  

5S The window is exhausted, and the server will wait until the client
   opens the window. (window probes )

4C. The client should, but doesnt  send a RST upon receiving the
	 remaining data from the server or window probes.

Once this state is acheived, the application on the server side
will never return from the write(), and the stack will sit in 
CLOSE_WAIT forever.  The evil client side has 'gone away'.

By sending a RST instead of the FIN to abort the connection, we can
cause both the server side and client side stacks to tear the connection
down immediately, and avoid the half-close where it gets caught.

The most common manifestation of the problem on the server side is
that web server admins find that over time, their server threads or
processes become hung.

Unfortunately, the bad stack implementation is in widespread use,
and the current versions of it *still* do not fix the problem.

-- 
-----------------------------------------------------------------------------
Josh Cohen				        Netscape Communications Corp.
Netscape Fire Department	       
Server Engineering
josh@netscape.com                       http://home.netscape.com/people/josh/
-----------------------------------------------------------------------------

From owner-tcp-impl  Tue Feb 11 19:18:45 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA05556 for tcp-impl-list; Tue, 11 Feb 1997 19:17:59 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA05531 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 11:17:57 -0800
Received: from c3po.mcom.com (h-205-217-237-46.netscape.com [205.217.237.46]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA03590 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 11:17:54 -0800
Received: from birdcage.mcom.com (birdcage.mcom.com [205.217.250.80]) by c3po.mcom.com (8.7.5/8.7.3) with SMTP id LAA06705 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 11:07:59 -0800 (PST)
Received: by birdcage.mcom.com (SMI-8.6/SMI-SVR4)
	id LAA25771; Tue, 11 Feb 1997 11:07:56 -0800
From: josh@birdcage.mcom.com (Josh Cohen)
Message-Id: <199702111907.LAA25771@birdcage.mcom.com>
Subject: minor clarification
To: tcp-impl
Date: Tue, 11 Feb 1997 11:07:56 -0800 (PST)
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

Hi,
	Ive sent my subscribe request to the list, but havent
seen the response yet, so Im sending this 'blind'.

>> RFC1122 says that as it is a half duplex
>> close and there is unread data, a RST should be sent.  And it is needed
>> to flush the data and unblock the write - there is a PC stack out there
>> that doesn't do this, and the server process hangs for ever.
                  ^^^^

>What exactly does the "this" refer to? After the client has issued a
>"half-duplex" close, it silently ignores any additional traffic from

That is the problem.  The broken stack doesnt do RFC1122 style resets.
I dont like having to work around a broken stack, but it
*is* quite widely deployed.  Its quite popular at large 
companies in their intranet setups.


-- 
-----------------------------------------------------------------------------
Josh Cohen				        Netscape Communications Corp.
Netscape Fire Department	       
Server Engineering
josh@netscape.com                       http://home.netscape.com/people/josh/
-----------------------------------------------------------------------------

From owner-tcp-impl  Tue Feb 11 19:22:06 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA07345 for tcp-impl-list; Tue, 11 Feb 1997 19:21:40 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA07332 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 11:21:38 -0800
Received: from palrel3.hp.com (palrel3.hp.com [15.253.88.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA04966 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 11:21:37 -0800
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185]) by palrel3.hp.com with SMTP (8.7.5/8.7.3) id LAA07350 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 11:20:06 -0800 (PST)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA09345; Tue, 11 Feb 1997 11:12:47 -0800
Message-Id: <3300C4AE.B34@cup.hp.com>
Date: Tue, 11 Feb 1997 11:12:46 -0800
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: Josh Cohen <josh@birdcage.mcom.com>
Cc: Ian Heavens <ian@spider.com>, narten@raleigh.ibm.com, tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close
References: <199702111844.KAA25296@birdcage.mcom.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

> By sending a RST instead of the FIN to abort the connection, we can
> cause both the server side and client side stacks to tear the connection
> down immediately, and avoid the half-close where it gets caught.

All this does is (unreliably) kludge around a bug in the transport
stack. There is no guarantee that RST will ever make it to the server
(witness all the connections that get waylaid in FIN_WAIT_2...) And
besides, if the server has outstanding data, if nothing else that is
supposed to retransmit timeout, so it should not be really "hung" only
taking a somewhat longish time to figure things out.

> The most common manifestation of the problem on the server side is
> that web server admins find that over time, their server threads or
> processes become hung.

If the server's stack is not dropping the connection to RTX exhaustion,
that is a bug in the server stack. If it is indeed dropping the
connection, then the admins are simply being too impatient. We are
likely talking several minutes here.

> Unfortunately, the bad stack implementation is in widespread use,
> and the current versions of it *still* do not fix the problem.

So we should compound the problem by committing a second wrong? (liberal
use of abortive close)

rick jones

From owner-tcp-impl  Tue Feb 11 19:33:48 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA10187 for tcp-impl-list; Tue, 11 Feb 1997 19:33:08 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA10168 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 11:33:06 -0800
Received: from socks2.raleigh.ibm.com (socks2.raleigh.ibm.com [204.146.167.123]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id LAA08145 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 11:33:04 -0800
Received: from rtpdce01.raleigh.ibm.com by socks2.raleigh.ibm.com (AIX 4.1/UCB 5.64/RTP-FW1.0)
          id AA69464; Tue, 11 Feb 1997 14:28:39 -0500
Received: from ludwigia.raleigh.ibm.com (ludwigia.raleigh.ibm.com [9.37.83.125]) by rtpdce01.raleigh.ibm.com (8.7.3/8.7.3/RTP-ral-1.0) with SMTP id OAA49804; Tue, 11 Feb 1997 14:28:38 -0500
Received: from localhost.raleigh.ibm.com by ludwigia.raleigh.ibm.com (AIX 4.1/UCB 5.64/4.03-RAL)
          id AA15928; Tue, 11 Feb 1997 14:28:45 -0500
Message-Id: <9702111928.AA15928@ludwigia.raleigh.ibm.com>
To: josh@birdcage.mcom.com (Josh Cohen)
Cc: ian@spider.com (Ian Heavens), tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close 
In-Reply-To: Your message of "Tue, 11 Feb 1997 10:44:56 PST."
             <199702111844.KAA25296@birdcage.mcom.com> 
Date: Tue, 11 Feb 1997 14:28:45 -0400
From: Thomas Narten <narten@raleigh.ibm.com>
Sender: owner-tcp-impl
Precedence: bulk

> notice the series of RSTs.  This is for each packet that was 'on its way'
> when the client aborted.
> It is these RSTs which cause the server's write() to return with an 
> error, ie connection reset by peer...

I think we're all in agreement that the above is exactly what is
supposed to happen.

> The problem that occurs with a bad stack is that the server side
> application will hang forever without the RSTs.

What I haven't quite been able to get a handle on (until below) is
just who has the bad stack (client or server), and in what way it is
bad.

> 2C. Client aborts and sends either a FIN or RST 
> 	(lets say a FIN to show the problem, and reasons for RST instead)

> 3C. The client application has destroyed the socket, and it detached
>  	from the stack for that connection. 
> 	From the client stack side, the connection no longer exists

> 3S Server continues to be in a write
>    Server Stack goes to CLOSE_WAIT ( having received the FIN )
>    
> 4S Server will continue to write to the client, until it has exhausted
>    its window. This is OK for a half-close.  

> 5S The window is exhausted, and the server will wait until the client
>    opens the window. (window probes )

To be perfectly precise, the server won't "wait"; it should send
periodic probes. So in this case the server is still sending TCP
segments.  If it really is waiting for the client to send it
something, the server stack is broken.

> 4C. The client should, but doesnt  send a RST upon receiving the
> 	 remaining data from the server or window probes.

The client is clearly broken, and in a major way that just doesn't
make sense to me. In step 3C, the client has destroyed the
connection. Yet in step 4C, when the client gets a TCP segment for
which it doesn't have a control block, it does nothing, as opposed to
sending a RST.

The client stack needs to be fixed; the fix is to follow the TCP spec,
which has nothing to do with whether it issued a half-duplex close or
not.  Just what stack has this property, i.e., doesn't send RSTs when
receiving a TCP segment for which it has no associated connection?
This is majorly broken.

> By [having the client] sending a RST instead of the FIN to abort the
> connection, we can cause both the server side and client side stacks
> to tear the connection down immediately, and avoid the half-close
> where it gets caught.

This is a hack that doesn't completely fix the problem. If the RST is
lost, you have the same problem as before. It seems to me that the
client stack needs to send RSTs like called for in the spec.

Thomas

From owner-tcp-impl  Tue Feb 11 19:54:05 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA14338 for tcp-impl-list; Tue, 11 Feb 1997 19:53:31 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA14307 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 11:53:26 -0800
Received: from malatesta. (malatesta.spider.com [194.217.109.93]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA13035 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 11:53:23 -0800
Received: from malatesta by malatesta. (SMI-8.6/SMI-SVR4)
	id TAA24196; Tue, 11 Feb 1997 19:29:57 GMT
Message-ID: <3300C8B5.5407@spider.com>
Date: Tue, 11 Feb 1997 19:29:57 +0000
From: Ian Heavens <ian@spider.com>
X-Mailer: Mozilla 3.01 (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: Rick Jones <raj@hpisrdq.cup.hp.com>
CC: Josh Cohen <josh@birdcage.mcom.com>, narten@raleigh.ibm.com, tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close
References: <199702111844.KAA25296@birdcage.mcom.com> <3300C4AE.B34@cup.hp.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

Rick Jones wrote:
> 
> > By sending a RST instead of the FIN to abort the connection, we can
> > cause both the server side and client side stacks to tear the connection
> > down immediately, and avoid the half-close where it gets caught.
> 
> All this does is (unreliably) kludge around a bug in the transport
> stack. There is no guarantee that RST will ever make it to the server
> (witness all the connections that get waylaid in FIN_WAIT_2...) And
> besides, if the server has outstanding data, if nothing else that is
> supposed to retransmit timeout, so it should not be really "hung" only
> taking a somewhat longish time to figure things out.

I'd like to see the network trace to see if the server TCP is
retransmitting or probing - in the latter case it should never
timeout (as RFC1122 mandates for probing zero windows)

> 
> > The most common manifestation of the problem on the server side is
> > that web server admins find that over time, their server threads or
> > processes become hung.
> 
> If the server's stack is not dropping the connection to RTX exhaustion,
> that is a bug in the server stack. If it is indeed dropping the
> connection, then the admins are simply being too impatient. We are
> likely talking several minutes here.
> 
> > Unfortunately, the bad stack implementation is in widespread use,
> > and the current versions of it *still* do not fix the problem.
> 
> So we should compound the problem by committing a second wrong? (liberal
> use of abortive close)
> 

Actually, I think that it doesn't make too much difference whether the
application or the stack gives rise to the RST - TIME-WAIT is avoided
in either case (but this is a protocol issue, not an implementation
issue: this is just a good example of where the RST mechanism is
needed - if the server is probing, or advisable - if the server is
retransmitting).

ian

From owner-tcp-impl  Tue Feb 11 20:11:23 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA17873 for tcp-impl-list; Tue, 11 Feb 1997 20:10:43 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA17843 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 12:10:41 -0800
Received: from malatesta. (malatesta.spider.com [194.217.109.93]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA17645 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 12:10:37 -0800
Received: from malatesta by malatesta. (SMI-8.6/SMI-SVR4)
	id TAA24207; Tue, 11 Feb 1997 19:54:30 GMT
Message-ID: <3300CE76.2917@spider.com>
Date: Tue, 11 Feb 1997 19:54:30 +0000
From: Ian Heavens <ian@spider.com>
X-Mailer: Mozilla 3.01 (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: Thomas Narten <narten@raleigh.ibm.com>
CC: Josh Cohen <josh@birdcage.mcom.com>, tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close
References: <9702111928.AA15928@ludwigia.raleigh.ibm.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

Thomas Narten wrote:

 > 2C. Client aborts and sends either a FIN or RST
> >       (lets say a FIN to show the problem, and reasons for RST instead)
> 
> > 3C. The client application has destroyed the socket, and it detached
> >       from the stack for that connection.
> >       From the client stack side, the connection no longer exists
> 
> > 3S Server continues to be in a write
> >    Server Stack goes to CLOSE_WAIT ( having received the FIN )
> >
> > 4S Server will continue to write to the client, until it has exhausted
> >    its window. This is OK for a half-close.
> 
> > 5S The window is exhausted, and the server will wait until the client
> >    opens the window. (window probes )
> 
> To be perfectly precise, the server won't "wait"; it should send
> periodic probes. So in this case the server is still sending TCP
> segments.  If it really is waiting for the client to send it
> something, the server stack is broken.

I think we are using confusing nomenclature here.  The server HTTPD
process waits but the server TCP/IP continues to probe, is that it?

> > 4C. The client should, but doesnt  send a RST upon receiving the
> >        remaining data from the server or window probes.
> 
> The client is clearly broken, and in a major way that just doesn't
> make sense to me. In step 3C, the client has destroyed the
> connection. Yet in step 4C, when the client gets a TCP segment for
> which it doesn't have a control block, it does nothing, as opposed to
> sending a RST.
> 

No, the client application has destroyed the _socket_; isn't the
connection still there, in FIN_WAIT_2, happily sending zero windows
in response to the server probes?

I would really like to see the network trace for this...

 The client stack needs to be fixed; the fix is to follow the TCP spec,
> which has nothing to do with whether it issued a half-duplex close or
> not.  Just what stack has this property, i.e., doesn't send RSTs when
> receiving a TCP segment for which it has no associated connection?
> This is majorly broken.

Indeed, I can't imagine anyone getting away with this...if you opened a 
connection to a port for which there was no listener, it would continue
trying until it timed out rather than returning with ECONNREFUSED.

ian

From owner-tcp-impl  Tue Feb 11 20:26:00 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA21256 for tcp-impl-list; Tue, 11 Feb 1997 20:25:21 GMT
Return-Path: <owner-tcp-impl>
Received: from odin.corp.sgi.com (odin.corp.sgi.com [192.26.51.194]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA21238 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 11 Feb 1997 12:25:19 -0800
Received: from sgi.sgi.com by odin.corp.sgi.com via ESMTP (951211.SGI.8.6.12.PATCH1502/951211.SGI)
	for <tcp-impl@relay.engr.SGI.COM> id MAA17503; Tue, 11 Feb 1997 12:19:43 -0800
Received: from c3po.mcom.com (h-205-217-237-46.netscape.com [205.217.237.46]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA20812 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 12:19:16 -0800
Received: from birdcage.mcom.com (birdcage.mcom.com [205.217.250.80]) by c3po.mcom.com (8.7.5/8.7.3) with SMTP id MAA09754; Tue, 11 Feb 1997 12:08:50 -0800 (PST)
Received: by birdcage.mcom.com (SMI-8.6/SMI-SVR4)
	id MAA26403; Tue, 11 Feb 1997 12:08:48 -0800
From: josh@birdcage.mcom.com (Josh Cohen)
Message-Id: <199702112008.MAA26403@birdcage.mcom.com>
Subject: Re: HTTP and RFC1122 half duplex close
To: raj@hpisrdq.cup.hp.com (Rick Jones)
Date: Tue, 11 Feb 1997 12:08:48 -0800 (PST)
Cc: ian@spider.com, narten@raleigh.ibm.com, tcp-impl
In-Reply-To: <3300C4AE.B34@cup.hp.com> from "Rick Jones" at Feb 11, 97 11:12:46 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

> 
> > By sending a RST instead of the FIN to abort the connection, we can
> > cause both the server side and client side stacks to tear the connection
> > down immediately, and avoid the half-close where it gets caught.
> 
> All this does is (unreliably) kludge around a bug in the transport
> stack. There is no guarantee that RST will ever make it to the server
> (witness all the connections that get waylaid in FIN_WAIT_2...) And
Yes, but its a much greater chance than zero, which is what happens now.
> 
> > The most common manifestation of the problem on the server side is
> > that web server admins find that over time, their server threads or
> > processes become hung.
> 
> If the server's stack is not dropping the connection to RTX exhaustion,
> that is a bug in the server stack. If it is indeed dropping the
No.  It the client window is full, the window probes will continue
forever. 
I think this statement is true:
There will be no retransmit.  There cant be a retransmit if the window
is closed.

In Stevens, TCP/IP Illustrated, page 325:
"The characteristic of the persist state that is different from the 
retransmission timout in Chapter 21 is that TCP 'never' gives up sending
window probes.  These window probes will continue to be sent at 60 sec 
intervals until the window opens up or either of the applications
using the connection is terminated."

I need to check if the bad stack is responding, at all, to the
probes, and to find out:
what happens to TCP if the probes are simply unacknowledged?

Ill get back on these issues.

> connection, then the admins are simply being too impatient. We are
> likely talking several minutes here.
No, it appears to be forever.  
> 
> So we should compound the problem by committing a second wrong? (liberal
> use of abortive close)
Well, this is a tough issue.  if a connection is to be aborted, the
only way to have the server discard data it wants to send, the only
way appears to be via RST.  Yes, I know, this is the intent of the 
half-close.  Unfortunately in HTTP, 'aborted connections' are so common.
On top of which, HTTP seems to thwart almost every optimization in TCP
altogether.

An idea I have bounced around is to suggest prohibiting the half-close in HTTP.
By this I mean that before every write, you must check the read 
status of the socket for EOF, and abort if so. 
( makes half-close a full close )
Dont flame me on that, Im still looking into what kind of sense it
makes, if any.

-- 
-----------------------------------------------------------------------------
Josh Cohen				        Netscape Communications Corp.
Netscape Fire Department	       
Server Engineering
josh@netscape.com                       http://home.netscape.com/people/josh/
-----------------------------------------------------------------------------

From owner-tcp-impl  Tue Feb 11 20:36:52 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA23878 for tcp-impl-list; Tue, 11 Feb 1997 20:36:16 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA23870 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 12:36:13 -0800
Received: from socks2.raleigh.ibm.com (socks2.raleigh.ibm.com [204.146.167.123]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA24667 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 12:36:11 -0800
Received: from rtpdce01.raleigh.ibm.com by socks2.raleigh.ibm.com (AIX 4.1/UCB 5.64/RTP-FW1.0)
          id AA18316; Tue, 11 Feb 1997 15:31:21 -0500
Received: from ludwigia.raleigh.ibm.com (ludwigia.raleigh.ibm.com [9.37.83.125]) by rtpdce01.raleigh.ibm.com (8.7.3/8.7.3/RTP-ral-1.0) with SMTP id PAA82038; Tue, 11 Feb 1997 15:31:20 -0500
Received: from localhost.raleigh.ibm.com by ludwigia.raleigh.ibm.com (AIX 4.1/UCB 5.64/4.03-RAL)
          id AA22266; Tue, 11 Feb 1997 15:31:27 -0500
Message-Id: <9702112031.AA22266@ludwigia.raleigh.ibm.com>
To: Ian Heavens <ian@spider.com>
Cc: Josh Cohen <josh@birdcage.mcom.com>, tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close 
In-Reply-To: Your message of "Tue, 11 Feb 1997 19:54:30 GMT."
             <3300CE76.2917@spider.com> 
Date: Tue, 11 Feb 1997 15:31:27 -0400
From: Thomas Narten <narten@raleigh.ibm.com>
Sender: owner-tcp-impl
Precedence: bulk

> > To be perfectly precise, the server won't "wait"; it should send
> > periodic probes. So in this case the server is still sending TCP
> > segments.  If it really is waiting for the client to send it
> > something, the server stack is broken.

> I think we are using confusing nomenclature here.  The server HTTPD
> process waits but the server TCP/IP continues to probe, is that it?

Yes (on both points!).

> No, the client application has destroyed the _socket_; isn't the
> connection still there, in FIN_WAIT_2, happily sending zero windows
> in response to the server probes?

I think now we need to look carefully at what it means to "destroy" a
socket. Destroying a socket is more than just sending a FIN or
RST. Two cases:

1) If the client wants to close the socket gracefully, the client TCP
sends a FIN and waits for the server's FIN. This is the normal
FIN_WAIT_1 > FIN_WAIT_2 > TIME_WAIT transition. This will not hang the
server and the client is required to hang around reading data until
the server is done.

2) If the client wants to close the socket immediately, and it doesn't
care to receive any more data from the peer, it might as well just
delete the control block. Any subsequent packets from the peer would
solicit a RST. (Actually, a new state is probably called for, since
you want RSTs to be generated on old packets, but you want to accept
new connection requests provided that the sequence numbers of the new
connection are high enough -- but there are some subtleties here).

It doesn't make sense to have the client send out a reset, and yet
still be in FIN_WAIT_2 state. If it sends a reset, the server is
supposed to delete its end and stop sending anything else. What will
take the client out of the FIN_WAIT_2 state (normally, it is a FIN
from the server!)?

If the client application is no longer associated with the socket,
then data delivered to that connection can't be delivered to an
application. It makes no sense for the client TCP to be advertising a
window of 0 in that case. If it is, I'd say that stack is
broken. Furthermore, it seems to me that the client should send back a
RST in this case --- the data can't be delivered to the application,
which means the sending TCP needs to be notified of a delivery
failure.

> I would really like to see the network trace for this...

That would be quite useful indeed!

Also, going back a few messages:

> The half duplex close is a MAY, but if it is implemented then
> the RST transmission is a SHOULD.

OK. I misunderstood that.

> ..I'm not suggesting that half duplex close be mandatory, but that
> when it is used, RSTs are mandatory if received data is still
> pending.

I really don't see this as a big issue. I'd venture that in the vast
majority of cases, at the exect time the client half-duplex closes a
connection, there will be no received data queued by TCP (at the
receiver), so this scenario won't happen frequently in practice. What
is absolutely critical, however, is that any subsequent TCP packets
that arrive for that connection cause a RST to be generated.

Thomas

From owner-tcp-impl  Tue Feb 11 20:48:05 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA26035 for tcp-impl-list; Tue, 11 Feb 1997 20:46:32 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA25916 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 12:46:14 -0800
Received: from kalae.kohala.com (kalae.kohala.com [206.62.226.35]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA26807 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 12:46:01 -0800
Received: from kohala.kohala.com (kohala.kohala.com [206.62.226.33]) by kalae.kohala.com (8.8.5/8.7.3) with ESMTP id NAA17271; Tue, 11 Feb 1997 13:42:10 -0700 (MST)
Received: (from rstevens@localhost) by kohala.kohala.com (8.8.5/8.8.3) id NAA02648; Tue, 11 Feb 1997 13:41:34 -0700 (MST)
Message-Id: <199702112041.NAA02648@kohala.kohala.com>
From: rstevens@kohala.com (W. Richard Stevens)
Date: Tue, 11 Feb 1997 13:41:34 -0700
Reply-To: "W. Richard Stevens" <rstevens@kohala.com>
X-Phone: +1 520 297 9416
X-Homepage: http://www.noao.edu/~rstevens
X-Mailer: Mail User's Shell (7.2.6 beta(3) 11/17/96)
To: josh@birdcage.mcom.com (Josh Cohen), raj@hpisrdq.cup.hp.com (Rick Jones)
Subject: Re: HTTP and RFC1122 half duplex close
Cc: ian@spider.com, narten@raleigh.ibm.com, tcp-impl
Sender: owner-tcp-impl
Precedence: bulk

[In your message of Feb 11, 12:08pm you write:]
> 
> In Stevens, TCP/IP Illustrated, page 325:
> "The characteristic of the persist state that is different from the 
> retransmission timout in Chapter 21 is that TCP 'never' gives up sending
> window probes.  These window probes will continue to be sent at 60 sec 
> intervals until the window opens up or either of the applications
> using the connection is terminated."

Section 14.9 of Volume 3 talks about this more, and shows the 7-line
fix for a Berkeley stack (from Lite2, 1995) that stops these persist
probes if no response is ever heard from the peer.  RFC 1122 requires
that you keep sending the probes only if the peer is sending ACKs of
the probes.

I would certainly hope that any serious Web server today is running a
stack that times out persist probes.

	Rich Stevens

From owner-tcp-impl  Tue Feb 11 21:00:30 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA29169 for tcp-impl-list; Tue, 11 Feb 1997 20:59:48 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA29156 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 12:59:46 -0800
Received: from kalae.kohala.com (kalae.kohala.com [206.62.226.35]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA00027 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 12:59:39 -0800
Received: from kohala.kohala.com (kohala.kohala.com [206.62.226.33]) by kalae.kohala.com (8.8.5/8.7.3) with ESMTP id NAA17294; Tue, 11 Feb 1997 13:56:03 -0700 (MST)
Received: (from rstevens@localhost) by kohala.kohala.com (8.8.5/8.8.3) id NAA02705; Tue, 11 Feb 1997 13:56:03 -0700 (MST)
Message-Id: <199702112056.NAA02705@kohala.kohala.com>
From: rstevens@kohala.com (W. Richard Stevens)
Date: Tue, 11 Feb 1997 13:56:03 -0700
Reply-To: "W. Richard Stevens" <rstevens@kohala.com>
X-Phone: +1 520 297 9416
X-Homepage: http://www.noao.edu/~rstevens
X-Mailer: Mail User's Shell (7.2.6 beta(3) 11/17/96)
To: Thomas Narten <narten@raleigh.ibm.com>, Ian Heavens <ian@spider.com>
Subject: Re: HTTP and RFC1122 half duplex close
Cc: Josh Cohen <josh@birdcage.mcom.com>, tcp-impl
Sender: owner-tcp-impl
Precedence: bulk

[In your message of Feb 11,  3:31pm you write:]
> 
> I'd venture that in the vast
> majority of cases, at the exact time the client half-duplex closes a
> connection, there will be no received data queued by TCP (at the
> receiver), so this scenario won't happen frequently in practice.

*Nothing* is infrequent on the Web, as we have all learned, finding
all the latent bugs in most TCP/IP implementations that have just
never been tickled before.  0.01% is a big number when you're dealing
with millions of connections per day.

> What
> is absolutely critical, however, is that any subsequent TCP packets
> that arrive for that connection cause a RST to be generated.

Absolutely.  And if stacks don't do this today, it's time to
name-that-vendor.

	Rich Stevens

From owner-tcp-impl  Tue Feb 11 21:00:38 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA29371 for tcp-impl-list; Tue, 11 Feb 1997 21:00:11 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA29353 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 13:00:10 -0800
Received: from socks2.raleigh.ibm.com (socks2.raleigh.ibm.com [204.146.167.123]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id NAA00115 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 13:00:06 -0800
Received: from rtpdce01.raleigh.ibm.com by socks2.raleigh.ibm.com (AIX 4.1/UCB 5.64/RTP-FW1.0)
          id AA36354; Tue, 11 Feb 1997 15:53:16 -0500
Received: from ludwigia.raleigh.ibm.com (ludwigia.raleigh.ibm.com [9.37.83.125]) by rtpdce01.raleigh.ibm.com (8.7.3/8.7.3/RTP-ral-1.0) with SMTP id PAA70180; Tue, 11 Feb 1997 15:53:15 -0500
Received: from localhost.raleigh.ibm.com by ludwigia.raleigh.ibm.com (AIX 4.1/UCB 5.64/4.03-RAL)
          id AA11910; Tue, 11 Feb 1997 15:53:21 -0500
Message-Id: <9702112053.AA11910@ludwigia.raleigh.ibm.com>
To: josh@birdcage.mcom.com (Josh Cohen)
Cc: raj@hpisrdq.cup.hp.com (Rick Jones), ian@spider.com, tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close 
In-Reply-To: Your message of "Tue, 11 Feb 1997 12:08:48 PST."
             <199702112008.MAA26403@birdcage.mcom.com> 
Date: Tue, 11 Feb 1997 15:53:21 -0400
From: Thomas Narten <narten@raleigh.ibm.com>
Sender: owner-tcp-impl
Precedence: bulk

josh@birdcage.mcom.com (Josh Cohen) writes:

> > All this does is (unreliably) kludge around a bug in the transport
> > stack. There is no guarantee that RST will ever make it to the server
> > (witness all the connections that get waylaid in FIN_WAIT_2...) And

> Yes, but its a much greater chance than zero, which is what happens
> now.

If you need to modify the client, might as well modify it correctly.
No sense in adding a kludge to get around a bug in your own stack. :-)

> > If the server's stack is not dropping the connection to RTX exhaustion,
> > that is a bug in the server stack. If it is indeed dropping the

> No.  It the client window is full, the window probes will continue
> forever.

Ah, but if the server sends a TCP segment to the client, the client
should send a RST. If the client is returning an ACK with a window of
0, that is clearly not right (if it is the case that the client
application has "destroyed" the socket and is no longer reading data).

> I think this statement is true:
> There will be no retransmit.  There cant be a retransmit if the window
> is closed.

I don't think the above has meaning. Whether the server TCP is
retransmitting a data segment, or sending a probe, the effect should
be the same. The client either returns a RST, or an ACK saying what
the current window size is. The apparent bug is (which needs to be
verified via a packet trace) that the client is not sending back the
correct response.

> In Stevens, TCP/IP Illustrated, page 325:
> "The characteristic of the persist state that is different from the 
> retransmission timout in Chapter 21 is that TCP 'never' gives up sending
> window probes.

Unless, of course, the peer returns a RST. Also, this assumes that the
peer continues to return ACKs saying the window is still 0. 

> I need to check if the bad stack is responding, at all, to the
> probes, and to find out: what happens to TCP if the probes are
> simply unacknowledged?

If no ACK are returned, the sender should timeout the connection the
same way as it does when retransmitting data.

> Well, this is a tough issue.  if a connection is to be aborted, the
> only way to have the server discard data it wants to send, the only
> way appears to be via RST.

I think that this actually has a happy resolution. We probably all
agree that RST needs to be sent. We're just disagreeing as to the
precise conditions. I'm arguing that the TCP spec already handles this
case, and the client stack is (currently) doing the wrong thing. You
are suggesting a modification to the client stack to kludge around the
the symptoms of the problem.

Thomas

From owner-tcp-impl  Tue Feb 11 21:04:39 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA00194 for tcp-impl-list; Tue, 11 Feb 1997 21:04:15 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA00189 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 13:04:13 -0800
Received: from c3po.mcom.com (h-205-217-237-46.netscape.com [205.217.237.46]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA01145 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 13:04:12 -0800
Received: from birdcage.mcom.com (birdcage.mcom.com [205.217.250.80]) by c3po.mcom.com (8.7.5/8.7.3) with SMTP id MAA11948; Tue, 11 Feb 1997 12:54:17 -0800 (PST)
Received: by birdcage.mcom.com (SMI-8.6/SMI-SVR4)
	id MAA26884; Tue, 11 Feb 1997 12:54:15 -0800
From: josh@birdcage.mcom.com (Josh Cohen)
Message-Id: <199702112054.MAA26884@birdcage.mcom.com>
Subject: Re: HTTP and RFC1122 half duplex close
To: narten@raleigh.ibm.com (Thomas Narten)
Date: Tue, 11 Feb 1997 12:54:15 -0800 (PST)
Cc: ian@spider.com, tcp-impl
In-Reply-To: <9702112031.AA22266@ludwigia.raleigh.ibm.com> from "Thomas Narten" at Feb 11, 97 03:31:27 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

> It doesn't make sense to have the client send out a reset, and yet
> still be in FIN_WAIT_2 state. If it sends a reset, the server is
> supposed to delete its end and stop sending anything else. What will
> take the client out of the FIN_WAIT_2 state (normally, it is a FIN
> from the server!)?
We're talking about two different things here.
The problem occurs when the client send a FIN to abort a connection
ie, user hits 'stop'.  In this case we go to FIN_WAIT_2 ( as well as
the other prior states ). 

The workaround has been that the web client forces the stack to send
a RST, avoiding the later states, and the hang problem.
> 
> I really don't see this as a big issue. I'd venture that in the vast
> majority of cases, at the exect time the client half-duplex closes a
> connection, there will be no received data queued by TCP (at the
> receiver), so this scenario won't happen frequently in practice. What
> is absolutely critical, however, is that any subsequent TCP packets
> that arrive for that connection cause a RST to be generated.
Actually, its a trememdously big issue, and it happens more often than
many peopl think, in terms of the web.
( yes yes, HTTP is evil for TCP )
With HTTP virtually *any* time the client is the active closer,
it is in an interrupt case.  The server almost *always* has 
data to send afterwards.

Think about it:

case 1:
	This page is taking too long, I hit stop.  Sooner or later
the server is going to get around to sending the requested page.
It will probably send a series of packets before it receives
my clients first RST. 
(the client will end up sending RSTs for a whole bunch of packets,
more on slower links where there is more data 'in the pipe', and 
guess what, there are *plenty* of people on slow links and dialup lines)

case 2:
	Im 'joe surfer'  I dont even let the page finish loading
before I make my next click.  This is virtually the same as hitting
stop, except that now the client goes on to load the next page.
If I do a series of these, I have aborted many connections.
Eventually Ill get to a page I want, then Ill read it, which
takes a long time relatively speaking.

So for 20 connections, 5 links traveled, 4 gifs on each, I
aborted 16 out of 20 connections.  Each aborted connection
was a case where the server did have data to send.

In both cases, it isnt just one connection, but multple concurrent
connections for each gif on the page, up to the browsers max 
concurrent limit.



-- 
-----------------------------------------------------------------------------
Josh Cohen				        Netscape Communications Corp.
Netscape Fire Department	       
Server Engineering
josh@netscape.com                       http://home.netscape.com/people/josh/
-----------------------------------------------------------------------------

From owner-tcp-impl  Tue Feb 11 21:06:30 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA00536 for tcp-impl-list; Tue, 11 Feb 1997 21:06:04 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA00529 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 13:06:01 -0800
Received: from c3po.mcom.com (h-205-217-237-46.netscape.com [205.217.237.46]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA01533 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 13:06:01 -0800
Received: from birdcage.mcom.com (birdcage.mcom.com [205.217.250.80]) by c3po.mcom.com (8.7.5/8.7.3) with SMTP id MAA12040; Tue, 11 Feb 1997 12:56:06 -0800 (PST)
Received: by birdcage.mcom.com (SMI-8.6/SMI-SVR4)
	id MAA26914; Tue, 11 Feb 1997 12:56:04 -0800
From: josh@birdcage.mcom.com (Josh Cohen)
Message-Id: <199702112056.MAA26914@birdcage.mcom.com>
Subject: Re: HTTP and RFC1122 half duplex close
To: rstevens@kohala.com
Date: Tue, 11 Feb 1997 12:56:04 -0800 (PST)
Cc: raj@hpisrdq.cup.hp.com, ian@spider.com, narten@raleigh.ibm.com, tcp-impl
In-Reply-To: <199702112041.NAA02648@kohala.kohala.com> from "W. Richard Stevens" at Feb 11, 97 01:41:34 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

> 
> [In your message of Feb 11, 12:08pm you write:]
> > 
> > In Stevens, TCP/IP Illustrated, page 325:
> > "The characteristic of the persist state that is different from the 
> > retransmission timout in Chapter 21 is that TCP 'never' gives up sending
> > window probes.  These window probes will continue to be sent at 60 sec 
> > intervals until the window opens up or either of the applications
> > using the connection is terminated."
> 
> Section 14.9 of Volume 3 talks about this more, and shows the 7-line
> fix for a Berkeley stack (from Lite2, 1995) that stops these persist
> probes if no response is ever heard from the peer.  RFC 1122 requires
> that you keep sending the probes only if the peer is sending ACKs of
> the probes.
Ill look at that.

> 
> I would certainly hope that any serious Web server today is running a
> stack that times out persist probes.
Is that common in most stacks today ?


-- 
-----------------------------------------------------------------------------
Josh Cohen				        Netscape Communications Corp.
Netscape Fire Department	       
Server Engineering
josh@netscape.com                       http://home.netscape.com/people/josh/
-----------------------------------------------------------------------------

From owner-tcp-impl  Tue Feb 11 21:36:14 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA06266 for tcp-impl-list; Tue, 11 Feb 1997 21:35:38 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA06231 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 13:35:32 -0800
Received: from socks2.raleigh.ibm.com (socks2.raleigh.ibm.com [204.146.167.123]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id NAA08296 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 13:35:25 -0800
Received: from rtpdce01.raleigh.ibm.com by socks2.raleigh.ibm.com (AIX 4.1/UCB 5.64/RTP-FW1.0)
          id AA29738; Tue, 11 Feb 1997 16:31:09 -0500
Received: from ludwigia.raleigh.ibm.com (ludwigia.raleigh.ibm.com [9.37.83.125]) by rtpdce01.raleigh.ibm.com (8.7.3/8.7.3/RTP-ral-1.0) with SMTP id QAA41374; Tue, 11 Feb 1997 16:31:07 -0500
Received: from localhost.raleigh.ibm.com by ludwigia.raleigh.ibm.com (AIX 4.1/UCB 5.64/4.03-RAL)
          id AA12732; Tue, 11 Feb 1997 16:31:14 -0500
Message-Id: <9702112131.AA12732@ludwigia.raleigh.ibm.com>
To: josh@birdcage.mcom.com (Josh Cohen)
Cc: ian@spider.com, tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close 
In-Reply-To: Your message of "Tue, 11 Feb 1997 12:54:15 PST."
             <199702112054.MAA26884@birdcage.mcom.com> 
Date: Tue, 11 Feb 1997 16:31:14 -0400
From: Thomas Narten <narten@raleigh.ibm.com>
Sender: owner-tcp-impl
Precedence: bulk

josh@birdcage.mcom.com (Josh Cohen) writes:

> > It doesn't make sense to have the client send out a reset, and yet
> > still be in FIN_WAIT_2 state. If it sends a reset, the server is
> > supposed to delete its end and stop sending anything else. What will
> > take the client out of the FIN_WAIT_2 state (normally, it is a FIN
> > from the server!)?

> We're talking about two different things here.

OK. I think I'm getting somewhat misled by terminology and
nomenclature.

> The problem occurs when the client send a FIN to abort a connection
> ie, user hits 'stop'.  In this case we go to FIN_WAIT_2 ( as well as
> the other prior states ).

I don't think it's quite correct to say "abort a connection" and at
the same time send a FIN and go through the normal closing
procedure. If you are going through the normal connection closing
steps, the application (I would assume) remains "attached" to the
socket, and the problematic scenario shouldn't arise. Right?

I've been assuming that "abort a connection" means sending a RST,
i.e., the case where the application "destroys" the socket.

> > 
> > I really don't see this as a big issue. I'd venture that in the vast
> > majority of cases, at the exect time the client half-duplex closes a
> > connection, there will be no received data queued by TCP (at the
> > receiver), so this scenario won't happen frequently in practice. What
> > is absolutely critical, however, is that any subsequent TCP packets
> > that arrive for that connection cause a RST to be generated.
>
> Actually, its a trememdously big issue, and it happens more often than
> many peopl think, in terms of the web.
>

I think my point didn't come through clearly (though Rich is correct
to point out the my assumption is somewhat flawed). I think the issue
of whether or not to send a RST if the connection is half-duplex
closed AND there is queued data in the receive queue is not the main
issue, with respect to keeping servers from hanging. If the server has
unacknowledged data (including possibly a FIN), it will retransmit, at
which point the reset will be generated.  So the server shouldn't hang
forever, which is what started this long thread.

I agree that sending a RST may reduce the total number of packets
that ultimately get sent, which is of course a good thing.

Thomas

From owner-tcp-impl  Tue Feb 11 21:43:42 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA07748 for tcp-impl-list; Tue, 11 Feb 1997 21:43:17 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA07741 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 13:43:15 -0800
Received: from palrel1.hp.com (palrel1.hp.com [15.253.72.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA10405 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 13:43:14 -0800
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185]) by palrel1.hp.com with SMTP (8.7.5/8.7.3) id NAA17027 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 13:41:24 -0800 (PST)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA09667; Tue, 11 Feb 1997 13:34:02 -0800
Message-Id: <3300E5CA.6578@cup.hp.com>
Date: Tue, 11 Feb 1997 13:34:02 -0800
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: Josh Cohen <josh@birdcage.mcom.com>
Cc: ian@spider.com, narten@raleigh.ibm.com, tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close
References: <199702112008.MAA26403@birdcage.mcom.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

> > > The most common manifestation of the problem on the server side is
> > > that web server admins find that over time, their server threads or
> > > processes become hung.
> >
> > If the server's stack is not dropping the connection to RTX exhaustion,
> > that is a bug in the server stack. If it is indeed dropping the
> No.  It the client window is full, the window probes will continue
> forever.
> I think this statement is true:
> There will be no retransmit.  There cant be a retransmit if the window
> is closed.

Is that correct? A retransmit is not trying to go beyond the existing
window, only resending data that is within the current window.

Now, if the client has ACKed the data, but not updated the window, I
could see a situation where there would be no data segments
retransmitted. Maybe I missed that in the initial description.

rick jones

From owner-tcp-impl  Tue Feb 11 21:57:47 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA11054 for tcp-impl-list; Tue, 11 Feb 1997 21:57:11 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA11030 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 13:57:08 -0800
Received: from palrel1.hp.com (palrel1.hp.com [15.253.72.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA13743 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 13:57:06 -0800
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185]) by palrel1.hp.com with SMTP (8.7.5/8.7.3) id NAA18700 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 13:56:56 -0800 (PST)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA09672; Tue, 11 Feb 1997 13:49:41 -0800
Message-Id: <3300E974.5D99@cup.hp.com>
Date: Tue, 11 Feb 1997 13:49:40 -0800
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: Thomas Narten <narten@raleigh.ibm.com>
Cc: Josh Cohen <josh@birdcage.mcom.com>, ian@spider.com, tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close
References: <9702112131.AA12732@ludwigia.raleigh.ibm.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

So if I've understood the discussion correctly (and I probably haven't
:) when a user moves-on to another link, or decides to hit the stop
button, the browser decides to initiate termination of the connection(s)
(presumeably via close()) in a way that says he no longer wishes to
receive data - the "socket" is evaporated and presumeably any subsequent
data from the web server system will be greeted (in a properly
functioning stack) with an RST.

Would most if not all of this be avoided if the browser simply called
shutdown("1") ("I'm not sending any data") instead and then sank any
data from the server to "/dev/null" until it got an EOF from the server
executing a shutdown()? And then, and only then, actually call close on
the descriptor?

Seems this would correct the problem with windows not getting updated,
take us beyond the point where we care (in this instance) if the client
TCP stack sent RSTs for non-existant connections, and best of all, leave
the *client* with the TIME_WAIT state instead of a server with a
FIN_WAIT_2 or some other connection state that has to be timed-out after
several score minutes!-)

rick jones

From owner-tcp-impl  Tue Feb 11 22:28:45 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA18963 for tcp-impl-list; Tue, 11 Feb 1997 22:28:03 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA18928 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 14:28:00 -0800
Received: from border.com ([199.71.190.98]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA21550 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 14:27:42 -0800
Received: by janus.border.com id <30787-2>; Tue, 11 Feb 1997 17:20:57 -0500
Message-Id: <97Feb11.172057est.30787-2@janus.border.com>
To: Thomas Narten <narten@raleigh.ibm.com>
cc: josh@birdcage.mcom.com (Josh Cohen), ian@spider.com (Ian Heavens),
        tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close 
References: <9702111928.AA15928@ludwigia.raleigh.ibm.com>
In-reply-to: narten's message of "Tue, 11 Feb 1997 13:28:45 -0500".
	 <9702111928.AA15928@ludwigia.raleigh.ibm.com> 
From: "C. Harald Koch" <chk@border.com>
Organization: Secure Computing Canada Ltd.
Phone: +1 416 813 2054
X-uri: <URL:http://www.eng.border.com/homes/chk/>
X-Face: )@F:jK?*}hv!eJ}*r*0DD"k8x1.d#i>7`ETe2;hSD2T!:Fh#wu`0pW7lO|Dfe'AbyNy[\Pw
 z'.bAtgTM!+iq2$yXiv4gf<:D*rZ-|f$\YQi7"D"=CG!JB?[^_7v>8Mm;z:NJ7pss)l__Cw+.>xUJ)
 did@Pr9
Date: Tue, 11 Feb 1997 17:23:07 -0500
Sender: owner-tcp-impl
Precedence: bulk

In message <3300AE7E.60F9@spider.com>, Ian Heavens writes:
>
> 1.  server  is blocked on write()
> 2.  client closes normally and server ACKs (client in FIN-WAIT-2
> and server in CLOSE-WAIT).  No RST because RFC1122 is not followed.
> 3.  server never closes
 
We saw this problem with the web server (and web proxies) on our firewall
product, and ended up using the (evil expedient) of having the server/proxy
timeout all connections (after 7200 seconds of zero-traffic, i.e. a *long*
time). 

In message <9702111928.AA15928@ludwigia.raleigh.ibm.com>, Thomas Narten writes:
> 
> The client is clearly broken, and in a major way that just doesn't
> make sense to me. In step 3C, the client has destroyed the
> connection. Yet in step 4C, when the client gets a TCP segment for
> which it doesn't have a control block, it does nothing, as opposed to
> sending a RST.

We most commonly see this when a PPP user has hung-up. Since they're *gone*,
you don't get any response from the TCP stack. Because of CIDR route
aggregation, buggy dialup servers, and/or firewalls, you often don't get
host/net unreachables. The only indication you have that the client has
disappeared is a timeout.

All the TCP stack fixes on the client side in the world won't solve this
particular problem (the dial-up user hung up).

-- 
C. Harald Koch          | Senior System Developer, Secure Computing Canada Ltd.
chk@border.com          | 100 University Ave., 7th Floor, Toronto ON M5J 1V6
+1 416 813 2054 (voice) | "Madness takes its toll. Please have exact change."
+1 416 813 2001 (fax)   |		-Karen Murphy <karenm@descartes.com>

From owner-tcp-impl  Tue Feb 11 22:59:38 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA26886 for tcp-impl-list; Tue, 11 Feb 1997 22:58:58 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA26857 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 14:58:52 -0800
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA28843 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 14:58:44 -0800
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id RAA16386; Tue, 11 Feb 1997 17:53:33 -0500 (EST)
Message-Id: <199702112253.RAA16386@grinch.eecs.umich.edu>
To: josh@birdcage.mcom.com (Josh Cohen)
Cc: ian@spider.com (Ian Heavens), narten@raleigh.ibm.com, tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close
References: <199702111844.KAA25296@birdcage.mcom.com>
From: Scott Dawson <sdawson@eecs.umich.edu>
In-Reply-To: josh@birdcage.mcom.com's message of Tue, 11 Feb 1997 10:44:56 -0800 (PST)
Lines: 86
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Tue, 11 Feb 1997 17:53:32 -0500
Sender: owner-tcp-impl
Precedence: bulk


> 1S. Server is in a write() sending data to the client
> 
> 1C. User hits STOP or something, the application does a close().
> 
> 2C. Client aborts and sends either a FIN or RST 
> 	(lets say a FIN to show the problem, and reasons for RST instead)
> 
> 3C. The client application has destroyed the socket, and it detached
>  	from the stack for that connection. 
> 	From the client stack side, the connection no longer exists
> 
> 3S Server continues to be in a write
>    Server Stack goes to CLOSE_WAIT ( having received the FIN )
>    
> 4S Server will continue to write to the client, until it has exhausted
>    its window. This is OK for a half-close.  
> 
> 5S The window is exhausted, and the server will wait until the client
>    opens the window. (window probes )
> 
> 4C. The client should, but doesnt  send a RST upon receiving the
> 	 remaining data from the server or window probes.

If the server is sending into what it thinks is an open window, and
the client doesn't reply at all (RST or otherwise), then the server
would simply time out the connection, right?  However, if the client
has already advertised a zero window before the socket was torn down,
then the server will be in a zero window probing state in which,
according to the spec, it MUST send zero window probes periodically to
see if the window has opened.  It must be the case then that the
server sends enough data to fill the client window before the client
decides to tear down the connection and send the RST, or otherwise it
would just kill the connection when the client doesn't respond.  For
example, maybe when the 'stop' button is clicked, the client stops
reading long enough for a zero window to be advertised before the
connection is closed down.  The spec states:

      A TCP MAY keep its offered receive window closed indefinitely.
      As long as the receiving TCP continues to send acknowledgments in
      response to the probe segments, the sending TCP MUST allow the
      connection to remain open.  (page 92, RFC-1122).

However, that this does not imply that the connection MUST be kept
open *indefinitely*.  In this case, it seems that the client not
replying at all causes the server TCP to stay in zero window probing
state forever.  But the server COULD time the connection out, it just
doesn't.  This seems to be a place where the specification needs to be
more concrete.  Maybe it should also say, "If the receiving TCP does
not acknowledge the probes, the sending TCP SHOULD close the
connection."  Or say "bad things might happen to you if you don't."

We have a tool called Orchestra that we have used to test several
different vendor TCP implementations by injecting faults (drop, delay
messages, fabricate new messages) into the segments sent by the TCP
peers.  In one of our tests, we dropped zero window probes to see what
participants do.  What we found in most of the TCPs we tested (SunOS,
Solaris, AIX, OS/2, Next Mach) was that the TCP did not time out
connections that are in the zero window probe state, even if the other
TCP does not ACK the probes.  Only one implementation (Windows 95)
actually timed out the connection once it was in a zero window probe
state if the peer did not ACK the probes.

It seems that the Win 95 TCP is doing a good thing by timing out
connections whose peer does not ACK zero window probes.  (At least it
isn't going against spec.)  If the server platform were Win 95, this
"server hanging" condition would not happen.

> Unfortunately, the bad stack implementation is in widespread use,
> and the current versions of it *still* do not fix the problem.

But if other TCPs dropped connections that do not ACK zero window
probes, this "bad stack" would not affect the server any more than if
it refused to ACK data messages.  Even though the client TCP is messed
up, the server would simply time out the connection and go on about
its business (dying, cleanup, whatever).  Even though we'd want the
bad stack to be fixed, it wouldn't be as much of a problem if we were
more resilient to the way that it's bad.

I'm not sticking up for the 'bad stack'.  I don't even know which
stack it is, and definitely think it should be fixed.  I'm saying that
just because it's out there doesn't mean that people who run web
servers (or OS's that run web servers) are at its mercy.

-Scott


From owner-tcp-impl  Tue Feb 11 23:33:28 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA05024 for tcp-impl-list; Tue, 11 Feb 1997 23:32:45 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA04997 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 15:32:37 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA06941 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 15:32:35 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id PAA20040; Tue, 11 Feb 1997 15:22:30 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id PAA04615; Tue, 11 Feb 1997 15:22:28 -0800
Received: from fstop. by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id PAA23761; Tue, 11 Feb 1997 15:22:06 -0800
Received: from fstop by fstop. (SMI-8.6/SMI-SVR4)
	id PAA03321; Tue, 11 Feb 1997 15:21:41 -0800
Message-Id: <199702112321.PAA03321@fstop.>
From: sparker@Eng.Sun.COM
To: Scott Dawson <sdawson@eecs.umich.edu>
cc: josh@birdcage.mcom.com (Josh Cohen), ian@spider.com (Ian Heavens),
        narten@raleigh.ibm.com, tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close 
Date: Tue, 11 Feb 1997 15:21:41 -0800
Sender: owner-tcp-impl
Precedence: bulk


-       A TCP MAY keep its offered receive window closed indefinitely.
-       As long as the receiving TCP continues to send acknowledgments in
-       response to the probe segments, the sending TCP MUST allow the
-       connection to remain open.  (page 92, RFC-1122).
- 
- However, that this does not imply that the connection MUST be kept
- open *indefinitely*.  In this case, it seems that the client not
- replying at all causes the server TCP to stay in zero window probing
- state forever.  But the server COULD time the connection out, it just
- doesn't.  This seems to be a place where the specification needs to be
- more concrete.  Maybe it should also say, "If the receiving TCP does
- not acknowledge the probes, the sending TCP SHOULD close the
- connection."  Or say "bad things might happen to you if you don't."

Yes.  This bug showed up on our doorstep at Sun, and resolved it this
way a couple of years back.  If the receiver doesn't send back an ACK
of some kind to the zero window probe, after 8 minutes, we give up.
Most TCP's had it, since it was in the BSD base up until at least
4.4BSD, if my memory serves.

The spec isn't as clear about this as it could be, I think.

Cheers,

	~sparker

From owner-tcp-impl  Tue Feb 11 23:56:12 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA10860 for tcp-impl-list; Tue, 11 Feb 1997 23:55:35 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA10841 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 15:55:34 -0800
Received: from c3po.mcom.com (h-205-217-237-46.netscape.com [205.217.237.46]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA10769 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 15:55:32 -0800
Received: from birdcage.mcom.com (birdcage.mcom.com [205.217.250.80]) by c3po.mcom.com (8.7.5/8.7.3) with SMTP id PAA20669 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Feb 1997 15:40:32 -0800 (PST)
Received: by birdcage.mcom.com (SMI-8.6/SMI-SVR4)
	id PAA27437; Tue, 11 Feb 1997 15:40:31 -0800
From: josh@birdcage.mcom.com (Josh Cohen)
Message-Id: <199702112340.PAA27437@birdcage.mcom.com>
Subject: The trace ( HTTP/RFC1122 )
To: tcp-impl
Date: Tue, 11 Feb 1997 15:40:31 -0800 (PST)
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

Hey!,
	I got the trace of the evil behavior.

A few notes on my previous posts..
1. In the problem state, the client stack is in FIN_WAIT_2
2. Server is in CLOSE_WAIT
3. write() never times out if you do a blocking write.
4. Its the client *stack* which is broken, not the web client.
5. Our products are commonly run on this stack, on PCs.
6. When I say aborted connection I mean at the HTTP level.
	The client hits 'stop' or equivalent.
	If we handle this with a client FIN, we may allow this
	problem.(in the client)  If we do it with a RST, we have a 
	better chance of avoiding it.
	(yes, the RST can be lost, but chances are itll make it
	more often than not.  Even if lost, we're in the same boat)
	
Now, what the trace tells us.
1. Client does the http abort via a FIN (not our client)
2. Client *stack* doesnt send RSTs to the servers data when its in
	FIN_WAIT_X
3. Client *does* respond to window probes, with win=0 forever
	I think this is ok for FIN_WAIT_X in general, but 
	in this case, its bad news.  How can we decide to 
	close the server side ?
4. The clients window drops to zero instantly with the FIN sent.

notes:

Im not suggesting any hacks to the protocols.  The reality is that
the problem exists often, so workarounds must be done.

It is the servers, proxy, web, whatever, which take the damage.  The
misbehaving client goes on with its life.  

(timeouts as a workaround)	
Timeouts are a help, but dont help enough.  A busy web server
can handle hundreds of connections per second.  If the timeout
is 2 minutes, its still way too long.  The server must scale
to be very large, ie memory, processes/threads etc, to take 
care of these dead connections until they timeout.
To use 2 min as an example, its too long to help the problem on
a really busy server, yet its too short from the client perspective,
especially for a proxy.  Consider a CGI app which does some
database queries.  It may not send any output for over two 
minutes while a large query is being run on the DBMS.  In this case,
the proxy would close the connection due to a timeout, prematurely.

Prohibit half-close in HTTP.
If the client has done a half-close, close the connection.
My gut reaction is that in HTTP, the half-close isnt necessary.
Ill raise it in http-wg.

--- here's the trace ---
I had to add some comments and data from the detail section (which I
didnt post), my additions are in <>'s.
The trace came from an trace tool which isnt really very friendly.

The client is a PC running the bad stack, with a web browser which does
not workaround the problem by sending a reset.

The server is an HPUX box running Netscape Proxy Server 2.5 without
any workarounds in place.

Si client.1391 > server.8080: S 0:1(0) ack: 0 win: 2000 <mss: 5b4>
Si server.8080 > client.1391: [DF] SA 20768c01:20768c02(0) ack: 1 win: 8000 <mss: 100>
Si client.1391 > server.8080: PA 
Si client.1391 > server.8080: PA 1:1c2(1c1) ack: 20768c02 win: 2000
Si server.8080 > client.1391: [DF] PA 20768c02:20768cde(dc) ack: 1c2 win: 8000
Si server.8080 > client.1391: [DF] A 20768cde:20769292(5b4) ack: 1c2 win: 8000
Si server.8080 > client.1391: [DF] A 20769292:20769846(5b4) ack: 1c2 win: 8000
Si server.8080 > client.1391: [DF] A 20769846:20769dfa(5b4) ack: 1c2 win: 8000
Si client.1391 > server.8080: PA 
Si server.8080 > client.1391: [DF] A 20769dfa:2076a3ae(5b4) ack: 1c2 win: 8000
Si server.8080 > client.1391: [DF] A 2076a3ae:2076a962(5b4) ack: 1c2 win: 8000
Si server.8080 > client.1391: [DF] A 2076a962:2076af16(5b4) ack: 1c2 win: 8000
Si server.8080 > client.1391: [DF] A 2076af16:2076b4ca(5b4) ack: 1c2 win: 8000
Si client.1391 > server.8080: PA 
Si server.8080 > client.1391: [DF] A 2076b4ca:2076ba7e(5b4) ack: 1c2 win: 8000
Si server.8080 > client.1391: [DF] A 2076b4ca:2076ba7e(5b4) ack: 1c2 win: 8000
Si client.1391 > server.8080: PA 
Si server.8080 > client.1391: [DF] A 2076ba7e:2076bdfa(37c) ack: 1c2 win: 8000
Si client.1391 > server.8080: PA 
Si server.8080 > client.1391: [DF] A 2076bdfa:2076bdfb(1) ack: 1c2 win: 8000
Si client.1391 > server.8080: PA 
Si client.1391 > server.8080: FPA 
< client window drops to zero and proceeds to FIN_WAIT_1 >
Si server.8080 > client.1391: [DF] A 
< server ACKs the FIN >
< client now in FIN_WAIT_2 >
Si server.8080 > client.1391: [DF] A 2076bdfa:2076bdfb(1) ack: 1c3 win: 8000
< server continues to try to send its data >
Si client.1391 > server.8080: PA < window = 0 >
Si server.8080 > client.1391: [DF] A 2076bdfa:2076bdfb(1) ack: 1c3 win: 8000
Si client.1391 > server.8080: PA < window = 0 >
Si server.8080 > client.1391: [DF] A 2076bdfa:2076bdfb(1) ack: 1c3 win: 8000
Si client.1391 > server.8080: PA < window = 0 >
Si server.8080 > client.1391: [DF] A 2076bdfa:2076bdfb(1) ack: 1c3 win: 8000
Si client.1391 > server.8080: PA < window = 0 >
Si server.8080 > client.1391: [DF] A 2076bdfa:2076bdfb(1) ack: 1c3 win: 8000
Si client.1391 > server.8080: PA < window = 0 >
.. repeat ad nauseum ..


 

-- 
-----------------------------------------------------------------------------
Josh Cohen				        Netscape Communications Corp.
Netscape Fire Department	       
Server Engineering
josh@netscape.com                       http://home.netscape.com/people/josh/
-----------------------------------------------------------------------------

From owner-tcp-impl  Wed Feb 12 10:03:37 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA03359 for tcp-impl-list; Wed, 12 Feb 1997 10:03:04 GMT
Return-Path: <owner-tcp-impl>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id CAA03354 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 12 Feb 1997 02:03:03 -0800
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id CAA28206; Wed, 12 Feb 1997 02:01:20 -0800
Message-Id: <199702121001.CAA28206@refugee.engr.sgi.com>
X-Mailer: exmh version 2.0alpha 12/3/96
To: sparker@Eng.Sun.COM
Cc: Scott Dawson <sdawson@eecs.umich.edu>, josh@birdcage.mcom.com (Josh Cohen),
        ian@spider.com (Ian Heavens), narten@raleigh.ibm.com, tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close 
In-reply-to: Message from sparker@Eng.Sun.COM of 11 Feb 1997 15:21:41 PST
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 12 Feb 1997 02:01:18 -0800
From: Steve Alexander <sca@refugee>
Sender: owner-tcp-impl
Precedence: bulk

sparker@Eng.Sun.COM writes:
>Yes.  This bug showed up on our doorstep at Sun, and resolved it this
>way a couple of years back.  If the receiver doesn't send back an ACK
>of some kind to the zero window probe, after 8 minutes, we give up.
>Most TCP's had it, since it was in the BSD base up until at least
>4.4BSD, if my memory serves.

I believe this is what Rich Stevens was referring to, and the fix was added to
BSD between 4.4-Lite and 4.4-Lite2, I believe, so the indefinite retry behavior
is pretty widespread.  We adopted the 4.4-Lite2 fix for IRIX a while back.

-- Steve



From owner-tcp-impl  Wed Feb 12 14:51:44 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA20018 for tcp-impl-list; Wed, 12 Feb 1997 14:51:07 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id GAA20006 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 06:51:02 -0800
Received: from malatesta. (malatesta.spider.com [194.217.109.93]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id GAA23036 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 06:50:59 -0800
Received: from malatesta by malatesta. (SMI-8.6/SMI-SVR4)
	id OAA29143; Wed, 12 Feb 1997 14:33:55 GMT
Message-ID: <3301D4D3.5471@spider.com>
Date: Wed, 12 Feb 1997 14:33:55 +0000
From: Ian Heavens <ian@spider.com>
X-Mailer: Mozilla 3.01 (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: "W. Richard Stevens" <rstevens@kohala.com>
CC: Thomas Narten <narten@raleigh.ibm.com>,
        Josh Cohen <josh@birdcage.mcom.com>, tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close
References: <199702112056.NAA02705@kohala.kohala.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

W. Richard Stevens wrote:
> 
> [In your message of Feb 11,  3:31pm you write:]
> >
> > I'd venture that in the vast
> > majority of cases, at the exact time the client half-duplex closes a
> > connection, there will be no received data queued by TCP (at the
> > receiver), so this scenario won't happen frequently in practice.
> 
> *Nothing* is infrequent on the Web, as we have all learned, finding
> all the latent bugs in most TCP/IP implementations that have just
> never been tickled before.  0.01% is a big number when you're dealing
> with millions of connections per day.
> 

And here, the figure is 40%, or the proportion of web page downloads
where the user decides to stop or try another link before it is
complete (hence the large amount of data being pumped out by the
server).  What we have here, IMHO, is a dramatic change in the
characteristic closing conditions of a TCP connection: graceful
close with all data sent and acknowledged cannot be assumed to
be the norm.

> > What
> > is absolutely critical, however, is that any subsequent TCP packets
> > that arrive for that connection cause a RST to be generated.

Yes, and in this case that RFC 1122 is followed for half duplex close
(the issues are the same: to flush the data and kill the server
application process).

> 
> Absolutely.  And if stacks don't do this today, it's time to
> name-that-vendor.

Have we decided whether it is OK to name vendors, at least on this
list?

ian

From owner-tcp-impl  Wed Feb 12 14:55:01 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA20410 for tcp-impl-list; Wed, 12 Feb 1997 14:54:37 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id GAA20404 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 06:54:36 -0800
Received: from socks2.raleigh.ibm.com (socks2.raleigh.ibm.com [204.146.167.123]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id GAA23732 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 06:54:33 -0800
Received: from rtpdce03.raleigh.ibm.com by socks2.raleigh.ibm.com (AIX 4.1/UCB 5.64/RTP-FW1.0)
          id AA56816; Wed, 12 Feb 1997 09:50:14 -0500
Received: from ludwigia.raleigh.ibm.com (ludwigia.raleigh.ibm.com [9.37.83.125]) by rtpdce03.raleigh.ibm.com (8.7.3/8.7.3/RTP-ral-1.0) with SMTP id JAA35808; Wed, 12 Feb 1997 09:50:10 -0500
Received: from localhost.raleigh.ibm.com by ludwigia.raleigh.ibm.com (AIX 4.1/UCB 5.64/4.03-RAL)
          id AA23128; Wed, 12 Feb 1997 09:50:23 -0500
Message-Id: <9702121450.AA23128@ludwigia.raleigh.ibm.com>
To: josh@birdcage.mcom.com (Josh Cohen)
Cc: tcp-impl
Subject: Re: The trace ( HTTP/RFC1122 ) 
In-Reply-To: Your message of "Tue, 11 Feb 1997 15:40:31 PST."
             <199702112340.PAA27437@birdcage.mcom.com> 
Date: Wed, 12 Feb 1997 09:50:23 -0400
From: Thomas Narten <narten@raleigh.ibm.com>
Sender: owner-tcp-impl
Precedence: bulk

Josh,

> 6. When I say aborted connection I mean at the HTTP level.
> 	The client hits 'stop' or equivalent.
> 	If we handle this with a client FIN, we may allow this
> 	problem.(in the client)  If we do it with a RST, we have a 
> 	better chance of avoiding it.
> 	(yes, the RST can be lost, but chances are itll make it
> 	more often than not.  Even if lost, we're in the same boat)

And:

> 1. Client does the http abort via a FIN (not our client)

Could you say a bit more about exactly how a client 'aborts' a
connection, in terms of library/API calls? The application doesn't
send FINs, it tells TCP to do so via an API call. I suspect that you
mean it issues a 'half-duplex close', which is *supposed* to send RSTs
and cause subsequent received data to trigger a RST.

> 2. Client *stack* doesnt send RSTs to the servers data when its in
> 	FIN_WAIT_X

I actually think this is correct. The FIN_WAIT_x states imply that the
application is still using the socket, waiting for the peer to finish
sending data. If, on the other hand, the application had invoked
'half-duplex close' (and has destroyed the socket), the client TCP
should go into some state other than FIN_WAIT_x.

> Im not suggesting any hacks to the protocols.

But it seems to like you are. How is it that the client can send a RST
(the suggested workaround) without modifying the stack to do so? That
would be modifying the protocol implemented by the stack. But if you
have to modify the stack to get the desired behavior anyway, why not
just fix the actual problem, so the workaround isn't needed anymore?

Thomas

From owner-tcp-impl  Wed Feb 12 15:31:58 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA24784 for tcp-impl-list; Wed, 12 Feb 1997 15:31:23 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA24778 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 07:31:21 -0800
Received: from malatesta. (malatesta.spider.com [194.217.109.93]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA00861 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 07:31:19 -0800
Received: from malatesta by malatesta. (SMI-8.6/SMI-SVR4)
	id PAA29165; Wed, 12 Feb 1997 15:11:14 GMT
Message-ID: <3301DD92.520D@spider.com>
Date: Wed, 12 Feb 1997 15:11:14 +0000
From: Ian Heavens <ian@spider.com>
X-Mailer: Mozilla 3.01 (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: Thomas Narten <narten@raleigh.ibm.com>
CC: Josh Cohen <josh@birdcage.mcom.com>, tcp-impl
Subject: Re: The trace ( HTTP/RFC1122 )
References: <9702121450.AA23128@ludwigia.raleigh.ibm.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

Thomas Narten wrote:
> 
> Josh,
> 
> > 6. When I say aborted connection I mean at the HTTP level.
> >       The client hits 'stop' or equivalent.
> >       If we handle this with a client FIN, we may allow this
> >       problem.(in the client)  If we do it with a RST, we have a
> >       better chance of avoiding it.
> >       (yes, the RST can be lost, but chances are itll make it
> >       more often than not.  Even if lost, we're in the same boat)
> 
> And:
> 
> > 1. Client does the http abort via a FIN (not our client)
> 
> Could you say a bit more about exactly how a client 'aborts' a
> connection, in terms of library/API calls? The application doesn't
> send FINs, it tells TCP to do so via an API call. I suspect that you
> mean it issues a 'half-duplex close', which is *supposed* to send RSTs
> and cause subsequent received data to trigger a RST.

I think the close() call, or the Winsock equivalent, sends a FIN
as normal, unless there is pending data to be read, in which
case a RST is sent.  If data arrives after the FIN has been sent,
a RST is sent (RFC1122).

> > 2. Client *stack* doesnt send RSTs to the servers data when its in
> >       FIN_WAIT_X
> 
> I actually think this is correct. The FIN_WAIT_x states imply that the
> application is still using the socket, waiting for the peer to finish
> sending data. If, on the other hand, the application had invoked
> 'half-duplex close' (and has destroyed the socket), the client TCP
> should go into some state other than FIN_WAIT_x.
> 
> > Im not suggesting any hacks to the protocols.
> 
> But it seems to like you are. How is it that the client can send a RST
> (the suggested workaround) without modifying the stack to do so? That
> would be modifying the protocol implemented by the stack. But if you
> have to modify the stack to get the desired behavior anyway, why not
> just fix the actual problem, so the workaround isn't needed anymore?
>

The client application sets SO_LINGER to zero before close(), and
this sends a RST rather than a FIN.  So the application can generate
a RST by itself.

Here, the stack is wrong, but one of my points is that the application
use of SO_LINGER=0 is reasonable, even though abortive closes are in
general a bad idea.  And RFC1122 RSTs for Half Duplex Close are vital.

ian

From owner-tcp-impl  Wed Feb 12 15:32:02 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA24801 for tcp-impl-list; Wed, 12 Feb 1997 15:31:26 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA24791 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 07:31:24 -0800
Received: from malatesta. (malatesta.spider.com [194.217.109.93]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA00894 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 07:31:21 -0800
Received: from malatesta by malatesta. (SMI-8.6/SMI-SVR4)
	id PAA29168; Wed, 12 Feb 1997 15:12:18 GMT
Message-ID: <3301DDD2.7469@spider.com>
Date: Wed, 12 Feb 1997 15:12:18 +0000
From: Ian Heavens <ian@spider.com>
X-Mailer: Mozilla 3.01 (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: Josh Cohen <josh@birdcage.mcom.com>
CC: tcp-impl
Subject: Re: The trace ( HTTP/RFC1122 )
References: <199702112340.PAA27437@birdcage.mcom.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

Ah! Real data...I think we argued our way to this conclusion
(painfully :-).

Two points I think worth flagging: a clarification of terminology
for clients and servers in terms of closing/aborting/processes
hanging/TCP connections neither transmitting nor receiving would
be a good thing, particularly to distinguish application and protocol
stack behaviour.  We got confused and we're supposed to understand this
stuff :-) (oh, and I should have got hold of a network trace before
mentioning it).

The other point is that there's sufficient ambiguities in application
and TCP closing behaviour to merit a close look and a separate section
in the WG deliverables (bug listing & spec clarification).

On the trace:

client closes with FIN, server ACKs: 
client in FIN-WAIT-2 advertising zero window (which makes a lot
of sense since it cannot read any more)
server in CLOSE-WAIT probing, all probes are ACKed with zero window.

It seems to me there are three approaches to fix this:

1.  client flushes the connection, ideally with a RFC 1122 RST
(application has to workaround with a SO_LINGER otherwise)

2.  server times out eventually.  Apart from the load, here the
server would have to deduce from the zero window received in CLOSE-WAIT
that the client really meant to do a half duplex close, as opposed
to advertising a zero window because of momentary buffer exhaustion.

3.  Moving the problem up one layer, either to HTTP as Josh suggests,
or  Rick Jones suggestion of sucking all the data off and throwing
it away and then closing (thereby moving the TIME-WAIT from server
to client, a good thing)

The problem about 3 is that you never get a chance to flush unwanted
data.  2 still ties up resources as Josh says.  

ian

From owner-tcp-impl  Wed Feb 12 16:12:50 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA29128 for tcp-impl-list; Wed, 12 Feb 1997 16:12:16 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA29120 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 08:12:13 -0800
Received: from socks2.raleigh.ibm.com (socks2.raleigh.ibm.com [204.146.167.123]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id IAA10150 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 08:12:09 -0800
Received: from rtpdce02.raleigh.ibm.com by socks2.raleigh.ibm.com (AIX 4.1/UCB 5.64/RTP-FW1.0)
          id AA63136; Wed, 12 Feb 1997 11:07:32 -0500
Received: from ludwigia.raleigh.ibm.com (ludwigia.raleigh.ibm.com [9.37.83.125]) by rtpdce02.raleigh.ibm.com (8.7.3/8.7.3/RTP-ral-1.0) with SMTP id LAA91952; Wed, 12 Feb 1997 11:07:30 -0500
Received: from localhost.raleigh.ibm.com by ludwigia.raleigh.ibm.com (AIX 4.1/UCB 5.64/4.03-RAL)
          id AA20792; Wed, 12 Feb 1997 11:07:42 -0500
Message-Id: <9702121607.AA20792@ludwigia.raleigh.ibm.com>
To: Ian Heavens <ian@spider.com>
Cc: Josh Cohen <josh@birdcage.mcom.com>, tcp-impl
Subject: Re: The trace ( HTTP/RFC1122 ) 
In-Reply-To: Your message of "Wed, 12 Feb 1997 15:12:18 GMT."
             <3301DDD2.7469@spider.com> 
Date: Wed, 12 Feb 1997 11:07:42 -0400
From: Thomas Narten <narten@raleigh.ibm.com>
Sender: owner-tcp-impl
Precedence: bulk

> Two points I think worth flagging: a clarification of terminology
> for clients and servers in terms of closing/aborting/processes
> hanging/TCP connections neither transmitting nor receiving would
> be a good thing, particularly to distinguish application and protocol
> stack behaviour.  We got confused and we're supposed to understand this
> stuff :-) (oh, and I should have got hold of a network trace before
> mentioning it).

100% agreed!

> The other point is that there's sufficient ambiguities in application
> and TCP closing behaviour to merit a close look and a separate section
> in the WG deliverables (bug listing & spec clarification).

Agree here as well.

> On the trace:

> client closes with FIN, server ACKs: 
> client in FIN-WAIT-2 advertising zero window (which makes a lot
> of sense since it cannot read any more)

Here is where we still seem to be having a disagreement. If the client
has "destroyed" the socket and is no longer using the socket, then TCP
will never be able to deliver additional received data to the client
application. In this scenario, it is wrong for the client TCP to
respond with an ACK and window advertisement of zero, since doing so
effectively tells its peer (the server): "I'm out of buffer space
right now, but try again in a little while and things should get
better". Things will *never* get better in this case, leading to the
problem of hung servers.

The trace doesn't give the full story. The trace is correct in the
sense that a TCP connection in the FIN_WAIT_2 state is allowed to
advertise a window of zero. However, it is correct to do so only if
the client application is still reading data received on the socket,
implying that the zero window is a temporary condition. That is not
the scenario we have been discussing (i.e., the client destroys the
socket)

> server in CLOSE-WAIT probing, all probes are ACKed with zero window.

> It seems to me there are three approaches to fix this:

Again, I think the correct fix is to prevent the client TCP from going
into the FIN_WAIT_x state, if in fact the application has disconnected
from the socket.

> 1.  client flushes the connection, ideally with a RFC 1122 RST
> (application has to workaround with a SO_LINGER otherwise)

I would agree that this is what should happen. Please note though,
that the TCP should *not* go into FIN_WAIT_2 state at this point.

> 2.  server times out eventually. Apart from the load, here the
> server would have to deduce from the zero window received in CLOSE-WAIT
> that the client really meant to do a half duplex close, as opposed
> to advertising a zero window because of momentary buffer exhaustion.

Doing the above has the side effect of breaking other applications
that are implemented correctly. For example, I could suspend a telnet
session, which would could cause its TCP to (eventually) advertise a
receive window of 0. The window advertisement would remain zero until
I resumed the telnet session (i.e., an indeterminate amount of
time). It would be wrong to terminate such connections after some
arbitrary time limit (e.g., 15 minutes).

Thomas

From owner-tcp-impl  Wed Feb 12 16:52:50 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA07460 for tcp-impl-list; Wed, 12 Feb 1997 16:52:15 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA07442 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 08:52:13 -0800
Received: from malatesta. (malatesta.spider.com [194.217.109.93]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA18167 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 08:52:09 -0800
Received: from malatesta by malatesta. (SMI-8.6/SMI-SVR4)
	id QAA29240; Wed, 12 Feb 1997 16:31:02 GMT
Message-ID: <3301F045.5E86@spider.com>
Date: Wed, 12 Feb 1997 16:31:01 +0000
From: Ian Heavens <ian@spider.com>
X-Mailer: Mozilla 3.01 (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: Thomas Narten <narten@raleigh.ibm.com>
CC: Josh Cohen <josh@birdcage.mcom.com>, tcp-impl
Subject: Re: The trace ( HTTP/RFC1122 )
References: <9702121607.AA20792@ludwigia.raleigh.ibm.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

> > On the trace:
> 
> > client closes with FIN, server ACKs:
> > client in FIN-WAIT-2 advertising zero window (which makes a lot
> > of sense since it cannot read any more)
> 
> Here is where we still seem to be having a disagreement. If the client
> has "destroyed" the socket and is no longer using the socket, then TCP
> will never be able to deliver additional received data to the client
> application. In this scenario, it is wrong for the client TCP to
> respond with an ACK and window advertisement of zero, since doing so
> effectively tells its peer (the server): "I'm out of buffer space
> right now, but try again in a little while and things should get
> better". Things will *never* get better in this case, leading to the
> problem of hung servers.

A very good point.   No alternative to that RST.
 
 > 1.  client flushes the connection, ideally with a RFC 1122 RST
> > (application has to workaround with a SO_LINGER otherwise)
> 
> I would agree that this is what should happen. Please note though,
> that the TCP should *not* go into FIN_WAIT_2 state at this point.
> 

Sure, after a RST it goes to CLOSED.  The problem without the RST
is that the client socket hangs around forever in FIN-WAIT-2.

> > 2.  server times out eventually. Apart from the load, here the
> > server would have to deduce from the zero window received in CLOSE-WAIT
> > that the client really meant to do a half duplex close, as opposed
> > to advertising a zero window because of momentary buffer exhaustion.
> 
> Doing the above has the side effect of breaking other applications
> that are implemented correctly. For example, I could suspend a telnet
> session, which would could cause its TCP to (eventually) advertise a
> receive window of 0. The window advertisement would remain zero until
> I resumed the telnet session (i.e., an indeterminate amount of
> time). It would be wrong to terminate such connections after some
> arbitrary time limit (e.g., 15 minutes).
>

Yes, that's a good example of what I meant - you can't time out
acknowledged probes., even in half closed states.  You can't detect
a half duplex close on the peer without some kind of 'interrupt'
(RST or URG or something if you want to do it in HTTP).

We agree :-)

ian

From owner-tcp-impl  Wed Feb 12 17:03:33 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA09281 for tcp-impl-list; Wed, 12 Feb 1997 17:02:37 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA09269 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 09:02:30 -0800
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA20614 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 09:02:28 -0800
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id LAA17965; Wed, 12 Feb 1997 11:40:27 -0500 (EST)
Message-Id: <199702121640.LAA17965@grinch.eecs.umich.edu>
To: sparker@Eng.Sun.COM
Cc: Scott Dawson <sdawson@eecs.umich.edu>, josh@birdcage.mcom.com (Josh Cohen),
        ian@spider.com (Ian Heavens), narten@raleigh.ibm.com, tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close
References: <199702112321.PAA03321@fstop.>
From: Scott Dawson <sdawson@eecs.umich.edu>
In-Reply-To: sparker@Eng.Sun.COM's message of Tue, 11 Feb 1997 15:21:41 -0800
Lines: 24
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Wed, 12 Feb 1997 11:40:27 -0500
Sender: owner-tcp-impl
Precedence: bulk


> Yes.  This bug showed up on our doorstep at Sun, and resolved it this
> way a couple of years back.  If the receiver doesn't send back an ACK
> of some kind to the zero window probe, after 8 minutes, we give up.
> Most TCP's had it, since it was in the BSD base up until at least
> 4.4BSD, if my memory serves.

I should've mentioned in my first message that the Solaris version
that we tested that did not time out connections in zero window probe
state was 2.3.  I just checked out Solaris 2.5.1, and it does exactly
as Steve says; times out the connection in about 8 minutes.

> The spec isn't as clear about this as it could be, I think.

I think so too.

When I sent my first message, I figured that the server must be in a
probing state sending probes forever, even though they aren't acked.
However, Josh's trace shows that the client is actually acking the
probes, even though no one will ever read any of the data out of the
closed window and open it again.  This is definitely very bad behavior
on the part of the client, and ought to be fixed.

-Scott

From owner-tcp-impl  Wed Feb 12 20:37:36 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA08998 for tcp-impl-list; Wed, 12 Feb 1997 20:36:23 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA08969 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 12:36:10 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA13904 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 12:35:54 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5/8.7.1) with UUCP id UAA19724; Wed, 12 Feb 1997 20:31:31 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0vuQu9-0005FcC; Tue, 11 Feb 97 22:42 GMT
Message-Id: <m0vuQu9-0005FcC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: HTTP and RFC1122 half duplex close
To: josh@birdcage.mcom.com (Josh Cohen)
Date: Tue, 11 Feb 1997 22:42:25 +0000 (GMT)
Cc: raj@hpisrdq.cup.hp.com, ian@spider.com, narten@raleigh.ibm.com, tcp-impl
In-Reply-To: <199702112008.MAA26403@birdcage.mcom.com> from "Josh Cohen" at Feb 11, 97 12:08:48 pm
Content-Type: text
Sender: owner-tcp-impl
Precedence: bulk

> "The characteristic of the persist state that is different from the 
> retransmission timout in Chapter 21 is that TCP 'never' gives up sending
> window probes.  These window probes will continue to be sent at 60 sec 
> intervals until the window opens up or either of the applications
> using the connection is terminated."
> 
> I need to check if the bad stack is responding, at all, to the
> probes, and to find out:
> what happens to TCP if the probes are simply unacknowledged?

It varies. The Linux stack does timeout on continual window probes to stop
people doing 0 window tcb and buffer starvation attacks and the like on the 
machine. I was under the impression at least OpenBSD was also doing this,
what are the 'vendor' stacks doing ?


From owner-tcp-impl  Wed Feb 12 21:04:51 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA14698 for tcp-impl-list; Wed, 12 Feb 1997 21:04:11 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA14675 for <tcp-impl@relay.engr.sgi.com>; Wed, 12 Feb 1997 13:04:08 -0800
Received: from alpha.xerox.com (alpha.Xerox.COM [13.1.64.93]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id NAA20401 for <tcp-impl@relay.engr.sgi.com>; Wed, 12 Feb 1997 13:04:03 -0800
Received: from crevenia.parc.xerox.com ([13.2.116.11]) by alpha.xerox.com with SMTP id <15569(2)>; Wed, 12 Feb 1997 12:59:19 PST
Received: from localhost ([127.0.0.1]) by crevenia.parc.xerox.com with SMTP id <177476>; Wed, 12 Feb 1997 12:59:05 -0800
X-Mailer: exmh version 1.6.9 8/22/96
To: josh@birdcage.mcom.com (Josh Cohen)
cc: raj@hpisrdq.cup.hp.com (Rick Jones), ian@spider.com,
        narten@raleigh.ibm.com, tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close 
In-reply-to: Your message of "Tue, 11 Feb 1997 12:08:48 PST."
             <199702112008.MAA26403@birdcage.mcom.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 12 Feb 1997 12:59:04 PST
From: Bill Fenner <fenner@parc.xerox.com>
Message-Id: <97Feb12.125905pst.177476@crevenia.parc.xerox.com>
Sender: owner-tcp-impl
Precedence: bulk

In message <199702112008.MAA26403@birdcage.mcom.com>Josh Cohen wrote:
>An idea I have bounced around is to suggest prohibiting the half-close in HTTP
>By this I mean that before every write, you must check the read 
>status of the socket for EOF, and abort if so. 
>( makes half-close a full close )

This would prevent migrating HTTP towards using T/TCP, since if a T/TCP client 
sends to a non-T/TCP server it ends up looking like a half-closed connection.

  Bill



From owner-tcp-impl  Wed Feb 12 21:12:25 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA16157 for tcp-impl-list; Wed, 12 Feb 1997 21:11:59 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA16151 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 13:11:57 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA22295 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 13:11:56 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id MAA11734; Wed, 12 Feb 1997 12:45:56 -0800
Received: from skybolt.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id MAA28875; Wed, 12 Feb 1997 12:45:50 -0800
Received: by skybolt.eng.sun.com (SMI-8.6/SMI-SVR4)
	id MAA18766; Wed, 12 Feb 1997 12:42:39 -0800
Date: Wed, 12 Feb 1997 12:42:39 -0800
From: Richard.Fox@Eng.Sun.COM (Richard Fox)
Message-Id: <199702122042.MAA18766@skybolt.eng.sun.com>
To: josh@birdcage.mcom.com, alan@lxorguk.ukuu.org.uk
Subject: Re: HTTP and RFC1122 half duplex close
Cc: raj@hpisrdq.cup.hp.com, ian@spider.com, narten@raleigh.ibm.com, tcp-impl
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl
Precedence: bulk


> From owner-tcp-impl@relay.engr.SGI.COM Wed Feb 12 12:38 PST 1997
> From: alan@lxorguk.ukuu.org.uk (Alan Cox)
> Subject: Re: HTTP and RFC1122 half duplex close
> To: josh@birdcage.mcom.com (Josh Cohen)
> Date: Tue, 11 Feb 1997 22:42:25 +0000 (GMT)
> Cc: raj@hpisrdq.cup.hp.com, ian@spider.com, narten@raleigh.ibm.com,
>         tcp-impl@relay.engr.SGI.COM
> 
> > "The characteristic of the persist state that is different from the 
> > retransmission timout in Chapter 21 is that TCP 'never' gives up sending
> > window probes.  These window probes will continue to be sent at 60 sec 
> > intervals until the window opens up or either of the applications
> > using the connection is terminated."
> > 
> > I need to check if the bad stack is responding, at all, to the
> > probes, and to find out:
> > what happens to TCP if the probes are simply unacknowledged?
> 
> It varies. The Linux stack does timeout on continual window probes to stop
> people doing 0 window tcb and buffer starvation attacks and the like on the 
> machine. I was under the impression at least OpenBSD was also doing this,
> what are the 'vendor' stacks doing ?


This seems to be in violation of the standard. I at one time believed this
was the right way to go. However, it seemed that no matter what I used as
the timeout value, somebody always came back saying: I just came back to my
desk and was going to get data when the connection was terminated. In other
words, no matter what timeout value I picked, somebody always found fault
with it:-)

--rich

From owner-tcp-impl  Wed Feb 12 22:57:53 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA14230 for tcp-impl-list; Wed, 12 Feb 1997 22:57:08 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA14208 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 14:57:05 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA17112 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 14:56:48 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5/8.7.1) with UUCP id WAA23815; Wed, 12 Feb 1997 22:54:09 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0vunG5-0005FcC; Wed, 12 Feb 97 22:34 GMT
Message-Id: <m0vunG5-0005FcC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: The trace ( HTTP/RFC1122 )
To: ian@spider.com (Ian Heavens)
Date: Wed, 12 Feb 1997 22:34:33 +0000 (GMT)
Cc: narten@raleigh.ibm.com, josh@birdcage.mcom.com, tcp-impl
In-Reply-To: <3301DD92.520D@spider.com> from "Ian Heavens" at Feb 12, 97 03:11:14 pm
Content-Type: text
Sender: owner-tcp-impl
Precedence: bulk

> Here, the stack is wrong, but one of my points is that the application
> use of SO_LINGER=0 is reasonable, even though abortive closes are in
> general a bad idea.  And RFC1122 RSTs for Half Duplex Close are vital.

Assuming it is a fundamental nono simply because there is a legal way
for them not to occur, which happens with stuff like DOSLynx. The machine
ceases to be a TCP/IP node. And of course it also happens every time the
routing tables go for their hourly flap. The server side has to time out
at some point.

Alan


From owner-tcp-impl  Wed Feb 12 23:00:12 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA14857 for tcp-impl-list; Wed, 12 Feb 1997 22:59:45 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA14849 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 14:59:43 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA17724 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 14:59:40 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5/8.7.1) with UUCP id WAA23824; Wed, 12 Feb 1997 22:54:26 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0vunZV-0005FcC; Wed, 12 Feb 97 22:54 GMT
Message-Id: <m0vunZV-0005FcC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: HTTP and RFC1122 half duplex close
To: Richard.Fox@Eng.Sun.COM (Richard Fox)
Date: Wed, 12 Feb 1997 22:54:37 +0000 (GMT)
Cc: josh@birdcage.mcom.com, alan@lxorguk.ukuu.org.uk, raj@hpisrdq.cup.hp.com,
        ian@spider.com, narten@raleigh.ibm.com, tcp-impl
In-Reply-To: <199702122042.MAA18766@skybolt.eng.sun.com> from "Richard Fox" at Feb 12, 97 12:42:39 pm
Content-Type: text
Sender: owner-tcp-impl
Precedence: bulk

> This seems to be in violation of the standard. I at one time believed this
> was the right way to go. However, it seemed that no matter what I used as
> the timeout value, somebody always came back saying: I just came back to my
> desk and was going to get data when the connection was terminated. In other
> words, no matter what timeout value I picked, somebody always found fault
> with it:-)

With it set at 2 hours, that implies the remote end got constipated for
2 hours solid and I've had no moans. The original 15 minutes produced a
barrage of complaints to my suprise people were having real stalls that
long on some data processing apps. It isn't clear what is the right action
- perhaps making it yet another tcp socket option ? (the upward propogation
of hard problem theory)

Alan


From owner-tcp-impl  Wed Feb 12 23:04:04 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA16038 for tcp-impl-list; Wed, 12 Feb 1997 23:03:30 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA16005 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 15:03:23 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA18568 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 15:03:18 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5/8.7.1) with UUCP id WAA23838; Wed, 12 Feb 1997 22:55:05 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0vunCj-0005FcC; Wed, 12 Feb 97 22:31 GMT
Message-Id: <m0vunCj-0005FcC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: HTTP and RFC1122 half duplex close
To: ian@spider.com (Ian Heavens)
Date: Wed, 12 Feb 1997 22:31:05 +0000 (GMT)
Cc: rstevens@kohala.com, narten@raleigh.ibm.com, josh@birdcage.mcom.com,
        tcp-impl
In-Reply-To: <3301D4D3.5471@spider.com> from "Ian Heavens" at Feb 12, 97 02:33:55 pm
Content-Type: text
Sender: owner-tcp-impl
Precedence: bulk

> > Absolutely.  And if stacks don't do this today, it's time to
> > name-that-vendor.
> Have we decided whether it is OK to name vendors, at least on this
> list?

I'd prefer we did, even if we agree not to name names off list. Certainly
if any tcp issues come up where the "vendor name" is Linux please do mention
it and cc me directly as well!

Alan


From owner-tcp-impl  Wed Feb 12 23:10:16 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA17785 for tcp-impl-list; Wed, 12 Feb 1997 23:09:26 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA17770 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 15:09:23 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA20761 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Feb 1997 15:08:59 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5/8.7.1) with UUCP id WAA23847; Wed, 12 Feb 1997 22:55:24 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0vunc2-0005FcC; Wed, 12 Feb 97 22:57 GMT
Message-Id: <m0vunc2-0005FcC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: HTTP and RFC1122 half duplex close
To: fenner@parc.xerox.com (Bill Fenner)
Date: Wed, 12 Feb 1997 22:57:13 +0000 (GMT)
Cc: josh@birdcage.mcom.com, raj@hpisrdq.cup.hp.com, ian@spider.com,
        narten@raleigh.ibm.com, tcp-impl
In-Reply-To: <97Feb12.125905pst.177476@crevenia.parc.xerox.com> from "Bill Fenner" at Feb 12, 97 12:59:04 pm
Content-Type: text
Sender: owner-tcp-impl
Precedence: bulk

> >By this I mean that before every write, you must check the read 
> >status of the socket for EOF, and abort if so. 
> >( makes half-close a full close )
> 
> This would prevent migrating HTTP towards using T/TCP, since if a T/TCP client 
> sends to a non-T/TCP server it ends up looking like a half-closed connection.

It also has an implicit race since it can close during the write or between
the check and the write. Is T/TCP really an issue when so few stacks support
it and so many break it - at least not until IPv6 is generic and the bad
stacks have hopefully died in the transition ?

Alan


From owner-tcp-impl  Thu Feb 13 01:13:26 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA20439 for tcp-impl-list; Thu, 13 Feb 1997 01:12:50 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA20430 for <tcp-impl@relay.engr.sgi.com>; Wed, 12 Feb 1997 17:12:49 -0800
Received: from alpha.xerox.com (alpha.Xerox.COM [13.1.64.93]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id RAA18008 for <tcp-impl@relay.engr.sgi.com>; Wed, 12 Feb 1997 17:12:46 -0800
Received: from crevenia.parc.xerox.com ([13.2.116.11]) by alpha.xerox.com with SMTP id <19069(1)>; Wed, 12 Feb 1997 16:26:25 PST
Received: from localhost by crevenia.parc.xerox.com with SMTP id <177476>; Wed, 12 Feb 1997 16:24:04 -0800
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
cc: fenner@parc.xerox.com (Bill Fenner), josh@birdcage.mcom.com, tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close 
In-reply-to: Your message of "Wed, 12 Feb 97 14:57:13 PST."
             <m0vunc2-0005FcC@lightning.swansea.linux.org.uk> 
Date: Wed, 12 Feb 1997 16:23:52 PST
From: Bill Fenner <fenner@parc.xerox.com>
Message-Id: <97Feb12.162404pst.177476@crevenia.parc.xerox.com>
Sender: owner-tcp-impl
Precedence: bulk

In message <m0vunc2-0005FcC@lightning.swansea.linux.org.uk> Alan Cox wrote:
>Is T/TCP really an issue when so few stacks support it and so many break it

Well, hopefully the former quantity will increase and the latter
will decrease over time.  I don't think that IPv6 is an issue since
the transport protocols are unchanged other than the pseudo-header.

Even given the ratios the way they are, I would still strongly
recommend against making a protocol (or implementation) decision
that precludes the eventual deployment of T/TCP.

  Bill

From owner-tcp-impl  Thu Feb 13 08:08:03 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA18477 for tcp-impl-list; Thu, 13 Feb 1997 08:05:45 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA18461 for <tcp-impl@relay.engr.SGI.COM>; Thu, 13 Feb 1997 00:05:42 -0800
Received: from malatesta. (malatesta.spider.com [194.217.109.93]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id AAA17851 for <tcp-impl@relay.engr.SGI.COM>; Thu, 13 Feb 1997 00:05:40 -0800
Received: from malatesta by malatesta. (SMI-8.6/SMI-SVR4)
	id HAA00943; Thu, 13 Feb 1997 07:42:14 GMT
Message-ID: <3302C5D5.6FA@spider.com>
Date: Thu, 13 Feb 1997 07:42:13 +0000
From: Ian Heavens <ian@spider.com>
X-Mailer: Mozilla 3.01 (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
CC: rstevens@kohala.com, narten@raleigh.ibm.com, josh@birdcage.mcom.com,
        tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close
References: <m0vunCj-0005FcC@lightning.swansea.linux.org.uk>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

Alan Cox wrote:
> 
> > > Absolutely.  And if stacks don't do this today, it's time to
> > > name-that-vendor.
> > Have we decided whether it is OK to name vendors, at least on this
> > list?
> 
> I'd prefer we did, even if we agree not to name names off list. Certainly
> if any tcp issues come up where the "vendor name" is Linux please do mention
> it and cc me directly as well!
> 

OK.  I assume that unless there is agreement to change this, this is
restricted to the list.  The stack that misbehaves with RFC1122 half
duplex close (no RST) is the Netmanage stack 

ian

> Alan

-- 
Ian Heavens, Spider Software Ltd., http://www.spider.com/
4 John's Place, Leith, Edinburgh EH6 7EL. 
Tel +44 131 555 8448 fax. +44 131 555 8448.  ian@spider.com

From owner-tcp-impl  Thu Feb 13 11:09:10 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA03995 for tcp-impl-list; Thu, 13 Feb 1997 11:08:15 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id DAA03988 for <tcp-impl@relay.engr.SGI.COM>; Thu, 13 Feb 1997 03:08:12 -0800
Received: from c3po.mcom.com (h-205-217-237-46.netscape.com [205.217.237.46]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id DAA09614 for <tcp-impl@relay.engr.SGI.COM>; Thu, 13 Feb 1997 03:08:10 -0800
Received: from birdcage.mcom.com (birdcage.mcom.com [205.217.250.80]) by c3po.mcom.com (8.7.5/8.7.3) with SMTP id CAA24986; Thu, 13 Feb 1997 02:58:14 -0800 (PST)
Received: by birdcage.mcom.com (SMI-8.6/SMI-SVR4)
	id CAA00961; Thu, 13 Feb 1997 02:58:12 -0800
From: josh@birdcage.mcom.com (Josh Cohen)
Message-Id: <199702131058.CAA00961@birdcage.mcom.com>
Subject: Re: HTTP and RFC1122 half duplex close
To: ian@spider.com (Ian Heavens)
Date: Thu, 13 Feb 1997 02:58:12 -0800 (PST)
Cc: alan@lxorguk.ukuu.org.uk, rstevens@kohala.com, narten@raleigh.ibm.com,
        tcp-impl
In-Reply-To: <3302C5D5.6FA@spider.com> from "Ian Heavens" at Feb 13, 97 07:42:13 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

> 
> Alan Cox wrote:
> > 
> > I'd prefer we did, even if we agree not to name names off list. Certainly
Lets please do that, keeping the vendor name to ourselves.

-- 
-----------------------------------------------------------------------------
Josh Cohen				        Netscape Communications Corp.
Netscape Fire Department	       
Server Engineering
josh@netscape.com                       http://home.netscape.com/people/josh/
-----------------------------------------------------------------------------

From owner-tcp-impl  Thu Feb 13 11:34:10 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA05756 for tcp-impl-list; Thu, 13 Feb 1997 11:33:19 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id DAA05750 for <tcp-impl@relay.engr.SGI.COM>; Thu, 13 Feb 1997 03:33:16 -0800
Received: from malatesta. (malatesta.spider.com [194.217.109.93]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id DAA14242 for <tcp-impl@relay.engr.SGI.COM>; Thu, 13 Feb 1997 03:33:15 -0800
Received: from malatesta by malatesta. (SMI-8.6/SMI-SVR4)
	id LAA01434; Thu, 13 Feb 1997 11:09:14 GMT
Message-ID: <3302F65A.141@spider.com>
Date: Thu, 13 Feb 1997 11:09:14 +0000
From: Ian Heavens <ian@spider.com>
X-Mailer: Mozilla 3.01 (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: Josh Cohen <josh@birdcage.mcom.com>
CC: alan@lxorguk.ukuu.org.uk, rstevens@kohala.com, narten@raleigh.ibm.com,
        tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close
References: <199702131058.CAA00961@birdcage.mcom.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

Josh Cohen wrote:
> 
> >
> > Alan Cox wrote:
> > >
> > > I'd prefer we did, even if we agree not to name names off list. Certainly
> Lets please do that, keeping the vendor name to ourselves.
> 
>

Josh points out that I had no right to name the vendor in question here.
I think this is an overwhelming argument for keeping vendor identities
confidential to the list.  Otherwise bug reports will dry up very
quickly.

*sigh*, actually there is a good argument for not even mentioning
them on the list, since it is archived.  But people are unlikely to
cotton on to that.  


ian

From owner-tcp-impl  Thu Feb 13 12:11:37 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA08520 for tcp-impl-list; Thu, 13 Feb 1997 12:10:51 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id EAA08515 for <tcp-impl@relay.engr.SGI.COM>; Thu, 13 Feb 1997 04:10:49 -0800
Received: from c3po.mcom.com (h-205-217-237-46.netscape.com [205.217.237.46]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id EAA18190 for <tcp-impl@relay.engr.SGI.COM>; Thu, 13 Feb 1997 04:10:47 -0800
Received: from birdcage.mcom.com (birdcage.mcom.com [205.217.250.80]) by c3po.mcom.com (8.7.5/8.7.3) with SMTP id EAA25978; Thu, 13 Feb 1997 04:00:14 -0800 (PST)
Received: by birdcage.mcom.com (SMI-8.6/SMI-SVR4)
	id EAA01474; Thu, 13 Feb 1997 04:00:11 -0800
From: josh@birdcage.mcom.com (Josh Cohen)
Message-Id: <199702131200.EAA01474@birdcage.mcom.com>
Subject: Re: The trace ( HTTP/RFC1122 )
To: narten@raleigh.ibm.com (Thomas Narten)
Date: Thu, 13 Feb 1997 04:00:11 -0800 (PST)
Cc: tcp-impl
In-Reply-To: <9702121450.AA23128@ludwigia.raleigh.ibm.com> from "Thomas Narten" at Feb 12, 97 09:50:23 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

> Could you say a bit more about exactly how a client 'aborts' a
> connection, in terms of library/API calls? The application doesn't
I beleive the client is doing a close() to send a FIN.

> send FINs, it tells TCP to do so via an API call. I suspect that you
> mean it issues a 'half-duplex close', which is *supposed* to send RSTs
> and cause subsequent received data to trigger a RST.
> 
Umm. I dont think thats right.  A half close does not cause a 
RST to be sent.  Actually, I think that regardless of the
half or full close, what is sent on the wire is the same, a FIN.
half close is just the time that elapses after the client sends
the FIN and before the server sends its own FIN.

The API stuff for half or full close just controls the
clients ( in this case ) stack determining if the control 
block stays around or not.

In a half close situation, there should be no RSTs since
data is expected to continue to flow in the direction
left open.  An RST would prohibit that and tear down the
connection immediately on both sides.

> 
> > Im not suggesting any hacks to the protocols.
> 
> But it seems to like you are. How is it that the client can send a RST
> (the suggested workaround) without modifying the stack to do so? That
> would be modifying the protocol implemented by the stack. But if you
> have to modify the stack to get the desired behavior anyway, why not
> just fix the actual problem, so the workaround isn't needed anymore?
No, a client application can force a RST to be sent via using
the SO_LINGER socket option. Which is what netscape navigator does.
Unfortunately it isnt within our engineering power to change the
stack that we are run on.  We can however 'coerce' it to send the
RST via the SO_LINGER.

-- 
-----------------------------------------------------------------------------
Josh Cohen				        Netscape Communications Corp.
Netscape Fire Department	       
Server Engineering
josh@netscape.com                       http://home.netscape.com/people/josh/
-----------------------------------------------------------------------------

From owner-tcp-impl  Thu Feb 13 13:31:19 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA13008 for tcp-impl-list; Thu, 13 Feb 1997 13:30:24 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA12996 for <tcp-impl@relay.engr.SGI.COM>; Thu, 13 Feb 1997 05:30:20 -0800
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id FAA27884 for <tcp-impl@relay.engr.SGI.COM>; Thu, 13 Feb 1997 05:30:19 -0800
Received: from ftp.com by ftp.com  ; Thu, 13 Feb 1997 08:21:26 -0500
Received: from mailserv-2high.ftp.com by ftp.com  ; Thu, 13 Feb 1997 08:21:26 -0500
Received: by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id IAA05201; Thu, 13 Feb 1997 08:21:31 -0500
Date: Thu, 13 Feb 1997 08:21:31 -0500
Message-Id: <199702131321.IAA05201@MAILSERV-2HIGH.FTP.COM>
To: josh@birdcage.mcom.com
Subject: Re: HTTP and RFC1122 half duplex close
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: ian@spider.com, alan@lxorguk.ukuu.org.uk, rstevens@kohala.com,
        narten@raleigh.ibm.com, tcp-impl
Repository: mailserv-2high.ftp.com, [message accepted at Thu Feb 13 08:21:26 1997]
Originating-Client: tunes.ftp.com
Sender: owner-tcp-impl
Precedence: bulk


||> Alan Cox wrote:
||> > 
||> > I'd prefer we did, even if we agree not to name names off list. Certainly
||Lets please do that, keeping the vendor name to ourselves.
||
This is a public mailing list.  You don't know who has joined and
what their motives are.  Don't fool yourselfs into thinking that
"ourselves" is only good guys....

L.


From owner-tcp-impl  Thu Feb 13 14:47:03 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA18677 for tcp-impl-list; Thu, 13 Feb 1997 14:46:07 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id GAA18663 for <tcp-impl@relay.engr.SGI.COM>; Thu, 13 Feb 1997 06:46:04 -0800
Received: from c3po.mcom.com (h-205-217-237-46.netscape.com [205.217.237.46]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id GAA08846 for <tcp-impl@relay.engr.SGI.COM>; Thu, 13 Feb 1997 06:46:02 -0800
Received: from birdcage.mcom.com (birdcage.mcom.com [205.217.250.80]) by c3po.mcom.com (8.7.5/8.7.3) with SMTP id GAA28466; Thu, 13 Feb 1997 06:33:45 -0800 (PST)
Received: by birdcage.mcom.com (SMI-8.6/SMI-SVR4)
	id GAA03515; Thu, 13 Feb 1997 06:33:42 -0800
From: josh@birdcage.mcom.com (Josh Cohen)
Message-Id: <199702131433.GAA03515@birdcage.mcom.com>
Subject: Re: HTTP and RFC1122 half duplex close
To: backman@ftp.com
Date: Thu, 13 Feb 1997 06:33:41 -0800 (PST)
Cc: ian@spider.com, alan@lxorguk.ukuu.org.uk, rstevens@kohala.com,
        narten@raleigh.ibm.com, tcp-impl
In-Reply-To: <199702131321.IAA05201@MAILSERV-2HIGH.FTP.COM> from "Larry Backman" at Feb 13, 97 08:21:31 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

> 
> 
> ||> Alan Cox wrote:
> ||> > 
> ||> > I'd prefer we did, even if we agree not to name names off list. Certainly
> ||Lets please do that, keeping the vendor name to ourselves.
> ||
> This is a public mailing list.  You don't know who has joined and
> what their motives are.  Don't fool yourselfs into thinking that
> "ourselves" is only good guys....
Yes yes.. but at this point, it is all we can ask/do.
The netscape time machine is out for repair, so I cant undo whats happened :)


-- 
-----------------------------------------------------------------------------
Josh Cohen				        Netscape Communications Corp.
Netscape Fire Department	       
Server Engineering
josh@netscape.com                       http://home.netscape.com/people/josh/
-----------------------------------------------------------------------------

From owner-tcp-impl  Fri Feb 14 00:53:32 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA10170 for tcp-impl-list; Fri, 14 Feb 1997 00:51:51 GMT
Return-Path: <owner-tcp-impl>
Received: from odin.corp.sgi.com (odin.corp.sgi.com [192.26.51.194]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA10158 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 13 Feb 1997 16:51:49 -0800
Received: from sgi.sgi.com by odin.corp.sgi.com via ESMTP (951211.SGI.8.6.12.PATCH1502/951211.SGI)
	for <tcp-impl@relay.engr.SGI.COM> id PAA22561; Thu, 13 Feb 1997 15:22:01 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA11643 for <tcp-impl@relay.engr.SGI.COM>; Thu, 13 Feb 1997 15:21:53 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5/8.7.1) with UUCP id XAA29782; Thu, 13 Feb 1997 23:36:15 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0vv9Dy-0005FcC; Thu, 13 Feb 97 22:01 GMT
Message-Id: <m0vv9Dy-0005FcC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: HTTP and RFC1122 half duplex close
To: backman@ftp.com
Date: Thu, 13 Feb 1997 22:01:50 +0000 (GMT)
Cc: josh@birdcage.mcom.com, ian@spider.com, alan@lxorguk.ukuu.org.uk,
        rstevens@kohala.com, narten@raleigh.ibm.com, tcp-impl
In-Reply-To: <199702131321.IAA05201@MAILSERV-2HIGH.FTP.COM> from "Larry Backman" at Feb 13, 97 08:21:31 am
Content-Type: text
Sender: owner-tcp-impl
Precedence: bulk

> This is a public mailing list.  You don't know who has joined and
> what their motives are.  Don't fool yourselfs into thinking that
> "ourselves" is only good guys....

Good point, though in my current experience I learn far far more from
the "bad guys" than anyone about network layer bugs.

Alan


From owner-tcp-impl  Fri Feb 14 01:07:20 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA17903 for tcp-impl-list; Fri, 14 Feb 1997 01:05:38 GMT
Return-Path: <owner-tcp-impl>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA17891 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 13 Feb 1997 17:05:36 -0800
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA02344; Thu, 13 Feb 1997 17:03:54 -0800
Message-Id: <199702140103.RAA02344@refugee.engr.sgi.com>
X-Mailer: exmh version 2.0gamma 1/27/96
To: Ian Heavens <ian@spider.com>
Cc: Josh Cohen <josh@birdcage.mcom.com>, alan@lxorguk.ukuu.org.uk,
        rstevens@kohala.com, narten@raleigh.ibm.com, tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close 
In-reply-to: Message from ian@spider.com of 13 Feb 1997 11:09:14 GMT
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 13 Feb 1997 17:03:54 -0800
From: Steve Alexander <sca@refugee>
Sender: owner-tcp-impl
Precedence: bulk

Ian Heavens <ian@spider.com> writes:
>*sigh*, actually there is a good argument for not even mentioning
>them on the list, since it is archived.  But people are unlikely to
>cotton on to that.  

I believe that we still have no clear position from the lawyers, so I'd at
least say that we should not start publically mentioning vendors until we
have some guidelines, which will hopefully be Real Soon Now.

Speaking personally, I don't feel that knowing the vendor has contributed to
my understanding of the problem in any way.

However, we can't change the past, so let's just forget it, move on, and fix
TCP.

Thanks,
-- Steve



From owner-tcp-impl  Fri Feb 14 11:45:26 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA06148 for tcp-impl-list; Fri, 14 Feb 1997 11:44:21 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id DAA06128 for <tcp-impl@relay.engr.SGI.COM>; Fri, 14 Feb 1997 03:44:08 -0800
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id DAA00696 for <tcp-impl@relay.engr.SGI.COM>; Fri, 14 Feb 1997 03:44:03 -0800
Received: from ftp.com by ftp.com  ; Fri, 14 Feb 1997 06:38:18 -0500
Received: from mailserv-2high.ftp.com by ftp.com  ; Fri, 14 Feb 1997 06:38:18 -0500
Received: by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id GAA24911; Fri, 14 Feb 1997 06:38:25 -0500
Date: Fri, 14 Feb 1997 06:38:25 -0500
Message-Id: <199702141138.GAA24911@MAILSERV-2HIGH.FTP.COM>
To: sca@refugee
Subject: Re: HTTP and RFC1122 half duplex close 
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: ian@spider.com, josh@birdcage.mcom.com, alan@lxorguk.ukuu.org.uk,
        rstevens@kohala.com, narten@raleigh.ibm.com, tcp-impl
Repository: mailserv-2high.ftp.com, [message accepted at Fri Feb 14 06:38:19 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl
Precedence: bulk


||I believe that we still have no clear position from the lawyers, so I'd at
||least say that we should not start publically mentioning vendors until we
||have some guidelines, which will hopefully be Real Soon Now.
||
||Speaking personally, I don't feel that knowing the vendor has contributed to
||my understanding of the problem in any way.
||
..
..
there's another complexity to consider here.  The vendor in question
has as far as I can see, given up on TCP.  For the most part, the old
TCP on a PC vendor's have given up on the stack business on PC's and
are playing out the string w/ their existing customers, doing either
very little or nothing at all to their stacks.

Ignoring the business and ethical issues here consider that if you
identify a complex TCP problem to some of these vendors who still have
significant installed based they may not have anyone capable of going
into TCP and fiddling correctly with something like connection setup
or tear down correctly.

Those of us w/ somewhat long memories might recall the TCP market of the
late 80's when inferior ports of Bsd 4.2 abounded, had deficiencies which
we all coded around.  The Bsd OOB confusion is a perfect example as we
at least, and I'm sure most other stacks also, have code to do OOB either
Bsd style or correctly.

My point here is that we can fix TCP for TCP stacks which are actively
under development but we can't ignore the issues of interoperating
with and tolerating legacy stacks with legacy deficiencies.

L.



From owner-tcp-impl  Fri Feb 14 15:34:03 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA25158 for tcp-impl-list; Fri, 14 Feb 1997 15:32:54 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA25141 for <tcp-impl@relay.engr.SGI.COM>; Fri, 14 Feb 1997 07:32:51 -0800
Received: from ohnasn07.houston.omnes.net (ohnasn07.houston.omnes.net [163.185.18.226]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA03029 for <tcp-impl@relay.engr.SGI.COM>; Fri, 14 Feb 1997 07:32:50 -0800
Received: from [163.185.164.57] by ohnasn07.houston.omnes.net
          (post.office MTA v1.9.3 ID# 0-12122) with ESMTP id AAA6237;
          Fri, 14 Feb 1997 15:22:36 +0000
X-Sender: lodge@sndsn1.sedalia.sinet.slb.com
Message-Id: <v03010d03af2a300b8692@[163.185.164.57]>
In-Reply-To: <199702140103.RAA02344@refugee.engr.sgi.com>
References: Message from ian@spider.com of 13 Feb 1997 11:09:14 GMT
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Fri, 14 Feb 1997 09:24:45 -0600
To: Steve Alexander <sca@refugee>
From: Mathew Lodge <lodge@houston.omnes.net>
Subject: RE: HTTP and RFC1122 half duplex close
Cc: tcp-impl
Sender: owner-tcp-impl
Precedence: bulk

At 19:03 -0600 2/13/1997, Steve Alexander wrote:
>Speaking personally, I don't feel that knowing the vendor has contributed to
>my understanding of the problem in any way.
>
>However, we can't change the past, so let's just forget it, move on, and fix
>TCP.

While we're on the subject of vendors... isn't the best way to ensure that
good TCP implementations are widely used to explicitly involve the vendors
of popular OSes?

For example, if you could get (say) Microsoft and Sun involved, you'd cover
most of the clients and many of the servers on the Internet. Yes, I know
there are other popular boxes out there and Bill Gates is beelzebub, but
let's face it -- Windows has a huge installed user base and lots of folks
are using that Windows TCP/IP stack, and Sun also has a huge installed
server base.

Otherwise, I feel that this team will do excellent work which could well be
ignored.  After all, the remit of the group is:

>Description of Working Group:
>
>The objective of this group is to decide how to best address known
>problems in existing implementations of the current TCP standard(s) and
>practices.  The overall goal is to improve conditions in the existing
>Internet by enhancing the quality of current TCP/IP implementations. It
>is hoped that both performance and correctness issues can be resolved
>by making implementors aware of the problems and their solutions.  In
>the long term, it is felt that this will provide a reduction in
>unnecessary traffic on the network, the rate of connection failures due
>to protocol errors, and load on network servers due to time spent
>processing both unsuccessful connections and retransmitted data.  This
>will help to ensure the stability of the global Internet.

I believe that the only way to effectively improve conditions in the
existing Internet is to ensure that good implementations are in widespread
use.

Regards,

Mathew

PS: Please, no flameage about why [insert OS of choice] is better than
Windows / Solaris / [other OS]

--
Mathew Lodge			| "I think animal testing is a
mjlodge@iee.org		| terrible idea. They get all nervous
Omnes, Houston, Texas, USA	| and give the wrong answers"
Phone: +1 281 285 8158	| -- A Bit of Fry and Laurie



From owner-tcp-impl  Fri Feb 14 16:28:31 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA02218 for tcp-impl-list; Fri, 14 Feb 1997 16:27:22 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA02211; Fri, 14 Feb 1997 08:27:15 -0800
Received: from c3po.mcom.com (h-205-217-237-46.netscape.com [205.217.237.46]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA15264; Fri, 14 Feb 1997 08:27:08 -0800
Received: from birdcage.mcom.com (birdcage.mcom.com [205.217.250.80]) by c3po.mcom.com (8.7.5/8.7.3) with SMTP id IAA04622; Fri, 14 Feb 1997 08:17:06 -0800 (PST)
Received: by birdcage.mcom.com (SMI-8.6/SMI-SVR4)
	id IAA22725; Fri, 14 Feb 1997 08:17:03 -0800
From: josh@birdcage.mcom.com (Josh Cohen)
Message-Id: <199702141617.IAA22725@birdcage.mcom.com>
Subject: Re: HTTP and RFC1122 half duplex close
To: backman@ftp.com
Date: Fri, 14 Feb 1997 08:17:03 -0800 (PST)
Cc: sca@refugee, ian@spider.com, alan@lxorguk.ukuu.org.uk, rstevens@kohala.com,
        narten@raleigh.ibm.com, tcp-impl
In-Reply-To: <199702141138.GAA24911@MAILSERV-2HIGH.FTP.COM> from "Larry Backman" at Feb 14, 97 06:38:25 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

> there's another complexity to consider here.  The vendor in question
> has as far as I can see, given up on TCP.  For the most part, the old
> TCP on a PC vendor's have given up on the stack business on PC's and
> are playing out the string w/ their existing customers, doing either
> very little or nothing at all to their stacks.
> 
> Ignoring the business and ethical issues here consider that if you
> identify a complex TCP problem to some of these vendors who still have
> significant installed based they may not have anyone capable of going
> into TCP and fiddling correctly with something like connection setup
> or tear down correctly.

> My point here is that we can fix TCP for TCP stacks which are actively
> under development but we can't ignore the issues of interoperating
> with and tolerating legacy stacks with legacy deficiencies.
I wholeheartedly agree.  While, one might assume that their product
is at the end of its usefulness lifetime, there are still a ton
of Win3.1 boxes running stacks likes these..



-- 
-----------------------------------------------------------------------------
Josh Cohen				        Netscape Communications Corp.
Netscape Fire Department	       
Server Engineering
josh@netscape.com                       http://home.netscape.com/people/josh/
-----------------------------------------------------------------------------

From owner-tcp-impl  Fri Feb 14 16:33:13 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA03001 for tcp-impl-list; Fri, 14 Feb 1997 16:30:27 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA02926; Fri, 14 Feb 1997 08:30:18 -0800
Received: from c3po.mcom.com (h-205-217-237-46.netscape.com [205.217.237.46]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA15835; Fri, 14 Feb 1997 08:30:17 -0800
Received: from birdcage.mcom.com (birdcage.mcom.com [205.217.250.80]) by c3po.mcom.com (8.7.5/8.7.3) with SMTP id IAA04580; Fri, 14 Feb 1997 08:15:22 -0800 (PST)
Received: by birdcage.mcom.com (SMI-8.6/SMI-SVR4)
	id IAA22684; Fri, 14 Feb 1997 08:15:20 -0800
From: josh@birdcage.mcom.com (Josh Cohen)
Message-Id: <199702141615.IAA22684@birdcage.mcom.com>
Subject: Re: HTTP and RFC1122 half duplex close
To: lodge@houston.omnes.net (Mathew Lodge)
Date: Fri, 14 Feb 1997 08:15:20 -0800 (PST)
Cc: sca@refugee, tcp-impl
In-Reply-To: <v03010d03af2a300b8692@[163.185.164.57]> from "Mathew Lodge" at Feb 14, 97 09:24:45 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

> 
> At 19:03 -0600 2/13/1997, Steve Alexander wrote:
> >Speaking personally, I don't feel that knowing the vendor has contributed to
> >my understanding of the problem in any way.
> >
> >However, we can't change the past, so let's just forget it, move on, and fix
> >TCP.
> 
> While we're on the subject of vendors... isn't the best way to ensure that
> good TCP implementations are widely used to explicitly involve the vendors
> of popular OSes?
The vendor in question has been made aware of the problem.. 
However, a fix today will not fix the great number of people are running
the older, broken versions.
> 
> For example, if you could get (say) Microsoft and Sun involved, you'd cover
> most of the clients and many of the servers on the Internet. Yes, I know
> there are other popular boxes out there and Bill Gates is beelzebub, but
> let's face it -- Windows has a huge installed user base and lots of folks
> are using that Windows TCP/IP stack, and Sun also has a huge installed
> server base.
> 
I agree, however, in this case the broken software in question is a
PC client TCP/IP stack.  In the PC world, as compared to unix, there
is a tremendous amount of people who run stacks other than the OS one
supplied.  

> 
> Mathew
> 
> PS: Please, no flameage about why [insert OS of choice] is better than
> Windows / Solaris / [other OS]
Obligatory: Unix roolz MaStEr 3l33t d00d! *duck*

-- 
-----------------------------------------------------------------------------
Josh Cohen				        Netscape Communications Corp.
Netscape Fire Department	       
Server Engineering
josh@netscape.com                       http://home.netscape.com/people/josh/
-----------------------------------------------------------------------------

From owner-tcp-impl  Sat Feb 15 04:25:02 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id EAA03678 for tcp-impl-list; Sat, 15 Feb 1997 04:23:43 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id UAA03546 for <tcp-impl@relay.engr.sgi.com>; Fri, 14 Feb 1997 20:22:44 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id UAA13228 for <tcp-impl@relay.engr.sgi.com>; Fri, 14 Feb 1997 20:22:43 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id UAA03804; Fri, 14 Feb 1997 20:12:37 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id UAA09129; Fri, 14 Feb 1997 20:12:35 -0800
Received: from taipei.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id UAA08978; Fri, 14 Feb 1997 20:12:34 -0800
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id UAA04836; Fri, 14 Feb 1997 20:11:26 -0800
Date: Fri, 14 Feb 1997 20:11:26 -0800
From: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199702150411.UAA04836@taipei.eng.sun.com>
To: narten@raleigh.ibm.com, ian@spider.com, josh@birdcage.mcom.com,
        lodge@houston.omnes.net
Subject: RE: HTTP and RFC1122 half duplex close
Cc: tcp-impl
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl
Precedence: bulk

>From Thomas Narten:

>> 2. Client *stack* doesnt send RSTs to the servers data when its in
>> 	FIN_WAIT_X
>
>I actually think this is correct. The FIN_WAIT_x states imply that the
>application is still using the socket, waiting for the peer to finish
>sending data. If, on the other hand, the application had invoked
>'half-duplex close' (and has destroyed the socket), the client TCP
>should go into some state other than FIN_WAIT_x.

I don't think whether a tcp end-point still has a client associated
with it has anything to do with what tcp state the connection should
be in. The key is to send a RST to break a potential deadlock if data
still arrives for any connection whose client has ditched it already.

>> client closes with FIN, server ACKs: 
>> client in FIN-WAIT-2 advertising zero window (which makes a lot
>> of sense since it cannot read any more)
>
>Here is where we still seem to be having a disagreement. If the client
>has "destroyed" the socket and is no longer using the socket, then TCP
>will never be able to deliver additional received data to the client
>application. In this scenario, it is wrong for the client TCP to
>respond with an ACK and window advertisement of zero,

Again a RST and nothing else should be sent whenever data arrives
destined to a detached connection.

>From Ian Heavens:

>I think the close() call, or the Winsock equivalent, sends a FIN
>as normal, unless there is pending data to be read, in which
>case a RST is sent.  If data arrives after the FIN has been sent,
>a RST is sent (RFC1122).

Solaris does the just that.

>From Josh Cohen:

>> send FINs, it tells TCP to do so via an API call. I suspect that you
>> mean it issues a 'half-duplex close', which is *supposed* to send RSTs
>> and cause subsequent received data to trigger a RST.
>> 
>Umm. I dont think thats right.  A half close does not cause a 
>RST to be sent.  Actually, I think that regardless of the
>half or full close, what is sent on the wire is the same, a FIN.
>half close is just the time that elapses after the client sends
>the FIN and before the server sends its own FIN.

It looks like there is still some confusion between half-close and
'half-duplex close' here. The former is for a client to tell the other
end 'I have no more data to SEND', whereas the latter says
'I don't intend to RECEIVE any more data 'cause I'm going away.'
Unfortunately, TCP protocol only offers an explicit signal for the
former, namely 'FIN'. Therefore RST has to be called upon for the
latter when there is any potential risk of deadlock, e.g. data
is destined for a detached connection. (Of course the timer should
always be able to break a deadlock like this. But it could take a
long time...)

>From Mathew Lodge:

>While we're on the subject of vendors... isn't the best way to ensure that
>good TCP implementations are widely used to explicitly involve the vendors
>of popular OSes?
>
>For example, if you could get (say) Microsoft and Sun involved, you'd cover
>most of the clients and many of the servers on the Internet. Yes, I know
>there are other popular boxes out there and Bill Gates is beelzebub, but
>let's face it -- Windows has a huge installed user base and lots of folks
>are using that Windows TCP/IP stack, and Sun also has a huge installed
>server base.

You can rest assured that we (Sun) are very involved :-).

Jerry Chu
Internet Engineering
SunSoft

From owner-tcp-impl  Tue Feb 18 21:54:14 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA15517 for tcp-impl-list; Tue, 18 Feb 1997 21:52:41 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA15504 for <tcp-impl@engr.sgi.com>; Tue, 18 Feb 1997 13:52:39 -0800
Received: from zorch.w3.org (zorch.w3.org [18.29.0.62]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id NAA27876 for <tcp-impl@engr.sgi.com>; Tue, 18 Feb 1997 13:52:37 -0800
From: jg@zorch.w3.org
Received: by zorch.w3.org; id AA29022; Tue, 18 Feb 1997 16:48:44 -0500
Message-Id: <9702182148.AA29022@zorch.w3.org>
To: tcp-impl
Subject: Re: HTTP and RFC1122 half duplex close
Date: Tue, 18 Feb 97 16:48:44 -0500
X-Mts: smtp
Sender: owner-tcp-impl
Precedence: bulk

I'm just joining this mailing list, and have been scanning (as opposed
to carefully reading) the archive...  at 300K, it is a bit much
to catch up on.

When discussing HTTP behavior, I'd like to remind people that
HTTP behavior will be evolving greatly over the next year
or two, with the deployment of HTTP/1.1. How long it will take
to get rid of HTTP/1.0 and the problems it causes is the next
interesting question... 

HTTP/1.1, if correctly implemented, will dramatically
change HTTP's behavior and should end its abuse of TCP.  Any suggestions
about where to go from here with HTTP should be rethought in this light.
Extrapolating much from current data on HTTP should be looked at 
very carefully; your presumptions are likely all wrong...

We've recently put together a paper outlining our experiences
and took quite a bit of data of running HTTP/1.1 implementations
(both Jigsaw and Apache), which you will find in the paper.
(including large amounts of tcpdump traces of our test site; if you
are on UNIX and install Tim Shepherd's xplot, you can go from
the tabular results in the paper, to the summary of all of our runs
of data, to xplots of each run of data we took.)

And certainly we saw bugs in TCP (in our particular case, Solaris
had problems; Sun has since been kind enough to get us patches...).

Thumbnail sketch: HTTP/1.1 over a single, buffered, pipelined
TCP connection outperformed HTTP/1.0, with or without keep-alives,
using N connections (N was 6 in our tests) under all the tests we ran...

Note that we found that servers may need to close the incoming
side of its connection; details are in the paper...

We also believe that by judicious use of range requests in HTTP,
a browser really ought not to be throwing away connections
very much at all...

Also note that HTTP/1.0 PUT has had various problems dealing with scenarios
having to do with resets getting sent causing data to be discarded
before being delivered to applications; hopefully HTTP/1.1 gets
these right....  Henrik Frystyk Nielsen spent quite a while 
understanding the problem (and talked to Dave Clark to confirm)
all that is going on.
			Jim Gettys
			Digital Equipment Corporation
			Visiting Scientist, W3C
			Editor of the HTTP/1.1 specification
			 for the HTTP working group...

Here's the abstract of the paper....
The paper can be found at:
http://www.w3.org/pub/WWW/Protocols/HTTP/Performance/Pipeline.html

Abstract 

We describe our investigation of the effect of persistent connections,
pipelining and link level document compression on our client and server
HTTP implementations. A simple test setup is used to verify HTTP/1.1's
design and understand HTTP/1.1 implementation strategies. We
present TCP and real time performance data between the libwww robot
and both the Jigsaw and Apache HTTP servers using HTTP/1.0,
HTTP/1.1 with persistent connections, HTTP/1.1 with pipelined
requests, and HTTP/1.1 with pipelined requests and deflate data
compression [22]. We also investigate whether the TCP Nagle algorithm
has an effect on HTTP/1.1 performance. While somewhat artificial and
possibly overstating the benefits of HTTP/1.1, we believe the tests and
results approximate some common behavior seen in browsers. The
results confirm that HTTP/1.1 is meeting its major design goals. Our
experience has been that implementation details are very important to
achieve all of the benefits of HTTP/1.1. 

For all our tests, a pipelined HTTP/1.1 implementation outperformed
HTTP/1.0, even when the HTTP/1.0 implementation used multiple
connections in parallel, under all network environments tested. The
savings were at least a factor of two, and sometimes as much as than a
factor of ten, in terms of packets transmitted. Elapsed time
improvement is less dramatic, and strongly depends on your network
connection. 

Note that the savings in network traffic and performance shown in this
document are solely due to the effects of pipelining, persistent
connections and transport compression. Some data is presented showing
further savings possible by the use of CSS1 style sheets [10], and the
more compact PNG [20] image representation that are enabled by
recent recommendations at higher levels than the base protocol. Time
did not allow full end to end data collection on these cases. The results
show that HTTP/1.1 and changes in Web content will have dramatic
results in Internet and Web performance as HTTP/1.1 and related
technologies deploy over the near future. Universal use of style sheets,
even without deployment of HTTP/1.1, would cause a very significant
reduction in network traffic. 

This paper does not investigate further performance and network
savings enabled by the improved caching facilities provided by the
HTTP/1.1 protocol, or by sophisticated use of range requests. 

From owner-tcp-impl  Wed Feb 19 21:23:07 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA04789 for tcp-impl-list; Wed, 19 Feb 1997 21:21:43 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA04779 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Feb 1997 13:21:39 -0800
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA13864 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Feb 1997 13:21:33 -0800
Received: (from mouse@localhost) by Twig.Rodents.Montreal.QC.CA (8.7.5/8.7.3) id QAA23602; Wed, 19 Feb 1997 16:17:34 -0500 (EST)
Date: Wed, 19 Feb 1997 16:17:34 -0500 (EST)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199702192117.QAA23602@Twig.Rodents.Montreal.QC.CA>
To: tcp-impl
Subject: OOB [was Re: HTTP and RFC1122 half duplex close]
Sender: owner-tcp-impl
Precedence: bulk

To pick up on a site comment, tangential to the main point of the note
I'm replying to,

> [...late '80s...4.2...]  The Bsd OOB confusion is a perfect example
> as we at least, and I'm sure most other stacks also, have code to do
> OOB either Bsd style or correctly.

I don't think there _is_ any "correctly".  TCP does not have OOB.  What
it has is an urgent pointer.  Some grad student who must have been
either on drugs or on a minimal understanding of TCP thought it would
be useful to take the byte the urgent pointer points to and treat it as
a byte in an out-of-band channel.

This almost works.  If you send one byte at time, with low data rates
compared to the network bandwidth*latency product, it will work fine.

But as soon as you try to send high data rates, use it over a lossy
network, or certain other circumstances you can figure out by reading
the RFC specifying what TCP really has, you'll see that at a minimum
the urgent pointer can be advanced to a later "OOB" byte before any
packet is emitted with the urgent pointer pointing to the earlier "OOB"
byte, which results in merging the previous byte into the non-"OOB"
data stream.  It can probably fail in other ways too; I haven't thought
about it much beyond determining that it definitely is broken.

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl  Wed Feb 19 22:39:36 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA24317 for tcp-impl-list; Wed, 19 Feb 1997 22:37:50 GMT
Return-Path: <owner-tcp-impl>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA24299 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 19 Feb 1997 14:37:49 -0800
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA28187; Wed, 19 Feb 1997 14:37:24 -0800
Message-Id: <199702192237.OAA28187@refugee.engr.sgi.com>
X-Mailer: exmh version 2.0gamma 1/27/96
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
Cc: tcp-impl
Subject: Re: OOB [was Re: HTTP and RFC1122 half duplex close] 
In-reply-to: Message from mouse@Rodents.Montreal.QC.CA of 19 Feb 1997 16:17:34 
 EST
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 19 Feb 1997 14:37:23 -0800
From: Steve Alexander <sca@refugee>
Sender: owner-tcp-impl
Precedence: bulk

der Mouse <mouse@Rodents.Montreal.QC.CA> writes:
>I don't think there _is_ any "correctly".  TCP does not have OOB.  What
>it has is an urgent pointer.  Some grad student who must have been
>either on drugs or on a minimal understanding of TCP thought it would
>be useful to take the byte the urgent pointer points to and treat it as
>a byte in an out-of-band channel.

I think the issue probably has more to do with interpreting what the urgent
pointer means.  If I remember correctly, 793 was ambiguous (it said two
different things in two different places) and BSD picked the "wrong" one
(having just re-read it, I probably would have too).  If you follow 1122, then
you disagree with BSD by one byte, which is a real pain.  I don't know why the
authors of 1122 didn't just admit defeat and codify the BSD practice ;->.

-- Steve



From owner-tcp-impl@relay.engr.sgi.com  Fri Feb 21 08:49:37 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA23663 for tcp-impl-list; Fri, 21 Feb 1997 08:47:37 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA23637 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Feb 1997 08:47:34 -0800
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id IAA00752 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Feb 1997 08:47:31 -0800
Received: from ftp.com by ftp.com  ; Wed, 19 Feb 1997 18:57:42 -0500
Received: from mailserv-2high.ftp.com by ftp.com  ; Wed, 19 Feb 1997 18:57:42 -0500
Received: by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id SAA27861; Wed, 19 Feb 1997 18:57:48 -0500
Date: Wed, 19 Feb 1997 18:57:48 -0500
Message-Id: <199702192357.SAA27861@MAILSERV-2HIGH.FTP.COM>
To: sca@refugee
Subject: Re: OOB [was Re: HTTP and RFC1122 half duplex close] 
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: mouse@Rodents.Montreal.QC.CA, tcp-impl
Repository: mailserv-2high.ftp.com, [message accepted at Wed Feb 19 18:57:47 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||>I don't think there _is_ any "correctly".  TCP does not have OOB.  What
||>it has is an urgent pointer.  Some grad student who must have been
||>either on drugs or on a minimal understanding of TCP thought it would
||>be useful to take the byte the urgent pointer points to and treat it as
||>a byte in an out-of-band channel.
||
||I think the issue probably has more to do with interpreting what the urgent
||pointer means.  If I remember correctly, 793 was ambiguous (it said two
||different things in two different places) and BSD picked the "wrong" one
||(having just re-read it, I probably would have too).  If you follow 1122, then
||you disagree with BSD by one byte, which is a real pain.  I don't know why the
||authors of 1122 didn't just admit defeat and codify the BSD practice ;->.
||
ah, we, er.  I worked w/ James B Van Bokkolen for 3 years.  Religion was 
far more important than practicality in those days...




From owner-tcp-impl  Tue Feb 25 22:15:28 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA21157 for tcp-impl-list; Tue, 25 Feb 1997 22:13:26 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA21149 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Feb 1997 14:13:20 -0800
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA18715 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Feb 1997 14:13:11 -0800
Received: (from mouse@localhost) by Twig.Rodents.Montreal.QC.CA (8.7.5/8.7.3) id QAA13987; Tue, 25 Feb 1997 16:34:34 -0500 (EST)
Date: Tue, 25 Feb 1997 16:34:34 -0500 (EST)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199702252134.QAA13987@Twig.Rodents.Montreal.QC.CA>
To: tcp-impl
Subject: Re: OOB [was Re: HTTP and RFC1122 half duplex close]
Sender: owner-tcp-impl
Precedence: bulk

>> I don't think there _is_ any "correctly".  TCP does not have OOB.
>> What it has is an urgent pointer.  [...]
> I think the issue probably has more to do with interpreting what the
> urgent pointer means.

The issue of incompatability between two different ways of implementing
this pseudo-OOB, yes, you're right.  My point is, there's _no_ correct
way of implementing OOB over TCP; the underlying mechanisms available
simply cannot support the API the implementations purport to offer.
Discussing whether the byte before or after the urgent pointer is the
OOB byte is simply discussing whether to cover a car's seats with
upholstery or leather, while not noticing they haven't been bolted to
the floor.

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl  Thu Feb 27 01:09:55 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA27009 for tcp-impl-list; Thu, 27 Feb 1997 01:06:56 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA26969 for <TCP-IMPL@ENGR.SGI.COM>; Wed, 26 Feb 1997 17:06:53 -0800
Received: from TGV.COM (Dr-Seuss.cisco.com [161.44.128.70]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id RAA02648 for <TCP-IMPL@ENGR.SGI.COM>; Wed, 26 Feb 1997 17:06:52 -0800
Received: from HQ.Cisco.COM ([161.44.128.147]) by TGV.COM via INTERNET ;
          Wed, 26 Feb 1997 17:06:35 PST
Received: by HQ.Cisco.COM with VMSmail for TCP-IMPL@ENGR.SGI.COM; Wed, 26 Feb
 1997 17:06:34 -0800
Date: Wed, 26 Feb 1997 17:06:34 -0800
From: gkn@Cisco.COM (Gerard K. Newman)
To: TCP-IMPL
Message-Id: <970226170634.20202328@Cisco.COM>
Sender: owner-tcp-impl
Precedence: bulk

subscribe tcp-impl

From owner-tcp-impl  Fri Feb 28 19:39:03 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA00361 for tcp-impl-list; Fri, 28 Feb 1997 19:27:57 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA00333 for <tcp-impl@engr.sgi.com>; Fri, 28 Feb 1997 11:27:47 -0800
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id LAA03260 for <tcp-impl@engr.sgi.com>; Fri, 28 Feb 1997 11:27:43 -0800
Received: from ftp.com by ftp.com  ; Fri, 28 Feb 1997 14:23:57 -0500
Received: from mailserv-2high.ftp.com by ftp.com  ; Fri, 28 Feb 1997 14:23:57 -0500
Received: by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id OAA15631; Fri, 28 Feb 1997 14:21:24 -0500
Date: Fri, 28 Feb 1997 14:21:24 -0500
Message-Id: <199702281921.OAA15631@MAILSERV-2HIGH.FTP.COM>
To: tcp-impl
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Repository: mailserv-2high.ftp.com, [message accepted at Fri Feb 28 14:20:18 1997]
Originating-Client: tunes.ftp.com
Sender: owner-tcp-impl
Precedence: bulk

who tcp-impl



From owner-tcp-impl  Mon Mar  3 12:16:53 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA19751 for tcp-impl-list; Mon, 3 Mar 1997 12:15:29 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id EAA19743 for <tcp-impl@engr.sgi.com>; Mon, 3 Mar 1997 04:15:27 -0800
Received: from teil.soft.net (tata_elxsi.soft.net [164.164.10.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id EAA08113 for <tcp-impl@engr.sgi.com>; Mon, 3 Mar 1997 04:14:02 -0800
Received: by teil.soft.net (940816.SGI.8.6.9/940406.SGI)
	for tcp-impl@engr.sgi.com id RAA26665; Mon, 3 Mar 1997 17:40:20 -0530
From: pnv@teil.soft.net (PANKAJ N VYAS)
Message-Id: <199703032310.RAA26665@teil.soft.net>
Subject: subscribe
To: tcp-impl
Date: Mon, 3 Mar 1997 17:40:19 -0530 (IST)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 356       
Sender: owner-tcp-impl
Precedence: bulk

subscribe tcp-impl
-- 
-------------------------------------------------------------------------------
Pankaj N. Vyas				
Senior Engineer - D & D
TATA ELXSI(INDIA) LIMITED	
Tel: 8452016/ 8452017/ 8452185	
Fax: 91 80 8452019		
Tlx: 0845 8522 RISC IN
E.Mail: pnv@teil.soft.net	
-------------------------------------------------------------------------------

From owner-tcp-impl  Fri Mar  7 06:25:44 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id GAA09378 for tcp-impl-list; Fri, 7 Mar 1997 06:24:27 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id WAA09373 for <tcp-impl@relay.engr.SGI.COM>; Thu, 6 Mar 1997 22:24:24 -0800
Received: from netcom14.netcom.com (netcom14.netcom.com [192.100.81.126]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id WAA02683 for <tcp-impl@relay.engr.SGI.COM>; Thu, 6 Mar 1997 22:24:23 -0800
Received: (from kck@localhost) by netcom14.netcom.com (8.6.13/Netcom)
	id WAA22783; Thu, 6 Mar 1997 22:20:37 -0800
Date: Thu, 6 Mar 1997 22:20:37 -0800
From: kck@netcom.com (Richard Fox)
Message-Id: <199703070620.WAA22783@netcom14.netcom.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Keep-Alive size
Sender: owner-tcp-impl
Precedence: bulk

Here is a question that has potential for having philosophical
views.

Should a TCP stack ACK a zero length Keep-Alive packet?

rfc1122 is very ambiguous on this matter.

The rfc states that an implementation should not ACK an ACK only
packet, which is what a zero length Keep-Alive is.
It states that a Keep-Alive is not part of the standard but is
an engineering hack (or something to that fact). It then goes on
to say that a stack should ACK any form of Keep-Alive but that a
stack should have a tuneable parameter to be able to set the size
of the Keep-Alive in case a remote end does not support zero length
Keep-Alives.

This all seems to imply that Keep-Alive packets should contain at
least one byte of data. However, there is at least one stack out
there which only sends zero length Keep-Alives and there is a stack
out there which does not ACK them. Both stacks seem to be conformant
to rfc1122 but the behaviour is not the desired one at all.

Any opinions?

--thanks rich



From owner-tcp-impl  Sat Mar  8 14:53:55 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA13740 for tcp-impl-list; Sat, 8 Mar 1997 14:52:35 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id GAA13734 for <tcp-impl@relay.engr.SGI.COM>; Sat, 8 Mar 1997 06:52:33 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id GAA04063 for <tcp-impl@relay.engr.SGI.COM>; Sat, 8 Mar 1997 06:52:29 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5/8.7.1) with UUCP id OAA08291; Sat, 8 Mar 1997 14:50:22 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0w35oX-0005FcC; Fri, 7 Mar 97 20:00 GMT
Message-Id: <m0w35oX-0005FcC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Keep-Alive size
To: kck@netcom.com (Richard Fox)
Date: Fri, 7 Mar 1997 20:00:25 +0000 (GMT)
Cc: tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199703070620.WAA22783@netcom14.netcom.com> from "Richard Fox" at Mar 6, 97 10:20:37 pm
Content-Type: text
Sender: owner-tcp-impl
Precedence: bulk

> This all seems to imply that Keep-Alive packets should contain at
> least one byte of data. However, there is at least one stack out
> there which only sends zero length Keep-Alives and there is a stack
> out there which does not ACK them. Both stacks seem to be conformant
> to rfc1122 but the behaviour is not the desired one at all.

Basically all bets are off. The best thing to do on the send side is clearly
to resend the last byte.


From owner-tcp-impl  Sat Mar  8 18:05:10 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA26705 for tcp-impl-list; Sat, 8 Mar 1997 18:03:49 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA26699 for <tcp-impl@relay.engr.SGI.COM>; Sat, 8 Mar 1997 10:03:46 -0800
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA28403 for <tcp-impl@relay.engr.SGI.COM>; Sat, 8 Mar 1997 10:03:42 -0800
Received: (from mouse@localhost) by Twig.Rodents.Montreal.QC.CA (8.7.5/8.7.3) id MAA21194; Sat, 8 Mar 1997 12:59:48 -0500 (EST)
Date: Sat, 8 Mar 1997 12:59:48 -0500 (EST)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199703081759.MAA21194@Twig.Rodents.Montreal.QC.CA>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: Keep-Alive size
Sender: owner-tcp-impl
Precedence: bulk

>> This all seems to imply that Keep-Alive packets should contain at
>> least one byte of data.  However, there is at least one stack out
>> there which only sends zero length Keep-Alives and there is a stack
>> out there which does not ACK them.  Both stacks seem to be
>> conformant to rfc1122 but the behaviour is not the desired one at
>> all.

> Basically all bets are off.  The best thing to do on the send side is
> clearly to resend the last byte.

What if there is no last byte (ie, if that direction has not sent any
data since the initial SYN/SYN-ACK/ACK exchange)?

Would it do any harm to invent a fictitious last byte and send it?  In
this condition it's guaranteed to be outside the receiver's idea of
valid sequence numbers, since it duplicates the SYN's sequence number.

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl  Sat Mar  8 19:28:51 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA03462 for tcp-impl-list; Sat, 8 Mar 1997 19:27:39 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA03452 for <tcp-impl@relay.engr.SGI.COM>; Sat, 8 Mar 1997 11:27:36 -0800
Received: from labinfo.iet.unipi.it (labinfo.iet.unipi.it [131.114.9.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA09920 for <tcp-impl@relay.engr.SGI.COM>; Sat, 8 Mar 1997 11:27:31 -0800
Received: from localhost (luigi@localhost) by labinfo.iet.unipi.it (8.6.5/8.6.5) id TAA22453; Sat, 8 Mar 1997 19:39:33 +0100
From: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Message-Id: <199703081839.TAA22453@labinfo.iet.unipi.it>
Subject: Re: Keep-Alive size
To: mouse@Rodents.Montreal.QC.CA (der Mouse)
Date: Sat, 8 Mar 1997 19:39:32 +0100 (MET)
Cc: tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199703081759.MAA21194@Twig.Rodents.Montreal.QC.CA> from "der Mouse" at Mar 8, 97 12:59:29 pm
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 1698      
Sender: owner-tcp-impl
Precedence: bulk

> >> This all seems to imply that Keep-Alive packets should contain at
> >> least one byte of data.  However, there is at least one stack out
> >> there which only sends zero length Keep-Alives and there is a stack
> >> out there which does not ACK them.  Both stacks seem to be
> >> conformant to rfc1122 but the behaviour is not the desired one at
> >> all.
> 
> > Basically all bets are off.  The best thing to do on the send side is
> > clearly to resend the last byte.
> 
> What if there is no last byte (ie, if that direction has not sent any
> data since the initial SYN/SYN-ACK/ACK exchange)?
> 
> Would it do any harm to invent a fictitious last byte and send it?  In
> this condition it's guaranteed to be outside the receiver's idea of
> valid sequence numbers, since it duplicates the SYN's sequence number.

perhaps the correct wording should have been "resend the item with the
last acked sequence number -- be it a data byte or a SYN.

However another problem comes to mind -- the sender might not have the
"last byte" available anymore since, once acked, data can be flushed
(except in the case of a SYN of course). One can send a random byte of
course, but then the receiver might really become suspicious....

This suggests that acking zero-sized Keep-Alives should be mandatory.

	Luigi
-----------------------------+--------------------------------------
Luigi Rizzo                  |  Dip. di Ingegneria dell'Informazione
email: luigi@iet.unipi.it    |  Universita' di Pisa
tel: +39-50-568533           |  via Diotisalvi 2, 56126 PISA (Italy)
fax: +39-50-568522           |  http://www.iet.unipi.it/~luigi/
_____________________________|______________________________________

From owner-tcp-impl  Sat Mar  8 19:45:00 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA05115 for tcp-impl-list; Sat, 8 Mar 1997 19:43:49 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA05106 for <tcp-impl@relay.engr.SGI.COM>; Sat, 8 Mar 1997 11:43:47 -0800
Received: from labinfo.iet.unipi.it (labinfo.iet.unipi.it [131.114.9.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA11985 for <tcp-impl@relay.engr.SGI.COM>; Sat, 8 Mar 1997 11:43:44 -0800
Received: from localhost (luigi@localhost) by labinfo.iet.unipi.it (8.6.5/8.6.5) id TAA22542 for tcp-impl@relay.engr.SGI.COM; Sat, 8 Mar 1997 19:56:07 +0100
From: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Message-Id: <199703081856.TAA22542@labinfo.iet.unipi.it>
Subject: retransmit count ...
To: tcp-impl@relay.engr.SGI.COM (tcp_impl)
Date: Sat, 8 Mar 1997 19:56:07 +0100 (MET)
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 986       
Sender: owner-tcp-impl
Precedence: bulk

About one year ago, on some tcp-related list, it was mentioned that
some BSD implementation of TCP failed to reset the counter of the
number of retransmissions (t_rxtshift) in some circumstances.
I have lost the reference, though, and do not remember exactly the
details.

Does someone remembers the details of the problem (which version suffer
of that, what conditions trigger it) ?

I was particularly curious to know if typical HTTP transactions
(one data packet in one direction, followed by a response in the
other one) might trigger this bug more than other applications.

	Thanks
	Luigi
-----------------------------+--------------------------------------
Luigi Rizzo                  |  Dip. di Ingegneria dell'Informazione
email: luigi@iet.unipi.it    |  Universita' di Pisa
tel: +39-50-568533           |  via Diotisalvi 2, 56126 PISA (Italy)
fax: +39-50-568522           |  http://www.iet.unipi.it/~luigi/
_____________________________|______________________________________

From owner-tcp-impl  Sat Mar  8 20:03:55 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA06126 for tcp-impl-list; Sat, 8 Mar 1997 19:57:30 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA06120 for <tcp-impl@relay.engr.SGI.COM>; Sat, 8 Mar 1997 11:57:28 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA13713 for <tcp-impl@relay.engr.SGI.COM>; Sat, 8 Mar 1997 11:57:24 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id LAA17498; Sat, 8 Mar 1997 11:47:24 -0800 (PST)
Message-Id: <199703081947.LAA17498@daffy.ee.lbl.gov>
To: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Cc: mouse@Rodents.Montreal.QC.CA (der Mouse), tcp-impl@relay.engr.SGI.COM
Subject: Re: Keep-Alive size
In-reply-to: Your message of Sat, 08 Mar 1997 19:39:32 PST.
Date: Sat, 08 Mar 1997 11:47:23 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

> This suggests that acking zero-sized Keep-Alives should be mandatory.

I'm confused about how to distinguish these from "pure" acks, which
shouldn't be acked.

		Vern

From owner-tcp-impl  Sun Mar  9 13:16:38 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA16372 for tcp-impl-list; Sun, 9 Mar 1997 13:15:00 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA16366 for <tcp-impl@relay.engr.SGI.COM>; Sun, 9 Mar 1997 05:14:57 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id FAA08336 for <tcp-impl@relay.engr.SGI.COM>; Sun, 9 Mar 1997 05:14:54 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5/8.7.1) with UUCP id NAA15102; Sun, 9 Mar 1997 13:12:41 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0w3Vab-0005FcC; Sat, 8 Mar 97 23:31 GMT
Message-Id: <m0w3Vab-0005FcC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Keep-Alive size
To: mouse@Rodents.Montreal.QC.CA (der Mouse)
Date: Sat, 8 Mar 1997 23:31:44 +0000 (GMT)
Cc: tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199703081759.MAA21194@Twig.Rodents.Montreal.QC.CA> from "der Mouse" at Mar 8, 97 12:59:48 pm
Content-Type: text
Sender: owner-tcp-impl
Precedence: bulk

> What if there is no last byte (ie, if that direction has not sent any
> data since the initial SYN/SYN-ACK/ACK exchange)?

That is an interesting question. 

> Would it do any harm to invent a fictitious last byte and send it?  In
> this condition it's guaranteed to be outside the receiver's idea of
> valid sequence numbers, since it duplicates the SYN's sequence number.

You couldnt do that if real data was to follow in time, nor could you send the
byte after the byte next in the case the other end had shut down the sending
side (it may well send you an RST).


From owner-tcp-impl  Sun Mar  9 13:16:40 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA16434 for tcp-impl-list; Sun, 9 Mar 1997 13:15:09 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA16407 for <tcp-impl@relay.engr.SGI.COM>; Sun, 9 Mar 1997 05:15:08 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id FAA08348 for <tcp-impl@relay.engr.SGI.COM>; Sun, 9 Mar 1997 05:15:05 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5/8.7.1) with UUCP id NAA15107; Sun, 9 Mar 1997 13:13:11 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0w3Vh8-0005FcC; Sat, 8 Mar 97 23:38 GMT
Message-Id: <m0w3Vh8-0005FcC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Keep-Alive size
To: luigi@labinfo.iet.unipi.it (Luigi Rizzo)
Date: Sat, 8 Mar 1997 23:38:30 +0000 (GMT)
Cc: mouse@Rodents.Montreal.QC.CA, tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199703081839.TAA22453@labinfo.iet.unipi.it> from "Luigi Rizzo" at Mar 8, 97 07:39:32 pm
Content-Type: text
Sender: owner-tcp-impl
Precedence: bulk

> This suggests that acking zero-sized Keep-Alives should be mandatory.

Sure so how do I tell a keepalive from an empty ack frame. There are stacks
out their that go into tcp food fights if you are too cavalier about acking
stuff. Not many but enough. 

Alan


From owner-tcp-impl  Mon Mar 10 10:49:28 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA00689 for tcp-impl-list; Mon, 10 Mar 1997 10:48:13 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id CAA00682 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 02:48:11 -0800
Received: from fly.cnuce.cnr.it (fly.cnuce.cnr.it [131.114.192.86]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id CAA19642 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 02:48:07 -0800
Received: by fly.cnuce.cnr.it (Smail3.1.26.7 #3)
	id m0w42aD-00004tC; Mon, 10 Mar 97 11:45 MET
Message-Id: <m0w42aD-00004tC@fly.cnuce.cnr.it>
Date: Mon, 10 Mar 97 11:45 MET
From: Francesco Potorti` <F.Potorti@cnuce.cnr.it>
To: IETF TCP implementation <tcp-impl@relay.engr.SGI.COM>
In-reply-to: <199703081759.MAA21194@Twig.Rodents.Montreal.QC.CA> (mouse@Rodents.Montreal.QC.CA)
Subject: Re: Keep-Alive size
Organization: CNUCE-CNR, Via S.Maria 36, Pisa - Italy +39-50-593211
Sender: owner-tcp-impl
Precedence: bulk

Someone (sorry, I lost the original message) wrote:

   Basically all bets are off.  The best thing to do on the send side
   is clearly to resend the last byte.

Well, why?  RFC 1122 speaks about a "garbage octet", and that looks
fine to me.  The important thing is that, as stated by 1122, SEG.SEQ =
SND.NXT-1, which is always outside the receiver's window, so the data
will not even be looked at.

der Mouse <mouse@Rodents.Montreal.QC.CA> wrote:
   
   What if there is no last byte (ie, if that direction has not sent any
   data since the initial SYN/SYN-ACK/ACK exchange)?

>From what I said above, it does not matter.
   
   Would it do any harm to invent a fictitious last byte and send it?
   In this condition it's guaranteed to be outside the receiver's idea
   of valid sequence numbers, since it duplicates the SYN's sequence
   number.

Exactly.

Then Luigi Rizzo <luigi@labinfo.iet.unipi.it> added:

   perhaps the correct wording should have been "resend the item with
   the last acked sequence number -- be it a data byte or a SYN.

   However another problem comes to mind -- the sender might not have
   the "last byte" available anymore since, once acked, data can be
   flushed (except in the case of a SYN of course). One can send a
   random byte of course, but then the receiver might really become
   suspicious....

Indeed, always sending a garbage byte of data should do no harm,
because it would be simply discarded.  So no problems exist about
the sender having no copy of the last byte sent, or the receiver
doing any checks and becoming suspicious.

   This suggests that acking zero-sized Keep-Alives should be
   mandatory.

This is what rfc 793 and rfc 1122 imply.  

However, rfc 1122 also states that some implementations fail to
acknowledge an out-of-window segment with non data.  In my opinion,
the obvious consequence is that, contrary to what rfc 1122 recommends,
keepalives should always contain data.

To this Vern Paxson <vern@ee.lbl.gov> responded:

   I'm confused about how to distinguish [zero-sized keepalives] from
   "pure" acks, which shouldn't be acked.

Normal acks with no data are inside the receive window, while
zero-sized keepalives are not.

-- 
Francesco Potorti` (researcher)        Voice:    +39-50-593203
Computer Network Division              Operator: +39-50-593211
CNUCE-CNR, Via Santa Maria 36          Fax:      +39-50-904052
56126 Pisa - Italy                     Email:    F.Potorti@cnuce.cnr.it

From owner-tcp-impl  Mon Mar 10 10:53:08 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA01135 for tcp-impl-list; Mon, 10 Mar 1997 10:52:15 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id CAA01128 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 02:52:13 -0800
Received: from fly.cnuce.cnr.it (fly.cnuce.cnr.it [131.114.192.86]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id CAA20180 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 02:52:10 -0800
Received: by fly.cnuce.cnr.it (Smail3.1.26.7 #3)
	id m0w42gp-00005NC; Mon, 10 Mar 97 11:52 MET
Message-Id: <m0w42gp-00005NC@fly.cnuce.cnr.it>
Date: Mon, 10 Mar 97 11:52 MET
From: Francesco Potorti` <F.Potorti@cnuce.cnr.it>
To: IETF TCP implementation <tcp-impl@relay.engr.SGI.COM>
In-reply-to: <m0w3Vab-0005FcC@lightning.swansea.linux.org.uk> (alan@lxorguk.ukuu.org.uk)
Subject: Re: Keep-Alive size
Organization: CNUCE-CNR, Via S.Maria 36, Pisa - Italy +39-50-593211
Sender: owner-tcp-impl
Precedence: bulk

alan@lxorguk.ukuu.org.uk (Alan Cox) wrote:

   > Would it do any harm to invent a fictitious last byte and send
   > it?  In this condition it's guaranteed to be outside the
   > receiver's idea of valid sequence numbers, since it duplicates
   > the SYN's sequence number.
   
   You couldnt do that if real data was to follow in time, 

Why?
							   nor could
   you send the byte after the byte next in the case the other end had
   shut down the sending side (it may well send you an RST).
   
I don't follow you here.  Could you explain a bit?

Thanks

-- 
Francesco Potorti` (researcher)        Voice:    +39-50-593203
Computer Network Division              Operator: +39-50-593211
CNUCE-CNR, Via Santa Maria 36          Fax:      +39-50-904052
56126 Pisa - Italy                     Email:    F.Potorti@cnuce.cnr.it

From owner-tcp-impl  Mon Mar 10 14:16:36 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA18040 for tcp-impl-list; Mon, 10 Mar 1997 14:15:08 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id GAA18024 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 06:15:06 -0800
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id GAA18304 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 06:15:03 -0800
Received: from Holland.Sun.COM ([129.159.201.1]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id GAA11553 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 06:05:01 -0800
Received: from albano by Holland.Sun.COM (SMI-8.6/SMI-SVR4-sd.fkk200)
	id PAA19603; Mon, 10 Mar 1997 15:05:50 +0100
Received: from holland by albano (SMI-8.6/SMI-SVR4-se.fkk201)
	id PAA07230; Mon, 10 Mar 1997 15:05:50 +0100
Message-Id: <199703101405.PAA07230@albano>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: Keep-Alive size 
In-reply-to: Your message of "Sat, 08 Mar 1997 19:39:32 +0100."
             <199703081839.TAA22453@labinfo.iet.unipi.it> 
Date: Mon, 10 Mar 1997 15:05:01 +0100
From: Casper Dik <casper@holland.Sun.COM>
Sender: owner-tcp-impl
Precedence: bulk


>perhaps the correct wording should have been "resend the item with the
>last acked sequence number -- be it a data byte or a SYN.
>
>However another problem comes to mind -- the sender might not have the
>"last byte" available anymore since, once acked, data can be flushed
>(except in the case of a SYN of course). One can send a random byte of
>course, but then the receiver might really become suspicious....
>
>This suggests that acking zero-sized Keep-Alives should be mandatory.


Well, the RFC also says:

            It is extremely important to remember that ACK segments that
            contain no data are not reliably transmitted by TCP.
            Consequently, if a keep-alive mechanism is implemented it
            MUST NOT interpret failure to respond to any specific probe
            as a dead connection.

So it seems that while you're told to implement keepalives as
ack-only packets, your *NOT* allowed to sever a connection if
you only send ack-only packets and don't get any responses.

When talking to a conforming TCP stack that starts sending keep-alives to
you, you can toss them until they contain one data byte.

Casper

From owner-tcp-impl  Mon Mar 10 15:13:06 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA23489 for tcp-impl-list; Mon, 10 Mar 1997 15:11:29 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA23483 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 07:11:27 -0800
Received: from external.BSDI.COM (external.BSDI.COM [205.230.225.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA28408 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 07:11:24 -0800
Received: from forge.BSDI.COM (dab@forge.BSDI.COM [205.230.224.24]) by external.BSDI.COM (8.8.5/8.8.2) with ESMTP id IAA23148 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 08:10:31 -0700 (MST)
Received: (from dab@localhost) by forge.BSDI.COM (8.8.2/8.7.3) id IAA19462 for tcp-impl@relay.engr.SGI.COM; Mon, 10 Mar 1997 08:10:31 -0700 (MST)
Date: Mon, 10 Mar 1997 08:10:31 -0700 (MST)
From: David Borman <dab@BSDI.COM>
Message-Id: <199703101510.IAA19462@forge.BSDI.COM>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: Keep-Alive size
Sender: owner-tcp-impl
Precedence: bulk


As to whether or not to send a garbage byte of data in the keep-alive,
there is an easy solution: punt and do both.

Start off by sending keep-alive probes without any data.  If you get
halfway through the keep-alives without any response, switch to
sending keep-alives with a garbage data byte.

Also, remember that keep-alives are *not* part of TCP (but many people
feel they are needed).  RFC 1122 doesn't require what a keep-alive packet
should look like, but indicates what most people use.  4.2BSD sent one
garbage byte in it's keep-alive.  The Host Requirements WG generally
agreed that it would be cleaner to not send a garbage byte, but the
fact that some hosts didn't respond unless there was a garbage byte
(4.2BSD, I think???) forced the wording that is in 1122.

		-David Borman, dab@bsdi.com


From owner-tcp-impl  Mon Mar 10 15:18:04 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA24413 for tcp-impl-list; Mon, 10 Mar 1997 15:17:05 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA24407 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 07:17:02 -0800
Received: from eamail1.unisys.com (eamail1.unisys.com [192.61.103.80]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA29753 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 07:17:00 -0800
Received: from ih85.ea.unisys.com (ih85.ea.unisys.com [192.61.103.85]) by eamail1.unisys.com (8.7.3/8.6.12) with ESMTP id PAA20949 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 15:16:51 GMT
Received: from pl_exchange_1.pl.unisys.com ([192.62.193.232]) by ih85.ea.unisys.com (8.7.3/8.7.3) with SMTP id PAA29526 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 15:16:37 GMT
Received: by pl_exchange_1.pl.unisys.com with SMTP (Microsoft Exchange Server Internet Mail Connector Version 4.0.994.63)
	id <01BC2D3C.5D1C6180@pl_exchange_1.pl.unisys.com>; Mon, 10 Mar 1997 10:18:14 -0500
Message-ID: <c=US%a=_ATTMAIL%p=UNISYS%l=RV-EXCHANGE-2-970310151538Z-5395@pl_exchange_1.pl.unisys.com>
From: "Smith, Allyn D" <Al.Smith@UNISYS.com>
To: "'tcp-impl@relay.engr.SGI.COM'" <tcp-impl@relay.engr.SGI.COM>
Subject: re: Keep Alive size
Date: Mon, 10 Mar 1997 10:15:38 -0500
X-Mailer:  Microsoft Exchange Server Internet Mail Connector Version 4.0.994.63
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

Francesco Potorti is correct in his interpretations. I think we need to
keep in mind what a keepalive really is. It may be either:
   1) an ACK segment with an OLD sequence number that does not contain
data
or
   2) an ACK segment with an OLD sequence number that does contain data,
however, the data is outside the window (OLD data).

RFC 793 page 69 clearly states that a TCP should first check the
sequence number of an arriving segment. If the segment is not acceptable
(an old sequence number with no data or data outside the window would
NOT be acceptable), an ACK must be sent after which the unacceptable
segment must be dropped.

The previous paragraph implies that a TCP that does not send an ACK in
response to an old sequence number (whether it contains data or not) is
in violation of RFC 793. Forcing every other implementation to send a
data byte to make a broken TCP behave properly is not acceptable.

Casper Dik writes:
>Well, the RFC also says:
>
>            It is extremely important to remember that ACK segments that
>            contain no data are not reliably transmitted by TCP.
>            Consequently, if a keep-alive mechanism is implemented it
>            MUST NOT interpret failure to respond to any specific probe
>            as a dead connection.
>
>So it seems that while you're told to implement keepalives as
>ack-only packets, your *NOT* allowed to sever a connection if
>you only send ack-only packets and don't get any responses.
>
>When talking to a conforming TCP stack that starts sending keep-alives to
>you, you can toss them until they contain one data byte.

I think the interpretation of that RFC 1122 fragment is:

An ACK with no data is not transmitted reliably because ACKs are not
retransmitted. ACKs may be lost and will not be retransmitted,
therefore, the failure of a response to a single, SPECIFIC probe should
not cause the connection to be severed. This is why you should not rely
on 1 single keepalive segment to determine if the peer has gone
belly-up. A TCP must probe periodically a number of times before
determining that the peer has gone away. So, I think the interpretation
that you're not allowed to sever a connection if you send ACK-only
segments is not quite correct. I think RFC 1122 means that you should
not sever a connection if there is an occasional lack of response. You
SHOULD sever the connection if the peer never responds to an ACK-only
keep-alive segment. Again, RFC 793 does not say you can ignore segments
with old sequence numbers. RFC 1122 does not say that either. You must
respond to these old segments with an ACK.

Regards,
Al Smith
UNISYS Corp.


From owner-tcp-impl  Mon Mar 10 15:31:09 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA25930 for tcp-impl-list; Mon, 10 Mar 1997 15:29:48 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA25924 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 07:29:47 -0800
Received: from lynx.europe.shiva.com (lynx.europe.shiva.com [134.191.64.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA02073 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 07:29:30 -0800
Received: from chryses.europe.shiva.com (chryses.europe.shiva.com [89.0.0.223])
	by lynx.europe.shiva.com (8.8.5/8.8.5) with ESMTP id PAA26848;
	Mon, 10 Mar 1997 15:18:41 GMT
Received: from black.europe.shiva.com (black.europe.shiva.com [134.191.8.140])
	by chryses.europe.shiva.com (8.8.5/8.8.5) with SMTP id PAA11014;
	Mon, 10 Mar 1997 15:18:40 GMT
Received: from localhost by black.europe.shiva.com (SMI-8.6/SMI-SVR4)
	id PAA16115; Mon, 10 Mar 1997 15:18:40 GMT
Date: Mon, 10 Mar 1997 15:18:40 +0000 (GMT)
From: Malcolm Campbell <malcolmc@europe.shiva.com>
To: David Borman <dab@bsdi.com>
cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: Keep-Alive size
In-Reply-To: <199703101510.IAA19462@forge.BSDI.COM>
Message-ID: <Pine.GSO.3.95.970310151616.15976F-100000@black.europe.shiva.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl
Precedence: bulk

On Mon, 10 Mar 1997, David Borman wrote:
> Also, remember that keep-alives are *not* part of TCP (but many people
> feel they are needed).  RFC 1122 doesn't require what a keep-alive packet
> should look like, but indicates what most people use.  4.2BSD sent one
> garbage byte in it's keep-alive.  The Host Requirements WG generally
> agreed that it would be cleaner to not send a garbage byte, but the
> fact that some hosts didn't respond unless there was a garbage byte
> (4.2BSD, I think???) forced the wording that is in 1122.

I'm seeing implementations which send the first byte of pending data in a
keepalive (and send them so fast that they almost always cross-over with
windows opening up). 

Sending a byte of data seems to make sense if you have one to send. Or
sending out-of-window data, so that it never gets accepted. A zero-byte
keepalive isnt distinguishable from a plain ACK, is it?

-- Malcolm



From owner-tcp-impl  Mon Mar 10 16:02:22 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA00125 for tcp-impl-list; Mon, 10 Mar 1997 16:01:03 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA00118 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 08:01:00 -0800
Received: from external.BSDI.COM (external.BSDI.COM [205.230.225.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA08684 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 08:00:57 -0800
Received: from forge.BSDI.COM (dab@forge.BSDI.COM [205.230.224.24]) by external.BSDI.COM (8.8.5/8.8.2) with ESMTP id JAA26553; Mon, 10 Mar 1997 09:00:13 -0700 (MST)
Received: (from dab@localhost) by forge.BSDI.COM (8.8.2/8.7.3) id JAA20290; Mon, 10 Mar 1997 09:00:13 -0700 (MST)
Date: Mon, 10 Mar 1997 09:00:13 -0700 (MST)
From: David Borman <dab@BSDI.COM>
Message-Id: <199703101600.JAA20290@forge.BSDI.COM>
To: dab@BSDI.COM, malcolmc@europe.shiva.com
Subject: Re: Keep-Alive size
Cc: tcp-impl@relay.engr.SGI.COM
Sender: owner-tcp-impl
Precedence: bulk

> Date: Mon, 10 Mar 1997 15:18:40 +0000 (GMT)
> From: Malcolm Campbell <malcolmc@europe.shiva.com>
> Subject: Re: Keep-Alive size
> ...
> I'm seeing implementations which send the first byte of pending data in a
> keepalive (and send them so fast that they almost always cross-over with
> windows opening up). 

I'm not sure I understand.  This sounds more like a zero length window
probe (which is part of TCP), not a keep-alive.

> Sending a byte of data seems to make sense if you have one to send. Or
> sending out-of-window data, so that it never gets accepted. A zero-byte

Keep-alives are outside the window, (before it, to be precise), so
the one byte of data is never accepted.

> keepalive isnt distinguishable from a plain ACK, is it?

You can check "SEG.SEQ == RCV.NXT - 1" to identify it.

			-David Borman, dab@bsdi.com


From owner-tcp-impl  Mon Mar 10 19:22:35 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA14884 for tcp-impl-list; Mon, 10 Mar 1997 19:20:09 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA14870 for <tcp-impl@relay.engr.sgi.com>; Mon, 10 Mar 1997 11:20:07 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA02248 for <tcp-impl@relay.engr.sgi.com>; Mon, 10 Mar 1997 11:20:06 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id LAA14600; Mon, 10 Mar 1997 11:10:02 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id LAA18422; Mon, 10 Mar 1997 11:10:00 -0800
Received: from taipei.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id LAA10233; Mon, 10 Mar 1997 11:09:58 -0800
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id LAA18607; Mon, 10 Mar 1997 11:08:31 -0800
Date: Mon, 10 Mar 1997 11:08:31 -0800
From: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199703101908.LAA18607@taipei.eng.sun.com>
To: vern@ee.lbl.gov
Subject: Re: Keep-Alive size
Cc: tcp-impl@relay.engr.sgi.com
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl
Precedence: bulk

The key difference is that the keep-alive carries a bad sequence number
(SEG.SEQ = RCV.NXT -1). That's the trick to trigger an ack. And the
"acceptability test" in page 68 of RFC793 implies this kind of
keep-alive probe is unacceptable, and an ack should be generated.

As long as stack on both sides implement the same acceptability policy,
and as long as stack don't generate any "unacceptable" packet in response
to an incoming "unacceptable" packet, we should be immune from an ack war.
(We fixed a bug not long ago in Solaris where a bad sequence number is
used in response to a zero-window probe, which causes another probe which
causes a packet w/ a bad sequence number in reply...)

Jerry

> From owner-tcp-impl@relay.engr.SGI.COM Sat Mar  8 12:09 PST 1997
> To: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
> Cc: mouse@Rodents.Montreal.QC.CA (der Mouse), tcp-impl@relay.engr.SGI.COM
> Subject: Re: Keep-Alive size
> Date: Sat, 08 Mar 1997 11:47:23 PST
> From: Vern Paxson <vern@ee.lbl.gov>
> 
> > This suggests that acking zero-sized Keep-Alives should be mandatory.
> 
> I'm confused about how to distinguish these from "pure" acks, which
> shouldn't be acked.
> 
> 		Vern
> 

From owner-tcp-impl  Mon Mar 10 20:28:01 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA01383 for tcp-impl-list; Mon, 10 Mar 1997 20:25:25 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA01376 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 12:25:23 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA20549 for <tcp-impl@relay.engr.SGI.COM>; Mon, 10 Mar 1997 12:25:21 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id MAA20843; Mon, 10 Mar 1997 12:15:22 -0800 (PST)
Message-Id: <199703102015.MAA20843@daffy.ee.lbl.gov>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: Keep Alive size
In-reply-to: Your message of Mon, 10 Mar 1997 10:15:38 PST.
Date: Mon, 10 Mar 1997 12:15:21 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

Okay, it seems clear from the parts of RFC 793 and RFC 1122 that others pointed
to that a zero-data, below-sequence packet is supposed to elicit an ack.
In particular, the table on page 69 of RFC 793 says that a below-sequence
pure ack fails the acceptability test and should be acked, and section 4.2.3.6
of RFC 1122 is clear on saying that zero-length SHOULD be acceptable but
a garbage octet MAY be sent for compatibility with broken implementations.

I have traces of some Reno-derived TCPs that don't ack below-sequence
acks.  But these traces are for out-of-order acks that are more than one
octet below sequence, due to out-of-order delivery.  So that doesn't mean
zero-data keep-alives won't be acked, because there's special-case code in
(at least our) Reno source for packets that are exactly one octet below
sequence.

I'm now working on a draft doc cataloging implementations issues, which I'll
be sending to the list next week.  I plan to add this one as something like:

	1)  Some TCPs don't in general ack below-sequence acks.

	2)  Some TCPs don't ack zero-length data packets that are one
	    octet below-sequence, which breaks "keep-alive" strategies
	    (and is also non-conformant behavior, as in (1)).

	3)  A workaround for keep-alives is, as already stated in RFC 1122,
	    to send a single byte of data that's one octet below-sequence.

	    I don't see a problem with this being a "garbage" byte,
	    as noted in 1122, as we haven't heard of a TCP that actually
	    cares about the contents of below-sequence data.

	4)  end2end should consider whether the 1122 successor ("1122:NG")
	    should be more explicit about keep-alive; for example, whether
	    to change the SHOULD above to a MUST.

	    If end2end does this, then they might also consider whether
	    to continue the requirement that TCPs ack below-sequence acks
	    that are *not* possible keep-alives, because (1) de facto,
	    plenty of implementations don't do this, with no problems
	    reported [that I know of]; and (2) these will generate extraneous
	    traffic in the presence of reordering, which along some paths
	    is quite frequent.

Comments?  (Either to the list or privately ...)

		Vern

From owner-tcp-impl  Tue Mar 11 01:22:22 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA12933 for tcp-impl-list; Tue, 11 Mar 1997 01:20:26 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA12868 for <tcp-impl@relay.engr.sgi.com>; Mon, 10 Mar 1997 17:20:20 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA03260 for <tcp-impl@relay.engr.sgi.com>; Mon, 10 Mar 1997 17:20:18 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id RAA26278; Mon, 10 Mar 1997 17:10:10 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id RAA06180; Mon, 10 Mar 1997 17:10:09 -0800
Received: from taipei.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id RAA02551; Mon, 10 Mar 1997 17:10:08 -0800
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id RAA18727; Mon, 10 Mar 1997 17:08:40 -0800
Date: Mon, 10 Mar 1997 17:08:40 -0800
From: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199703110108.RAA18727@taipei.eng.sun.com>
To: vern@ee.lbl.gov
Subject: Re: Keep Alive size
Cc: tcp-impl@relay.engr.sgi.com
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl
Precedence: bulk

>	    If end2end does this, then they might also consider whether
>	    to continue the requirement that TCPs ack below-sequence acks
>	    that are *not* possible keep-alives, because (1) de facto,
>	    plenty of implementations don't do this, with no problems
>	    reported [that I know of]; and (2) these will generate extraneous
>	    traffic in the presence of reordering, which along some paths
>	    is quite frequent.

Talking about reducing extraneous traffic, a nit in the acceptability
test:

        Segment Receive  Test
        Length  Window
        ------- -------  -------------------------------------------

           0       0     SEG.SEQ = RCV.NXT
 
           0      >0     RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND
					    ^
					should be <=

A perfectly good tcp stack on both sides can hit this case due to packet
reordering (an ack-only packet got ahead of a data packet transmitted
earlier).

Jerry

From owner-tcp-impl  Tue Mar 11 02:46:47 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id CAA29321 for tcp-impl-list; Tue, 11 Mar 1997 02:45:20 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA29314 for <tcp-impl@relay.engr.sgi.com>; Mon, 10 Mar 1997 18:45:18 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id SAA20808 for <tcp-impl@relay.engr.sgi.com>; Mon, 10 Mar 1997 18:45:16 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id SAA21896; Mon, 10 Mar 1997 18:35:10 -0800 (PST)
Message-Id: <199703110235.SAA21896@daffy.ee.lbl.gov>
To: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Cc: tcp-impl@relay.engr.sgi.com
Subject: Re: Keep Alive size
In-reply-to: Your message of Mon, 10 Mar 1997 17:08:40 PST.
Date: Mon, 10 Mar 1997 18:35:10 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

>            0      >0     RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND
> 					      ^
> 					should be <=
> 
> A perfectly good tcp stack on both sides can hit this case due to packet
> reordering (an ack-only packet got ahead of a data packet transmitted
> earlier).

Are you sure?  RCV.NXT+RCV.WND is one beyond the upper edge of the offered
window.  So even in the presence of reordering, I don't see a mechanism for 
a sender legitimately sending such a packet.

		Vern

From owner-tcp-impl  Tue Mar 11 10:46:18 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA17297 for tcp-impl-list; Tue, 11 Mar 1997 10:44:55 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id CAA17291 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 02:44:52 -0800
Received: from fly.cnuce.cnr.it (fly.cnuce.cnr.it [131.114.192.86]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id CAA29852 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 02:44:49 -0800
Received: by fly.cnuce.cnr.it (Smail3.1.26.7 #3)
	id m0w4P0T-00005XC; Tue, 11 Mar 97 11:42 MET
Message-Id: <m0w4P0T-00005XC@fly.cnuce.cnr.it>
Date: Tue, 11 Mar 97 11:42 MET
From: Francesco Potorti` <F.Potorti@cnuce.cnr.it>
To: Vern Paxson <vern@ee.lbl.gov>
CC: tcp-impl@relay.engr.SGI.COM,
        hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
In-reply-to: <199703110235.SAA21896@daffy.ee.lbl.gov> (vern@ee.lbl.gov)
Subject: Re: Keep Alive size
Organization: CNUCE-CNR, Via S.Maria 36, Pisa - Italy +39-50-593211
Sender: owner-tcp-impl
Precedence: bulk

   >	    Segment Receive  Test
   >	    Length  Window
   >	    ------- -------  -------------------------------------------
   >	        0       0     SEG.SEQ = RCV.NXT
   >            0      >0     RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND
   > 					         ^
   >						should be <=
   > 
   > A perfectly good tcp stack on both sides can hit this case due to packet
   > reordering (an ack-only packet got ahead of a data packet transmitted
   > earlier).
   
   Are you sure?  RCV.NXT+RCV.WND is one beyond the upper edge of the
   offered window.  So even in the presence of reordering, I don't see
   a mechanism for a sender legitimately sending such a packet.

Yes, if the packet contains no data (we are speaking of segment length
= 0 here).  If the sender has sent all data it can send, filling the
receive window, but the receiver has not received anything, every pure
ack that the sender sends afterwards will have SEQ.SEQ=RCV.NXT+RCV.WND

Should this case really be treated especially?

-- 
Francesco Potorti` (researcher)        Voice:    +39-50-593203
Computer Network Division              Operator: +39-50-593211
CNUCE-CNR, Via Santa Maria 36          Fax:      +39-50-904052
56126 Pisa - Italy                     Email:    F.Potorti@cnuce.cnr.it

From owner-tcp-impl  Tue Mar 11 15:20:04 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA06293 for tcp-impl-list; Tue, 11 Mar 1997 15:18:44 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA06287 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 07:18:42 -0800
Received: from eamail1.unisys.com (eamail1.unisys.com [192.61.103.80]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA10869 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 07:18:40 -0800
Received: from ih85.ea.unisys.com (ih85.ea.unisys.com [192.61.103.85]) by eamail1.unisys.com (8.7.3/8.6.12) with ESMTP id PAA15908 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 15:18:24 GMT
Received: from ea_ihx102.ea.unisys.com (ihx102.ea.unisys.com [192.61.144.52]) by ih85.ea.unisys.com (8.7.3/8.7.3) with SMTP id PAA16401 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 15:18:17 GMT
Received: by ea_ihx102.ea.unisys.com with SMTP (Microsoft Exchange Server Internet Mail Connector Version 4.0.994.63)
	id <01BC2E30.FC139160@ea_ihx102.ea.unisys.com>; Tue, 11 Mar 1997 15:29:18 -0000
Message-ID: <c=US%a=_ATTMAIL%p=UNISYS%l=RV-EXCHANGE-2-970311151706Z-7266@ea_ihx102.ea.unisys.com>
From: "Smith, Allyn D" <Al.Smith@UNISYS.com>
To: "'tcp-impl@relay.engr.SGI.COM'" <tcp-impl@relay.engr.SGI.COM>
Subject: Re: Keep Alive size
Date: Tue, 11 Mar 1997 15:17:06 -0000
X-Mailer:  Microsoft Exchange Server Internet Mail Connector Version 4.0.994.63
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl
Precedence: bulk

>>    Segment Receive  Test
>>    Length  Window
>>    ------- -------  -------------------------------------------
>>        0       0     SEG.SEQ = RCV.NXT
>>        0      >0     RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND
>>					    ^
>>					should be <=
>> 
>> A perfectly good tcp stack on both sides can hit this case due to packet
>> reordering (an ack-only packet got ahead of a data packet transmitted
>> earlier).
   
>   Are you sure?  RCV.NXT+RCV.WND is one beyond the upper edge of the
>   offered window.  So even in the presence of reordering, I don't see
>   a mechanism for a sender legitimately sending such a packet.

It appears to me that in the case of zero length segments, the test
should be <= but in the case of non-zero segments the test should be <.
Here's why.

Consider the case of a 2-way simultaneous data exchange between TCP A
and B with packet reordering occurring. 
1) A sends a windows worth of data to B
2) B is simultaneously sending some data to A
3) A ACKs B's data.
4) ACK packet passes data
5) B receives ACK packet (0 length, SEG.SEQ = RCV.NXT+ RCV.WND) 

The consequences of not processing the ACK are that B may have to
retransmit the data that A already received and ACKed. 

I seems to me that having the condition SEG.SEQ < RCV.NXT+RCV.WND for 0
length segments is at the expense of retransmissions.

Regards,
Al Smith
UNISYS Corp.

From owner-tcp-impl  Tue Mar 11 19:23:49 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA25718 for tcp-impl-list; Tue, 11 Mar 1997 19:22:14 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA25712 for <tcp-impl@relay.engr.sgi.com>; Tue, 11 Mar 1997 11:22:12 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA13799 for <tcp-impl@relay.engr.sgi.com>; Tue, 11 Mar 1997 11:22:11 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id LAA00450 for <tcp-impl@relay.engr.sgi.com>; Tue, 11 Mar 1997 11:12:07 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id LAA28186; Tue, 11 Mar 1997 11:12:06 -0800
Received: from taipei.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id LAA13475; Tue, 11 Mar 1997 11:12:02 -0800
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id LAA19164; Tue, 11 Mar 1997 11:10:34 -0800
Date: Tue, 11 Mar 1997 11:10:34 -0800
From: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199703111910.LAA19164@taipei.eng.sun.com>
To: tcp-impl@relay.engr.sgi.com
Subject: Re: Keep Alive size
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl
Precedence: bulk

This is my original reply to Vern and it looks like other people might
be interested. Vern also pointed out not only packet reordering, but
even just packet loss can cause it to happen.

The point is that I don't see any use of generating an ACK in response
to such a packet here other than a remote possibility of triggerring
a fast retransmission...

Jerry

----- Begin Included Message -----

>From hkchu Mon Mar 10 20:12:55 1997
To: vern@ee.lbl.gov
Subject: Re: Keep Alive size

The key is ack-only packets. They can start on the left edge of the
window. But since they have zero-length, you can't say they have gone
over the limit and violated TCP window. Consider this:

receive side:
RCV.NXT = 10000
RCV.WND = 5

send side:
SND.NXT = 10000
SND.UNA = 10000

A packet w/ seq = 10000 length = 5 is sent from the SND side.
SND.NXT becomes 10005 and SND.UNA stays at 10000. Then something from
the RCV side causes the SND to ack with an ack-only packet of
seq = 10005 (use SND.NXT, not SND.UNA, right?) length = 0.

If the ack-only packet arrives before the first packet, it will
fail the old test.

BTW, is the schedule for the tcp-impl WG decided on the upcoming
IETF?

Jerry

> From vern@ee.lbl.gov  Mon Mar 10 18:35:14 1997
> To: hkchu@pacific-86.eng.sun.com (Hsiao-keng Jerry Chu)
> Cc: tcp-impl@relay.engr.sgi.com
> Subject: Re: Keep Alive size
> Date: Mon, 10 Mar 1997 18:35:10 PST
> From: Vern Paxson <vern@ee.lbl.gov>
> 
> >            0      >0     RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND
> > 					      ^
> > 					should be <=
> > 
> > A perfectly good tcp stack on both sides can hit this case due to packet
> > reordering (an ack-only packet got ahead of a data packet transmitted
> > earlier).
> 
> Are you sure?  RCV.NXT+RCV.WND is one beyond the upper edge of the offered
> window.  So even in the presence of reordering, I don't see a mechanism for 
> a sender legitimately sending such a packet.
> 
> 		Vern
> 


----- End Included Message -----


From owner-tcp-impl  Tue Mar 11 22:51:46 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA17540 for tcp-impl-list; Tue, 11 Mar 1997 22:50:01 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA17443 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 14:49:44 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA08006 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 14:49:42 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id OAA24302; Tue, 11 Mar 1997 14:39:27 -0800 (PST)
Message-Id: <199703112239.OAA24302@daffy.ee.lbl.gov>
To: Al.Smith@UNISYS.com
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: Keep Alive size
In-reply-to: Your message of Tue, 11 Mar 1997 15:17:06 PST.
Date: Tue, 11 Mar 1997 14:39:27 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

> The consequences of not processing the ACK are that B may have to
> retransmit the data that A already received and ACKed. 
> 
> I seems to me that having the condition SEG.SEQ < RCV.NXT+RCV.WND for 0
> length segments is at the expense of retransmissions.

This sounds to me like a good argument for fixing the test.

Now, I wonder if there are implementations that actually exhibit this
problem ...

		Vern

From owner-tcp-impl  Tue Mar 11 23:55:05 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA05413 for tcp-impl-list; Tue, 11 Mar 1997 23:52:24 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA05403 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 15:52:22 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA23728 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 15:52:16 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id PAA26340 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 15:42:11 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id PAA19147; Tue, 11 Mar 1997 15:42:10 -0800
Received: from taipei.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id PAA26818; Tue, 11 Mar 1997 15:42:07 -0800
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id PAA19405; Tue, 11 Mar 1997 15:40:41 -0800
Date: Tue, 11 Mar 1997 15:40:41 -0800
From: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199703112340.PAA19405@taipei.eng.sun.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: window update algorithm
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl
Precedence: bulk

RFC793 P72 outlines a window update algorithm that seems more complex
than necessary.

	  If SND.UNA < SEG.ACK =< SND.NXT, the send window should be
          updated.  If (SND.WL1 < SEG.SEQ or (SND.WL1 = SEG.SEQ and
          SND.WL2 =< SEG.ACK)), set SND.WND <- SEG.WND, set
          SND.WL1 <- SEG.SEQ, and set SND.WL2 <- SEG.ACK.

          Note that SND.WND is an offset from SND.UNA, that SND.WL1
          records the sequence number of the last segment used to update
          SND.WND, and that SND.WL2 records the acknowledgment number of
          the last segment used to update SND.WND.  The check here
          prevents using old segments to update the window.

4.4BSD code modified it slightly:
        if ((tiflags & TH_ACK) &&
            (SEQ_LT(tp->snd_wl1, ti->ti_seq) || tp->snd_wl1 == ti->ti_seq &&
            (SEQ_LT(tp->snd_wl2, ti->ti_ack) ||
            tp->snd_wl2 == ti->ti_ack && tiwin > tp->snd_wnd))) {
 
The difference is that for pure window update packets, we only take those
which advertise larger windows, therefore preventing out-of-order window
update packets from causing hiccups if they shrink the window (moving the
right edge of the window leftwards).

If the purpose of all the check here is to filter out old packets,
then SEG.ACK is a much better candidate than SEG.SEQ for the following
reasons: 

1. SEG.ACK more accurately reflect the time sequence because it never
go backwards (except in, of course, a buggy implementaion). On the other
hand, SEG.SEQ often go backwards for retransmissions. Therefore, a packet
with a higher SEG.SEQ might carry an older window size than one with a
lower SEG.SEQ. Using the check "SND.WL1 < SEG.SEQ" to filter is just not
right.

2. SEG.ACK is directly tied to how the window size is calculated, but
SEG.SEQ is not. Packets with bogus SEG.SEQ can easily pass the old test
and cause grieves to the window size.

So my suggestion is to use

	SND.WL <= SEG.ACK and SND.WND < SEG.WND, where

	SND.WL reocrds the SEG.ACK of the last update

instead. Of course SEG.ACK still has to reasonable
(SND.UNA < SEG.ACK =< SND.NXT) to begin with.

It's not clear to me if the old check also serves some other window
management functions as described in RFC813. If not, then the new check
looks simpler and better. There is some subtle difference such as the
new one won't allow window to shrink even when SEG.SEQ increases, which I
think is a good thing to do.

Any comment?

Jerry


From owner-tcp-impl  Wed Mar 12 00:07:48 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA09084 for tcp-impl-list; Wed, 12 Mar 1997 00:04:41 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA09057 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 16:04:39 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA26470 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 16:04:37 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id PAA27526 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 15:54:32 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id PAA21700; Tue, 11 Mar 1997 15:54:31 -0800
Received: from taipei.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id PAA28237; Tue, 11 Mar 1997 15:54:26 -0800
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id PAA19414; Tue, 11 Mar 1997 15:52:59 -0800
Date: Tue, 11 Mar 1997 15:52:59 -0800
From: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199703112352.PAA19414@taipei.eng.sun.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: window update algorithm (oops!)
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl
Precedence: bulk

Just realize the check I suggested outlaws shrinking window completely.
If that's considered too harsh, the following is what I had in mind
originally:

	SND.WL < SEG.ACK or
	SND.WL = SEG.ACK and SND.WND < SEG.WND, where
 
        SND.WL reocrds the SEG.ACK of the last update

Jerry

From owner-tcp-impl  Wed Mar 12 00:22:32 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA12679 for tcp-impl-list; Wed, 12 Mar 1997 00:17:28 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA12611 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 16:17:19 -0800
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA29773 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 16:17:17 -0800
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id QAA11097; Tue, 11 Mar 1997 16:12:21 -0800 (PST)
Message-Id: <199703120012.QAA11097@aland.bbn.com>
To: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: window update algorithm 
In-reply-to: Your message of Tue, 11 Mar 97 15:40:41 -0800.
             <199703112340.PAA19405@taipei.eng.sun.com> 
Date: Tue, 11 Mar 97 16:12:20 -0800
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl
Precedence: bulk


    RFC793 P72 outlines a window update algorithm that seems more complex
    than necessary.

    ...

    4.4BSD code modified it slightly:

    ...
     
    The difference is that for pure window update packets, we only take those
    which advertise larger windows, therefore preventing out-of-order window
    update packets from causing hiccups if they shrink the window (moving the
    right edge of the window leftwards).

Actually, a change mandated by RFC 1122, p. 91.

    So my suggestion is to use

    	SND.WL <= SEG.ACK and SND.WND < SEG.WND, where

    	SND.WL reocrds the SEG.ACK of the last update

    instead. Of course SEG.ACK still has to reasonable
    (SND.UNA < SEG.ACK =< SND.NXT) to begin with.

    It's not clear to me if the old check also serves some other window
    management functions as described in RFC813. If not, then the new check
    looks simpler and better. There is some subtle difference such as the
    new one won't allow window to shrink even when SEG.SEQ increases, which I
    think is a good thing to do.

Pardon me, I'm confused here.  I think what is meant is the window can't
shrink from the right.  Windows are always permitted to shrink from the left
(and must be able to do so).

Craig

From owner-tcp-impl  Wed Mar 12 03:04:14 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id DAA22105 for tcp-impl-list; Wed, 12 Mar 1997 03:02:25 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA22051 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 19:02:20 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id TAA04320 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 19:02:11 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id SAA10888 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 18:52:02 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id SAA19439; Tue, 11 Mar 1997 18:52:00 -0800
Received: from taipei.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id SAA19000; Tue, 11 Mar 1997 18:52:00 -0800
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id SAA19474; Tue, 11 Mar 1997 18:50:32 -0800
Date: Tue, 11 Mar 1997 18:50:32 -0800
From: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199703120250.SAA19474@taipei.eng.sun.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: window update algorithm
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl
Precedence: bulk

>From talking to Craig Partridge I realized there might exist some
confusion regarding my use of "shrinking" the window. I'm referring to
the bad case described in RFC1122/4.2.2.16 where the right window edge
is being moved backwards:

        4.2.2.16  Managing the Window: RFC-793 Section 3.7, page 41
 
            A TCP receiver SHOULD NOT shrink the window, i.e., move the
            right window edge to the left.  However, a sending TCP MUST
            be robust against window shrinking, which may cause the
            "useable window" (see Section 4.2.3.4) to become negative.

But most people take it as "shrinking the window SIZE", which is not
what I meant.

Also I discovered that since "SND.UNA <= SEG.ACK =< SND.NXT" (notice the
change of "<" to "<=" from RFC1122/4.2.2.20.g) is the premise before the
window update is even considered, the new check can be simplied further to

	SND.WND < SEG.WND or SND.WL < SEG.ACK

If we want to outlaw shrunk window totally, the check becomes

	SND.WND < SEG.WND

and SND.WL is no longer needed.

To see how good info can be lost using the old check, consider a
retransmitted packet offering to open up the window. The offer won't be
accepted by the old check since SND.WL1 > SEG.SEQ.

Jerry

From owner-tcp-impl  Wed Mar 12 07:04:12 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA21807 for tcp-impl-list; Wed, 12 Mar 1997 07:02:54 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA21800 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 23:02:51 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id XAA10007 for <tcp-impl@relay.engr.SGI.COM>; Tue, 11 Mar 1997 23:02:47 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id WAA25085; Tue, 11 Mar 1997 22:52:52 -0800 (PST)
Message-Id: <199703120652.WAA25085@daffy.ee.lbl.gov>
To: tcp-impl@relay.engr.SGI.COM
Subject: "naming names" policy
Date: Tue, 11 Mar 1997 22:52:52 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

We've now settled on the policy we intend to use on the issue of "naming names"
(identifying specific implementations exhibiting particular behaviors):

	* Official products of the working group will not include specifics
	  of individual implementations unless the implementors so request
	  and the co-chairs find consensus that it's appropriate.

	  The official products are the documents produced by the working
	  group: RFCs, draft RFCs, recommendations, WG Web pages, and
	  traces illustrating implementation problems.  These last will
	  be anonymized.

	* The same policy applies to the public WG meetings at IETFs.  Our
	  experience has been that when a specific implementation is mentioned
	  in a talk, very often many in the audience lose the specifics and
	  generalize the statement to be about all versions of the
	  implementation - even when the speaker explicitly states otherwise.
	  This is just an unfortunate fact of how people process auditory
	  (and compressed) information.

	* Official WG products (and public meetings) are free to reference
	  other documents that identify implementations.

	* Individuals are free to post whatever they see fit to the mailing
	  list.  The exception is people speaking in an official IETF context:
	  this includes the WG co-chairs and the AD's, unless they explicitly
	  note that they're speaking in a non-IETF capacity.

The policy comes from several considerations:

First, it is easy to spread damaging misinformation by incorrect or incomplete
identifications, which we want to avoid for reasons of accuracy and possible
liability.

Second, we view the WG as focussed towards aiding implementors rather than
end-users.  In this context, we anticipate that naming names will often not
be particularly useful, compared to the utility of withholding names for
attracting broad participation from the community.

Third, we emphasize techniques for diagnosing problems, rather than lists
of specific implementations with specific problems, because identifying
techniques has much more general utility.  Interested parties are always
free to apply the techniques in an attempt to identify problems with
specific implementations.  Furthermore, we strongly encourage people to
develop diagnostic techniques wherever they see the need.  A catalog of
these will be part of the official WG product.

Finally, much of the de facto benefit of the WG will come from the informal
mailing list discussion.  This discussion is unencumbered, since messages
on the list do not carry official weight and are not published with the
same scope as RFCs.

- Vern & Steve

From owner-tcp-impl  Wed Mar 12 18:05:04 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA21341 for tcp-impl-list; Wed, 12 Mar 1997 18:03:05 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA21182 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Mar 1997 10:02:23 -0800
Received: from fly.cnuce.cnr.it (fly.cnuce.cnr.it [131.114.192.86]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id KAA07412 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Mar 1997 10:02:09 -0800
Received: by fly.cnuce.cnr.it (Smail3.1.26.7 #3)
	id m0w4sIz-0002XcC; Wed, 12 Mar 97 18:59 MET
Message-Id: <m0w4sIz-0002XcC@fly.cnuce.cnr.it>
Date: Wed, 12 Mar 97 18:59 MET
From: Francesco Potorti` <F.Potorti@cnuce.cnr.it>
To: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
CC: tcp-impl@relay.engr.SGI.COM
In-reply-to: <199703120250.SAA19474@taipei.eng.sun.com> (hkchu@pacific-86.Eng.Sun.COM)
Subject: Re: window update algorithm
Organization: CNUCE-CNR, Via S.Maria 36, Pisa - Italy +39-50-593211
Sender: owner-tcp-impl
Precedence: bulk

When updating the right edge of the send window, i.e., when

	SND.UNA <= SEG.ACK =< SND.NXT,

Jerry proposes to substitute the standard test (rfc793 page 72,
corrected by rfc1122-4.2.2.20.g) with the following one:
   
   	SND.WND < SEG.WND or SND.WL < SEG.ACK,

with the assumption that shrinking the window is legal (as is now),
and that SND.WL contains the SEG.ACK of the last segment used to
update the send window.

Anyway, this test allows the sender to shrink the window only while
advancing SEG.ACK, thus preventing the sender from shrinking its
receive window when the connection is idle, which is allowed by the
test in rfc793.

If instead shrinking the window is allowed in any situation when an
ACK is considered for window update, the current behaviour is
preserved and the test is further simplified -- actually it
disappears.

In practice: the standard test on the SEG.SEQ makes no sense, as Jerry
argued previously, and the standard test on SEG.ACK is implied by the
first test on the window update, as Jerry noted, so no further test is
needed!

Too good to be true? :-)

-- 
Francesco Potorti` (researcher)        Voice:    +39-50-593203
Computer Network Division              Operator: +39-50-593211
CNUCE-CNR, Via Santa Maria 36          Fax:      +39-50-904052
56126 Pisa - Italy                     Email:    F.Potorti@cnuce.cnr.it

From owner-tcp-impl  Wed Mar 12 20:10:37 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA27983 for tcp-impl-list; Wed, 12 Mar 1997 20:08:37 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA27977 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Mar 1997 12:08:35 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA13372 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Mar 1997 12:08:33 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id LAA08525; Wed, 12 Mar 1997 11:58:28 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id LAA11365; Wed, 12 Mar 1997 11:58:25 -0800
Received: from taipei.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id LAA29347; Wed, 12 Mar 1997 11:58:26 -0800
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id LAA19826; Wed, 12 Mar 1997 11:56:58 -0800
Date: Wed, 12 Mar 1997 11:56:58 -0800
From: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199703121956.LAA19826@taipei.eng.sun.com>
To: F.Potorti@cnuce.cnr.it
Subject: Re: window update algorithm
Cc: tcp-impl@relay.engr.SGI.COM
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl
Precedence: bulk

>Too good to be true? :-)

Maybe :-{. I just discovered that the test

	SND.WND < SEG.WND or SND.WL < SEG.ACK

can not be further simplied to

	SND.WND < SEG.WND

The latter doesn't just outlaw "shrunk window". The trouble is that SND.WND
is based on an left edge of the old SEG.ACK (recorded in SND.WL), whereas
SEG.WND's left edge is on the current SEG.ACK. If the two are different,
we are comparing apple with orange.

                                     SND.WND
	old SEG.ACK <--------------------------------------------->

	                           new SEG.ACK <--------SEG.WND-------->

Obviously we have to take SEG.WND as the new SND.WND.

It's possbile to argment the first test to filter out shrunk window
but I don't think it's worth the trouble. We are not doing any worst
than the old test.

This stuff is more confusing than I thought :-(.

Jerry


From owner-tcp-impl  Wed Mar 12 20:40:36 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA06365 for tcp-impl-list; Wed, 12 Mar 1997 20:38:48 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA06347 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Mar 1997 12:38:45 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA20758 for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Mar 1997 12:38:44 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id MAA26510; Wed, 12 Mar 1997 12:28:38 -0800 (PST)
Message-Id: <199703122028.MAA26510@daffy.ee.lbl.gov>
To: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: window update algorithm
In-reply-to: Your message of Wed, 12 Mar 1997 11:56:58 PST.
Date: Wed, 12 Mar 1997 12:28:37 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

> This stuff is more confusing than I thought :-(.

I suggest we make this the official tcp-impl motto!

		Vern

From owner-tcp-impl  Thu Mar 13 20:52:58 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA03246 for tcp-impl-list; Thu, 13 Mar 1997 20:51:10 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA03219 for <tcp-impl@relay.engr.SGI.COM>; Thu, 13 Mar 1997 12:51:08 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA16800 for <tcp-impl@relay.engr.SGI.COM>; Thu, 13 Mar 1997 12:51:04 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id MAA03108; Thu, 13 Mar 1997 12:41:05 -0800 (PST)
Message-Id: <199703132041.MAA03108@daffy.ee.lbl.gov>
To: tcp-impl@relay.engr.SGI.COM
Subject: IETF schedule for tcp-impl
Date: Thu, 13 Mar 1997 12:41:05 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl
Precedence: bulk

We've been given the following slot:

        Tuesday, April 8 at 1530-1730 (opposite grip, disman, drums, 
                spki, idmr,issll)

- Vern

From owner-tcp-impl  Thu Mar 13 22:58:00 1997
Received: (from daemon@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA00625 for tcp-impl-list; Thu, 13 Mar 1997 22:56:24 GMT
Return-Path: <owner-tcp-impl>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA00617 for <tcp-impl@relay.engr.SGI.COM>; Thu, 13 Mar 1997 14:56:22 -0800
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id OAA19183 for <tcp-impl@relay.engr.SGI.COM>; Thu, 13 Mar 1997 14:56:19 -0800
Received: from ftp.com by ftp.com  ; Thu, 13 Mar 1997 17:52:28 -0500
Received: from mailserv-2high.ftp.com by ftp.com  ; Thu, 13 Mar 1997 17:52:28 -0500
Received: by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id RAA16321; Thu, 13 Mar 1997 17:49:48 -0500
Date: Thu, 13 Mar 1997 17:49:48 -0500
Message-Id: <199703132249.RAA16321@MAILSERV-2HIGH.FTP.COM>
To: vern@ee.lbl.gov
Subject: Re: IETF schedule for tcp-impl
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: tcp-impl@relay.engr.SGI.COM
Repository: mailserv-2high.ftp.com, [message accepted at Thu Mar 13 17:49:39 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl
Precedence: bulk


||We've been given the following slot:
||
||        Tuesday, April 8 at 1530-1730 (opposite grip, disman, drums, 
||                spki, idmr,issll)
||

Mbone coverage for those of us who don't want to see how lovely the
Memphis hotel situation is :-)?





From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 18 15:34:01 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA06408 for tcp-impl-list; Tue, 18 Mar 1997 15:31:12 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA06383 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 15:31:09 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA26489 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 15:31:07 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id PAA20933; Tue, 18 Mar 1997 15:21:12 -0800 (PST)
Message-Id: <199703182321.PAA20933@daffy.ee.lbl.gov>
To: tcp-impl@relay.engr.SGI.COM
Subject: draft descriptions of TCP implementation problems
Date: Tue, 18 Mar 1997 15:21:12 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

The next four messages include draft descriptions of four different
TCP implementation problems.  I'm interested in (1) comments on the
descriptions, (2) comments on the format, and (3) volunteers to write
up other descriptions.

The goal is to put together a number of problem descriptions into an
Internet draft before the I-D deadline (mid next week).  It would probably
be best if volunteers coordinate with me to make sure we don't wind up
duplicating work.

	Thanks,

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 18 15:34:01 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA06576 for tcp-impl-list; Tue, 18 Mar 1997 15:31:42 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA06568 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 15:31:40 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA26826 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 15:31:39 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id PAA20942; Tue, 18 Mar 1997 15:21:44 -0800 (PST)
Message-Id: <199703182321.PAA20942@daffy.ee.lbl.gov>
To: tcp-impl@relay.engr.SGI.COM
Subject: draft description of "No initial slow start"
Date: Tue, 18 Mar 1997 15:21:44 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Name of problem: No initial slow start

Category: Congestion control

Description:
	When a TCP begins transmitting data, it is required by RFC 1122,
	4.2.2.15, to engage in a "slow start" by initializing its congestion
	window, cwnd, to one packet (one segment of the maximum size).
	It subsequently increases cwnd by one packet for each ack it receives
	for new data.  A TCP that fails to do so exhibits "No initial slow
	start".

Significance:
	Serious.

Implications: 
	A TCP failing to slow start when beginning a connection results in
	traffic bursts that can stress the network, leading to excessive
	queueing delays and packet loss.

	Implementations exhibiting this problem might do so because
	they suffer from the general problem of not including the required
	congestion window.  These implementations will also suffer from
	"No slow start after timeout".

	There are different shades of "No initial slow start".  From
	the perspective of stressing the network, the worst is a connection
	that simply always sends based on the receiver's advertised window,
	with no notion of a separate congestion window.  Some other forms
	are described in "Uninitialized CWND" and "Initial CWND of 2 packets".

Relevant RFCs:
	RFC 1122 requires use of slow start.  RFC 2001 gives the
	specifics of slow start.

Trace file demonstrating it:
	[This will eventually be a URL to the trace file, probably
	in both ASCII and binary forms.]

	Made using tcpdump/BPF recording at the connection responder.
	No losses reported.

	10:40:42.244503 B > A: S 1168512000:1168512000(0) win 32768
				<mss 1460,nop,wscale 0> (DF) [tos 0x8]
	10:40:42.259908 A > B: S 3688169472:3688169472(0) 
				ack 1168512001 win 32768 <mss 1460>
	10:40:42.389992 B > A: . ack 1 win 33580 (DF) [tos 0x8] 
	10:40:42.664975 A > B: P 1:513(512) ack 1 win 32768
	10:40:42.700185 A > B: . 513:1973(1460) ack 1 win 32768
	10:40:42.718017 A > B: . 1973:3433(1460) ack 1 win 32768
	10:40:42.762945 A > B: . 3433:4893(1460) ack 1 win 32768
	10:40:42.811273 A > B: . 4893:6353(1460) ack 1 win 32768
	10:40:42.829149 A > B: . 6353:7813(1460) ack 1 win 32768
	10:40:42.853687 B > A: . ack 1973 win 33580 (DF) [tos 0x8] 
	10:40:42.864031 B > A: . ack 3433 win 33580 (DF) [tos 0x8] 

	After the third packet, the connection is established.  A, the
	connection responder, begins transmitting to B, the connection
	initiator.  A quickly sends 6 packets comprising 7812 bytes,
	even though the SYN exchange agreed upon an MSS of 1460 bytes
	and so A should have sent at most 1460 bytes.

	The acks sent by B to A in the last two lines indicate that this
	trace is not a measurement error (slow start really occurring but
	the corresponding acks having been dropped by the packet filter).

	A second trace confirmed that the problem is repeatable.

Trace file demonstrating correct behavior:

	Made using tcpdump/BPF recording at the connection originator.
	No losses reported.

	12:35:31.914050 C > D: S 1448571845:1448571845(0) win 4380 <mss 1460>
	12:35:32.068819 D > C: S 1755712000:1755712000(0) ack 1448571846 win 4096
	12:35:32.069341 C > D: . ack 1 win 4608
	12:35:32.075213 C > D: P 1:513(512) ack 1 win 4608
	12:35:32.286073 D > C: . ack 513 win 4096
	12:35:32.287032 C > D: . 513:1025(512) ack 1 win 4608
	12:35:32.287506 C > D: . 1025:1537(512) ack 1 win 4608
	12:35:32.432712 D > C: . ack 1537 win 4096
	12:35:32.433690 C > D: . 1537:2049(512) ack 1 win 4608
	12:35:32.434481 C > D: . 2049:2561(512) ack 1 win 4608
	12:35:32.435032 C > D: . 2561:3073(512) ack 1 win 4608
	12:35:32.594526 D > C: . ack 3073 win 4096
	12:35:32.595465 C > D: . 3073:3585(512) ack 1 win 4608
	12:35:32.595947 C > D: . 3585:4097(512) ack 1 win 4608
	12:35:32.596414 C > D: . 4097:4609(512) ack 1 win 4608
	12:35:32.596888 C > D: . 4609:5121(512) ack 1 win 4608
	12:35:32.733453 D > C: . ack 4097 win 4096

References
	V. Paxson, "Automated Packet Trace Analysis of TCP Implementations,"
	available in draft form from vern@ee.lbl.gov.

How to detect
	For implementations always manifesting this problem, it shows
	up immediately in a packet trace or a sequence plot, as illustrated
	above.

How to fix
	If the root problem is that the implementation lacks a notion
	of a congestion window, then unfortunately this requires significant
	work to fix.  However, doing so is vital, as such implementations
	exhibit "No slow start after timeout", which has a significance
	of "Vital".

Implementation specifics (if approved by implementor)
	(Implementor contact address)

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 18 15:34:06 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA06717 for tcp-impl-list; Tue, 18 Mar 1997 15:32:03 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA06700 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 15:32:01 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA26908 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 15:31:59 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id PAA20950; Tue, 18 Mar 1997 15:22:04 -0800 (PST)
Message-Id: <199703182322.PAA20950@daffy.ee.lbl.gov>
To: tcp-impl@relay.engr.SGI.COM
Subject: draft description of "No slow start after timeout"
Date: Tue, 18 Mar 1997 15:22:04 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Name of problem: No slow start after timeout

Category: Congestion control

Description:
	When a TCP experiences a retransmission timeout, it is required by
	RFC 1122, 4.2.2.15, to engage in "slow start" by initializing its
	congestion window, cwnd, to one packet (one segment of the maximum
	size).  It subsequently increases cwnd by one packet for each ack
	it receives for new data until it reaches the "congestion avoidance"
	threshold, ssthresh, at which point the congestion avoidance
	algorithm for updating the window takes over.  A TCP that fails
	to enter slow start upon a timeout exhibits "No slow start after
	timeout".

Significance:
	Vital.

Implications: 
	Entering slow start upon timeout forms one of the cornerstones
	of Internet congestion stability, as outlined in [Jacobson88].
	If TCPs fail to do so, the network becomes at risk of suffering
	"congestion collapse" [RFC896].

Relevant RFCs:
	RFC 1122 requires use of slow start after loss.  RFC 896 describes
	congestion collapse.  RFC 2001 describes the "fast recovery"
	mechanism that differs from the timeout retransmissions discussed
	here.

Trace file demonstrating it:
	[This will eventually be a URL to the trace file, probably
	in both ASCII and binary forms.]

	Made using tcpdump/BPF recording at the sending TCP (A).
	No losses reported.

	10:40:59.090612 B > A: . ack 357125 win 33580 (DF) [tos 0x8]
	10:40:59.222025 A > B: . 357125:358585(1460) ack 1 win 32768
	10:40:59.868871 A > B: . 357125:358585(1460) ack 1 win 32768
	10:41:00.016641 B > A: . ack 364425 win 33580 (DF) [tos 0x8]
	10:41:00.036709 A > B: . 364425:365885(1460) ack 1 win 32768
	10:41:00.045231 A > B: . 365885:367345(1460) ack 1 win 32768
	10:41:00.053785 A > B: . 367345:368805(1460) ack 1 win 32768
	10:41:00.062426 A > B: . 368805:370265(1460) ack 1 win 32768
	10:41:00.071074 A > B: . 370265:371725(1460) ack 1 win 32768
	10:41:00.079794 A > B: . 371725:373185(1460) ack 1 win 32768
	10:41:00.089304 A > B: . 373185:374645(1460) ack 1 win 32768
	10:41:00.097738 A > B: . 374645:376105(1460) ack 1 win 32768
	10:41:00.106409 A > B: . 376105:377565(1460) ack 1 win 32768
	10:41:00.115024 A > B: . 377565:379025(1460) ack 1 win 32768
	10:41:00.123576 A > B: . 379025:380485(1460) ack 1 win 32768
	10:41:00.132016 A > B: . 380485:381945(1460) ack 1 win 32768
	10:41:00.141635 A > B: . 381945:383405(1460) ack 1 win 32768
	10:41:00.150094 A > B: . 383405:384865(1460) ack 1 win 32768
	10:41:00.158552 A > B: . 384865:386325(1460) ack 1 win 32768
	10:41:00.167053 A > B: . 386325:387785(1460) ack 1 win 32768
	10:41:00.175518 A > B: . 387785:389245(1460) ack 1 win 32768
	10:41:00.210835 A > B: . 389245:390705(1460) ack 1 win 32768
	10:41:00.226108 A > B: . 390705:392165(1460) ack 1 win 32768
	10:41:00.241524 B > A: . ack 389245 win 8760 (DF) [tos 0x8]

	The first packet indicates the ack point is 357125.  130 msec
	after receiving the ack, A transmits the packet after the ack
	point, 357125:358585.  640 msec after this transmission,
	it retransmits 357125:358585, in an apparent retransmission
	timeout.  At this point, A's cwnd should be one MSS, or 1460 bytes,
	as A enters slow-start.  The trace is consistent with this
	possibility.

	B replies with an ack of 364425, indicating that A has filled a
	sequence hole.  At this point, A's cwnd should be 1460*2 = 2920
	bytes, since in slow start receiving an ack advances cwnd by MSS.
	However, A then launches 19 consecutive packets, which is
	inconsistent with slow start.

	A second trace confirmed that the problem is repeatable.

Trace file demonstrating correct behavior:

	Made using tcpdump/BPF recording at the sending TCP (C).
	No losses reported.

	12:35:48.442538 C > D: P 465409:465921(512) ack 1 win 4608
	12:35:48.544483 D > C: . ack 461825 win 4096
	12:35:48.703496 D > C: . ack 461825 win 4096
	12:35:49.044613 C > D: . 461825:462337(512) ack 1 win 4608
	12:35:49.192282 D > C: . ack 465921 win 2048
	12:35:49.192538 D > C: . ack 465921 win 4096
	12:35:49.193392 C > D: P 465921:466433(512) ack 1 win 4608
	12:35:49.194726 C > D: P 466433:466945(512) ack 1 win 4608
	12:35:49.350665 D > C: . ack 466945 win 4096 
	12:35:49.351694 C > D: . 466945:467457(512) ack 1 win 4608
	12:35:49.352168 C > D: . 467457:467969(512) ack 1 win 4608
	12:35:49.352643 C > D: . 467969:468481(512) ack 1 win 4608
	12:35:49.506000 D > C: . ack 467969 win 3584 

	After C transmits the first packet shown to D, it takes no action
	in response to D's acks for 461825, because the first packet
	already reached the advertised window limit of 4096 bytes above
	461825.  600 msec after transmitting the first packet, C retransmits
	461825:462337, presumably due to a timeout.  Its congestion window
	is now MSS (512 bytes).

	D acks 465921, indicating that C's retransmission filled
	a sequence hole.  This ack advances C's cwnd from 512 to 1024.
	Very shortly after, D acks 465921 again in order to update
	the offered window from 2048 to 4096.  This ack does not
	advance cwnd since it is not for new data.  Very shortly
	after, C responds to the newly enlarged window by transmitting
	two packets.  D acks both, advancing cwnd from 1024 to 1536.
	C in turn transmits three packets.

References
	V. Jacobson, "Congestion Avoidance and Control," Proc. SIGCOMM '88.
	ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z

	V. Paxson, "Automated Packet Trace Analysis of TCP Implementations,"
	available in draft form from vern@ee.lbl.gov.

How to detect
	Packet loss is common enough in the Internet that generally it
	is not difficult to find an Internet path that will force
	retransmission due to packet loss.

	If the effective window prior to loss is large enough, however,
	then the TCP may retransmit using the "fast recovery" mechanism
	described in RFC 2001.  In a packet trace, the signature of fast
	recovery is that the packet retransmission occurs in response to
	the receipt of three duplicate acks, and subsequent duplicate acks
	may lead to the transmission of new data, above both the ack
	point and the highest sequence transmitted so far.  An absence
	of three duplicate acks prior to retransmission suffices to
	distinguish between timeout and fast recovery retransmissions.
	In the face of only observing fast recovery retransmissions,
	generally it is not difficult to repeat the data transfer until
	observing a timeout retransmission.

	Once armed with a trace exhibiting a timeout retransmission,
	determining whether the TCP follows slow start is done by
	computing the correct progression of cwnd and comparing it
	to the amount of data transmited by the TCP subsequent to
	the timeout rtransmission.

How to fix
	If the root problem is that the implementation lacks a notion
	of a congestion window, then unfortunately this requires significant
	work to fix.  However, doing so is vital, for reasons outlined above.

Implementation specifics (if approved by implementor)
	(Implementor contact address)

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 18 15:34:17 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA06894 for tcp-impl-list; Tue, 18 Mar 1997 15:32:41 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA06888 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 15:32:39 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA27038 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 15:32:38 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id PAA20959; Tue, 18 Mar 1997 15:22:31 -0800 (PST)
Message-Id: <199703182322.PAA20959@daffy.ee.lbl.gov>
To: tcp-impl@relay.engr.SGI.COM
Subject: draft description of "Failure to retain above-sequence data"
Date: Tue, 18 Mar 1997 15:22:30 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Name of problem: Failure to retain above-sequence data

Category: Congestion control, performance

Description:
	When a TCP receives an "above sequence" segment, meaning one with a
	sequence number exceeding RCV.NXT but below RCV.NXT+RCV.WND, it
	SHOULD queue the segment for later delivery (RFC 1122, 4.2.2.20).
	A TCP that fails to do so is said to exhibit "Failure to retain
	above-sequence data".

	It may sometimes be appropriate for a TCP to discard above-sequence
	data to reclaim memory.  If they do so only rarely, then we would
	not consider them to exhibit this problem.  Instead, the particular
	concern is with TCPs that always discard above-sequence data.

Significance:
	Serious.

Implications: 
	In times of congestion, a failure to retain above-sequence data
	will lead to numerous otherwise-unnecessary retransmissions,
	aggravating the congestion and potentially reducing performance
	by a large factor.

Relevant RFCs:
	RFC 1122 revises RFC 793 by upgrading the latter's MAY to a
	SHOULD on this issue.

Trace file demonstrating it:
	[This will eventually be a URL to the trace file, probably
	in both ASCII and binary forms.]

	Made using tcpdump/BPF recording at the receiving TCP.  No losses
	reported.

	B is the TCP sender, A the receiver.  A exhibits failure to retain
	above sequence data:

	10:38:10.164860 B > A: . 221078:221614(536) ack 1 win 33232 [tos 0x8]
	10:38:10.170809 B > A: . 221614:222150(536) ack 1 win 33232 [tos 0x8]
	10:38:10.177183 B > A: . 222150:222686(536) ack 1 win 33232 [tos 0x8]
	10:38:10.225039 A > B: . ack 222686 win 25800

	Here B has sent up to (relative) sequence 222676 in-sequence, and
	A accordingly acknowledges.

	10:38:10.268131 B > A: . 223222:223758(536) ack 1 win 33232 [tos 0x8]
	10:38:10.337995 B > A: . 223758:224294(536) ack 1 win 33232 [tos 0x8]
	10:38:10.344065 B > A: . 224294:224830(536) ack 1 win 33232 [tos 0x8]
	10:38:10.350169 B > A: . 224830:225366(536) ack 1 win 33232 [tos 0x8]
	10:38:10.356362 B > A: . 225366:225902(536) ack 1 win 33232 [tos 0x8]
	10:38:10.362445 B > A: . 225902:226438(536) ack 1 win 33232 [tos 0x8]
	10:38:10.368579 B > A: . 226438:226974(536) ack 1 win 33232 [tos 0x8]
	10:38:10.374732 B > A: . 226974:227510(536) ack 1 win 33232 [tos 0x8]
	10:38:10.380825 B > A: . 227510:228046(536) ack 1 win 33232 [tos 0x8]
	10:38:10.387027 B > A: . 228046:228582(536) ack 1 win 33232 [tos 0x8]
	10:38:10.393053 B > A: . 228582:229118(536) ack 1 win 33232 [tos 0x8]
	10:38:10.399193 B > A: . 229118:229654(536) ack 1 win 33232 [tos 0x8]
	10:38:10.405356 B > A: . 229654:230190(536) ack 1 win 33232 [tos 0x8]

	A now receives 13 additional packets from B.  These are above-sequence
	because 222686:223222 was dropped.  The packets do however fit within
	the offered window of 25800.  A does not generate any duplicate acks
	for them.

	The trace contributor (V. Paxson) verified that these 13 packets
	had valid IP and TCP checksums.

	10:38:11.917728 B > A: . 222686:223222(536) ack 1 win 33232 [tos 0x8]
	10:38:11.930925 A > B: . ack 223222 win 32232

	B times out for 222686:223222 and retransmits it.  Upon receiving
	it, A only acknowledges 223222.  Had it retained the valid
	above-sequence packets, it would instead have ack'd 230190.

	10:38:12.048438 B > A: . 223222:223758(536) ack 1 win 33232 [tos 0x8]
	10:38:12.054397 B > A: . 223758:224294(536) ack 1 win 33232 [tos 0x8]
	10:38:12.068029 A > B: . ack 224294 win 31696

	B retransmits two more packets, and A only acknowledges them.
	This pattern continues as B retransmits the entire set of
	previously-received packets.

	A second trace confirmed that the problem is repeatable.

Trace file demonstrating correct behavior:

	Made using tcpdump/BPF recording at the receiving TCP (C).
	No losses reported.

	09:11:25.790417 D > C: . 33793:34305(512) ack 1 win 61440
	09:11:25.791393 D > C: . 34305:34817(512) ack 1 win 61440
	09:11:25.792369 D > C: . 34817:35329(512) ack 1 win 61440
	09:11:25.792369 D > C: . 35329:35841(512) ack 1 win 61440
	09:11:25.793345 D > C: . 36353:36865(512) ack 1 win 61440
	09:11:25.794321 C > D: . ack 35841 win 59904

	A sequence hole occurs because 35841:36353 has been dropped.

	09:11:25.794321 D > C: . 36865:37377(512) ack 1 win 61440
	09:11:25.794321 C > D: . ack 35841 win 59904
	09:11:25.795297 D > C: . 37377:37889(512) ack 1 win 61440
	09:11:25.795297 C > D: . ack 35841 win 59904
	09:11:25.796273 C > D: . ack 35841 win 61440
	09:11:25.798225 D > C: . 37889:38401(512) ack 1 win 61440
	09:11:25.799201 C > D: . ack 35841 win 61440
	09:11:25.807009 D > C: . 38401:38913(512) ack 1 win 61440
	09:11:25.807009 C > D: . ack 35841 win 61440
	...
	09:11:25.884113 D > C: . 52737:53249(512) ack 1 win 61440
	09:11:25.884113 C > D: . ack 35841 win 61440

	Each additional, above-sequence packet C receives from D elicits
	a duplicate ack for 35841.

	09:11:25.887041 D > C: . 35841:36353(512) ack 1 win 61440
	09:11:25.887041 C > D: . ack 53249 win 44032

	D retransmits 35841:36353 and C acknowledges receipt of
	data all the way up to 53249.

References
	V. Paxson, "Automated Packet Trace Analysis of TCP Implementations,"
	available in draft form from vern@ee.lbl.gov.

How to detect
	Packet loss is common enough in the Internet that generally it
	is not difficult to find an Internet path that will result
	in some above-sequence packets arriving.  A TCP that exhibits
	"Failure to retain ..." may not generate duplicate acks for
	these packets.  However, some TCPs that do retain above-sequence
	data also do not generate duplicate acks, so failure to do so
	does not definitively identify the problem.  Instead, the key
	observation is whether upon retransmission of the dropped packet,
	data that was previously above-sequence is acknowledged.

	Two considerations in detecting this problem using a packet trace
	are that it is easiest to do so with a trace made at the TCP receiver,
	in order to unambiguously determine which packets arrived successfully,
	and that such packets may still be correctly discarded if they
	arrive with checksum errors.  The latter can be tested by capturing
	the entire packet contents and performing the IP and TCP checksum
	algorithms to verify their integrity; or by confirming that the
	packets arrive with the same checksum and contents as that with
	which they were sent, with a presumption that the sending TCP
	correctly calculates checksums for the packets it transmits.

	It is considerably easier to verify that an implementation does
	*not* exhibit this problem.  This can be done by recording a trace
	at the data sender, and observing that sometimes after a retransmission
	the receiver acknowledges a higher sequence number than just that
	which was retransmitted.

How to fix
	If the root problem is that the implementation lacks buffer, then
	then unfortunately this requires significant work to fix.  However,
	doing so is important, for reasons outlined above.

Implementation specifics (if approved by implementor)
	(Implementor contact address)

Input to IRTF
	The IRTF should consider whether to upgrade the need to retain
	above-sequence data from SHOULD to MUST, with an allowance for
	occasional failure to do so in order to reclaim memory.  This
	suggestion is motivated by the observation that failure to retain
	such data can significantly aggravate congestion.

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 18 15:34:24 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA06940 for tcp-impl-list; Tue, 18 Mar 1997 15:32:50 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA06919 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 15:32:47 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA27067 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 15:32:46 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id PAA20968; Tue, 18 Mar 1997 15:22:51 -0800 (PST)
Message-Id: <199703182322.PAA20968@daffy.ee.lbl.gov>
To: tcp-impl@relay.engr.SGI.COM
Subject: draft description of "Inconsistent retransmission"
Date: Tue, 18 Mar 1997 15:22:50 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Name of problem: Inconsistent retransmission

Category: Reliability

Description:
	If, for a given sequence number, a sending TCP retransmits different
	data than previously sent for that sequence number, then a strong
	possibility arises that the receiving TCP will reconstruct a
	different byte stream than that sent by the sending application,
	depending on which instance of the sequence number it accepts.
	Such a sending TCP exhibits "Inconsistent retransmission".

Significance:
	Vital.

Implications: 
	Reliable delivery of data is a fundamental property of TCP.

Relevant RFCs:
	RFC 793, section 1.5, discusses the central role of reliability
	in TCP operation.

Trace file demonstrating it:
	[This will eventually be a URL to the trace file, probably
	in both ASCII and binary forms.]

	Made using tcpdump/BPF recording at the receiving TCP (B).  No losses
	reported.

	12:35:53.145503 A > B: FP 90048435:90048461(26) ack 393464682 win 4096
			                     4500 0042 9644 0000
			 3006 e4c2 86b1 0401 83f3 010a b2a4 0015
			 055e 07b3 1773 cb6a 5019 1000 68a9 0000
	data starts here>504f 5254 2031 3334 2c31 3737*2c34 2c31
			 2c31 3738 2c31 3635 0d0a
	12:35:53.146479 B > A: R 393464682:393464682(0) win 8192
	12:35:53.851714 A > B: FP 90048429:90048463(34) ack 393464682 win 4096
			                     4500 004a 965b 0000
			 3006 e4a3 86b1 0401 83f3 010a b2a4 0015
			 055e 07ad 1773 cb6a 5019 1000 8bd3 0000
	data starts here>5041 5356 0d0a 504f 5254 2031 3334 2c31
			 3737*2c31 3035 2c31 3431 2c34 2c31 3539
			 0d0a

	The sequence numbers shown in this trace are absolute and
	not adjusted to reflect the ISN.  The 4-digit hex values
	show a dump of the packet's IP and TCP headers, as well
	as payload.  A first sends to B data for 90048435:90048461.
	The corresponding data begins with hex words 504f, 5254, etc.

	B responds with a RST.  Since the recording location was local
	to B, it is unknown whether A received the RST.

	A then sends 90048429:90048463, which includes six sequence
	positions below the earlier transmission, all 26 positions
	of the earlier transmission, and two additional sequence positions.

	The retransmission disagrees starting just after sequence 90048447,
	annotated above with a leading '*'.  These two bytes were
	originally transmitted as hex 2c34 but retransmitted as hex 2c31.
	Subsequent positions disagree as well.

	This behavior has been observed in other traces involving
	different hosts.  It is unknown how to repeat it.

	In this instance, no corruption would occur, since B has
	already indicated it will not accept further packets from A.

	A second example illustrates a slightly different instance
	of the problem.  The tracing again was made with tcpdump/BPF
	at the receiving TCP (D).

	22:23:58.645829 C > D: P 185:212(27) ack 565 win 4096 
                                             4500 0043 90a3 0000
                         3306 0734 cbf1 9eef 83f3 010a 0525 0015
                         a3a2 faba 578c 70a4 5018 1000 9a53 0000
        data starts here>504f 5254 2032 3033 2c32 3431 2c31 3538
                         2c32 3339 2c35 2c34 330d 0a
	22:23:58.646805 D > C: . ack 184 win 8192 
                                             4500 0028 beeb 0000
                         3e06 ce06 83f3 010a cbf1 9eef 0015 0525
                         578c 70a4 a3a2 fab9 5010 2000 342f 0000
	22:31:36.532244 C > D: FP 186:213(27) ack 565 win 4096 
                                             4500 0043 9435 0000
                         3306 03a2 cbf1 9eef 83f3 010a 0525 0015
                         a3a2 fabb 578c 70a4 5019 1000 9a51 0000
        data starts here>504f 5254 2032 3033 2c32 3431 2c31 3538
                         2c32 3339 2c35 2c34 330d 0a

	In this trace, sequence numbers are relative.  C sends 185:212,
	but D only sends an ack for 184 (so sequence number 184 is
	missing).  C then sends 186:213.  The packet payload is identical
	to the previous payload, but the base sequence number is one
	higher, resulting in an inconsistent retransmission.

	Neither trace exhibits checksum errors.

Trace file demonstrating correct behavior:
	(Omitted, as presumably correct behavior is obvious.)

References
	None known.

How to detect
	This problem unfortunately can be very difficult to detect,
	since available experience indicates it is quite rare that
	it is manifested.  No "trigger" has been identified that
	can be used to reproduce the problem.

How to fix
	In the absence of a known "trigger", we cannot assess how to
	fix the problem.

Implementation specifics (if approved by implementor)
	(Implementor contact address)

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 18 15:51:56 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA09889 for tcp-impl-list; Tue, 18 Mar 1997 15:46:09 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA09722 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 15:45:19 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id PAA00197 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 15:45:17 -0800
From: touch@ISI.EDU
Received: from ash.isi.edu by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA21900>; Tue, 18 Mar 1997 15:41:26 -0800
Date: Tue, 18 Mar 1997 15:41:25 -0800
Posted-Date: Tue, 18 Mar 1997 15:41:25 -0800
Message-Id: <199703182341.AA03261@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA03261>; Tue, 18 Mar 1997 15:41:25 -0800
To: tcp-impl@relay.engr.SGI.COM, vern@ee.lbl.gov
Subject: Re: draft description of "No slow start after timeout"
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Name of problem: No slow start after timeout

Needs to be more specific - this is talking about a retransmission
timeout. I don't know if that could be confused with "lack of things
to send" - or whether that's implemented as a timeout or not. 

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 18 17:38:45 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA05671 for tcp-impl-list; Tue, 18 Mar 1997 17:36:51 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA05660 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 17:36:49 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA26185 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 17:36:47 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id RAA19612; Tue, 18 Mar 1997 17:26:51 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id RAA00066; Tue, 18 Mar 1997 17:26:49 -0800
Received: from taipei.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id RAA07409; Tue, 18 Mar 1997 17:26:48 -0800
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id RAA23248; Tue, 18 Mar 1997 17:25:13 -0800
Date: Tue, 18 Mar 1997 17:25:13 -0800
From: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199703190125.RAA23248@taipei.eng.sun.com>
To: vern@ee.lbl.gov
Subject: Re: draft description of "No slow start after timeout"
Cc: tcp-impl@relay.engr.SGI.COM
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>Name of problem: No slow start after timeout

A more common one is "No slow start after idle" described in VJ's
revised 88' paper, Appendix C.

I have a potentially controversial one that falls into the other end of
the spectrum from the above. It's called

	"Slow start too slow on LFN with delayed ACK"

In a TCP implementation with a more aggressive delayed ACK algorithm,
the congestion window may open up linearly instead of exponentially.
For a large window this can take a lot of round trip time. A simple
fix (other than to reduce delayed ACK) is to count the # of bytes
ack'ed, instead of # of ack packets when growing the congestion window.

Jerry

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 18 20:57:00 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA03867 for tcp-impl-list; Tue, 18 Mar 1997 20:55:20 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id UAA03862 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 20:55:18 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id UAA29858 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 20:55:16 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id UAA21841; Tue, 18 Mar 1997 20:45:21 -0800 (PST)
Message-Id: <199703190445.UAA21841@daffy.ee.lbl.gov>
To: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: draft description of "No slow start after timeout"
In-reply-to: Your message of Tue, 18 Mar 1997 17:25:13 PST.
Date: Tue, 18 Mar 1997 20:45:20 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> A more common one is "No slow start after idle" described in VJ's
> revised 88' paper, Appendix C.

This one to my knowledge hasn't been standardized (it's not mentioned
in RFC 2001), so if that's true, it's out of scope.

> In a TCP implementation with a more aggressive delayed ACK algorithm,
> the congestion window may open up linearly instead of exponentially.

If these "aggressive" delayed ACK algorithms are the same as I've observed
(discussed in my draft paper, by the way), then the fundamental problem is
instead that the algorithm violates RFC 1122, section 4.2.3.2, which limits
delayed acks to 500 msec or two packets (whichever comes first).

In addition to opening up the window slowly, there are other problems with
these acking policies, too.  They lead to brittle performance in the presence
of loss along the return path; they elicit very bursty traffic (even worse
if senders count bytes and not packets for opening cwnd during slow start,
as you suggest); and they can fail to "fill the pipe" if the acks are limited
to one per RTT, which at least one implementation does on initial slow start.

I've added this problem to the list of things to document ...

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 18 23:38:04 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA17933 for tcp-impl-list; Tue, 18 Mar 1997 23:32:30 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA17928 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 23:32:28 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id XAA21670 for <tcp-impl@relay.engr.SGI.COM>; Tue, 18 Mar 1997 23:32:26 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id XAA11556; Tue, 18 Mar 1997 23:22:31 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id XAA01709; Tue, 18 Mar 1997 23:22:29 -0800
Received: from taipei.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id XAA04145; Tue, 18 Mar 1997 23:22:29 -0800
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id XAA23474; Tue, 18 Mar 1997 23:20:56 -0800
Date: Tue, 18 Mar 1997 23:20:56 -0800
From: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199703190720.XAA23474@taipei.eng.sun.com>
To: vern@ee.lbl.gov
Subject: Re: draft description of "No slow start after timeout"
Cc: tcp-impl@relay.engr.SGI.COM
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>If these "aggressive" delayed ACK algorithms are the same as I've observed
>(discussed in my draft paper, by the way), then the fundamental problem is
>instead that the algorithm violates RFC 1122, section 4.2.3.2, which limits
>delayed acks to 500 msec or two packets (whichever comes first).

Right. But my suggestion aims at "working around" a performance
problem caused by a remote client delaying acks too aggressively.

>these acking policies, too. They lead to brittle performance in the presence
>of loss along the return path; they elicit very bursty traffic (even worse
>if senders count bytes and not packets for opening cwnd during slow start,
>as you suggest)

I don't agree that counting bytes can induce more burstiness. The
real culprit is excessive delayed acks. Either way (counting bytes or
acks) you'll experience bursty traffic with excessive delayed acks
once the congestion window open up. By counting bytes at least you escape
one devil (only to embrace another one sooner :-)). When the window
size (or BxD) is sufficiently large, slow-start (linearly) can be a
real pain!

Jerry

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 02:31:50 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id CAA07103 for tcp-impl-list; Wed, 19 Mar 1997 02:28:23 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id CAA07098 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 02:28:20 -0800
Received: from labinfo.iet.unipi.it (labinfo.iet.unipi.it [131.114.9.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id CAA15996 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 02:27:31 -0800
Received: from localhost (luigi@localhost) by labinfo.iet.unipi.it (8.6.5/8.6.5) id KAA02363; Wed, 19 Mar 1997 10:29:32 +0100
From: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Message-Id: <199703190929.KAA02363@labinfo.iet.unipi.it>
Subject: Re: draft description of "No initial slow start"
To: vern@ee.lbl.gov (Vern Paxson)
Date: Wed, 19 Mar 1997 10:29:31 +0100 (MET)
Cc: tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199703182321.PAA20942@daffy.ee.lbl.gov> from "Vern Paxson" at Mar 18, 97 03:21:25 pm
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 1675      
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Name of problem: No initial slow start
> 
> Category: Congestion control
> 
> Description:
> 	When a TCP begins transmitting data, it is required by RFC 1122,
> 	4.2.2.15, to engage in a "slow start" by initializing its congestion
> 	window, cwnd, to one packet (one segment of the maximum size).
> 	It subsequently increases cwnd by one packet for each ack it receives
> 	for new data.  A TCP that fails to do so exhibits "No initial slow
> 	start".

FreeBSD 2.1 (and up to 2.1.7 at least) by default disable slow-start
on the "local" network. The relevant code in tcp_input.c FreeBSD 2.1.7

 *      $Id: tcp_input.c,v 1.25.4.7 1996/11/20 18:25:30 pst Exp $

around line 2111:

        /*
         * Don't force slow-start on local network.
         */
        if (!in_localaddr(inp->inp_faddr))
                tp->snd_cwnd = mss;
                
I don't know if this is an acceptable behaviour.

To make thing worse, the default definition of "local", controlled
by the macro SUBNETSARELOCAL in file in.c,  extends to the whole
CLASS_A, CLASS_B or CLASS_C network.

Although this can be overridden by redefining SUBNETSARELOCAL=0 in
the kernel config file, this is not the default in the kernels as
shipped.

The problem might be common to other *BSD releases.

	Cheers
	Luigi
-----------------------------+--------------------------------------
Luigi Rizzo                  |  Dip. di Ingegneria dell'Informazione
email: luigi@iet.unipi.it    |  Universita' di Pisa
tel: +39-50-568533           |  via Diotisalvi 2, 56126 PISA (Italy)
fax: +39-50-568522           |  http://www.iet.unipi.it/~luigi/
_____________________________|______________________________________

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 06:36:46 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id GAA24680 for tcp-impl-list; Wed, 19 Mar 1997 06:35:22 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id GAA24675 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 06:35:20 -0800
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id GAA22057 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 06:35:16 -0800
Received: from ftp.com by ftp.com  ; Wed, 19 Mar 1997 09:31:24 -0500
Received: from mailserv-2high.ftp.com by ftp.com  ; Wed, 19 Mar 1997 09:31:24 -0500
Received: by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id JAA11995; Wed, 19 Mar 1997 09:28:40 -0500
Date: Wed, 19 Mar 1997 09:28:40 -0500
Message-Id: <199703191428.JAA11995@MAILSERV-2HIGH.FTP.COM>
To: vern@ee.lbl.gov
Subject: Re: draft description of "Failure to retain above-sequence data"
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: tcp-impl@relay.engr.SGI.COM
Repository: mailserv-2high.ftp.com, [message accepted at Wed Mar 19 09:28:29 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||Name of problem: Failure to retain above-sequence data
||
||Category: Congestion control, performance
||
||Description:
||        When a TCP receives an "above sequence" segment, meaning one with a
||        sequence number exceeding RCV.NXT but below RCV.NXT+RCV.WND, it
||        SHOULD queue the segment for later delivery (RFC 1122, 4.2.2.20).
||        A TCP that fails to do so is said to exhibit "Failure to retain
||        above-sequence data".
||
||        It may sometimes be appropriate for a TCP to discard above-sequence
||        data to reclaim memory.  If they do so only rarely, then we would
||        not consider them to exhibit this problem.  Instead, the particular
||        concern is with TCPs that always discard above-sequence data.
||
Comment from the old PC/TCP for DOS days.  In a limited memory
environment such as DOS in days gone by, and very possibly
in various network computers, handheld widgets, etc. in  days to
come, a stack has to balence very carefully network behavior with
system behavior.

The old DOS stack was bound by limitations of memory that caused its
default buffer management to have 3-5 MTU sized packets and 10-20
NFS RPC sized packets.  That <10K memory chunk was all that could
be taken from the system environment without affecting system behavior
and performance in other ways.

Our old out of sequence policy was to retain 1 and only one out of sequence
packet provided the SN of that packet was not more than 1 MTU above the
last received & ack'ed packet.  Combined with a carefully selected receive
window of not more than 3*MTU this policy balanced system memory,
window size and lost packets reasonably well.  However, if you upped
the window size to higher multiples of the MTU - say 5*MTU or 8K all
bets were off.

Our algorithm worked because the window size was controlled to reflect the
out of sequence retention policy. It seems to me that modifying your
2nd para. in the SHOULD above slightly to indicate that in cases where 
a TCP host is memory limited - that a combination of window size 
management and limited out of sequence retention can be used to 
balance memory usage and network behavior.

While its easy to dismiss our old DOS memory issues as obselete, the
underlying issues of low memory TCP implementations will undoubtably
resurface from time to time as TCP is pushed onto smaller and cheaper
devices.

L.




From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 07:29:04 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA29627 for tcp-impl-list; Wed, 19 Mar 1997 07:27:18 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA29614 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 07:27:15 -0800
Received: from labinfo.iet.unipi.it (labinfo.iet.unipi.it [131.114.9.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA04728 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 07:21:09 -0800
Received: from localhost (luigi@localhost) by labinfo.iet.unipi.it (8.6.5/8.6.5) id PAA02925; Wed, 19 Mar 1997 15:22:58 +0100
From: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Message-Id: <199703191422.PAA02925@labinfo.iet.unipi.it>
Subject: Re: draft description of "Failure to retain above-sequence data"
To: backman@ftp.com
Date: Wed, 19 Mar 1997 15:22:58 +0100 (MET)
Cc: vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199703191428.JAA11995@MAILSERV-2HIGH.FTP.COM> from "Larry Backman" at Mar 19, 97 09:28:21 am
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 996       
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Comment from the old PC/TCP for DOS days.  In a limited memory
> environment such as DOS in days gone by, and very possibly
> in various network computers, handheld widgets, etc. in  days to
> come, a stack has to balence very carefully network behavior with
> system behavior.

Having hacked (in the good old days) a TCP for DOS myself, I fully
second this.

> While its easy to dismiss our old DOS memory issues as obselete, the
> underlying issues of low memory TCP implementations will undoubtably
> resurface from time to time as TCP is pushed onto smaller and cheaper
> devices.

Agreed.

	Luigi
-----------------------------+--------------------------------------
Luigi Rizzo                  |  Dip. di Ingegneria dell'Informazione
email: luigi@iet.unipi.it    |  Universita' di Pisa
tel: +39-50-568533           |  via Diotisalvi 2, 56126 PISA (Italy)
fax: +39-50-568522           |  http://www.iet.unipi.it/~luigi/
_____________________________|______________________________________

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 08:25:09 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA06294 for tcp-impl-list; Wed, 19 Mar 1997 08:20:58 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA06206 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 08:20:37 -0800
Received: from kalae.kohala.com (kalae.kohala.com [206.62.226.35]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA15864 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 08:18:23 -0800
Received: from kohala.kohala.com (kohala.kohala.com [206.62.226.33]) by kalae.kohala.com (8.8.5/8.7.3) with ESMTP id JAA00384; Wed, 19 Mar 1997 09:14:35 -0700 (MST)
Received: (from rstevens@localhost) by kohala.kohala.com (8.8.5/8.8.3) id JAA18455; Wed, 19 Mar 1997 09:14:34 -0700 (MST)
Message-Id: <199703191614.JAA18455@kohala.kohala.com>
From: rstevens@kohala.com (W. Richard Stevens)
Date: Wed, 19 Mar 1997 09:14:34 -0700
Reply-To: "W. Richard Stevens" <rstevens@kohala.com>
X-Phone: +1 520 297 9416
X-Homepage: http://www.noao.edu/~rstevens
X-Mailer: Mail User's Shell (7.2.6 beta(3) 11/17/96)
To: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu), vern@ee.lbl.gov
Subject: Re: aggressive delayed ACKs
Cc: tcp-impl@relay.engr.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

When you say "aggressive delayed ACKs" I assume you mean the ACKer
is delaying them by large time frames (up to the 500 ms limit) and
not ACKing every other packet?  I normally think of "agressive" as
something done too often, and just want to make sure I understand
the terminology.

	Rich Stevens

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 09:09:42 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA14740 for tcp-impl-list; Wed, 19 Mar 1997 09:08:26 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA14731 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 09:08:23 -0800
Received: from eamail1.unisys.com (eamail1.unisys.com [192.61.103.80]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA23628 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 09:08:16 -0800
Received: from ih85.ea.unisys.com (ih85.ea.unisys.com [192.61.103.85]) by eamail1.unisys.com (8.7.3/8.6.12) with ESMTP id QAA16291 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 16:59:32 GMT
Received: from pl_exchange_1.pl.unisys.com ([192.62.193.232]) by ih85.ea.unisys.com (8.7.3/8.7.3) with SMTP id QAA20688 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 16:59:31 GMT
Received: by pl_exchange_1.pl.unisys.com with SMTP (Microsoft Exchange Server Internet Mail Connector Version 4.0.994.63)
	id <01BC345D.3F15D6D0@pl_exchange_1.pl.unisys.com>; Wed, 19 Mar 1997 12:01:15 -0500
Message-ID: <c=US%a=_ATTMAIL%p=UNISYS%l=RV-EXCHANGE--970319165815Z-19181@pl_exchange_1.pl.unisys.com>
From: "Smith, Allyn D" <Al.Smith@UNISYS.com>
To: "'Vern Paxson'" <vern@ee.lbl.gov>
Cc: "'tcp-impl@relay.engr.SGI.COM'" <tcp-impl@relay.engr.SGI.COM>
Subject: RE: draft description of "No initial slow start"
Date: Wed, 19 Mar 1997 11:58:15 -0500
X-Mailer:  Microsoft Exchange Server Internet Mail Connector Version 4.0.994.63
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


>>	After the third packet, the connection is established.  A, the
>>    connection responder, begins transmitting to B, the connection
>>	initiator.  A quickly sends 6 packets comprising 7812 bytes,
>>	even though the SYN exchange agreed upon an MSS of 1460 bytes
>>	and so A should have sent at most 1460 bytes.

I would like to see the last sentence in the previous paragraph
 clarified to something like this:

>A quickly sends 6 packets comprising 7812 bytes,
>even though the SYN exchange agreed upon an MSS of 1460 bytes
      (implying a congestion window of 1 segment or 1460 bytes)
>and so A should have sent at most 1460 bytes.

As an aside, I observed in your examples that the sending TCP never 
sets the PSH bit except in the first segment. I hope the PSH bit is 
eventually set and is not shown in your traces because these are trace 
fragments and not because the sender is behaving badly. This relates to
an 
issue I had lately with a TCP implementor that did not set the PSH bit
correctly. 
The misbehaving TCP did not always set the PSH bit in the last data
segment 
but still expected that data to be delivered to the application. Our TCP
conforms 
to RFC 1122 section 4.2.2.2 and stages data until either the PSH bit is
set 
or the applications receive buffer is filled. If the sender sends all of
its data 
and does not set the PSH bit, neither of these conditions is met and the
connection is effectively hung. They had tested their product with 
some UNIX boxes that always pass received data to the application
regardless 
of whether the PSH bit is set.

I do not know how wide spread this problem is but caused us some 
considerable grief.

Al Smith
UNISYS Corp.

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 10:07:32 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA29606 for tcp-impl-list; Wed, 19 Mar 1997 10:06:29 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA29592 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 10:06:26 -0800
Received: from labinfo.iet.unipi.it (labinfo.iet.unipi.it [131.114.9.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA10504 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 10:05:53 -0800
Received: from localhost (luigi@localhost) by labinfo.iet.unipi.it (8.6.5/8.6.5) id PAA02925; Wed, 19 Mar 1997 15:22:58 +0100
From: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Message-Id: <199703191422.PAA02925@labinfo.iet.unipi.it>
Subject: Re: draft description of "Failure to retain above-sequence data"
To: backman@ftp.com
Date: Wed, 19 Mar 1997 15:22:58 +0100 (MET)
Cc: vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199703191428.JAA11995@MAILSERV-2HIGH.FTP.COM> from "Larry Backman" at Mar 19, 97 09:28:21 am
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 996       
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Comment from the old PC/TCP for DOS days.  In a limited memory
> environment such as DOS in days gone by, and very possibly
> in various network computers, handheld widgets, etc. in  days to
> come, a stack has to balence very carefully network behavior with
> system behavior.

Having hacked (in the good old days) a TCP for DOS myself, I fully
second this.

> While its easy to dismiss our old DOS memory issues as obselete, the
> underlying issues of low memory TCP implementations will undoubtably
> resurface from time to time as TCP is pushed onto smaller and cheaper
> devices.

Agreed.

	Luigi
-----------------------------+--------------------------------------
Luigi Rizzo                  |  Dip. di Ingegneria dell'Informazione
email: luigi@iet.unipi.it    |  Universita' di Pisa
tel: +39-50-568533           |  via Diotisalvi 2, 56126 PISA (Italy)
fax: +39-50-568522           |  http://www.iet.unipi.it/~luigi/
_____________________________|______________________________________

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 11:03:44 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA21012 for tcp-impl-list; Wed, 19 Mar 1997 11:02:05 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA21004 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 11:02:02 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA28734 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 11:02:02 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id KAA26682; Wed, 19 Mar 1997 10:52:07 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id KAA19535; Wed, 19 Mar 1997 10:52:05 -0800
Received: from taipei.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id KAA12788; Wed, 19 Mar 1997 10:52:05 -0800
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id KAA23716; Wed, 19 Mar 1997 10:50:29 -0800
Date: Wed, 19 Mar 1997 10:50:29 -0800
From: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199703191850.KAA23716@taipei.eng.sun.com>
To: rstevens@kohala.com
Subject: Re: aggressive delayed ACKs
Cc: tcp-impl@relay.engr.SGI.COM
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>When you say "aggressive delayed ACKs" I assume you mean the ACKer
>is delaying them by large time frames (up to the 500 ms limit) and
>not ACKing every other packet?  I normally think of "agressive" as
>something done too often, and just want to make sure I understand

Yes. Perhaps "aggressively (or excessively)-delayed ACKs" is a better term.
I wonder if this is a must for those newer link types that have highly
asymmetric upstream/downstream bandwidth.

Jerry

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 11:13:06 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA23654 for tcp-impl-list; Wed, 19 Mar 1997 11:12:01 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA23339 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 11:10:38 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA00865 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 11:10:38 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id LAA27779; Wed, 19 Mar 1997 11:00:11 -0800
Received: from skybolt.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id LAA21575; Wed, 19 Mar 1997 11:00:08 -0800
Received: by skybolt.eng.sun.com (SMI-8.6/SMI-SVR4)
	id KAA07663; Wed, 19 Mar 1997 10:56:48 -0800
Date: Wed, 19 Mar 1997 10:56:48 -0800
From: Richard.Fox@Eng.Sun.COM (Richard Fox)
Message-Id: <199703191856.KAA07663@skybolt.eng.sun.com>
To: vern@ee.lbl.gov, Al.Smith@UNISYS.com
Subject: RE: draft description of "No initial slow start"
Cc: tcp-impl@relay.engr.SGI.COM
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> 
> As an aside, I observed in your examples that the sending TCP never 
> sets the PSH bit except in the first segment. I hope the PSH bit is 
> eventually set and is not shown in your traces because these are trace 
> fragments and not because the sender is behaving badly. This relates to
> an 
> issue I had lately with a TCP implementor that did not set the PSH bit
> correctly. 
> The misbehaving TCP did not always set the PSH bit in the last data
> segment 
> but still expected that data to be delivered to the application. Our TCP
> conforms 
> to RFC 1122 section 4.2.2.2 and stages data until either the PSH bit is
> set 
> or the applications receive buffer is filled. If the sender sends all of
> its data 
> and does not set the PSH bit, neither of these conditions is met and the
> connection is effectively hung. They had tested their product with 
> some UNIX boxes that always pass received data to the application
> regardless 
> of whether the PSH bit is set.
> 
> I do not know how wide spread this problem is but caused us some 
> considerable grief.


I have seen a number of implementations that seem to depend on the PSH
bit. I am not so sure one should really ever depend on the PSH bit being
set to deliver data to the app. I know the definition of the PSH bit
but I would like to see the RFC amended to say the PSH bit is advisory
on when to deliver data to the app but a stack should deliver data
regardless of the PSH bit.

--rich

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 11:38:50 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA01646 for tcp-impl-list; Wed, 19 Mar 1997 11:37:05 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA01615 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 11:37:02 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA09278 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 11:36:54 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id LAA22874; Wed, 19 Mar 1997 11:26:58 -0800 (PST)
Message-Id: <199703191926.LAA22874@daffy.ee.lbl.gov>
To: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: draft description of "No slow start after timeout"
In-reply-to: Your message of Tue, 18 Mar 1997 23:20:56 PST.
Date: Wed, 19 Mar 1997 11:26:58 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Right. But my suggestion aims at "working around" a performance
> problem caused by a remote client delaying acks too aggressively.

I think working around performance problems due to non-compliant
implementations has considerably less appeal than working around
interoperability problems (such as keep-alives).  It'd really be much
better to fix the non-compliant implementation.

Furthermore, the proposed work-around won't fly: it violates RFC 2001,
which states that slow start is done by increasing cwnd by one MSS
per ack, and not by counting bytes ack'd.  This is not a minor difference,
and I disagree with:

> I don't agree that counting bytes can induce more burstiness.

Counting bytes can make a very large difference.  If the receiver does
ack-every-other, which many do, then the cwnd progression per RTT today
looks something like:

	RTT	acks received	cwnd	segments pending
	0	0		1	0
	1	1		2	0
	2	1		3	0
	3	1		4	1
	4	2		6	1
	5	3		9	1
	6	5		14	0
	7	7		21	0

where "acks received" is how many acks will arrive if the receiver
does ack-every-other and delays single-segment acks long enough that
they carry over to the next RTT ("segments pending").

But with counting bytes it looks like:

	RTT	segments ack'd	cwnd	acks pending
	0	0		1	0
	1	1		2	0
	2	2		4	0
	3	4		8	0
	4	8		16	0
	5	16		32	0
	6	32		64	0
	7	64		128	0

That's a lot burstier!

You can get the same rapid cwnd growth if the receiver acks every segment
instead of every-other, but with the significant benefit that the new data
is much more spread out of the RTT due to self-clocking, so a lot less
bursty.  That case worries me a lot less than the counting-bytes case
(and is standard-compliant).

Consequently, I think we need a good ahead from the IRTF and/or RFC 2001
before advocating counting bytes instead of packets.

> real culprit is excessive delayed acks.

Agreed - need to fix these.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 11:47:28 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA06796 for tcp-impl-list; Wed, 19 Mar 1997 11:46:03 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA06763 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 11:45:55 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA13087 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 11:45:51 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id LAA22921; Wed, 19 Mar 1997 11:35:50 -0800 (PST)
Message-Id: <199703191935.LAA22921@daffy.ee.lbl.gov>
To: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: draft description of "No initial slow start"
In-reply-to: Your message of Wed, 19 Mar 1997 10:29:31 PST.
Date: Wed, 19 Mar 1997 11:35:50 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> FreeBSD 2.1 (and up to 2.1.7 at least) by default disable slow-start
> on the "local" network ...  I don't know if this is an acceptable behaviour.

RFC 2001 mentions:

   Early implementations performed slow start only if the other end was
   on a different network.  Current implementations always perform slow
   start.

but it doesn't quite nail down whether implementations are required
to always perform slow start.  The discussion of mandatory slow start
in RFC 1122 doesn't mention any exceptions for LANs.

> To make thing worse, the default definition of "local", controlled
> by the macro SUBNETSARELOCAL in file in.c,  extends to the whole
> CLASS_A, CLASS_B or CLASS_C network.

Oops!

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 11:58:26 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA10412 for tcp-impl-list; Wed, 19 Mar 1997 11:57:04 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA10403 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 11:57:02 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA16727 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 11:56:58 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id LAA23031; Wed, 19 Mar 1997 11:46:46 -0800 (PST)
Message-Id: <199703191946.LAA23031@daffy.ee.lbl.gov>
To: backman@ftp.com
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: draft description of "Failure to retain above-sequence data"
In-reply-to: Your message of Wed, 19 Mar 1997 09:28:40 PST.
Date: Wed, 19 Mar 1997 11:46:46 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Our algorithm worked because the window size was controlled to reflect the
> out of sequence retention policy. It seems to me that modifying your
> 2nd para. in the SHOULD above slightly to indicate that in cases where 
> a TCP host is memory limited - that a combination of window size 
> management and limited out of sequence retention can be used to 
> balance memory usage and network behavior.

Thanks.  While I appreciate this point, I'm still wondering why in
limited-memory situations one would advertise a window exceeding available
buffer.  Why not just advertise a window equal to the buffer to ensure that
above-sequence data can be retained?

A natural answer is: for added performance.  But the only time a bigger
window buys you performance is for a large bandwidth-delay path.  These are
generally WANs, which are quite prone to packet loss, for which failing to
retain the above-sequence data is potentially a serious congestion
problem.  If the main performance concern is for a LAN, then I'd think it
wouldn't be hard to have enough buffer to support a sufficiently large
window, because due to the smaller bandwidth-delay product the window
doesn't actually have to be very big.  Say it's an Ethernet with a 1 msec
RTT.  Then:

	10 Mbps / (8 b/B) * .001 sec = 1250 bytes is all you need!

- Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 13:38:36 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA03761 for tcp-impl-list; Wed, 19 Mar 1997 13:37:04 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA03721 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 13:37:02 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA12618 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 13:37:00 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id NAA23874; Wed, 19 Mar 1997 13:26:24 -0800 (PST)
Message-Id: <199703192126.NAA23874@daffy.ee.lbl.gov>
To: Al.Smith@UNISYS.com
Cc: tcp-impl@relay.engr.SGI.COM
Subject: PSH (was Re: draft description of "No initial slow start")
In-reply-to: Your message of Wed, 19 Mar 1997 11:58:15 PST.
Date: Wed, 19 Mar 1997 13:26:24 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I would like to see the last sentence in the previous paragraph
>  clarified to something like this:

Sounds good, I've added this.

> As an aside, I observed in your examples that the sending TCP never 
> sets the PSH bit except in the first segment. I hope the PSH bit is 
> eventually set and is not shown in your traces because these are trace 
> fragments and not because the sender is behaving badly.

Yes, the sending TCP in that trace does indeed set PSH now and then.
I just poked through a couple hundred traces I have lying around
(representing maybe six different TCPs) and all of them set PSH at
least once.

> The misbehaving TCP did not always set the PSH bit in the last data
> segment but still expected that data to be delivered to the application.

This looks like it's okay.  RFC 793 (section 3.7, page 41) says:

  The CLOSE user call implies a push function, as does the FIN control
  flag in an incoming segment.

and I didn't see wording in RFC 1122 overruling this, though maybe I missed it.

Rich Fox wrote:

> I have seen a number of implementations that seem to depend on the PSH
> bit.

If this means that they won't ultimately deliver all of the data if
the last segment doesn't include a PSH, then that definitely seems worth
documenting.  (A bit hard to show from a trace, I guess ...)

> ... I would like to see the RFC amended to say the PSH bit is advisory
> on when to deliver data to the app but a stack should deliver data
> regardless of the PSH bit.

Maybe this should instead be worded in terms of: stacks must ultimately
deliver data even if they don't ever receive a PSH?  (Because in 793,
beginning of section 2.8, it sounds like PSH is mandatory and not advisory.)

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 14:05:29 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA10527 for tcp-impl-list; Wed, 19 Mar 1997 14:01:41 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA10490 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 14:01:38 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA19161 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 14:01:37 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id NAA18482; Wed, 19 Mar 1997 13:51:00 -0800
Received: from skybolt.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id NAA25158; Wed, 19 Mar 1997 13:50:56 -0800
Received: by skybolt.eng.sun.com (SMI-8.6/SMI-SVR4)
	id NAA07933; Wed, 19 Mar 1997 13:47:34 -0800
Date: Wed, 19 Mar 1997 13:47:34 -0800
From: Richard.Fox@Eng.Sun.COM (Richard Fox)
Message-Id: <199703192147.NAA07933@skybolt.eng.sun.com>
To: Al.Smith@UNISYS.com, vern@ee.lbl.gov
Subject: Re: PSH (was Re: draft description of "No initial slow start")
Cc: tcp-impl@relay.engr.SGI.COM
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 
> > ... I would like to see the RFC amended to say the PSH bit is advisory
> > on when to deliver data to the app but a stack should deliver data
> > regardless of the PSH bit.
> 
> Maybe this should instead be worded in terms of: stacks must ultimately
> deliver data even if they don't ever receive a PSH?  (Because in 793,
> beginning of section 2.8, it sounds like PSH is mandatory and not advisory.)
> 

Yes this sounds very reasonable.

--rich

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 14:07:07 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA11841 for tcp-impl-list; Wed, 19 Mar 1997 14:06:00 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA11829 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 14:05:58 -0800
Received: from bbmail1.unisys.com (192-63-2005.unisys.com [192.63.200.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA20069 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 14:05:42 -0800
Received: from trsvr.tr.unisys.com (trsvr.tr.unisys.com [192.63.216.7]) by bbmail1.unisys.com (8.7.3/8.6.12) with ESMTP id WAA13408 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 22:04:37 GMT
Received: from tr-exchange-2.tr.unisys.com by trsvr.tr.unisys.com (8.6.12/8.6.12) id WAA05614 ; Wed, 19 Mar 1997 22:05:34 GMT
Received: by tr-exchange-2.tr.unisys.com with Microsoft Exchange (IMC 4.0.838.14)
	id <01BC3487.D2808A80@tr-exchange-2.tr.unisys.com>; Wed, 19 Mar 1997 17:06:01 -0500
Message-ID: <c=US%a=_ATTMAIL%p=UNISYS%l=RV-EXCHANGE--970319220308Z-20026@tr-exchange-2.tr.unisys.com>
From: "Smith, Allyn D" <Al.Smith@UNISYS.com>
To: "'Vern Paxson'" <vern@ee.lbl.gov>
Cc: "'tcp-impl@relay.engr.SGI.COM'" <tcp-impl@relay.engr.SGI.COM>
Subject: RE: PSH (was Re: draft description of "No initial slow start")
Date: Wed, 19 Mar 1997 17:03:08 -0500
X-Mailer:  Microsoft Exchange Server Internet Mail Connector Version 4.0.838.14
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


>>> The misbehaving TCP did not always set the PSH bit in the last data
>>> segment but still expected that data to be delivered to the application.
>
>>This looks like it's okay.  RFC 793 (section 3.7, page 41) says:
>
>> The CLOSE user call implies a push function, as does the FIN control
>> flag in an incoming segment.

Neither the local nor remote applications closed the connections. If
that had 
happened, everything would have worked out fine. This was an interactive

application data exchange. The remote application sent the data without 
the PSH bit and was waiting for a reply from the local application.
However, 
the local TCP was waiting for the PSH bit (or local CLOSE or FIN from
peer) 
which never came. This caused a connection deadlock.

>>and I didn't see wording in RFC 1122 overruling this, though maybe I
>>>missed it.

RFC 1122, section 4.2.2.2, first paragraph:
"...when a series of segments is received without the PSH bit, a TCP may
queue the data internally without passing it to the receiving
application."

third paragraph:
"If PSH flags (on application send calls) are not implemented, then the
sending 
TCP: (1)....(2) MUST set the PSH bit in the last buffered segment (i.e.,
when 
there is no more queued data to be sent."

>>Rich Fox wrote:
>
>>> I have seen a number of implementations that seem to depend on the PSH
>>> bit.
>
>>If this means that they won't ultimately deliver all of the data if
>>the last segment doesn't include a PSH, then that definitely seems worth
>>documenting.  (A bit hard to show from a trace, I guess ...)

It seems to me, that the sender MUST set the PSH bit in the last data 
segment to be delivered to an application (unless a FIN bit is set).
If it's an interactive application, it's difficult to know when to give
the data
to the application if the PSH bit is not set. Short of inventing a
timer, I don't know 
what the algorithm would be.

>>> ... I would like to see the RFC amended to say the PSH bit is advisory
>>> on when to deliver data to the app but a stack should deliver data
>>> regardless of the PSH bit.
>
>>Maybe this should instead be worded in terms of: stacks must ultimately
>>deliver data even if they don't ever receive a PSH?  (Because in 793,
>>beginning of section 2.8, it sounds like PSH is mandatory and not advisory.)

Again, if a receiving TCP can't depend on the sender to set the PSH bit,
what 
would the algorithm be on when to give data to the application?

Al Smith
>

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 14:09:01 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA12489 for tcp-impl-list; Wed, 19 Mar 1997 14:08:02 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA12469 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 14:08:00 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA20891 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 14:07:58 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id NAA24090; Wed, 19 Mar 1997 13:57:59 -0800 (PST)
Message-Id: <199703192157.NAA24090@daffy.ee.lbl.gov>
To: Richard.Fox@Eng.Sun.COM (Richard Fox)
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: PSH (was Re: draft description of "No initial slow start")
In-reply-to: Your message of Wed, 19 Mar 1997 13:47:34 PST.
Date: Wed, 19 Mar 1997 13:57:59 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Yes this sounds very reasonable.

Okay, I've started a list of items for input to the IRTF, and put this on
it.  I'm going to wait on turning this into a draft document until more
items are suggested (in particular, I'm not going to try to put together a
draft before Memphis, unless it sounds like everyone's bursting with
pressing IRTF issues).

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 14:25:52 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA16287 for tcp-impl-list; Wed, 19 Mar 1997 14:23:52 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA16267 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 14:23:49 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA24745 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 14:23:46 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id OAA24193; Wed, 19 Mar 1997 14:13:17 -0800 (PST)
Message-Id: <199703192213.OAA24193@daffy.ee.lbl.gov>
To: "Smith, Allyn D" <Al.Smith@UNISYS.com>
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: PSH (was Re: draft description of "No initial slow start")
In-reply-to: Your message of Wed, 19 Mar 1997 17:03:08 PST.
Date: Wed, 19 Mar 1997 14:13:17 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Neither the local nor remote applications closed the connections.

Ah, I didn't realize that - I agree, that's a definite implementation bug,
clearly the sender must set PSH.

Can you capture a trace illustrating the problem?

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 15:07:00 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA24803 for tcp-impl-list; Wed, 19 Mar 1997 15:02:13 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA24787 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 15:02:11 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA03867 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 15:02:06 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id OAA25475; Wed, 19 Mar 1997 14:52:10 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id OAA08043; Wed, 19 Mar 1997 14:52:07 -0800
Received: from fstop. by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id OAA16763; Wed, 19 Mar 1997 14:51:45 -0800
Received: from fstop by fstop. (SMI-8.6/SMI-SVR4)
	id OAA13048; Wed, 19 Mar 1997 14:51:30 -0800
Message-Id: <199703192251.OAA13048@fstop.>
From: sparker@Eng.Sun.COM
To: "Smith, Allyn D" <Al.Smith@UNISYS.com>
cc: "'tcp-impl@relay.engr.SGI.COM'" <tcp-impl@relay.engr.SGI.COM>
Subject: Re: PSH (was Re: draft description of "No initial slow start") 
Date: Wed, 19 Mar 1997 14:51:30 -0800
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


- Neither the local nor remote applications closed the connections. If that had 
- happened, everything would have worked out fine. This was an interactive
- application data exchange. The remote application sent the data without 
- the PSH bit and was waiting for a reply from the local application.
- However, the local TCP was waiting for the PSH bit (or local CLOSE or FIN from
- peer) which never came. This caused a connection deadlock.

It seems to me such a receiver is failing to be conservative in what it
accepts.  It has data, and it refuses to deliver it because no PSH bit
is present, yet we know it is acceptable for a TCP to deliver data without
the PSH bit being set.

- Again, if a receiving TCP can't depend on the sender to set the PSH bit,
- what would the algorithm be on when to give data to the application?

How about everything it currently has, assuming the application still offers
receive socket buffer space?  On our systems, coalescing occurs at the
granularity of the read()'s done by the application, not in TCP.  Every
TCP segment which arrived since the last read, assuming it fits in the
user's buffer, is delivered.

Cheers,

	~sparker

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 16:19:07 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA12208 for tcp-impl-list; Wed, 19 Mar 1997 16:16:33 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA12190 for <tcp-impl@relay.engr.sgi.com>; Wed, 19 Mar 1997 16:16:30 -0800
Received: from thoth.cs.ohiou.edu (thoth.cs.ohiou.edu [132.235.3.135]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA22870 for <tcp-impl@relay.engr.sgi.com>; Wed, 19 Mar 1997 16:16:28 -0800
Received: from thoth.cs.ohiou.edu by thoth.cs.ohiou.edu (8.6.11/1.930630)
	id AAA06226; Thu, 20 Mar 1997 00:00:41 GMT
Message-Id: <199703200000.AAA06226@thoth.cs.ohiou.edu>
To: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Cc: vern@ee.lbl.gov, tcp-impl@relay.engr.sgi.com,
        "Chris Hayes" <chayes@oucsace.cs.OhioU.Edu>,
        "Shawn Ostermann" <sdo@picard.cs.ohiou.edu>
From: Mark Allman <mallman@oucsace.cs.ohiou.edu>
Reply-To: mallman@oucsace.cs.ohiou.edu
Subject: Re: draft description of "No slow start after timeout" 
Date: Wed, 19 Mar 1997 19:00:40 -0500
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> In a TCP implementation with a more aggressive delayed ACK
> algorithm, the congestion window may open up linearly instead of
> exponentially.  For a large window this can take a lot of round
> trip time. A simple fix (other than to reduce delayed ACK) is to
> count the # of bytes ack'ed, instead of # of ack packets when
> growing the congestion window.

We tested this window increase algorithm.  The results are in a
draft available at:

    http://jarok.cs.ohiou.edu/papers
    (the slow start draft)

Essentially we found no harmful effects when using this window
increase algorithm.

We are considering a further restriction suggested by Sally Floyd.
That is, the window increase can be no more than 2 MSS bytes.  This
will not hurt a receiver that is ACKing according to the standard.
But, it will help limit the burstiness if the receiver is not.

allman


 

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 16:19:28 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA12628 for tcp-impl-list; Wed, 19 Mar 1997 16:17:48 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA12537 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 16:17:38 -0800
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id QAA23142 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 16:17:34 -0800
Received: from rock.mentat.com ([192.88.122.136]) by mentat.com (4.1/SMI-4.1)
	id AA10152; Wed, 19 Mar 97 16:12:48 PST
Date: Wed, 19 Mar 97 16:12:48 PST
From: jt@mentat.com (Jerry Toporek)
Message-Id: <9703200012.AA10152@mentat.com>
To: Al.Smith@UNISYS.com, vern@ee.lbl.gov
Subject: Re: PSH (was Re: draft description of "No initial slow start")
Cc: tcp-impl@relay.engr.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> 
> > Neither the local nor remote applications closed the connections.
> 
> Ah, I didn't realize that - I agree, that's a definite implementation bug,
> clearly the sender must set PSH.

So you are saying that the sender has to set the PSH bit just before the
receiver closes the window all the way, even if it has unsent data queued
and ready to go?  The "definite implementation bug" is on the receiver side.

jt


From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 16:40:05 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA17500 for tcp-impl-list; Wed, 19 Mar 1997 16:38:09 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA17466 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 16:38:07 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA27905 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 16:38:05 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id QAA24580; Wed, 19 Mar 1997 16:28:09 -0800 (PST)
Message-Id: <199703200028.QAA24580@daffy.ee.lbl.gov>
To: jt@mentat.com (Jerry Toporek)
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: PSH (was Re: draft description of "No initial slow start")
In-reply-to: Your message of Wed, 19 Mar 1997 16:12:48 PST.
Date: Wed, 19 Mar 1997 16:28:09 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> So you are saying that the sender has to set the PSH bit just before the
> receiver closes the window all the way, even if it has unsent data queued
> and ready to go?  The "definite implementation bug" is on the receiver side.

No, I meant the opposite: if the sender doesn't have any more data to send,
it needs to set PSH.  The receiver's window doesn't enter into it, it's
just the question of whether the sender at the moment could not possibly
send any more data.

Where I'm coming from is RFC 1122, 4.2.2.2 p 82:

            A TCP MAY implement PUSH flags on SEND calls.  If PUSH flags
            are not implemented, then the sending TCP: (1) must not
            buffer data indefinitely, and (2) MUST set the PSH bit in
            the last buffered segment (i.e., when there is no more
            queued data to be sent).

In particular, item (2).  However, I'm confused as to whether all data
transmission is via "SEND calls", or if those are just one type of
transmission, and there are others.

So the problem as I see it is:

	Time	Dir.	What
	1.0	A > B	data 1:50
	1.1	B > A	ack 50
	...
	2 eons	A > B	data 51:100

In this example, the transmission of 1:50 should have included PSH,
since at that point A evidently didn't have any more data to send.

If A is not required to set PSH, then B must have a timer to force
delivery of 1:50 to the receiving application, or else interactive
use breaks.

I'm not sure that sparker's approach:

> How about everything it currently has, assuming the application still offers
> receive socket buffer space?  On our systems, coalescing occurs at the
> granularity of the read()'s done by the application, not in TCP.  Every
> TCP segment which arrived since the last read, assuming it fits in the
> user's buffer, is delivered.

helps in this case.  The receiving application on B could've made its
read() call before any data arrived, so it blocked.  The question now
is when to wake it up.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 18:01:30 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA04457 for tcp-impl-list; Wed, 19 Mar 1997 17:59:32 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA04448 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 17:59:30 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA14038 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 17:59:28 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id RAA16532; Wed, 19 Mar 1997 17:49:33 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id RAA13099; Wed, 19 Mar 1997 17:49:31 -0800
Received: from taipei.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id RAA19349; Wed, 19 Mar 1997 17:49:31 -0800
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id RAA23884; Wed, 19 Mar 1997 17:47:56 -0800
Date: Wed, 19 Mar 1997 17:47:56 -0800
From: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199703200147.RAA23884@taipei.eng.sun.com>
To: vern@ee.lbl.gov
Subject: Re: draft description of "No slow start after timeout"
Cc: tcp-impl@relay.engr.SGI.COM
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>> I don't agree that counting bytes can induce more burstiness.
>
>Counting bytes can make a very large difference.  If the receiver does
>ack-every-other, which many do, then the cwnd progression per RTT today
>looks something like:

If you have a simple delayed ack algorithm of acking every N packets,
no matter which way you count (acks or bytes), you'll end up with a
burstiness of N once you've got out of the slow-start phase. But by
counting acks it'll take Rlog(W) with base (N+1)/N where R is the
round-trip time and W is the window size in packets. N=1 gives Rlog2(W),
the one described in VJ's 88' paper. When W is large, with an aggressive
delayed ack of N >> 2, log(W) can be quite large.

E.g. W = 100 packets,	N=1 => log(W) = 6.6 (log base 2)
			N=2 => log(W) = 11.4 (log base 1.5)
			N=10 => log(W) = 48.3 (log base 1.1)

Counting bytes always gives Rlog2(W). It appears to be contributing
to the burstiness of the traffic because it helps to get out of
slow-start quicker, which should be considered a merit, not fault (IMHO).

>Furthermore, the proposed work-around won't fly: it violates RFC 2001,
>which states that slow start is done by increasing cwnd by one MSS
>per ack, and not by counting bytes ack'd.  This is not a minor difference,
>and I disagree with:

I'm pretty much aware of what the various RFCs say. But I thought we're
also in the business of making recommendation for amendments if deemed
necessary. Are we not?

Jerry


From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 18:13:57 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA06879 for tcp-impl-list; Wed, 19 Mar 1997 18:12:02 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA06869 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 18:12:00 -0800
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id SAA17568 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 18:11:55 -0800
Received: (from mouse@localhost) by Twig.Rodents.Montreal.QC.CA (8.7.5/8.7.3) id VAA22086; Wed, 19 Mar 1997 21:07:57 -0500 (EST)
Date: Wed, 19 Mar 1997 21:07:57 -0500 (EST)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199703200207.VAA22086@Twig.Rodents.Montreal.QC.CA>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: PSH (was Re: draft description of "No initial slow start")
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>>> Neither the local nor remote applications closed the connections.
>> Ah, I didn't realize that - I agree, that's a definite
>> implementation bug, clearly the sender must set PSH.
> So you are saying that the sender has to set the PSH bit just before
> the receiver closes the window all the way, even if it has unsent
> data queued and ready to go?  The "definite implementation bug" is on
> the receiver side.

That's not how I read it.  What I thought the situation was was, the
receiver application buffer is not full, even though all received data
have been copied into it (and receiver is therefore advertising a
full-size window), but the sender has nothing more to send.

If the sender doesn't PSH the last segment it sends in this
circumstance, things sit in this state indefinitely (or at least until
one of the applications or humans running the applications gets fed up
with waiting).  Since the sender cannot know anything about what's
happening on the receiver end, it has to PSH any segment it sends that,
if ACKed immediately, would completely drain its queue of to-be-sent
data.

Which is more or less what RFC1122 says.  Funny, that. :-)

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 18:40:21 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA12201 for tcp-impl-list; Wed, 19 Mar 1997 18:38:26 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA12194 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 18:38:24 -0800
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id SAA21983 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 18:38:20 -0800
Received: from rock.mentat.com ([192.88.122.136]) by mentat.com (4.1/SMI-4.1)
	id AA10929; Wed, 19 Mar 97 18:34:30 PST
Date: Wed, 19 Mar 97 18:34:30 PST
From: jt@mentat.com (Jerry Toporek)
Message-Id: <9703200234.AA10929@mentat.com>
To: vern@ee.lbl.gov
Subject: Re: PSH (was Re: draft description of "No initial slow start")
Cc: tcp-impl@relay.engr.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 
> > So you are saying that the sender has to set the PSH bit just before the
> > receiver closes the window all the way, even if it has unsent data queued
> > and ready to go?  The "definite implementation bug" is on the receiver side.
> 
> No, I meant the opposite: if the sender doesn't have any more data to send,
> it needs to set PSH.  The receiver's window doesn't enter into it, it's
> just the question of whether the sender at the moment could not possibly
> send any more data.

I certainly agree that if the sender has no more data to send, then it must
set the PSH bit.  If it isn't doing that then the sender is more broken than
the receiver.  I saw the term "deadlock" in the problem description, which
I thought implied that neither side could proceed.  If the sender has more
data to send, but is looking at a closed (or silly) window, then it can not
proceed, and is under no obligation to show a PSH bit.

What concerns me is that some of this discussion has seemed to imply that a
trace showing a sender which almost never turns on the PSH bit somehow implies
that there is something wrong with the sender implementation.  This is clearly
not true.  If the sender is faster than the receiver, meaning that the sender
always has unsent data ready to go, then there is no obligation to turn on the
PSH bit until the FIN goes out.  If there really is a receiver that will
refuse to send any data upstream until it sees a PSH bit, and will in fact
let its receive window shrink to zero as a result, then I hope we can all agree
that this is a receiver problem, not a sender problem.

As for the question of what the receiver should do that is better, well almost
anything, right?  Send data upstream immediately...  Count bytes and send data
upstream when a threshold is reached...  Send everything upstream if a timer
goes off...  Don't hold more than half the window...  All of the above!

jt


From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 19:39:42 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA20278 for tcp-impl-list; Wed, 19 Mar 1997 19:38:15 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA20273 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 19:38:13 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id TAA01704 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 19:38:11 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id TAA24981; Wed, 19 Mar 1997 19:28:09 -0800 (PST)
Message-Id: <199703200328.TAA24981@daffy.ee.lbl.gov>
To: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: draft description of "No slow start after timeout"
In-reply-to: Your message of Wed, 19 Mar 1997 17:47:56 PST.
Date: Wed, 19 Mar 1997 19:28:08 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Counting bytes always gives Rlog2(W). It appears to be contributing
> to the burstiness of the traffic because it helps to get out of
> slow-start quicker, which should be considered a merit, not fault (IMHO).

It is a merit from the perspective of performance.  It is a potential fault
from the perspective of burstiness.  There's a constant tension between
performance and congestion avoidance.  My observation is that if counting
bytes for slow start were deployed in today's Internet, traffic would get
significantly burstier - it's not clear that this is a good thing.  There
was recently a lot of haggling on end2end-interest over increasing the
initial value of cwnd, and that's just a one-time burst, rather than the
ongoing (for the duration of slow start) burstiness that counting bytes
would lead to.

> I'm pretty much aware of what the various RFCs say. But I thought we're
> also in the business of making recommendation for amendments if deemed
> necessary. Are we not?

Yes, but the "deemed necessary" is in terms of clarifying ambiguities and
fixing broken or problematic specs.  So far, from a TCP implementation
perspective, the only argument we've heard for counting bytes is to improve
performance when sending data to a non-conformant receiver.  That doesn't
seem too compelling.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 19 19:47:58 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA21462 for tcp-impl-list; Wed, 19 Mar 1997 19:46:28 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA21448 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 19:46:22 -0800
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id TAA02872 for <tcp-impl@relay.engr.SGI.COM>; Wed, 19 Mar 1997 19:46:14 -0800
Received: from ftp.com by ftp.com  ; Wed, 19 Mar 1997 22:42:21 -0500
Received: from mailserv-2high.ftp.com by ftp.com  ; Wed, 19 Mar 1997 22:42:21 -0500
Received: by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id WAA10987; Wed, 19 Mar 1997 22:39:36 -0500
Date: Wed, 19 Mar 1997 22:39:36 -0500
Message-Id: <199703200339.WAA10987@MAILSERV-2HIGH.FTP.COM>
To: vern@ee.lbl.gov
Subject: Re: draft description of "Failure to retain above-sequence data"
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: tcp-impl@relay.engr.SGI.COM
Repository: mailserv-2high.ftp.com, [message accepted at Wed Mar 19 22:39:35 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||Thanks.  While I appreciate this point, I'm still wondering why in
||limited-memory situations one would advertise a window exceeding available
||buffer.  Why not just advertise a window equal to the buffer to ensure that
||above-sequence data can be retained?
||
because these were younger and more innocent days prior to the
Internet being on the front page of Business Week :-).  Your absolutely
right, and it was an issue we discussed but never implemented for as you 
say:

||A natural answer is: for added performance.  
yes.
||window buys you performance is for a large bandwidth-delay path.  These are
||generally WANs, which are quite prone to packet loss, for which failing to
||retain the above-sequence data is potentially a serious congestion
||problem.  If the main performance concern is for a LAN, then I'd think it
||wouldn't be hard to have enough buffer to support a sufficiently large
||window, because due to the smaller bandwidth-delay product the window
||doesn't actually have to be very big.  Say it's an Ethernet with a 1 msec
||RTT.  Then:
Aha - in simple single connection cases - FTP as an example, no
problem - however in complex multiconnection cases - FTP to an NFS
drive there weren't enough buffers to go around.

Again - I think the point is important to reemphasize, as its going to
resurface again in network widgets - the stack model we had to deal
w/ in those days was a model that supported a default of 5-10 concurrent
connections (X ran on DOS back then..) with a buffer pool of 3-5 MTU
sized packets and 10'ish RPC sized packets.

The more I think thru the issue the more I realize that  our receiving
one out of sequence packet was correct, if (which we didn;t do..) we
managed the window size more agressively as someone already suggested.

Perhaps wording to the effect that in low memory cases where out
of sequence packet retention is limited, the window should be shrunk to
avoid dropping incoming packets.

L.



From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 20 07:14:16 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA22663 for tcp-impl-list; Thu, 20 Mar 1997 07:12:59 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA22658 for <tcp-impl@relay.engr.SGI.COM>; Thu, 20 Mar 1997 07:12:57 -0800
Received: from eamail1.unisys.com (eamail1.unisys.com [192.61.103.80]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA08677 for <tcp-impl@relay.engr.SGI.COM>; Thu, 20 Mar 1997 07:12:49 -0800
Received: from ih85.ea.unisys.com (ih85.ea.unisys.com [192.61.103.85]) by eamail1.unisys.com (8.7.3/8.6.12) with ESMTP id PAA10495 for <tcp-impl@relay.engr.SGI.COM>; Thu, 20 Mar 1997 15:12:35 GMT
Received: from ea_ihx102.ea.unisys.com (ihx102.ea.unisys.com [192.61.144.52]) by ih85.ea.unisys.com (8.7.3/8.7.3) with SMTP id PAA01695 for <tcp-impl@relay.engr.SGI.COM>; Thu, 20 Mar 1997 15:12:26 GMT
Received: by ea_ihx102.ea.unisys.com with SMTP (Microsoft Exchange Server Internet Mail Connector Version 4.0.994.63)
	id <01BC3542.D66DE050@ea_ihx102.ea.unisys.com>; Thu, 20 Mar 1997 15:24:44 -0000
Message-ID: <c=US%a=_ATTMAIL%p=UNISYS%l=RV-EXCHANGE--970320151109Z-20822@ea_ihx102.ea.unisys.com>
From: "Smith, Allyn D" <Al.Smith@UNISYS.com>
To: "'tcp-impl@relay.engr.SGI.COM'" <tcp-impl@relay.engr.SGI.COM>
Subject: RE: PSH (was Re: draft description of "No initial slow start")
Date: Thu, 20 Mar 1997 15:11:09 -0000
X-Mailer:  Microsoft Exchange Server Internet Mail Connector Version 4.0.994.63
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Vern and Der Mouse are correct in their understanding of this problem.
The sending TCP MUST set the PSH bit in the last segment that drains 
the output queue. Apparently, Berkeley derived TCPs always pass 
received input immediately to an application regardless of the PSH 
bit. TCPs that don't immediately pass the input must rely on the PSH 
bit to deliver data to an application or the connection is deadlocked.
Passing data immediately upon receipt, or inventing a timer to push 
the data to the application is not a good idea for large servers that 
support 10s of thousands of connections.

jt writes:
>What concerns me is that some of this discussion has seemed to imply that a
>trace showing a sender which almost never turns on the PSH bit somehow
implies
>that there is something wrong with the sender implementation.  This is
clearly
>not true.  If the sender is faster than the receiver, meaning that the sender
>always has unsent data ready to go, then there is no obligation to turn on
the
>PSH bit until the FIN goes out. 

You are correct. What I meant to imply was that the sending TCP must set
the PSH 
bit in the last segment that drains the output queue. In fact, a sending
TCP NEVER 
has to set the PSH bit since the FIN implies a push.  

jt also writes:
>If there really is a receiver that will
>refuse to send any data upstream until it sees a PSH bit, and will in fact
>let its receive window shrink to zero as a result, then I hope we can all
agree
>that this is a receiver problem, not a sender problem.

This was not the problem. The receivers receive window never shrunk. The
receiver had no data to send. The receiver was waiting for the PSH or
enough
data to fill the receive buffer before giving it to the application. The
local 
application couldn't send data until it had received the data TCP was
sitting on.

Regards,

Al Smith
>

From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 20 15:22:14 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA04912 for tcp-impl-list; Thu, 20 Mar 1997 15:20:47 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA04907 for <tcp-impl@relay.engr.SGI.COM>; Thu, 20 Mar 1997 15:20:46 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA18567 for <tcp-impl@relay.engr.SGI.COM>; Thu, 20 Mar 1997 15:20:35 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id XAA20112; Thu, 20 Mar 1997 23:14:17 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0w7qWL-0005FcC; Thu, 20 Mar 97 22:41 GMT
Message-Id: <m0w7qWL-0005FcC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: draft description of "Failure to retain above-sequence data"
To: backman@ftp.com
Date: Thu, 20 Mar 1997 22:41:17 +0000 (GMT)
Cc: vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199703200339.WAA10987@MAILSERV-2HIGH.FTP.COM> from "Larry Backman" at Mar 19, 97 10:39:36 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Perhaps wording to the effect that in low memory cases where out
> of sequence packet retention is limited, the window should be shrunk to
> avoid dropping incoming packets.

Shrinking the window is frowned upon by 1122 however, and most stacks getting
multiple window offers for the same sequence will assume the smaller ones are
delayed out of order updates and bin them. 

The low memory one turns up in other ways with buffer overheads in bigger
stacks. Sending 20,000 1byte tcp frames to most stacks tends to increase the
resources committed quite considerably over 15 1500 byte frames.

Alan


From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 03:51:59 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id DAA15860 for tcp-impl-list; Fri, 21 Mar 1997 03:50:24 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id DAA15850 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 03:50:21 -0800
Received: from fly.cnuce.cnr.it (fly.cnuce.cnr.it [131.114.192.86]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id DAA22905 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 03:50:09 -0800
Received: by fly.cnuce.cnr.it (Smail3.1.26.7 #3)
	id m0w82nR-0003YsC; Fri, 21 Mar 97 12:47 MET
Message-Id: <m0w82nR-0003YsC@fly.cnuce.cnr.it>
Date: Fri, 21 Mar 97 12:47 MET
From: Francesco Potorti` <F.Potorti@cnuce.cnr.it>
To: tcp-impl@relay.engr.SGI.COM
In-reply-to: <m0w7qWL-0005FcC@lightning.swansea.linux.org.uk> (alan@lxorguk.ukuu.org.uk)
Subject: Re: draft description of "Failure to retain above-sequence data"
Organization: CNUCE-CNR, Via S.Maria 36, Pisa - Italy +39-50-593211
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   > Perhaps wording to the effect that in low memory cases where out
   > of sequence packet retention is limited, the window should be
   > shrunk to avoid dropping incoming packets.
   
   Shrinking the window is frowned upon by 1122 however, and most
   stacks getting multiple window offers for the same sequence will
   assume the smaller ones are delayed out of order updates and bin
   them.

Is it reasonable for stacks to throw away smaller window offers for
the same sequence?  That's what BSD 4.4 does, but this goes against
rfc793, and frustrates a receiver's attempt to shrink the window when
the line is idle, which is probably the most sane case when a limited
memory receiver should shrink its receive window.

-- 
Francesco Potorti` (researcher)        Voice:    +39-50-593203
Computer Network Division              Operator: +39-50-593211
CNUCE-CNR, Via Santa Maria 36          Fax:      +39-50-904052
56126 Pisa - Italy                     Email:    F.Potorti@cnuce.cnr.it

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 08:01:56 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA09895 for tcp-impl-list; Fri, 21 Mar 1997 08:00:40 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA09819 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 08:00:32 -0800
Received: from mail1.digital.com (mail1.digital.com [204.123.2.50]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id IAA02470 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 08:00:30 -0800
Received: from pachyderm.pa.dec.com by mail1.digital.com (5.65 EXP 4/12/95 for V3.2/1.0/WV)
	id AA07042; Fri, 21 Mar 1997 07:51:45 -0800
Received: by pachyderm.pa.dec.com; id AA20564; Fri, 21 Mar 1997 07:51:53 -0800
Date: Fri, 21 Mar 1997 07:51:53 -0800
From: jg@pa.dec.com (Jim Gettys)
Message-Id: <9703211551.AA20564@pachyderm.pa.dec.com>
X-Mailer: Pachyderm (client tunsrv2-tunnel.imc.das.dec.com)
To: Francesco Potorti` <F.Potorti@cnuce.cnr.it>
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: draft description of "Failure to retain above-sequence data"
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


There are times that applications want to reduce buffering, which, as 
I understand TCP (as a user and implementer of protocols based on TCP, 
rather than implementer of TCP), might result in smaller windows being 
advertized.  A situation where the buffering in TCP is not under my control, 
by the implementation ignoring my advice, is NOT a good situation, in 
my opinion. 
				- Jim Gettys







From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 09:28:54 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA28430 for tcp-impl-list; Fri, 21 Mar 1997 09:26:20 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA28426 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 09:26:18 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id JAA25841 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 09:26:16 -0800
From: touch@ISI.EDU
Received: from ash.isi.edu by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA29601>; Fri, 21 Mar 1997 09:22:29 -0800
Date: Fri, 21 Mar 1997 09:22:28 -0800
Posted-Date: Fri, 21 Mar 1997 09:22:28 -0800
Message-Id: <199703211722.AA06131@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA06131>; Fri, 21 Mar 1997 09:22:28 -0800
To: F.Potorti@cnuce.cnr.it, jg@pa.dec.com
Subject: Re: draft description of "Failure to retain above-sequence data"
Cc: tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From owner-tcp-impl@relay.engr.SGI.COM Fri Mar 21 08:02:48 1997
> Date: Fri, 21 Mar 1997 07:51:53 -0800
> From: jg@pa.dec.com (Jim Gettys)
> To: Francesco Potorti` <F.Potorti@cnuce.cnr.it>
> Cc: tcp-impl@relay.engr.SGI.COM
> Subject: Re: draft description of "Failure to retain above-sequence data"
> 
> 
> There are times that applications want to reduce buffering, which, as 
> I understand TCP (as a user and implementer of protocols based on TCP, 
> rather than implementer of TCP), might result in smaller windows being 
> advertized.  A situation where the buffering in TCP is not under my control, 
> by the implementation ignoring my advice, is NOT a good situation, in 
> my opinion. 
> 				- Jim Gettys

Jim,

This sounds like a great idea, but it might be outside the 
scope of tcp-impl, which deals with errors in implementation,
rather than proposed changes to the specification.

PS - Since the window size is related to the round-trip time, 
this won't help the application increase responsiveness,
it only decreases the application throughput. Can you give an
example of its benefit?

It seems like fixing the PSH (push) bit is a more direct fix, which
too is not specified sufficiently right now. That bit is intended
to flush buffers out at the sender, and force received data up
to the application at the receiver. Unfortunately, the API doesn't
specify a preemptive callback or interrupt at the receiver when PSH
is received.

(tcp-impl's - that's my understanding of the status of that bit -
please emend if necessary)

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 10:14:10 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA11223 for tcp-impl-list; Fri, 21 Mar 1997 10:11:55 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA11205 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 10:11:53 -0800
Received: from mail1.digital.com (mail1.digital.com [204.123.2.50]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id KAA09919 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 10:11:52 -0800
Received: from pachyderm.pa.dec.com by mail1.digital.com (5.65 EXP 4/12/95 for V3.2/1.0/WV)
	id AA14739; Fri, 21 Mar 1997 10:01:39 -0800
Received: by pachyderm.pa.dec.com; id AA22563; Fri, 21 Mar 1997 10:01:37 -0800
Date: Fri, 21 Mar 1997 10:01:37 -0800
From: jg@pa.dec.com (Jim Gettys)
Message-Id: <9703211801.AA22563@pachyderm.pa.dec.com>
X-Mailer: Pachyderm (client tunsrv2-tunnel.imc.das.dec.com)
To: touch@ISI.EDU
Cc: F.Potorti@cnuce.cnr.it, tcp-impl@relay.engr.SGI.COM
Subject: Re: draft description of "Failure to retain above-sequence data"
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

I'll give an example....  (in the HTTP context)...

Say your TCP can have 32Kbytes in transit (is advertizing such a window), 
to a client on a slow line. I think, for example, my Alpha advertizes 
a window of that order. It may mean it takes 5-10 seconds to empty a buffer 
once the server has sent a window full of data (given current modems).

Similar situations (in fact worse) can occur with HTTP/1.0's multiple 
connection hackery; in it, I might have 4 (or more) windows worth of data 
in flight.

This means that not much is going to happen, from a client's point of view,
until that data trickles over the low speed (e.g. 28.8K or slower) modem.
It is sitting in memory on the router driving the PPP connection....

Not a nice situation....

About the only control I have is to set the socket buffer size lower.
(if I want to lower latency).

If that gets ignored (cause the TCP throws away smaller (or larger), 
I've lost what little control I have, which may need to vary with time.

In any case, this is from the point of view of someone who is not
a true TCP guru; it is from my understanding of its behavior.

So, as usual, this is a latency versus performance problem.

		Hope this explanation helps,
				Jim

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 10:30:18 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA16272 for tcp-impl-list; Fri, 21 Mar 1997 10:27:59 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA16262 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 10:27:57 -0800
Received: from socks2.raleigh.ibm.com (socks2.raleigh.ibm.com [204.146.167.123]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id KAA14225 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 10:27:53 -0800
Received: from rtpdce02.raleigh.ibm.com by socks2.raleigh.ibm.com (AIX 4.1/UCB 5.64/RTP-FW1.0)
          id AA18746; Fri, 21 Mar 1997 13:23:54 -0500
Received: from ludwigia.raleigh.ibm.com (ludwigia.raleigh.ibm.com [9.37.83.125]) by rtpdce02.raleigh.ibm.com (8.7.3/8.7.3/RTP-ral-1.0) with SMTP id NAA53412; Fri, 21 Mar 1997 13:23:49 -0500
Received: from localhost.raleigh.ibm.com by ludwigia.raleigh.ibm.com (AIX 4.1/UCB 5.64/4.03-RAL)
          id AA17712; Fri, 21 Mar 1997 13:23:24 -0500
Message-Id: <9703211823.AA17712@ludwigia.raleigh.ibm.com>
To: touch@ISI.EDU
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: draft description of "Failure to retain above-sequence data" 
In-Reply-To: Your message of "Fri, 21 Mar 1997 09:22:28 PST."
             <199703211722.AA06131@ash.isi.edu> 
Date: Fri, 21 Mar 1997 13:23:24 -0400
From: Thomas Narten <narten@raleigh.ibm.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> It seems like fixing the PSH (push) bit is a more direct fix, which
> too is not specified sufficiently right now. That bit is intended
> to flush buffers out at the sender, and force received data up
> to the application at the receiver. Unfortunately, the API doesn't
> specify a preemptive callback or interrupt at the receiver when PSH
> is received.

I think you read a bit much into PSH. PSH means tell the TCP
implementations that they are no longer allowed to continue holding
queued data (at the sender or receiver) in the hopes that by delaying,
more will come along shortly (making subsequent transfer - sender to
receiver, receiver to application - more efficient). I do not
interpret that as saying the OS is required to have TCP to send an OOB
signal telling the application "data is here, read it now" (the Urgent
pointer does that, in the case that it is needed, but that is another
story). If the application isn't ready to read data, the PSH isn't a
factor.

My take on this discussion is that some (sending) implementation
doesn't set the PSH bit when it should, so the reciver buffers data
forever (expecting data or a PSH to come along later if needed). That
is a bug in the sender TCP; the receiver is behaving correctly.

Thomas

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 10:43:39 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA20507 for tcp-impl-list; Fri, 21 Mar 1997 10:42:02 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA20502 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 10:42:01 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id KAA18148 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 10:41:59 -0800
From: touch@ISI.EDU
Received: from ash.isi.edu by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA04455>; Fri, 21 Mar 1997 10:38:12 -0800
Date: Fri, 21 Mar 1997 10:38:11 -0800
Posted-Date: Fri, 21 Mar 1997 10:38:11 -0800
Message-Id: <199703211838.AA07697@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07697>; Fri, 21 Mar 1997 10:38:11 -0800
To: touch@ISI.EDU, jg@pa.dec.com
Subject: TCP buffers
Cc: F.Potorti@cnuce.cnr.it, tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: jg@pa.dec.com (Jim Gettys)
> 
> I'll give an example....  (in the HTTP context)...
> 
> Say your TCP can have 32Kbytes in transit (is advertizing such a window), 
> to a client on a slow line. I think, for example, my Alpha advertizes 
> a window of that order. It may mean it takes 5-10 seconds to empty a buffer 
> once the server has sent a window full of data (given current modems).

There are two seperable issues here. One is window size, the other
is socket buffer size. They are not explicitly related - changing
one has indirect effects on the other, but the socket is not
directly controlling the TCP window.

You can never have more than one window 'in flight' - in the 
network. The rest is stuck somewhere else - in the socket buffers
on the ends. There is no signalling protocol for clients to 
request servers to reduce their socket buffer sizes.

This does not appear to be a TCP issue at all, at that point.

The exception is the PSH - which is supposed to flush data
end-to-end all the way to the application, but sadly the API
doesn't really require this. This is a bug in the spec, not
the implementation.

(the indirect relationship is that the TCP send window never 
becomes larger than the socket buffer, but it can certainly
be smaller).

I agree this is an interesting issue, but probably can be taken 
to the end-2-end group or somewhere else, rather than tcp-impl
(can someone confirm?)

Joe


----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 10:46:37 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA21703 for tcp-impl-list; Fri, 21 Mar 1997 10:45:22 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA21682 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 10:45:19 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id KAA18845 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 10:45:14 -0800
From: touch@ISI.EDU
Received: from ash.isi.edu by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA04629>; Fri, 21 Mar 1997 10:41:19 -0800
Date: Fri, 21 Mar 1997 10:41:18 -0800
Posted-Date: Fri, 21 Mar 1997 10:41:18 -0800
Message-Id: <199703211841.AA07769@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07769>; Fri, 21 Mar 1997 10:41:18 -0800
To: narten@raleigh.ibm.com
Subject: more on TCP buffering
Cc: tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From narten@raleigh.ibm.com Fri Mar 21 10:23:56 1997
> To: touch@ISI.EDU
> Cc: tcp-impl@relay.engr.SGI.COM
> Subject: Re: draft description of "Failure to retain above-sequence data" 
> Date: Fri, 21 Mar 1997 13:23:24 -0400
> From: Thomas Narten <narten@raleigh.ibm.com>
> 
> > It seems like fixing the PSH (push) bit is a more direct fix, which
> > too is not specified sufficiently right now. That bit is intended
> > to flush buffers out at the sender, and force received data up
> > to the application at the receiver. Unfortunately, the API doesn't
> > specify a preemptive callback or interrupt at the receiver when PSH
> > is received.
> 
> I think you read a bit much into PSH. PSH means tell the TCP

I'm basing my statements on what was intended by PSH (from talking
with old-timers involved in the spec), rather than what
is currently implemented.

I agree that this is a spec issue, not an impl issue.

> My take on this discussion is that some (sending) implementation
> doesn't set the PSH bit when it should, so the reciver buffers data
> forever (expecting data or a PSH to come along later if needed). That
> is a bug in the sender TCP; the receiver is behaving correctly.

PSH wouldn't help in this case - the receiving TCP can dump
the data to the socket, but the app need not pick it up.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 10:47:52 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA22190 for tcp-impl-list; Fri, 21 Mar 1997 10:46:51 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA22183 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 10:46:49 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA19493 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 10:46:47 -0800
Received: from LapTop.Simpson.DialUp.Mich.Net (pm265-25.dialip.mich.net [198.110.68.228]) by merit.edu (8.8.5/merit-2.0) with SMTP id NAA26582 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 13:42:58 -0500 (EST)
Date: Fri, 21 Mar 97 15:25:33 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <2247.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Need slow start after idle
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Vern Paxson <vern@ee.lbl.gov>
> > A more common one is "No slow start after idle" described in VJ's
> > revised 88' paper, Appendix C.
>
> This one to my knowledge hasn't been standardized (it's not mentioned
> in RFC 2001), so if that's true, it's out of scope.
>
Since RFC 2001 is only a "proposed" standard, it should be revised to
include slow start after idle before publication as "draft" standard.

That is just as Vital as initial slow start and congestive slow start,
for the same reasons.

Since there is no other IETF WG considering these issues, this WG would
appear to be the one responsible for RFC 2001.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 10:49:27 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA22441 for tcp-impl-list; Fri, 21 Mar 1997 10:48:04 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA22422 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 10:48:02 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA19751 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 10:48:01 -0800
Received: from LapTop.Simpson.DialUp.Mich.Net (pm265-25.dialip.mich.net [198.110.68.228]) by merit.edu (8.8.5/merit-2.0) with SMTP id NAA26577 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 13:42:55 -0500 (EST)
Date: Fri, 21 Mar 97 15:04:18 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <2246.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: cwnd acks -> bytes
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu)
> Counting bytes always gives Rlog2(W). It appears to be contributing
> to the burstiness of the traffic because it helps to get out of
> slow-start quicker, which should be considered a merit, not fault (IMHO).
>
There is also a serious _bug_ when increasing by acks instead of bytes!

When the destination is behind a low bps link (modem), the initial slow
start for many applications is based on short messages (for example,
SMTP), and the next buffers are full sized.  So, the cwnd is increased
by thousands of bytes based on mere dozens of bytes.

The result is that cwnd is too large, and the larger packets are
retransmitted, because several large packets queue for very long times
(> 1 second).

I see this every day!


> I'm pretty much aware of what the various RFCs say. But I thought we're
> also in the business of making recommendation for amendments if deemed
> necessary. Are we not?
>
I thought so, too!

Put me down for a _documented_ change to counting ack'd bytes, with a
limit of 2 MSS per ack.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 10:55:50 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA23833 for tcp-impl-list; Fri, 21 Mar 1997 10:54:01 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA23828 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 10:53:59 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id KAA21410 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 10:53:56 -0800
From: touch@ISI.EDU
Received: from ash.isi.edu by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA05128>; Fri, 21 Mar 1997 10:50:08 -0800
Date: Fri, 21 Mar 1997 10:49:59 -0800
Posted-Date: Fri, 21 Mar 1997 10:49:59 -0800
Message-Id: <199703211849.AA08077@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA08077>; Fri, 21 Mar 1997 10:49:59 -0800
To: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
Subject: Re: Need slow start after idle
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From owner-tcp-impl@relay.engr.SGI.COM Fri Mar 21 10:48:24 1997
> Date: Fri, 21 Mar 97 15:25:33 GMT
> From: "William Allen Simpson" <wsimpson@greendragon.com>
> To: tcp-impl@relay.engr.SGI.COM
> Subject: Need slow start after idle
> 
> > From: Vern Paxson <vern@ee.lbl.gov>
> > > A more common one is "No slow start after idle" described in VJ's
> > > revised 88' paper, Appendix C.
> >
> > This one to my knowledge hasn't been standardized (it's not mentioned
> > in RFC 2001), so if that's true, it's out of scope.
> >
> Since RFC 2001 is only a "proposed" standard, it should be revised to
> include slow start after idle before publication as "draft" standard.
> 
> That is just as Vital as initial slow start and congestive slow start,
> for the same reasons.

After an existing connection, there may be other options 
rather than forcing a new slow-start, e.g., rate estimation
based on prior ACK pacing. It is not clear that slow
start after idle is the only sufficient solution; the revision
should indicate that.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 10:59:04 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA24648 for tcp-impl-list; Fri, 21 Mar 1997 10:57:41 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA24637 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 10:57:39 -0800
Received: from kalae.kohala.com (kalae.kohala.com [206.62.226.35]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA22436 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 10:57:36 -0800
Received: from kohala.kohala.com (kohala.kohala.com [206.62.226.33]) by kalae.kohala.com (8.8.5/8.7.3) with ESMTP id LAA03703 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 11:55:29 -0700 (MST)
Received: (from rstevens@localhost) by kohala.kohala.com (8.8.5/8.8.3) id LAA25887 for tcp-impl@relay.engr.SGI.COM; Fri, 21 Mar 1997 11:55:28 -0700 (MST)
Message-Id: <199703211855.LAA25887@kohala.kohala.com>
From: rstevens@kohala.com (W. Richard Stevens)
Date: Fri, 21 Mar 1997 11:55:28 -0700
Reply-To: "W. Richard Stevens" <rstevens@kohala.com>
X-Phone: +1 520 297 9416
X-Homepage: http://www.noao.edu/~rstevens
X-Mailer: Mail User's Shell (7.2.6 beta(3) 11/17/96)
To: tcp-impl@relay.engr.SGI.COM
Subject: UDP and path MTU discovery
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

(I apologize for sending this to a list with TCP in the name, but I think
I have a better chance of reaching the greatest number of TCP/IP stack
developers on this list, than anywhere else.  Simple question, and e-mail
response directly to me is fine.)

Is anyone aware of a stack that provides feedback to a *UDP* application
when an ICMP "fragmentation needed but DF bit set" is received?  The
few stacks that I have source code access to all process this for a TCP
endpoint (with the stack doing the right thing, as per path MTU discovery)
but it sure looks like a UDP application never receives the message, even
if the UDP socket is connected.  It appears the application must timeout
and retransmit the UDP datagram when this happens, effectively ignoring
the information that IP/ICMP has received.  Am I missing anything?

	Rich Stevens

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 11:03:09 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA25830 for tcp-impl-list; Fri, 21 Mar 1997 11:02:08 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA25813 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 11:02:06 -0800
Received: from mail1.digital.com (mail1.digital.com [204.123.2.50]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id LAA23675 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 11:02:05 -0800
Received: from pachyderm.pa.dec.com by mail1.digital.com (5.65 EXP 4/12/95 for V3.2/1.0/WV)
	id AA22391; Fri, 21 Mar 1997 10:53:56 -0800
Received: by pachyderm.pa.dec.com; id AA26962; Fri, 21 Mar 1997 10:54:04 -0800
Date: Fri, 21 Mar 1997 10:54:04 -0800
From: jg@pa.dec.com (Jim Gettys)
Message-Id: <9703211854.AA26962@pachyderm.pa.dec.com>
X-Mailer: Pachyderm (client tunsrv2-tunnel.imc.das.dec.com)
To: touch@ISI.EDU
Cc: F.Potorti@cnuce.cnr.it, tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP buffers
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Yes, the control is indirect, at best, given the current interfaces
to operating systems....

But application programmers do what they gotta do...  And latency
control on data in flight is certainly a real issue for interactive
network use...  I'd like to control both, ideally...

I suppose I should wander upstairs and beat up Dave Clark on the topic
to raise it in the end-2-end group.
			- Jim

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 11:05:31 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA26423 for tcp-impl-list; Fri, 21 Mar 1997 11:04:12 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA26412 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 11:04:10 -0800
Received: from socks2.raleigh.ibm.com (socks2.raleigh.ibm.com [204.146.167.123]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id LAA23875 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 11:02:59 -0800
Received: from rtpdce01.raleigh.ibm.com by socks2.raleigh.ibm.com (AIX 4.1/UCB 5.64/RTP-FW1.0)
          id AA57662; Fri, 21 Mar 1997 13:59:10 -0500
Received: from ludwigia.raleigh.ibm.com (ludwigia.raleigh.ibm.com [9.37.83.125]) by rtpdce01.raleigh.ibm.com (8.7.3/8.7.3/RTP-ral-1.0) with SMTP id NAA24390; Fri, 21 Mar 1997 13:59:08 -0500
Received: from localhost.raleigh.ibm.com by ludwigia.raleigh.ibm.com (AIX 4.1/UCB 5.64/4.03-RAL)
          id AA15536; Fri, 21 Mar 1997 13:58:41 -0500
Message-Id: <9703211858.AA15536@ludwigia.raleigh.ibm.com>
To: touch@ISI.EDU
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: more on TCP buffering 
In-Reply-To: Your message of "Fri, 21 Mar 1997 10:41:18 PST."
             <199703211841.AA07769@ash.isi.edu> 
Date: Fri, 21 Mar 1997 13:58:41 -0400
From: Thomas Narten <narten@raleigh.ibm.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > I think you read a bit much into PSH. PSH means tell the TCP

> I'm basing my statements on what was intended by PSH (from talking
> with old-timers involved in the spec), rather than what is currently
> implemented.

I won't claim to be an old timer, but my previous note is based on
what I've always understood PSH to be; this dates back to RFC 1122
days. I've seen nothing in the last week's discussion suggesting that
more than one implementation didn't get this right. I don't see a
problem with the spec. Are we missing each other's point?

> I agree that this is a spec issue, not an impl issue.

I don't see the issue at all. Where is ambiguity in the spec? Which
implementations have interpreted the spec differently? If all
implementations (save 1) have implemented the spec in the same way,
and it hasn't caused operational problems (for those that implemented
it that way), my take is the exception got it wrong, not that the spec
needs revising in a way that causes all the other implementations to
be non-conforming (which is what your suggested change appears to do).

> > My take on this discussion is that some (sending) implementation
> > doesn't set the PSH bit when it should, so the reciver buffers data
> > forever (expecting data or a PSH to come along later if needed). That
> > is a bug in the sender TCP; the receiver is behaving correctly.

> PSH wouldn't help in this case - the receiving TCP can dump
> the data to the socket, but the app need not pick it up.

And PSH won't fix the problem where the application refuses to read
data. You *can't* force an application to read data. PSH is designed
to deal with the case where the client has sent a 1-byte message (that
gets buffered by tcp) the sender has issued a read, and there is true
deadlock (i.e., both sender and receiver are hung, waiting for tcp to
do something). If the read side of a tcp connection doesn't have an
application hanging on a read, there is no deadlock (that TCP can
solve).

Thomas


From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 11:14:02 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA29121 for tcp-impl-list; Fri, 21 Mar 1997 11:12:43 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA29116 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 11:12:42 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA26436 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 11:12:38 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id LAA29226; Fri, 21 Mar 1997 11:02:41 -0800 (PST)
Message-Id: <199703211902.LAA29226@daffy.ee.lbl.gov>
To: Thomas Narten <narten@raleigh.ibm.com>
Cc: tcp-impl@relay.engr.SGI.COM
Subject: PSH / "Failure to retain above-sequence data" 
In-reply-to: Your message of Fri, 21 Mar 1997 13:23:24 PST.
Date: Fri, 21 Mar 1997 11:02:40 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

It looks like two separate threads have now merged into a hybrid.  Here's
my take on untangling it.

One issue is the problem of senders not setting PSH even when they have no
more data to send.  It appears the consensus is that that's simply broken,
and should be documented as an implementation problem.

A quite separate issue arose when I floated the idea of amending the TCP
spec so that receivers MUST NOT routinely discard above-sequence data,
rather than SHOULD NOT.  This led to interesting discussion of scenarios in
which, due to memory constraints, the receiver might need to discard
above-sequence data, and that has evolved into discussion of whether it's
okay to shrink the offered window, instead of advertising more window than
you actually have buffer.  This latter discussion looks like it may drift
into general window-shrinking issues; that's fine, but I don't want to lose
sight of the original question about failing to retain data.

So let me ask:

	1.  Do we have agreement over the sender-should-set-PSH issue,
	    that it's an implementation problem we should document?

	2.  Can we resolve the failure-to-retain by using wording that
	    receivers must not "routinely" fail to do so?

	    This come about because for the implementation where I observed
	    this, as far as it could tell it simply never bothered retaining
	    above-sequence data, even though it had plenty of memory.
	    I suspect this was to simplify the implementation, but it seems
	    clear that because of the bad congestion properties of this
	    behavior, it should be fixed.  So I'm trying to separate 
	    "routinely" doing so from "occasionally".

	    Perhaps a different way to put it is that a TCP must have
	    mechanism in place that allows it to retain a full window's
	    worth of above-sequence data, without delving into specifics
	    of when that mechanism might not be exercised.

- Vern

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 11:21:05 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA00682 for tcp-impl-list; Fri, 21 Mar 1997 11:17:56 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA00675; Fri, 21 Mar 1997 11:17:54 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA27857; Fri, 21 Mar 1997 11:17:49 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id LAA29255; Fri, 21 Mar 1997 11:07:52 -0800 (PST)
Message-Id: <199703211907.LAA29255@daffy.ee.lbl.gov>
To: "William Allen Simpson" <wsimpson@greendragon.com>
Cc: tcp-impl@relay.engr.SGI.COM, sca@refugee.engr.sgi.com
Subject: Re: Need slow start after idle
In-reply-to: Your message of Fri, 21 Mar 1997 15:25:33 PST.
Date: Fri, 21 Mar 1997 11:07:51 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Since there is no other IETF WG considering these issues, this WG would
> appear to be the one responsible for RFC 2001.

Hmmmm, you may be right.  When the WG was formed, the input from the A-D's
was clear that we should focus on implementation problems and not research
issues.  Tweaking RFC 2001 strikes me as straddling the line.

Steve and I should take this up with the Powers That Be, I'll get rolling
on that.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 11:31:48 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA04551 for tcp-impl-list; Fri, 21 Mar 1997 11:30:32 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA04536 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 11:30:30 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id LAA01740 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 11:30:29 -0800
From: touch@ISI.EDU
Received: from ash.isi.edu by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA07968>; Fri, 21 Mar 1997 11:26:34 -0800
Date: Fri, 21 Mar 1997 11:26:32 -0800
Posted-Date: Fri, 21 Mar 1997 11:26:32 -0800
Message-Id: <199703211926.AA09050@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA09050>; Fri, 21 Mar 1997 11:26:32 -0800
To: touch@ISI.EDU, vern@ee.lbl.gov
Subject: Re: Need slow start after idle
Cc: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From vern@ee.lbl.gov Fri Mar 21 11:19:33 1997
> To: touch@ISI.EDU
> Cc: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
> Subject: Re: Need slow start after idle
> Date: Fri, 21 Mar 1997 11:19:31 PST
> From: Vern Paxson <vern@ee.lbl.gov>
> 
> > After an existing connection, there may be other options ...
> 
> Yes - and some of these remain research issues, which is not
> the role of tcp-impl.
> 
> 		Vern

Saying "implementations MUST not leave the window open
in a way that creates line-rate bursts" is much better
than saying "MUST use slow-start restart".

The former leaves room for variations in the implementation
that achieve the desired effect. The latter is too specific.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 11:31:44 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA04198 for tcp-impl-list; Fri, 21 Mar 1997 11:29:40 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA04185 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 11:29:38 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA01521 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 11:29:37 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id LAA29355; Fri, 21 Mar 1997 11:19:31 -0800 (PST)
Message-Id: <199703211919.LAA29355@daffy.ee.lbl.gov>
To: touch@ISI.EDU
Cc: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
Subject: Re: Need slow start after idle
In-reply-to: Your message of Fri, 21 Mar 1997 10:49:59 PST.
Date: Fri, 21 Mar 1997 11:19:31 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> After an existing connection, there may be other options ...

Yes - and some of these remain research issues, which is not
the role of tcp-impl.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 11:37:40 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA05711 for tcp-impl-list; Fri, 21 Mar 1997 11:36:29 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA05704 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 11:36:26 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA03427 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 11:36:23 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id LAA29437; Fri, 21 Mar 1997 11:26:28 -0800 (PST)
Message-Id: <199703211926.LAA29437@daffy.ee.lbl.gov>
To: touch@ISI.EDU, tcp-impl@relay.engr.SGI.COM
Subject: Re: Need slow start after idle
In-reply-to: Your message of Fri, 21 Mar 1997 11:19:31 PST.
Date: Fri, 21 Mar 1997 11:26:28 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > After an existing connection, there may be other options ...
> 
> Yes - and some of these remain research issues, which is not
> the role of tcp-impl.

I should add: my point is that while modest revisions to RFC 2001 might
be within our charter, it gets slippery fast, as Joe illustrates by
showing that a seemingly straight-forward tweak actually is not a
"done deal" from a research perspective.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 11:50:19 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA08294 for tcp-impl-list; Fri, 21 Mar 1997 11:47:53 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA08258 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 11:47:47 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA06045 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 11:47:45 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id LAA29483; Fri, 21 Mar 1997 11:37:47 -0800 (PST)
Message-Id: <199703211937.LAA29483@daffy.ee.lbl.gov>
To: touch@ISI.EDU
Cc: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
Subject: Re: Need slow start after idle
In-reply-to: Your message of Fri, 21 Mar 1997 11:26:32 PST.
Date: Fri, 21 Mar 1997 11:37:46 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Saying "implementations MUST not leave the window open
> in a way that creates line-rate bursts" is much better
> than saying "MUST use slow-start restart".

I like this, and it keeps us out of the research business.  "line-rate
bursts" needs some tweaking, though, as it's not clear exactly what it
means.  The idea is "something less than cwnd all at once", but how to
phrase it so as to draw that line in a meaningful way is tricky.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 12:25:04 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA16070 for tcp-impl-list; Fri, 21 Mar 1997 12:22:50 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA16064 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 12:22:48 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA15834 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 12:22:46 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id MAA00358; Fri, 21 Mar 1997 12:06:06 -0800
Received: from skybolt.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id MAA10846; Fri, 21 Mar 1997 12:06:04 -0800
Received: by skybolt.eng.sun.com (SMI-8.6/SMI-SVR4)
	id MAA10040; Fri, 21 Mar 1997 12:02:42 -0800
Date: Fri, 21 Mar 1997 12:02:42 -0800
From: Richard.Fox@Eng.Sun.COM (Richard Fox)
Message-Id: <199703212002.MAA10040@skybolt.eng.sun.com>
To: narten@raleigh.ibm.com, vern@ee.lbl.gov
Subject: Re: PSH / "Failure to retain above-sequence data"
Cc: tcp-impl@relay.engr.SGI.COM
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 
> One issue is the problem of senders not setting PSH even when they have no
> more data to send.  It appears the consensus is that that's simply broken,
> and should be documented as an implementation problem.
> 

....

> So let me ask:
> 
> 	1.  Do we have agreement over the sender-should-set-PSH issue,
> 	    that it's an implementation problem we should document?
> 

I think there is still a train of thought that says a receiver which
will not deliver data until a PSH bit set is broken. The above consensus
does not reflect this. So I see 2 issues:
	1. issues in regards to senders setting this bit
	2. receivers sending data up to the app in the abscense of seeing
		a PSH bit set.

--rich

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 12:56:42 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA26957 for tcp-impl-list; Fri, 21 Mar 1997 12:54:52 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA26912 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 12:54:50 -0800
Received: from netcom20.netcom.com (netcom20.netcom.com [192.100.81.133]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA23380 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 12:54:50 -0800
Received: (from kck@localhost) by netcom20.netcom.com (8.6.13/Netcom)
	id MAA23623; Fri, 21 Mar 1997 12:49:40 -0800
Date: Fri, 21 Mar 1997 12:49:40 -0800
From: kck@netcom.com (Richard Fox)
Message-Id: <199703212049.MAA23623@netcom20.netcom.com>
To: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
Subject: Re:  cwnd acks -> bytes
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> There is also a serious _bug_ when increasing by acks instead of bytes!
> 
> When the destination is behind a low bps link (modem), the initial slow
> start for many applications is based on short messages (for example,
> SMTP), and the next buffers are full sized.  So, the cwnd is increased
> by thousands of bytes based on mere dozens of bytes.

This is an excellent point. The problem that I have seen with this is the
RTT estimates do not always work cleanly because the small packet has a much
smaller RTT than a larger packet would. By increasing the cwnd by packets
and not by bytes the RTT will not move smoothly and this will result in
increased retransmits while the RTT must take into account a sudden change
due to queueing in combination with the network latency.

> 
> The result is that cwnd is too large, and the larger packets are
> retransmitted, because several large packets queue for very long times
> (> 1 second).
> 
> I see this every day!

I have seen this all too often as well.

--rich


From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 13:55:24 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA13479 for tcp-impl-list; Fri, 21 Mar 1997 13:53:54 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA13473 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 13:53:52 -0800
Received: from internet-mail2.ford.com (internet-mail2.ford.com [198.111.80.24]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id NAA06403 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 13:53:49 -0800
Received: by internet-mail2.ford.com id AA27645
  (InterLock SMTP Gateway 3.0 for tcp-impl@relay.engr.SGI.COM);
  Fri, 21 Mar 1997 16:40:48 -0500
Message-Id: <199703212140.AA27645@internet-mail2.ford.com>
Received: by internet-mail2.ford.com (Protected-side Proxy Mail Agent-1);
  Fri, 21 Mar 1997 16:40:48 -0500
From: "Krishnan Subramaniam" <ksubram1@ford.com>
Date: Fri, 21 Mar 1997 16:43:08 -0500
X-Mailer: Z-Mail (3.2.1 15feb95)
To: tcp-impl@relay.engr.SGI.COM
Subject: IP TOS ...
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Few questions :

- Does any implementations, other than Cray, let the application set the
  IP TOS bit?
- Are there any routers that does priority queuing based on the TOS value?
- Should RED or any other queue management algorithms avoid dropping packets
  with TOS = low delay?

Regards

ks

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 14:22:32 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA19759 for tcp-impl-list; Fri, 21 Mar 1997 14:17:14 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA19743 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 14:17:12 -0800
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA12609 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 14:17:09 -0800
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.7.5/8.7.3) with ESMTP id RAA15243; Fri, 21 Mar 1997 17:11:40 -0500 (EST)
Message-Id: <199703212211.RAA15243@brookfield.ans.net>
To: backman@ftp.com
cc: vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
Reply-To: curtis@ans.net
Subject: Re: draft description of "Failure to retain above-sequence data" 
In-reply-to: Your message of "Wed, 19 Mar 1997 09:28:40 EST."
             <199703191428.JAA11995@MAILSERV-2HIGH.FTP.COM> 
Date: Fri, 21 Mar 1997 17:11:36 -0500
From: Curtis Villamizar <curtis@ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <199703191428.JAA11995@MAILSERV-2HIGH.FTP.COM>, Larry Backman writes
:
> 
> Comment from the old PC/TCP for DOS days.  In a limited memory
> environment such as DOS in days gone by, and very possibly
> in various network computers, handheld widgets, etc. in  days to
> come, a stack has to balence very carefully network behavior with
> system behavior.
> 
> The old DOS stack was bound by limitations of memory that caused its
> default buffer management to have 3-5 MTU sized packets and 10-20
> NFS RPC sized packets.  That <10K memory chunk was all that could
> be taken from the system environment without affecting system behavior
> and performance in other ways.

An implementation should not advertise an available window larger than
what it can actually buffer.

Severely memory limited IP aware toaster ovens and other anemic
devices need not implement RFC1323 window scaling.  :-)

> While its easy to dismiss our old DOS memory issues as obselete, the
> underlying issues of low memory TCP implementations will undoubtably
> resurface from time to time as TCP is pushed onto smaller and cheaper
> devices.

These machines often had 2-4 MB of memory even in the old days but
needed to stuff everything into 640KB for purely stupid reasons.

It doesn't matter why these implementations are broken, the fact
remains that they are broken and their performance will be poor.  If
this leads to the conclusion that it is not possible to implement TCP
in a way that TCP will not perform poorly on some hardware platform
with too little memory or software platform that can't allocate memory
for some reason, then it must simply be accepted that these
implementations cannot conform to the RFC and will perform poorly.

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 14:24:05 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA21111 for tcp-impl-list; Fri, 21 Mar 1997 14:21:34 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA21092 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 14:21:32 -0800
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA13559 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 14:21:29 -0800
Received: (from mouse@localhost) by Twig.Rodents.Montreal.QC.CA (8.7.5/8.7.3) id RAA27386; Fri, 21 Mar 1997 17:17:36 -0500 (EST)
Date: Fri, 21 Mar 1997 17:17:36 -0500 (EST)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199703212217.RAA27386@Twig.Rodents.Montreal.QC.CA>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: IP TOS ...
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> - Does any implementations, other than Cray, let the application set
>   the IP TOS bit?

NetBSD, which I think is largely 4.4 in such matters, documents and
appears to implement (based on a quick glance at the code) a
setsockopt() option to set it.

> - Are there any routers that does priority queuing based on the TOS
>   value?

I don't know.

> - Should RED or any other queue management algorithms avoid dropping
>   packets with TOS = low delay?

Seems to me that the packets you should avoid dropping are the ones
with the high-reliability bit set, not the ones with the low-delay bit
set.  To me, low delay means "send me very soon or not at all" (or
perhaps even more "use a faster-than-default line for me"), not "drop
me only if you really really must".

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 15:30:23 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA08219 for tcp-impl-list; Fri, 21 Mar 1997 15:28:58 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA08205 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 15:28:52 -0800
Received: from mailhost.yahoo.com (mailhost.yahoo.com [205.216.162.34]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA01769 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 15:28:50 -0800
Received: from borogove.yahoo.com (borogove.yahoo.com [205.216.162.65]) by mailhost.yahoo.com (8.8.5/8.6.12) with ESMTP id PAA29125; Fri, 21 Mar 1997 15:25:01 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by borogove.yahoo.com (8.8.5/8.6.12) with SMTP id PAA01080; Fri, 21 Mar 1997 15:25:01 -0800 (PST)
Message-Id: <199703212325.PAA01080@borogove.yahoo.com>
X-Authentication-Warning: borogove.yahoo.com: localhost [127.0.0.1] didn't use HELO protocol
To: "W. Richard Stevens" <rstevens@kohala.com>
cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: UDP and path MTU discovery 
In-reply-to: Your message of "Fri, 21 Mar 1997 11:55:28 MST."
             <199703211855.LAA25887@kohala.kohala.com> 
Date: Fri, 21 Mar 1997 15:25:01 -0800
From: John Hanley <jh@yahoo.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Simple.

Run suid-root (uggh!), and listen on the ICMP port.
Once you have settled on a good MTU, close the ICMP port.
If you ever go into a couple packets worth of timeout, go back to
listening on the ICMP port for a while, as the cloud may have 
change the path-MTU on you.

Non-unix platforms will have different means of getting ICMPs.


	Cheers,
	JH


From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 15:47:34 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA11665 for tcp-impl-list; Fri, 21 Mar 1997 15:45:54 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA11639 for <tcp-impl@relay.engr.sgi.com>; Fri, 21 Mar 1997 15:45:51 -0800
Received: from alpha.xerox.com (alpha.Xerox.COM [13.1.64.93]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id PAA05395 for <tcp-impl@relay.engr.sgi.com>; Fri, 21 Mar 1997 15:45:49 -0800
Received: from crevenia.parc.xerox.com ([13.2.116.11]) by alpha.xerox.com with SMTP id <17628(3)>; Fri, 21 Mar 1997 15:39:16 PST
Received: from localhost by crevenia.parc.xerox.com with SMTP id <177486>; Fri, 21 Mar 1997 15:39:08 -0800
To: "Krishnan Subramaniam" <ksubram1@ford.com>
cc: tcp-impl@relay.engr.sgi.com
Subject: Re: IP TOS ... 
In-reply-to: Your message of "Fri, 21 Mar 97 13:43:08 PST."
             <199703212140.AA27645@internet-mail2.ford.com> 
Date: Fri, 21 Mar 1997 15:39:00 PST
From: Bill Fenner <fenner@parc.xerox.com>
Message-Id: <97Mar21.153908pst.177486@crevenia.parc.xerox.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

"Krishnan Subramaniam" <ksubram1@ford.com> wrote:
>- Does any implementations, other than Cray, let the application set the
>  IP TOS bit?

A quick test program shows that:
- IRIX 6.3
- DEC OSF/1 V3.2
- FreeBSD (and other 4.4-derived systems)
- Solaris

implement at least setsockopt(...,IPPROTO_IP, IP_TOS, &IPTOS_THROUGHPUT, ...).
Presumably values other than THROUGHPUT work but I didn't test them.

SunOS 4.1.3 and BSD4.3 (well, NeXTStep 3.3) are the only two OS's I have
access to that don't implement IP_TOS.

  Bill

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 15:48:02 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA11923 for tcp-impl-list; Fri, 21 Mar 1997 15:46:50 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA11911 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 15:46:48 -0800
Received: from darkside.rutgers.edu (darkside.rutgers.edu [128.6.111.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA05845 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 15:46:46 -0800
Received: (from davem@localhost)
	by darkside.rutgers.edu (8.8.5/8.8.5) id SAA08606;
	Fri, 21 Mar 1997 18:45:50 -0500
Date: Fri, 21 Mar 1997 18:45:50 -0500
Message-Id: <199703212345.SAA08606@darkside.rutgers.edu>
From: "David S. Miller" <davem@darkside.rutgers.edu>
To: jh@yahoo.com
CC: rstevens@kohala.com, tcp-impl@relay.engr.SGI.COM
In-reply-to: <199703212325.PAA01080@borogove.yahoo.com> (message from John
	Hanley on Fri, 21 Mar 1997 15:25:01 -0800)
Subject: Re: UDP and path MTU discovery
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   Date: Fri, 21 Mar 1997 15:25:01 -0800
   From: John Hanley <jh@yahoo.com>

   Simple.

Perhaps, but its the wrong approach.  The kernel should return
EMSGSIZE to the best matching UDP socket when such an ICMP message
is received, the kernel is in the best position to do this, and this
is in fact what Linux is doing in the 2.1.x kernels, for both IPv4 and
IPv6.  This has the added benefit that you don't need to run as root
to acquire this information, any user can get at it and it's clean.

---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s   ////
ethernet.  Beat that!                     ////
-----------------------------------------////__________  o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 15:50:48 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA12500 for tcp-impl-list; Fri, 21 Mar 1997 15:49:28 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA12481 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 15:49:26 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA06269 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 15:49:20 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id XAA31934; Fri, 21 Mar 1997 23:41:18 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0w8Dqn-0005FcC; Fri, 21 Mar 97 23:35 GMT
Message-Id: <m0w8Dqn-0005FcC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: draft description of "Failure to retain above-sequence data"
To: F.Potorti@cnuce.cnr.it (Francesco Potorti`)
Date: Fri, 21 Mar 1997 23:35:56 +0000 (GMT)
Cc: tcp-impl@relay.engr.SGI.COM
In-Reply-To: <m0w82nR-0003YsC@fly.cnuce.cnr.it> from "Francesco Potorti`" at Mar 21, 97 12:47:00 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Is it reasonable for stacks to throw away smaller window offers for
> the same sequence?  That's what BSD 4.4 does, but this goes against

Its what Linux does too, and seems to be what everyone does.

> rfc793, and frustrates a receiver's attempt to shrink the window when
> the line is idle, which is probably the most sane case when a limited

Well RFC1122 says they arent supposed to shrink it. From a practical
point of view we are stuck with that behaviour forever now


From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 17:07:06 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA00782 for tcp-impl-list; Fri, 21 Mar 1997 17:04:31 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA00772 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 17:04:29 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id RAA21920 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 17:04:27 -0800
From: touch@ISI.EDU
Received: from ash.isi.edu by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA27852>; Fri, 21 Mar 1997 17:00:40 -0800
Date: Fri, 21 Mar 1997 17:00:39 -0800
Posted-Date: Fri, 21 Mar 1997 17:00:39 -0800
Message-Id: <199703220100.AA17678@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA17678>; Fri, 21 Mar 1997 17:00:39 -0800
To: touch@ISI.EDU, narten@raleigh.ibm.com
Subject: Re: more on TCP buffering
Cc: tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Thomas Narten <narten@raleigh.ibm.com>
> 
>>>> It seems like fixing the PSH (push) bit is a more direct fix, which
>>>> too is not specified sufficiently right now. That bit is intended
>>>> to flush buffers out at the sender, and force received data up
>>>> to the application at the receiver. Unfortunately, the API doesn't
>>>> specify a preemptive callback or interrupt at the receiver when PSH
>>>> is received.

>>>I think you read a bit much into PSH. PSH means tell the TCP
>>>implementations that they are no longer allowed to continue holding
>>>queued data (at the sender or receiver) in the hopes that by delaying,
>>>more will come along shortly (making subsequent transfer - sender to
>>>receiver, receiver to application - more efficient). I do not

RFC1122 says that

            A TCP MAY implement PUSH flags on SEND calls.  If PUSH flags
            are not implemented, then the sending TCP: (1) must not
            buffer data indefinitely, and (2) MUST set the PSH bit in
            the last buffered segment (i.e., when there is no more
            queued data to be sent).

BSD and NET/3 implementations do the latter ONLY.

> I don't see the issue at all. Where is ambiguity in the spec? Which

The "old timers" I'm talking about would claim that section (page 83)
of RFC1122 talking about sending PSH up to the application:

            The discussion in RFC-793 on pages 48, 50, and 74
            erroneously implies that a received PSH flag must be passed
            to the application layer.  Passing a received PSH flag to
            the application layer is now OPTIONAL.

Granted, whether this is by call-back or as a read-only bit, it's not
quite what 'urgent' pointers do. I want PSH to be an in-band version of
URG, which would allow signalling for client/server applciations. URG
is useful more for OOB interrupts, like 'cancel'.

(in fact, persistent HTTP/TCP would do better to use
URG for cancel than to RST the connection).

The ambiguity is that RFC1122 states (page 83) that
PSH may or may not be implemented at the SEND interface, and
that it also states that (in DISCUSSION)

                 Generally, an interactive application protocol must set
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 the PUSH flag at least in the last SEND call in each
                 ^^^^^^^^^^^^^
                 command or response sequence.  A bulk transfer protocol
                 like FTP should set the PUSH flag on the last segment
                 of a file or when necessary to prevent buffer deadlock.

The problem is that current implementations lack this interface, but
do not preclude their use for interactive applications.

Any implementation that is intended to support interactive use 
(it appears) _must_ implement an API access to the PSH bit at 
the SEND.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 17:07:39 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA01160 for tcp-impl-list; Fri, 21 Mar 1997 17:05:44 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA01094 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 17:05:29 -0800
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA22096 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 17:05:25 -0800
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.7.5/8.7.3) with ESMTP id TAA16073; Fri, 21 Mar 1997 19:59:55 -0500 (EST)
Message-Id: <199703220059.TAA16073@brookfield.ans.net>
To: Vern Paxson <vern@ee.lbl.gov>
cc: hkchu@pacific-86.Eng.Sun.COM (Hsiao-keng Jerry Chu),
        tcp-impl@relay.engr.SGI.COM
Reply-To: curtis@ans.net
Subject: Re: draft description of "No slow start after timeout" 
In-reply-to: Your message of "Wed, 19 Mar 1997 19:28:08 PST."
             <199703200328.TAA24981@daffy.ee.lbl.gov> 
Date: Fri, 21 Mar 1997 19:59:53 -0500
From: Curtis Villamizar <curtis@ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <199703200328.TAA24981@daffy.ee.lbl.gov>, Vern Paxson writes:
> > Counting bytes always gives Rlog2(W). It appears to be contributing
> > to the burstiness of the traffic because it helps to get out of
> > slow-start quicker, which should be considered a merit, not fault (IMHO).
> 
> It is a merit from the perspective of performance.  It is a potential fault
> from the perspective of burstiness.  There's a constant tension between
> performance and congestion avoidance.  My observation is that if counting
> bytes for slow start were deployed in today's Internet, traffic would get
> significantly burstier - it's not clear that this is a good thing.  There
> was recently a lot of haggling on end2end-interest over increasing the
> initial value of cwnd, and that's just a one-time burst, rather than the
> ongoing (for the duration of slow start) burstiness that counting bytes
> would lead to.


With delayed ACK you get 1 ACK for every 2 segments sent.  If you
count ACKs, you pace packets conservatively:

  RTT1		send  -->
				<--  ack	(first one is not delayed)
  RTT2		send  -->
		send  -->			(adv one seg, add one to cwnd)
				 --  delay
				<--  ack
  RTT3		send  -->
		send  -->
		send  -->			(adv 2 seg, add one to cwnd)
				 --  delay
				<--  ack
				 --  delay
  RTT4		send -->			(adv 2 seg, add one)
		send -->
		send -->
				<--  ack
				 --  delay
				<--  ack
  RT5		send -->			(adv 2 seg, add one)
		send -->
		send -->
		send -->			(adv 2 seg, add one)
		send -->
		send -->
				<--  etc

By not counting the two segment separately, TCP will just take more
time to reach the initial drop which will put it in congestion
avoidance and will tend to start pacing data.  Its only in congestion
avoidance that packets can begin to spread out in time, with only one
segment being added per RTT, rather than one or two per ACK.  On the
positive side, the chances of a multiple drop when the first drop
occurs ould probably be reduced.

Counting ACKs is exactly equivalent to counting segments and adding
1/2 segment per ACKed segment during slow start.  This would mean
spending a lot more time in slow start for bulk transfers over LFNs.
It now takes log_2(N) RTTs to reach a cwnd of N segments.  With this
change it would take log_1.5(N).  For a segment size of 1024, a T1
satelite link with an RTT of 250msec, you'd want to reach a cwnd of
(1.54/8)*(.250)*1000000/1024 or about 47 packets.  You now need 6
RTTs, or 1.5 seconds to come to full speed.  This would be increased
to 10 RTTs or 2.5 seconds.  This isn't so bad.  A DS3 speed connection
US over satelite or US to Europe (a DS3 speed single TCP flow) would
need to reach over 1200 segments, but if FDDI MTU was used (almost a
requirement) this would be only about 300, reached in about 15 RTTs.
Reachig full window of 1.5MB in under 4 seconds isn't too bad.

While the idea of making slow start is certainly interesting and might
even be a good idea, isn't it a bit out of scope for the current
discussion?

Curtis


From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 17:08:55 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA01676 for tcp-impl-list; Fri, 21 Mar 1997 17:07:47 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA01670 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 17:07:45 -0800
Received: from mailbag.jf.intel.com (mailbag.jf.intel.com [134.134.248.4]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA22656 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 17:07:42 -0800
Received: from ideal.jf.intel.com (ideal.jf.intel.com [134.134.130.5]) by mailbag.jf.intel.com (8.8.5/8.7.3) with ESMTP id RAA27684; Fri, 21 Mar 1997 17:08:12 -0800 (PST)
Received: from raj2 (raj2.jf.intel.com [134.134.12.225])
          by ideal.jf.intel.com (8.8.5/8.8.4) with SMTP
	  id RAA08460; Fri, 21 Mar 1997 17:03:26 -0800 (PST)
Date: Fri, 21 Mar 1997 17:03:26 -0800 (PST)
Message-Id: <199703220103.RAA08460@ideal.jf.intel.com>
X-Sender: yavatkar@ibeam.jf.intel.com
X-Mailer: Windows Eudora Pro Version 2.1.2
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
To: "Krishnan Subramaniam" <ksubram1@ford.com>, tcp-impl@relay.engr.SGI.COM
From: Raj Yavatkar <yavatkar@ideal.jf.intel.com>
Subject: Re: IP TOS ...
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

I believe the latest version of TCP/Ip stack from Microsoft supports (and
implements) the corresponding setsockopt. I also believe taht Cisco routers
support priority queuing (or bandwidth allocation?) based on the TOS values.
Will be nice to get direct confirmation from Cisco and MS.

Raj

 At 04:43 PM 3/21/97 -0500, Krishnan Subramaniam wrote:
>Few questions :
>
>- Does any implementations, other than Cray, let the application set the
>  IP TOS bit?
>- Are there any routers that does priority queuing based on the TOS value?
>- Should RED or any other queue management algorithms avoid dropping packets
>  with TOS = low delay?
>
>Regards
>
>ks
>

-----------------------------------------
   Raj Yavatkar                                       
   Communication Architecture Lab           yavatkar@ibeam.intel.com
   Intel Corporation, JF3-206                     voice -- 503-264-9077
   2111 NE 25th Avenue,  Hillsboro, OR 97124                              


From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 17:35:11 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA08299 for tcp-impl-list; Fri, 21 Mar 1997 17:31:22 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA08280 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 17:31:20 -0800
Received: from lint.cisco.com (lint.cisco.com [171.68.223.44]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA26868 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 17:31:18 -0800
Received: from big-dogs.cisco.com (herndon-dhcp-92.cisco.com [171.68.53.92]) by lint.cisco.com (8.8.5/CISCO.SERVER.1.2) with SMTP id RAA28974; Fri, 21 Mar 1997 17:27:30 -0800 (PST)
Message-Id: <3.0.32.19970321202726.006d8318@lint.cisco.com>
X-Sender: pferguso@lint.cisco.com
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Fri, 21 Mar 1997 20:27:30 -0500
To: Raj Yavatkar <yavatkar@ideal.jf.intel.com>
From: Paul Ferguson <pferguso@cisco.com>
Subject: Re: IP TOS ...
Cc: tcp-impl@relay.engr.SGI.COM
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

At 05:03 PM 3/21/97 -0800, Raj Yavatkar wrote:

>I believe the latest version of TCP/Ip stack from Microsoft supports (and
>implements) the corresponding setsockopt. I also believe taht Cisco routers
>support priority queuing (or bandwidth allocation?) based on the TOS values.
>Will be nice to get direct confirmation from Cisco and MS.
>

Actually, we currently do use the precedence sub-field in the TOS field
with WFQ; as the precedence value increases, the algorithm allocates
more bandwidth to that conversation which allows it to transmit more
frequently.

- paul


From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 18:40:45 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA21240 for tcp-impl-list; Fri, 21 Mar 1997 18:39:10 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA21213 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 18:39:08 -0800
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id SAA09103 for <tcp-impl@relay.engr.SGI.COM>; Fri, 21 Mar 1997 18:39:05 -0800
Received: from ftp.com by ftp.com  ; Fri, 21 Mar 1997 21:35:13 -0500
Received: from mailserv-2high.ftp.com by ftp.com  ; Fri, 21 Mar 1997 21:35:13 -0500
Received: by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id VAA19762; Fri, 21 Mar 1997 21:32:27 -0500
Date: Fri, 21 Mar 1997 21:32:27 -0500
Message-Id: <199703220232.VAA19762@MAILSERV-2HIGH.FTP.COM>
To: ksubram1@ford.com
Subject: Re: IP TOS ...
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: tcp-impl@relay.engr.SGI.COM
Repository: mailserv-2high.ftp.com, [message accepted at Fri Mar 21 21:32:22 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||- Does any implementations, other than Cray, let the application set the
||  IP TOS bit?

FTP has for time immemorial done this in its stacks.  I know we are
not alone as we have seen other stacks do this also.



From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 18:40:46 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA21152 for tcp-impl-list; Fri, 21 Mar 1997 18:40:59 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA21132 for <tcp-impl@engr.SGI.COM>; Fri, 21 Mar 1997 18:40:52 -0800
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id SAA09080 for <tcp-impl@engr.SGI.COM>; Fri, 21 Mar 1997 18:38:40 -0800
Received: from ftp.com by ftp.com  ; Fri, 21 Mar 1997 21:34:52 -0500
Received: from mailserv-2high.ftp.com by ftp.com  ; Fri, 21 Mar 1997 21:34:52 -0500
Received: by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id VAA19757; Fri, 21 Mar 1997 21:32:07 -0500
Date: Fri, 21 Mar 1997 21:32:07 -0500
Message-Id: <199703220232.VAA19757@MAILSERV-2HIGH.FTP.COM>
To: vern@ee.lbl.gov
Subject: Re: PSH / "Failure to retain above-sequence data" 
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: narten@raleigh.ibm.com, tcp-impl@engr.SGI.COM
Repository: mailserv-2high.ftp.com, [message accepted at Fri Mar 21 21:32:02 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||So let me ask:
Since I started the 2nd thread..
||
||        1.  Do we have agreement over the sender-should-set-PSH issue,
||            that it's an implementation problem we should document?
||
yes.
||        2.  Can we resolve the failure-to-retain by using wording that
||            receivers must not "routinely" fail to do so?
||
yes.
||            This come about because for the implementation where I observed
||            this, as far as it could tell it simply never bothered retaining
||            above-sequence data, even though it had plenty of memory.
||            I suspect this was to simplify the implementation, but it seems
||            clear that because of the bad congestion properties of this
||            behavior, it should be fixed.  So I'm trying to separate 
||            "routinely" doing so from "occasionally".
||
however..
||            Perhaps a different way to put it is that a TCP must have
||            mechanism in place that allows it to retain a full window's
||            worth of above-sequence data, without delving into specifics
||            of when that mechanism might not be exercised.

I'd like to see words to this effect (above) put into the wording to
clarify the issue we have all thrashed thru.





From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 21 18:40:43 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA21137 for tcp-impl-list; Fri, 21 Mar 1997 18:40:55 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA21123 for <tcp-impl@engr.SGI.COM>; Fri, 21 Mar 1997 18:40:48 -0800
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id SAA09071 for <tcp-impl@engr.SGI.COM>; Fri, 21 Mar 1997 18:38:35 -0800
Received: from ftp.com by ftp.com  ; Fri, 21 Mar 1997 21:34:42 -0500
Received: from mailserv-2high.ftp.com by ftp.com  ; Fri, 21 Mar 1997 21:34:42 -0500
Received: by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id VAA19755; Fri, 21 Mar 1997 21:31:57 -0500
Date: Fri, 21 Mar 1997 21:31:57 -0500
Message-Id: <199703220231.VAA19755@MAILSERV-2HIGH.FTP.COM>
To: curtis@ans.net
Subject: Re: draft description of "Failure to retain above-sequence data" 
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: vern@ee.lbl.gov, tcp-impl@engr.SGI.COM
Repository: mailserv-2high.ftp.com, [message accepted at Fri Mar 21 21:31:53 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||
||These machines often had 2-4 MB of memory even in the old days but
||needed to stuff everything into 640KB for purely stupid reasons.
||
||It doesn't matter why these implementations are broken, the fact
||remains that they are broken and their performance will be poor.  If
||this leads to the conclusion that it is not possible to implement TCP
||in a way that TCP will not perform poorly on some hardware platform
||with too little memory or software platform that can't allocate memory
||for some reason, then it must simply be accepted that these
||implementations cannot conform to the RFC and will perform poorly.
||
I'll do my best not to resent this and take it personally.

Can we refrain from emotionally charged words like "broken" and "stupid"
and try and work with each other to clarify things and improve the TCP space?

All protocol implementations have corners into which they don't want to
go for a variety of platform specific reasons.  The mark in my
mind, of a good protocol implementation is how said implementation goes
into those corners and behaves under adverse circumstances in those
corners.  My intent in exposing my dirty laundry was to share hard
won knowledge we at FTP found in one of those corners and to seek
advice, which I got, in how to smooth and improve our behavior
in those corners.

Please feel free to suggest how I can improve my product, please don't
tell me my product is broken for doing the best balancing act it
could under the circusmstances for which it was aimed.

L.





From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 22 04:33:44 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id EAA09046 for tcp-impl-list; Sat, 22 Mar 1997 04:31:57 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id EAA09041 for <tcp-impl@relay.engr.SGI.COM>; Sat, 22 Mar 1997 04:31:55 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id EAA15839 for <tcp-impl@relay.engr.SGI.COM>; Sat, 22 Mar 1997 04:31:48 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id MAA21101; Sat, 22 Mar 1997 12:29:12 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0w8EXl-0005FcC; Sat, 22 Mar 97 00:20 GMT
Message-Id: <m0w8EXl-0005FcC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: IP TOS ...
To: ksubram1@ford.com (Krishnan Subramaniam)
Date: Sat, 22 Mar 1997 00:20:21 +0000 (GMT)
Cc: tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199703212140.AA27645@internet-mail2.ford.com> from "Krishnan Subramaniam" at Mar 21, 97 04:43:08 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> - Does any implementations, other than Cray, let the application set the
>   IP TOS bit?

Linux. The packet forwarder can also alter other people's TOS bits as packets
pass through - eg to tweak outgoing ftp data to the right level when it comes
off a poorer stack. The Linux code doesnt do strict priority it merely favours
higher TOS values

> - Are there any routers that does priority queuing based on the TOS value?

Linux does this. It seems to help a bit in some cases. Its no help on a modem
due to the amount of buffering you get in the modems. Some 4BSD's do it for
PPP at least even if not for other ports.

> - Should RED or any other queue management algorithms avoid dropping packets
>   with TOS = low delay?

That depends if you are argue low delay also implies high reliability - which
is questionable. 

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 22 04:33:47 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id EAA09160 for tcp-impl-list; Sat, 22 Mar 1997 04:32:25 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id EAA09151 for <tcp-impl@relay.engr.SGI.COM>; Sat, 22 Mar 1997 04:32:23 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id EAA15858 for <tcp-impl@relay.engr.SGI.COM>; Sat, 22 Mar 1997 04:32:20 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id MAA21095; Sat, 22 Mar 1997 12:28:56 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0w8EPQ-0005FcC; Sat, 22 Mar 97 00:11 GMT
Message-Id: <m0w8EPQ-0005FcC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: UDP and path MTU discovery
To: rstevens@kohala.com
Date: Sat, 22 Mar 1997 00:11:44 +0000 (GMT)
Cc: tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199703211855.LAA25887@kohala.kohala.com> from "W. Richard Stevens" at Mar 21, 97 11:55:28 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> endpoint (with the stack doing the right thing, as per path MTU discovery)
> but it sure looks like a UDP application never receives the message, even
> if the UDP socket is connected.  It appears the application must timeout
> and retransmit the UDP datagram when this happens, effectively ignoring
> the information that IP/ICMP has received.  Am I missing anything?

That stacks dont send UDP DF frames except via raw sockets, and then you can
listen to ICMP messages via a raw socket too. I agree this is a relevant issue
to stuff - multicasting notably.

Alan

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 22 16:04:21 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA26403 for tcp-impl-list; Sat, 22 Mar 1997 15:57:40 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA26398 for <tcp-impl@engr.SGI.COM>; Sat, 22 Mar 1997 15:57:34 -0800
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id PAA11031 for <tcp-impl@engr.SGI.COM>; Sat, 22 Mar 1997 15:57:30 -0800
Received: from ftp.com by ftp.com  ; Sat, 22 Mar 1997 18:53:44 -0500
Received: from mailserv-2high.ftp.com by ftp.com  ; Sat, 22 Mar 1997 18:53:44 -0500
Received: by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id SAA02576; Sat, 22 Mar 1997 18:50:57 -0500
Date: Sat, 22 Mar 1997 18:50:57 -0500
Message-Id: <199703222350.SAA02576@MAILSERV-2HIGH.FTP.COM>
To: kck@netcom.com
Subject: Re:  cwnd acks -> bytes
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: tcp-impl@engr.SGI.COM
Repository: mailserv-2high.ftp.com, [message accepted at Sat Mar 22 18:50:54 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||
||This is an excellent point. The problem that I have seen with this is the
||RTT estimates do not always work cleanly because the small packet has a much
||smaller RTT than a larger packet would. By increasing the cwnd by packets
||and not by bytes the RTT will not move smoothly and this will result in
||increased retransmits while the RTT must take into account a sudden change
||due to queueing in combination with the network latency.
||
||> 
||> The result is that cwnd is too large, and the larger packets are
||> retransmitted, because several large packets queue for very long times
||> (> 1 second).
||> 
||> I see this every day!
||
||I have seen this all too often as well.
||
we live with this constantly on wireless links where slowness is only
one issue, but where lossiness is another issue, and an even worse
issue is wildly varying RTT time.  

We have two configuration parameters we recommend for wireless customers
who operate in these conditions

"no-slow-start" which does what it says and
"RTT-multiplier" which multiplies the calculated RTT to smooth the
cumulative effects of slowness, lossiness, variability in both
media speed and packet size.  19.2K CDPD works well with no slow
start and a RTT multiplier factor of 4.

In a perfect world, with perfect API's and a perfect operating system
we would be able to sense all this and adjust accordingly.


L.





From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 22 21:32:04 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA23689 for tcp-impl-list; Sat, 22 Mar 1997 21:28:06 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id VAA23680 for <tcp-impl@relay.engr.SGI.COM>; Sat, 22 Mar 1997 21:28:04 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id VAA15752 for <tcp-impl@relay.engr.SGI.COM>; Sat, 22 Mar 1997 21:28:02 -0800
Received: from LapTop.Simpson.DialUp.Mich.Net (pm088-25.dialip.mich.net [198.110.68.160]) by merit.edu (8.8.5/merit-2.0) with SMTP id AAA14717 for <tcp-impl@relay.engr.SGI.COM>; Sun, 23 Mar 1997 00:24:09 -0500 (EST)
Date: Sun, 23 Mar 97 04:53:21 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <2254.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: IP TOS ...
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: "Krishnan Subramaniam" <ksubram1@ford.com>
> - Are there any routers that does priority queuing based on the TOS value?

Yes.  NetBlazer, NetHopper, DRBOND, Lan's End, On Ramp, ....

The new (RFC-1349) TOS, not the old ones.


> - Should RED or any other queue management algorithms avoid dropping packets
>   with TOS = low delay?
>
Probably MaximizeThroughPut rather than MinimizeDelay.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 22 21:32:08 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA23669 for tcp-impl-list; Sat, 22 Mar 1997 21:28:01 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id VAA23664 for <tcp-impl@relay.engr.SGI.COM>; Sat, 22 Mar 1997 21:27:58 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id VAA15748 for <tcp-impl@relay.engr.SGI.COM>; Sat, 22 Mar 1997 21:27:55 -0800
Received: from LapTop.Simpson.DialUp.Mich.Net (pm088-25.dialip.mich.net [198.110.68.160]) by merit.edu (8.8.5/merit-2.0) with SMTP id AAA14712 for <tcp-impl@relay.engr.SGI.COM>; Sun, 23 Mar 1997 00:24:01 -0500 (EST)
Date: Sun, 23 Mar 97 04:38:47 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <2253.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: PSH / "Failure to retain above-sequence data"
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Vern Paxson <vern@ee.lbl.gov>
> One issue is the problem of senders not setting PSH even when they have no
> more data to send.  It appears the consensus is that that's simply broken,
> and should be documented as an implementation problem.
>
> 	1.  Do we have agreement over the sender-should-set-PSH issue,
> 	    that it's an implementation problem we should document?
>
Yes.  Sender MUST set PSH when it has no more data to send (and any
other time that it wants), and we should _reinforce_ that with more
documentation (since it is already clearly written in the specs).

I would also prefer that the API be _required_ to allow PSH to be set by
the application, although we don't document particular APIs.

It may be a serious problem with Mac Open Transport, which is just being
widely deployed with MacOS 7.6.  That's Mentat based.


> A quite separate issue arose when I floated the idea of amending the TCP
> spec so that receivers MUST NOT routinely discard above-sequence data,
> rather than SHOULD NOT. ...
>
> 	2.  Can we resolve the failure-to-retain by using wording that
> 	    receivers must not "routinely" fail to do so?
>
I think it needs to stay a SHOULD NOT, but the advertisement of too
large a window is the _real_ problem.  I think we should document
problems of _never_ retaining.  And separately document advertisement of
windows that are too large.

> 	    Perhaps a different way to put it is that a TCP must have
> 	    mechanism in place that allows it to retain a full window's
> 	    worth of above-sequence data, without delving into specifics
> 	    of when that mechanism might not be exercised.
>
I don't mind having a discussion section (like 1122) of such times.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 22 23:23:55 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA01040 for tcp-impl-list; Sat, 22 Mar 1997 23:20:03 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA01028 for <tcp-impl@relay.engr.SGI.COM>; Sat, 22 Mar 1997 23:20:00 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id XAA25115 for <tcp-impl@relay.engr.SGI.COM>; Sat, 22 Mar 1997 23:19:58 -0800
From: touch@ISI.EDU
Received: from ash.isi.edu by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA20493>; Sat, 22 Mar 1997 23:16:09 -0800
Date: Sat, 22 Mar 1997 23:16:07 -0800
Posted-Date: Sat, 22 Mar 1997 23:16:07 -0800
Message-Id: <199703230716.AA01907@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA01907>; Sat, 22 Mar 1997 23:16:07 -0800
To: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
Subject: Re: PSH / "Failure to retain above-sequence data"
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: "William Allen Simpson" <wsimpson@greendragon.com>
> 
> I would also prefer that the API be _required_ to allow PSH to be set by
> the application, although we don't document particular APIs.

This is an issue that changes explicit indications in the 
host requirements RFC. 

It appears to be outside the scope of this group, which 
(I thought) was limited to errors in the implementations
and deficiencies in specifications.

> I don't mind having a discussion section (like 1122) of such times.

Provided tcp-impl isn't trying to usurp the host-requirements
RFC mechanism. :-)

Joe

----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 22 23:25:19 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA01272 for tcp-impl-list; Sat, 22 Mar 1997 23:22:52 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA01267 for <tcp-impl@relay.engr.SGI.COM>; Sat, 22 Mar 1997 23:22:50 -0800
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id XAA25492 for <tcp-impl@relay.engr.SGI.COM>; Sat, 22 Mar 1997 23:22:49 -0800
Received: from feller.mentat.com ([192.88.122.144]) by mentat.com (4.1/SMI-4.1)
	id AA03282; Sat, 22 Mar 97 23:18:38 PST
Received: by feller.mentat.com (SMI-8.6/SMI-SVR4)
	id XAA16563; Sat, 22 Mar 1997 23:19:08 -0800
Date: Sat, 22 Mar 1997 23:19:08 -0800
From: jt@mentat.com (Jerry Toporek)
Message-Id: <199703230719.XAA16563@feller.mentat.com>
To: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
Subject: Re: PSH
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 
> I would also prefer that the API be _required_ to allow PSH to be set by
> the application, although we don't document particular APIs.
> 
> It may be a serious problem with Mac Open Transport, which is just being
> widely deployed with MacOS 7.6.  That's Mentat based.
> 

What problem would that be?  The only one I am aware of is a bug which
resulted in the PSH bit not getting set when the URG bit was on.  That has
been fixed.

jt


From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 01:10:35 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA08866 for tcp-impl-list; Mon, 24 Mar 1997 01:08:55 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id BAA08858 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 01:08:53 -0800
Received: from glacier.wise.edt.ericsson.se (glacier-ext.wise.edt.ericsson.se [193.180.251.38]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id BAA11532 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 01:08:42 -0800
Received: from aristotel.eth.ericsson.se (aristotel.eth.ericsson.se [164.48.158.205]) by glacier.wise.edt.ericsson.se (8.7.5/8.7.3/glacier-0.9) with SMTP id JAA18854 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 09:58:36 +0100 (MET)
Received: from demokritos.eth.ericsson.se by aristotel.eth.ericsson.se (SMI-8.6/SMI-SVR4)
	id JAA07706; Mon, 24 Mar 1997 09:57:40 +0100
Received: from demokritos by demokritos.eth.ericsson.se (SMI-8.6/SMI-SVR4)
	id JAA08282; Mon, 24 Mar 1997 09:57:52 +0100
Message-ID: <33364210.4070@aristotel.eth.ericsson.se>
Date: Mon, 24 Mar 1997 09:57:52 +0100
From: Andras Olah <Andras.Olah@aristotel.eth.ericsson.se>
Organization: BP/ETH/LT Traffic Lab, Ericsson Kft., Hungary
X-Mailer: Mozilla 3.01Gold (X11; I; SunOS 5.5 sun4u)
MIME-Version: 1.0
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: draft description of "No initial slow start"
References: <199703191935.LAA22921@daffy.ee.lbl.gov>
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Vern Paxson wrote:
> 
> > FreeBSD 2.1 (and up to 2.1.7 at least) by default disable slow-start
> > on the "local" network ...  I don't know if this is an acceptable behaviour.
> 
> RFC 2001 mentions:
> 
>    Early implementations performed slow start only if the other end was
>    on a different network.  Current implementations always perform slow
>    start.
> 
> but it doesn't quite nail down whether implementations are required
> to always perform slow start.  The discussion of mandatory slow start
> in RFC 1122 doesn't mention any exceptions for LANs.
> 
> > To make thing worse, the default definition of "local", controlled
> > by the macro SUBNETSARELOCAL in file in.c,  extends to the whole
> > CLASS_A, CLASS_B or CLASS_C network.
> 
> Oops!
> 
>                 Vern

I was the one who ported Bob Braden's T/TCP into the FreeBSD code.  The
disabling of slow-start for local connections came with T/TCP as it is
documented by Rich Stevens in TCP/IP Illustrated vol.3.  I agree that
the SUBNETSARELOCAL macro should not consider the whole class A/B net as
local.

Regards,
--
	András Oláh		Andras.Olah@aristotel.eth.ericsson.se
	Traffic Lab (ETH/LT)	Tel: +36-1-2657100/x781
	Ericsson Kft.		Fax: +36-1-2627861
	P.O.B. 154, 1475 Budapest, Hungary

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 11:18:15 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA04372 for tcp-impl-list; Mon, 24 Mar 1997 11:14:43 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA04339 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 11:14:40 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA01808 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 11:14:38 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id LAA02381; Mon, 24 Mar 1997 11:04:34 -0800 (PST)
Message-Id: <199703241904.LAA02381@daffy.ee.lbl.gov>
To: touch@ISI.EDU
Cc: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
Subject: Re: PSH / "Failure to retain above-sequence data"
In-reply-to: Your message of Sat, 22 Mar 1997 23:16:07 PST.
Date: Mon, 24 Mar 1997 11:04:34 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> This is an issue that changes explicit indications in the 
> host requirements RFC. 
> 
> It appears to be outside the scope of this group, which 
> (I thought) was limited to errors in the implementations
> and deficiencies in specifications.

That's how I see it, too.  API changes are definitely out of scope.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 11:36:54 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA11839 for tcp-impl-list; Mon, 24 Mar 1997 11:33:50 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA11816 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 11:33:46 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA07456 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 11:33:44 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id LAA02569; Mon, 24 Mar 1997 11:23:34 -0800 (PST)
Message-Id: <199703241923.LAA02569@daffy.ee.lbl.gov>
To: Richard.Fox@Eng.Sun.COM (Richard Fox)
Cc: narten@raleigh.ibm.com, tcp-impl@relay.engr.SGI.COM
Subject: Re: PSH / "Failure to retain above-sequence data"
In-reply-to: Your message of Fri, 21 Mar 1997 12:02:42 PST.
Date: Mon, 24 Mar 1997 11:23:34 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I think there is still a train of thought that says a receiver which
> will not deliver data until a PSH bit set is broken. The above consensus
> does not reflect this. So I see 2 issues:
> 	1. issues in regards to senders setting this bit
> 	2. receivers sending data up to the app in the abscense of seeing
> 		a PSH bit set.

While I agree that item (2) would make receivers more robust, I'm not
persuaded that it's something we should actively advocate, for two
reasons.  First, it's potentially not a trivial implementation change,
since it may involve introducing a new timer.  Second, so far it sounds
like the problem of the sender not setting PSH is not widespread, so this
would be changing receivers for compatibility with a fairly rare,
out-of-spec TCP.  As with the slow-start acking problem we discussed
earlier, this strikes me as not a compelling enough combination.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 11:59:07 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA18596 for tcp-impl-list; Mon, 24 Mar 1997 11:56:18 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA18590 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 11:56:16 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA13412 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 11:56:02 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm002-10.dialip.mich.net [141.211.7.146]) by merit.edu (8.8.5/merit-2.0) with SMTP id OAA04839 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 14:52:00 -0500 (EST)
Date: Mon, 24 Mar 97 19:40:44 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5710.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: PSH / "Failure to retain above-sequence data"
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: touch@ISI.EDU
> Date: Sat, 22 Mar 1997 23:16:07 -0800
> > From: "William Allen Simpson" <wsimpson@greendragon.com>
> >
> > I would also prefer that the API be _required_ to allow PSH to be set by
> > the application, although we don't document particular APIs.
>
> This is an issue that changes explicit indications in the
> host requirements RFC.
>
> It appears to be outside the scope of this group, which
> (I thought) was limited to errors in the implementations
> and deficiencies in specifications.
>
I expect that we've learned a _few_ things since 1989.

And yet, clearly documented stuff such as slow start are not properly
implemented, let alone "optional" stuff such as PSH.

Based on problems already mentioned here (and personal experience), not
having the PSH bit settable has lead to the aforementioned hangs (errors
in implementation).

Therefore, the change is due to a deficiency in the RFC-1122
specification.  PSH support needs to be mandatory.


> Provided tcp-impl isn't trying to usurp the host-requirements
> RFC mechanism. :-)
>
What mechanism?  Is there an extant host-requirements WG?

Sure, everyone would agree that ARP and IP (and SMTP et alia) are not in
our purview.  But TCP requirements certainly are....

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 11:59:04 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA18503 for tcp-impl-list; Mon, 24 Mar 1997 11:55:54 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA18485 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 11:55:51 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA13364 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 11:55:49 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm002-10.dialip.mich.net [141.211.7.146]) by merit.edu (8.8.5/merit-2.0) with SMTP id OAA04836 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 14:51:57 -0500 (EST)
Date: Mon, 24 Mar 97 19:16:54 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5709.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: PSH
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

I had written:
> > I would also prefer that the API be _required_ to allow PSH to be set by
> > the application, although we don't document particular APIs.
> >
> > It may be a serious problem with Mac Open Transport, which is just being
> > widely deployed with MacOS 7.6.  That's Mentat based.
> >

> Date: Sat, 22 Mar 1997 23:19:08 -0800
> From: jt@mentat.com (Jerry Toporek)
> What problem would that be?  The only one I am aware of is a bug which
> resulted in the PSH bit not getting set when the URG bit was on.  That has
> been fixed.
>
Great!  What version of OT?  OT 1.1.1 is shipping in MacOS 7.6, but I'm
still seeing the problem of improperly missing PSH in 1.1.2 (net update
distribution).

How do you set the PSH bit in OT?

According to Vinnie Moscaritolo:
#   Date: 12 Jun 96 00:59 GMT
#   The push flag feature of MacTCP is not functional in the the OT MacTCP
#   implementation. Setting or resetting it does not ripple down to the OT TCP.
#   Futher there is no way to programmaticaly set the push flag (PSH bit in the
#   header).

We had to revert our Mac web servers to MacTCP, where PSH works.
Netscape hangs periodically when both ends of the connection are using
OT, but works better when the server is using MacTCP.

FYI:

I also see no slow start when in the same /24, even tho not on the same
subnet (physically divided /27).

I also see no slow start after idle.

I also see no Nagle algorithm.

I also see Ack-only packets immediately followed by data packets.

I also see about 10% retransmissions, when no packets were lost due to
congestion.  RTT algorithm needs work.

I expect that this group will document all these behaviours as
"non-conforming".

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 12:25:22 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA27266 for tcp-impl-list; Mon, 24 Mar 1997 12:22:56 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA26954 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 12:21:45 -0800
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA21230 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 12:21:42 -0800
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.7.5/8.7.3) with ESMTP id PAA03567; Mon, 24 Mar 1997 15:15:30 -0500 (EST)
Message-Id: <199703242015.PAA03567@brookfield.ans.net>
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
cc: backman@ftp.com, vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
Reply-To: curtis@ans.net
Subject: Re: draft description of "Failure to retain above-sequence data" 
In-reply-to: Your message of "Thu, 20 Mar 1997 22:41:17 GMT."
             <m0w7qWL-0005FcC@lightning.swansea.linux.org.uk> 
Date: Mon, 24 Mar 1997 15:15:28 -0500
From: Curtis Villamizar <curtis@ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <m0w7qWL-0005FcC@lightning.swansea.linux.org.uk>, Alan Cox writes:
> > Perhaps wording to the effect that in low memory cases where out
> > of sequence packet retention is limited, the window should be shrunk to
> > avoid dropping incoming packets.
> 
> Shrinking the window is frowned upon by 1122 however, and most stacks getting
> multiple window offers for the same sequence will assume the smaller ones are
> delayed out of order updates and bin them. 


I think the intent was to advertise a smaller window initially rather
than advertise a lerge one and shrink it as memory is used up by other
TCP flows.

Curtis


From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 12:25:17 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA27408 for tcp-impl-list; Mon, 24 Mar 1997 12:23:07 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA27330 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 12:23:01 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA21428 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 12:22:45 -0800
From: touch@ISI.EDU
Received: from ash.isi.edu by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA23354>; Mon, 24 Mar 1997 12:18:20 -0800
Date: Mon, 24 Mar 1997 12:18:16 -0800
Posted-Date: Mon, 24 Mar 1997 12:18:16 -0800
Message-Id: <199703242018.AA25681@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA25681>; Mon, 24 Mar 1997 12:18:16 -0800
To: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
Subject: Re: PSH / "Failure to retain above-sequence data"
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > > From: "William Allen Simpson" <wsimpson@greendragon.com>
> > > I would also prefer that the API be _required_ to allow PSH to be set by
> > > the application, although we don't document particular APIs.

> > From: touch@ISI.EDU
> > This is an issue that changes explicit indications in the
> > host requirements RFC.

> From: "William Allen Simpson" <wsimpson@greendragon.com>
> Therefore, the change is due to a deficiency in the RFC-1122
> specification.  PSH support needs to be mandatory.

Certainly - but this isn't a deficiency in RFC1122,
it's a disagreement with it.

There are other channels by which to revise that RFC, 
and this group isn't it. We _could_ make a list of
things to reconsider, and use it to form a separate
WG, but detailed considerations of those changes here
are a diversion from the real work of this WG.

> Sure, everyone would agree that ARP and IP (and SMTP et alia) are not in
> our purview.  But TCP requirements certainly are....

I completely disagree - this is an --implementation-- WG, by
definition. Underspecification is an implementation issue,
re-specification is not.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 13:15:49 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA08509 for tcp-impl-list; Mon, 24 Mar 1997 13:13:26 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA08501 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 13:13:23 -0800
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id NAA03413 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 13:13:21 -0800
Received: from rock.mentat.com ([192.88.122.136]) by mentat.com (4.1/SMI-4.1)
	id AA00643; Mon, 24 Mar 97 12:55:42 PST
Date: Mon, 24 Mar 97 12:55:42 PST
From: jt@mentat.com (Jerry Toporek)
Message-Id: <9703242055.AA00643@mentat.com>
To: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
Subject: Re: PSH
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > What problem would that be?  The only one I am aware of is a bug which
> > resulted in the PSH bit not getting set when the URG bit was on.  That has
> > been fixed.
> >
> Great!  What version of OT?  OT 1.1.1 is shipping in MacOS 7.6, but I'm
> still seeing the problem of improperly missing PSH in 1.1.2 (net update
> distribution).

I don't directly control our customers releases.  I would like to say that
the URG-PSH problem should have been in 1.1.2, but I don't know for sure.  I'll
try to find out.  Meanwhile, I get the feeling that you are complaining about
something more than not setting the PSH bit when the URG bit is on, and I need
to understand more specifically what else you are seeing.

> 
> How do you set the PSH bit in OT?
> 

In our implementation, on which OT TCP/IP is based, the PSH bit is turned on
whenever we are sending the last piece of data that TCP has seen.  If we
already have additional unsent data queued at TCP, the PSH bit is not set.
End of story.

> According to Vinnie Moscaritolo:
> #   Date: 12 Jun 96 00:59 GMT
> #   The push flag feature of MacTCP is not functional in the the OT MacTCP
> #   implementation. Setting or resetting it does not ripple down to the OT TCP.
> #   Futher there is no way to programmaticaly set the push flag (PSH bit in the
> #   header).
> 
> We had to revert our Mac web servers to MacTCP, where PSH works.
> Netscape hangs periodically when both ends of the connection are using
> OT, but works better when the server is using MacTCP.

The API associated with MacTCP included a mechanism for specifying that the
PSH bit be set with a particular buffer.  As you know, RFC 1122 allows for,
but does not require, such a facility.  I would be very happy to see this
be explicitly discouraged.  The biggest problem with this capability is that
application writers seem to think that by manipulating the PSH bit they can
turn TCP into a low-rent messaging protocol.  This is an interoperability
nightmare.  What is the upside?  If the sending and receiving TCPs are doing
their job properly, then I don't see a good reason to allow the applications
to get into the act.  Additionally, we, as protocol implementors, are often
not in a postion to reinvent existing APIs regardless of what is "required".

If your web servers don't respond properly given the simple PSH bit strategy
described above, then I would very much like to get some actual details of the
problem.

> 
> FYI:
> 
> I also see no slow start when in the same /24, even tho not on the same
> subnet (physically divided /27).

Huh?  If you are saying that we don't do slow-start when connected to a machine
on the same subnet, that is not true.  We do slow-start on every connection.
If you have evidence to the contrary, let's see it.

> 
> I also see no slow start after idle.

I understand the discussion on this point, and consider it fair game for
recommendations by this group, but I need to understand why you are using it
as criticism while this is not a requirement.

> 
> I also see no Nagle algorithm.

This is an almost slanderous misrepresentation.

> 
> I also see Ack-only packets immediately followed by data packets.

More of the same.  ACKs should be deferred only so long, and may occur just
before new data is sent.  If you have something to show me that looks like
a systematic problem, please send it along and I will be happy to look at it.

> 
> I also see about 10% retransmissions, when no packets were lost due to
> congestion.  RTT algorithm needs work.

OT 1.1.1 did include some serious RTT problems, which I take full
responsibility for, that were very noticeable on slow links.  These have been
resolved for some time, but did not make it into OT 1.1.2.  Apple has been
promising to get these fixes out, but you will have to talk to them about
when you will see them.

> 
> I expect that this group will document all these behaviours as
> "non-conforming".

I'm happy to discuss and examine any problems reported with our implementation.
I certainly hope that this group will document only complaints supported by
facts.

jt


From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 14:06:57 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA20413 for tcp-impl-list; Mon, 24 Mar 1997 14:03:51 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA20391 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 14:03:48 -0800
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA15318 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 14:03:45 -0800
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.7.5/8.7.3) with ESMTP id QAA04244; Mon, 24 Mar 1997 16:56:50 -0500 (EST)
Message-Id: <199703242156.QAA04244@brookfield.ans.net>
To: jg@pa.dec.com (Jim Gettys)
cc: touch@ISI.EDU, F.Potorti@cnuce.cnr.it, tcp-impl@relay.engr.SGI.COM
Reply-To: curtis@ans.net
Subject: Re: TCP buffers 
In-reply-to: Your message of "Fri, 21 Mar 1997 10:54:04 PST."
             <9703211854.AA26962@pachyderm.pa.dec.com> 
Date: Mon, 24 Mar 1997 16:56:49 -0500
From: Curtis Villamizar <curtis@ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <9703211854.AA26962@pachyderm.pa.dec.com>, Jim Gettys writes:
> Yes, the control is indirect, at best, given the current interfaces
> to operating systems....
> 
> But application programmers do what they gotta do...  And latency
> control on data in flight is certainly a real issue for interactive
> network use...  I'd like to control both, ideally...
> 
> I suppose I should wander upstairs and beat up Dave Clark on the topic
> to raise it in the end-2-end group.
> 			- Jim


Jim,

The server end should not attempt to reduce the window.  The side
sitting behind a slow link should start out with a small send and
receive buffer size and in doing so reduce the amount of data in
flight.  Failing to do so, the client screws themself.  The router or
whatever, on the other side of the slow link can help the
misconfigured client by implementing RED and using a strategicly timed
packet toss (when average queue size dictates) to limit the amount of
data in flight.  In the single bottleneck case the single packet toss
still results in no retransmit throught the bottleneck.  As far as
impacting delay on other (interactive or real time) applications, some
form of fair queueing would help.  With FIFO and tail drop, the client
side can successfully screw themselves.  A proxy on the other side can
also help the misconfigured client.

The server side application is probably better off not changing the
send buffer size at all after the open.

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 15:50:54 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA14747 for tcp-impl-list; Mon, 24 Mar 1997 15:47:56 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA14741 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 15:47:54 -0800
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA11004 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 15:47:49 -0800
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.7.5/8.7.3) with ESMTP id SAA04944; Mon, 24 Mar 1997 18:42:22 -0500 (EST)
Message-Id: <199703242342.SAA04944@brookfield.ans.net>
To: backman@ftp.com
cc: curtis@ans.net, vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
Reply-To: curtis@ans.net
Subject: Re: draft description of "Failure to retain above-sequence data" 
In-reply-to: Your message of "Fri, 21 Mar 1997 20:53:44 EST."
             <199703220153.UAA18921@MAILSERV-2HIGH.FTP.COM> 
Date: Mon, 24 Mar 1997 18:42:22 -0500
From: Curtis Villamizar <curtis@ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <199703220153.UAA18921@MAILSERV-2HIGH.FTP.COM>, Larry Backman writes
:
> 
> ||
> ||These machines often had 2-4 MB of memory even in the old days but
> ||needed to stuff everything into 640KB for purely stupid reasons.
> ||
> ||It doesn't matter why these implementations are broken, the fact
> ||remains that they are broken and their performance will be poor.  If
> ||this leads to the conclusion that it is not possible to implement TCP
> ||in a way that TCP will not perform poorly on some hardware platform
> ||with too little memory or software platform that can't allocate memory
> ||for some reason, then it must simply be accepted that these
> ||implementations cannot conform to the RFC and will perform poorly.
> ||
> I'll do my best not to resent this and take it personally.
> 
> Can we refrain from emotionally charged words like "broken" and "stupid"
> and try and work with each other to clarify things and improve the TCP space?
> 
> All protocol implementations have corners into which they don't want to
> go for a variety of platform specific reasons.  The mark in my
> mind, of a good protocol implementation is how said implementation goes
> into those corners and behaves under adverse circumstances in those
> corners.  My intent in exposing my dirty laundry was to share hard
> won knowledge we at FTP found in one of those corners and to seek
> advice, which I got, in how to smooth and improve our behavior
> in those corners.
> 
> Please feel free to suggest how I can improve my product, please don't
> tell me my product is broken for doing the best balancing act it
> could under the circusmstances for which it was aimed.
> 
> L.


What is considered the right thing for a TCP implementation to do
should not be constrained or in any way influenced by what is best to
accomodate a software platform design decision based on backward
compatibility to the Intel 8088.  Today about the only thing with that
little addressable memory will be a small appliance which is hardly a
candidate for high speed TCP applications.  Any software platform
design in which the memory is physically available but inaccessible
(ie: 640KB limits) has serious shortcomings.  I appologize for using
such emotionally charged words to describe it.

Your point is taken.

My point is that very small amounts of available buffer space must
imply very low end-to-end performance.  The practice of advertising
buffering that you don't have should be strongly discouraged.  

Curtis

btw- I meant to imply that DOS was broken, not someones software's DOS
TCP stack if that's any consolation.

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 16:15:04 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA21534 for tcp-impl-list; Mon, 24 Mar 1997 16:12:47 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA21486 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 16:12:38 -0800
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id QAA16489 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 16:12:30 -0800
Received: from ftp.com by ftp.com  ; Mon, 24 Mar 1997 19:08:38 -0500
Received: from mailserv-2high.ftp.com by ftp.com  ; Mon, 24 Mar 1997 19:08:38 -0500
Received: by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id TAA08790; Mon, 24 Mar 1997 19:05:49 -0500
Date: Mon, 24 Mar 1997 19:05:49 -0500
Message-Id: <199703250005.TAA08790@MAILSERV-2HIGH.FTP.COM>
To: curtis@ans.net
Subject: Re: draft description of "Failure to retain above-sequence data" 
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: curtis@ans.net, vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
Repository: mailserv-2high.ftp.com, [message accepted at Mon Mar 24 19:05:47 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||Your point is taken.
||
||My point is that very small amounts of available buffer space must
||imply very low end-to-end performance.  The practice of advertising
||buffering that you don't have should be strongly discouraged.  
||
||Curtis
||
||btw- I meant to imply that DOS was broken, not someones software's DOS
||TCP stack if that's any consolation.

Peace on the word war, but I still feel that theres a point to perhaps
be beat into the ground.  Low end, high end machine doesn't matter -
a TCP stack in such machine operates in a competive environment against
a complex set of marketing points which can cause the TCP stack's behavior
to be less optimal than other behavior.

In our days of yore, we lost more sales due to:
	* memory footprint
	* pure optimal LAN test performance
than we did due to TCP's lack of congestion control.  I can perhaps
dig through dozens of cases where we had to argue w/ customers that
alternative DOS stacks with no slow start and no congestion window,
while faster over the LAN were doomed to lose in the real world.  Or
that small footprint stacks that whizzed in a PC Week performance test
didn't necessarily behave well in a 4 concurrent app across the internet
test.  And we also got beat up that a sophisticated user could tune
our stack to run fast, but an out of the box config. was not
optimally tuned for LAN performance.

As the NC's and Wintel NC's start to come out we will revisit those
days once again.  "Buying decisions" are made far too often on somewhat
meaningless performance benchmarks.

But back to your point; had I to do it over again, your absolutely
right, I would not have advertised a window greater than the amount of
available buffer space and would have had people implement the window
shrinking algorithm we talked about but never did.


L.



From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 17:37:14 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA12334 for tcp-impl-list; Mon, 24 Mar 1997 17:34:33 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA12310 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 17:34:31 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA03676 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 17:34:28 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm047-14.dialip.mich.net [141.211.6.56]) by merit.edu (8.8.5/merit-2.0) with SMTP id UAA10094 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 20:30:39 -0500 (EST)
Date: Tue, 25 Mar 97 01:01:06 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5715.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: PSH
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Date: Mon, 24 Mar 97 12:55:42 PST
> From: jt@mentat.com (Jerry Toporek)
> Huh?  If you are saying that we don't do slow-start when connected to a machine
> on the same subnet, that is not true.  We do slow-start on every connection.
> If you have evidence to the contrary, let's see it.
>
Very well, in the next message.  It will take a bit of time to compose,
as the traces are sitting on my laptop.  I just happened to make them a
couple of weeks ago, when trying to figure out why the throughput was so
bad after I upgraded to MacOS 7.6.


> I understand the discussion on this point, and consider it fair game for
> recommendations by this group, but I need to understand why you are using it
> as criticism while this is not a requirement.
>
Because it causes link congestion, and massive retransmission.  In your
stack, as in others.

I'm getting pretty tired of the plaint "this is not a requirement."

Once upon a time, requirements were the "minimal set".  You seem to
interpret them as the "complete set".

"Do the right thing" is the only real requirement!  Do we have to spell
out every niggling detail?


> > I also see no Nagle algorithm.
>
> This is an almost slanderous misrepresentation.
>
A misrepresentation of what "I see"?

You can get my postal address from any IETF Proceedings, or whois was4,
and I look forward to receiving your lawyers' communication.

How wonderful -- we were worried that this list would be populated by PR
management flacks instead of technical folks; and sure enough, we have
Mentat.


> > I also see Ack-only packets immediately followed by data packets.
>
> More of the same.  ACKs should be deferred only so long, and may occur just
> before new data is sent.  If you have something to show me that looks like
> a systematic problem, please send it along and I will be happy to look at it.
>
I'll send the list a short excerpt.  I'd send you personally the entire
2.5 megabyte run, but I'm not at all certain I'd like a potential
litigator to read even one day of my email.


> OT 1.1.1 did include some serious RTT problems, which I take full
> responsibility for, that were very noticeable on slow links.  These have been
> resolved for some time, but did not make it into OT 1.1.2.  Apple has been
> promising to get these fixes out, but you will have to talk to them about
> when you will see them.
>
The first time I've seen 10 Mbps ethernet characterized as a slow link.

Maybe when I have time, I'll run a slow (28.8 Kbps) PPP trace to compare.

Apple has cancelled further OT development.  And shipments of OT 1.1.1
are running to the millions of units.

Meanwhile, I will note that these "requirements" are long standing, and
OT was "tested" for several years.  And doesn't a major workstation
vendor ship a Mentat stack, too?

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 17:57:29 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA16121 for tcp-impl-list; Mon, 24 Mar 1997 17:54:08 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA16107 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 17:54:01 -0800
Received: from caipfs.rutgers.edu (caipfs.rutgers.edu [128.6.155.100]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA07480 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 17:53:58 -0800
Received: from jenolan.caipgeneral (jenolan.rutgers.edu [128.6.111.5])
	by caipfs.rutgers.edu (8.8.5/8.8.5) with SMTP id UAA27626;
	Mon, 24 Mar 1997 20:50:01 -0500 (EST)
Received: by jenolan.caipgeneral (SMI-8.6/SMI-SVR4)
	id UAA00323; Mon, 24 Mar 1997 20:49:29 -0500
Date: Mon, 24 Mar 1997 20:49:29 -0500
Message-Id: <199703250149.UAA00323@jenolan.caipgeneral>
From: "David S. Miller" <davem@jenolan.rutgers.edu>
To: wsimpson@greendragon.com
CC: tcp-impl@relay.engr.SGI.COM
In-reply-to: <5715.wsimpson@greendragon.com>
Subject: Re: PSH
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   Date: Tue, 25 Mar 97 01:01:06 GMT
   From: "William Allen Simpson" <wsimpson@greendragon.com>

   And doesn't a major workstation vendor ship a Mentat stack, too?

>From what I've heard Sun's code base is so heavily hacked since the
Mentat code originally went in, that it is mostly their own code at
this point.

---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s   ////
ethernet.  Beat that!                     ////
-----------------------------------------////__________  o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 18:54:14 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA27169 for tcp-impl-list; Mon, 24 Mar 1997 18:50:24 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA27160 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 18:50:21 -0800
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id SAA17897 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 18:50:17 -0800
Received: from rock.mentat.com ([192.88.122.136]) by mentat.com (4.1/SMI-4.1)
	id AA02656; Mon, 24 Mar 97 18:46:25 PST
Date: Mon, 24 Mar 97 18:46:25 PST
From: jt@mentat.com (Jerry Toporek)
Message-Id: <9703250246.AA02656@mentat.com>
To: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
Subject: Re: PSH
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Bill:

As someone who has frequently tried to convince people that you have useful
things to say, I find much of your response personally offensive, but not
the least surprising...

Just a few comments...

- I'm a bit disturbed at your reaction to "not a requirement".  I thought I made
it quite clear that I fully expect that part of the work product of this group
will be to try and clarify items that are not currently written down but which
ought to be "requirements".  It seems that there are two classes of items
discussed here.  Some proposals are rejected out of hand because they are not
part of the requirements, and other things are believed to be required even
thought they are not written down.  Isn't straightening this out a major part
of what we are trying to do?

- Apple has announced that development of OT has been discontinued.  They will
have to speak for themselves, but my understanding is that OT maintenance has
not been discontinued, and you should continue to expect that problems will
be fixed.

- A simple question on the original topic:  The PSH bit...  Assuming a non-
broken sender TCP and a non-broken receiver TCP, then what good can be done
by an application API to set the PSH bit?  Do I understand correctly that you
believe that such and API is required in order to get around problems in
broken implementations, or is there something more that you are trying to
accomplish?

jt


From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 22:46:53 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA27997 for tcp-impl-list; Mon, 24 Mar 1997 22:40:33 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id WAA27993 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 22:40:31 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id WAA20513 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 22:40:29 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id WAA04596; Mon, 24 Mar 1997 22:30:33 -0800 (PST)
Message-Id: <199703250630.WAA04596@daffy.ee.lbl.gov>
To: jt@mentat.com (Jerry Toporek)
Cc: tcp-impl@relay.engr.SGI.COM
Subject: WG Scope [was Re: PSH]
In-reply-to: Your message of Mon, 24 Mar 1997 18:46:25 PST.
Date: Mon, 24 Mar 1997 22:30:33 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> It seems that there are two classes of items
> discussed here.  Some proposals are rejected out of hand because they are not
> part of the requirements, and other things are believed to be required even
> thought they are not written down.  Isn't straightening this out a major part
> of what we are trying to do?

Yes.  Along with cataloging implementation problems and diagnostics,
we are chartered to identify ambiguities in the TCP spec that lead to
implementation problems.  We are not chartered to add new things to the
spec, though we might be able to do so in limited cases, where it is clear
that the addition does not constitute a research issue.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 22:51:18 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA28883 for tcp-impl-list; Mon, 24 Mar 1997 22:50:07 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id WAA28871 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 22:50:04 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id WAA21768 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 22:49:58 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm012-28.dialip.mich.net [141.211.7.196]) by merit.edu (8.8.5/merit-2.0) with SMTP id BAA12701 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 01:45:52 -0500 (EST)
Date: Tue, 25 Mar 97 04:12:34 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <2255.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: OT 1.1.2 trace
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Here's the trace I made, showing a number of problems.  The network
(all 10Base2) looks like:

    .21 -------------- .18/.65 --------------- .78
    PowerMac 7100-66   486-20                  486-16
    MacOS 7.6          NetBlazer               KA9Q
    OT 1.1.2                                   (my version)

Recording done at KA9Q ethernet interface.


*** We start with an ARP exchange, since everything has expired:

Fri Feb 28 22:00:24 1997 - e0 sent:
Ether: len 42 00:80:c7:5b:e8:a8->ff:ff:ff:ff:ff:ff type ARP
ARP: len 28 hwtype 10 Mb Ethernet prot IP op REQUEST
sender IPaddr 206.31.151.78 hwaddr 00:80:c7:5b:e8:a8
target IPaddr 206.31.151.65 hwaddr 00:00:00:00:00:00
0000  ff ff ff ff ff ff 00 80 c7 5b e8 a8 08 06 00 01  ........G[h(....
0010  08 00 06 04 00 01 00 80 c7 5b e8 a8 ce 1f 97 4e  ........G[h(N..N
0020  00 00 00 00 00 00 ce 1f 97 41                    ......N..A

Fri Feb 28 22:00:24 1997 - e0 recv:
Ether: len 60 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type ARP
ARP: len 46 hwtype 10 Mb Ethernet prot IP op REPLY
sender IPaddr 206.31.151.65 hwaddr 00:00:c0:74:36:20
target IPaddr 206.31.151.78 hwaddr 00:80:c7:5b:e8:a8
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 06 00 01  ..G[h(..@t6 ....
0010  08 00 06 04 00 02 00 00 c0 74 36 20 ce 1f 97 41  ........@t6 N..A
0020  00 80 c7 5b e8 a8 ce 1f 97 4e 00 09 82 00 01 80  ..G[h(N..N......
0030  00 09 82 00 0c 80 00 0f 82 75 6e 74              .........unt


*** Syn exchange:

Fri Feb 28 22:00:24 1997 - e0 sent:
Ether: len 58 00:80:c7:5b:e8:a8->00:00:c0:74:36:20 type IP
IP: len 44 206.31.151.78->206.31.151.21 ihl 20 ttl 58 prot TCP
TCP: 1024->110 Seq xcae7000 SYN Wnd 5840 MSS 1460
0000  00 00 c0 74 36 20 00 80 c7 5b e8 a8 08 00 45 00  ..@t6 ..G[h(..E.
0010  00 2c 00 00 00 00 3a 06 b6 29 ce 1f 97 4e ce 1f  .,....:.6)N..NN.
0020  97 15 04 00 00 6e 0c ae 70 00 00 00 00 00 60 02  .....n..p.....`.
0030  16 d0 35 97 00 00 02 04 05 b4                    .P5......4

Fri Feb 28 22:00:24 1997 - e0 recv:
Ether: len 60 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 44 206.31.151.21->206.31.151.78 ihl 20 ttl 253 DF prot TCP
TCP: 110->1024 Seq x8c84b400 Ack xcae7001 ACK SYN Wnd 17520 MSS 1460
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  00 2c b7 5c 40 00 fd 06 fb cb ce 1f 97 15 ce 1f  .,7\@.}.{KN...N.
0020  97 4e 00 6e 04 00 8c 84 b4 00 0c ae 70 01 60 12  .N.n....4...p.`.
0030  44 70 c7 60 00 00 02 04 05 b4 0a 73              DpG`.....4.s

Fri Feb 28 22:00:24 1997 - e0 sent:
Ether: len 54 00:80:c7:5b:e8:a8->00:00:c0:74:36:20 type IP
IP: len 40 206.31.151.78->206.31.151.21 ihl 20 ttl 59 prot TCP
TCP: 1024->110 Seq xcae7001 Ack x8c84b401 ACK Wnd 5840
0000  00 00 c0 74 36 20 00 80 c7 5b e8 a8 08 00 45 00  ..@t6 ..G[h(..E.
0010  00 28 00 01 00 00 3b 06 b5 2c ce 1f 97 4e ce 1f  .(....;.5,N..NN.
0020  97 15 04 00 00 6e 0c ae 70 01 8c 84 b4 01 50 10  .....n..p...4.P.
0030  16 d0 0c be 00 00                                .P.>..


*** POP3 server:

Fri Feb 28 22:00:24 1997 - e0 recv:
Ether: len 154 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 139 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84b401 Ack xcae7001 ACK PSH Wnd 17520 Data 99
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  00 8b b8 5d 40 00 fe 06 f9 6b ce 1f 97 15 ce 1f  ..8]@.~.ykN...N.
0020  97 4e 00 6e 04 00 8c 84 b4 01 0c ae 70 01 50 18  .N.n....4...p.P.
0030  44 70 f9 21 00 00 2b 4f 4b 20 67 72 65 65 6e 64  Dpy!..+OK greend
0040  72 61 67 6f 6e 2e 63 6f 6d 20 72 75 6e 6e 69 6e  ragon.com runnin
0050  67 20 41 70 70 6c 65 20 49 6e 74 65 72 6e 65 74  g Apple Internet
0060  20 4d 61 69 6c 20 53 65 72 76 65 72 20 31 2e 31   Mail Server 1.1
0070  2e 31 20 3c 31 33 35 34 39 37 36 36 37 35 2d 32  .1 <1354976675-2
0080  34 38 33 33 32 32 40 67 72 65 65 6e 64 72 61 67  483322@greendrag
0090  6f 6e 2e 63 6f 6d 3e 0d 0a 00                    on.com>...

Fri Feb 28 22:00:24 1997 - e0 sent:
Ether: len 69 00:80:c7:5b:e8:a8->00:00:c0:74:36:20 type IP
IP: len 55 206.31.151.78->206.31.151.21 ihl 20 ttl 59 prot TCP
TCP: 1024->110 Seq xcae7001 Ack x8c84b464 ACK PSH Wnd 5840 Data 15
0000  00 00 c0 74 36 20 00 80 c7 5b e8 a8 08 00 45 00  ..@t6 ..G[h(..E.
0010  00 37 00 02 00 00 3b 06 b5 1c ce 1f 97 4e ce 1f  .7....;.5.N..NN.
0020  97 15 04 00 00 6e 0c ae 70 01 8c 84 b4 64 50 18  .....n..p...4dP.
0030  16 d0 84 d0 00 00 55 53 45 52 20 77 73 69 6d 70  .P.P..USER wsimp
0040  73 6f 6e 0d 0a                                   son..

Fri Feb 28 22:00:24 1997 - e0 recv:
Ether: len 60 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 40 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84b464 Ack xcae7010 ACK Wnd 17520
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  00 28 b8 5e 40 00 fe 06 f9 cd ce 1f 97 15 ce 1f  .(8^@.~.yMN...N.
0020  97 4e 00 6e 04 00 8c 84 b4 64 0c ae 70 10 50 10  .N.n....4d..p.P.
0030  44 70 de ab 00 00 2b 4f 4b 20 67 72              Dp^+..+OK gr


*** Note that the Ack (above) with no data was immediately followed by
    data (below).  Must not be using delayed Ack, or delay too short.

*** BTW, the tail of the minimum ethernet packet wasn't zeroed, leaving
    data from the previously sent buffer.  Security problem rather than
    TCP, but shows lack of attention to detail.

Fri Feb 28 22:00:24 1997 - e0 recv:
Ether: len 70 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 56 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84b464 Ack xcae7010 ACK PSH Wnd 17520 Data 16
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  00 38 b8 5f 40 00 fe 06 f9 bc ce 1f 97 15 ce 1f  .88_@.~.y<N...N.
0020  97 4e 00 6e 04 00 8c 84 b4 64 0c ae 70 10 50 18  .N.n....4d..p.P.
0030  44 70 79 eb 00 00 2b 4f 4b 20 75 73 65 72 20 6b  Dpyk..+OK user k
0040  6e 6f 77 6e 0d 0a                                nown..

Fri Feb 28 22:00:24 1997 - e0 sent:
Ether: len 68 00:80:c7:5b:e8:a8->00:00:c0:74:36:20 type IP
IP: len 54 206.31.151.78->206.31.151.21 ihl 20 ttl 59 prot TCP
TCP: 1024->110 Seq xcae7010 Ack x8c84b474 ACK PSH Wnd 5840 Data 14
0000  00 00 c0 74 36 20 00 80 c7 5b e8 a8 08 00 45 00  ..@t6 ..G[h(..E.
0010  00 36 00 03 00 00 3b 06 b5 1c ce 1f 97 4e ce 1f  .6....;.5.N..NN.
0020  97 15 04 00 00 6e 0c ae 70 10 8c 84 b4 74 50 18  .....n..p...4tP.
0030  16 d0 25 41 00 00 50 41 53 53 20 ** ** ** ** **  .P%A..PASS *****
(password deleted)

Fri Feb 28 22:00:24 1997 - e0 recv:
Ether: len 60 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 40 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84b474 Ack xcae701e ACK Wnd 17520
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  00 28 b8 60 40 00 fe 06 f9 cb ce 1f 97 15 ce 1f  .(8`@.~.yKN...N.
0020  97 4e 00 6e 04 00 8c 84 b4 74 0c ae 70 1e 50 10  .N.n....4t..p.P.
0030  44 70 de 8d 00 00 2b 4f 4b 20 75 73              Dp^...+OK us

*** Note that the Ack (above) with no data was immediately followed by
    data (below).  Must not be using delayed Ack, or delay too short.

Fri Feb 28 22:00:25 1997 - e0 recv:
Ether: len 70 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 55 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84b474 Ack xcae701e ACK PSH Wnd 17520 Data 15
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  00 37 b8 61 40 00 fe 06 f9 bb ce 1f 97 15 ce 1f  .78a@.~.y;N...N.
0020  97 4e 00 6e 04 00 8c 84 b4 74 0c ae 70 1e 50 18  .N.n....4t..p.P.
0030  44 70 96 55 00 00 2b 4f 4b 20 6c 6f 67 67 65 64  Dp.U..+OK logged
0040  20 69 6e 0d 0a 2e                                 in...

Fri Feb 28 22:00:25 1997 - e0 sent:
Ether: len 60 00:80:c7:5b:e8:a8->00:00:c0:74:36:20 type IP
IP: len 46 206.31.151.78->206.31.151.21 ihl 20 ttl 59 prot TCP
TCP: 1024->110 Seq xcae701e Ack x8c84b483 ACK PSH Wnd 5840 Data 6
0000  00 00 c0 74 36 20 00 80 c7 5b e8 a8 08 00 45 00  ..@t6 ..G[h(..E.
0010  00 2e 00 04 00 00 3b 06 b5 23 ce 1f 97 4e ce 1f  ......;.5#N..NN.
0020  97 15 04 00 00 6e 0c ae 70 1e 8c 84 b4 83 50 18  .....n..p...4.P.
0030  16 d0 6a 5e 00 00 53 54 41 54 0d 0a              .Pj^..STAT..

Fri Feb 28 22:00:25 1997 - e0 recv:
Ether: len 60 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 40 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84b483 Ack xcae7024 ACK Wnd 17520
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  00 28 b8 62 40 00 fe 06 f9 c9 ce 1f 97 15 ce 1f  .(8b@.~.yIN...N.
0020  97 4e 00 6e 04 00 8c 84 b4 83 0c ae 70 24 50 10  .N.n....4...p$P.
0030  44 70 de 78 00 00 2b 4f 4b 20 6c 6f              Dp^x..+OK lo


*** Note that the Ack (above) with no data was immediately followed by
    data (below).  Must not be using delayed Ack, or delay too short.

Fri Feb 28 22:00:25 1997 - e0 recv:
Ether: len 70 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 55 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84b483 Ack xcae7024 ACK PSH Wnd 17520 Data 15
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  00 37 b8 63 40 00 fe 06 f9 b9 ce 1f 97 15 ce 1f  .78c@.~.y9N...N.
0020  97 4e 00 6e 04 00 8c 84 b4 83 0c ae 70 24 50 18  .N.n....4...p$P.
0030  44 70 6a 0e 00 00 2b 4f 4b 20 39 34 20 34 30 39  Dpj...+OK 94 409
0040  34 35 36 0d 0a 0a                                456...

Fri Feb 28 22:00:25 1997 - e0 sent:
Ether: len 62 00:80:c7:5b:e8:a8->00:00:c0:74:36:20 type IP
IP: len 48 206.31.151.78->206.31.151.21 ihl 20 ttl 59 prot TCP
TCP: 1024->110 Seq xcae7024 Ack x8c84b492 ACK PSH Wnd 5840 Data 8
0000  00 00 c0 74 36 20 00 80 c7 5b e8 a8 08 00 45 00  ..@t6 ..G[h(..E.
0010  00 30 00 05 00 00 3b 06 b5 20 ce 1f 97 4e ce 1f  .0....;.5 N..NN.
0020  97 15 04 00 00 6e 0c ae 70 24 8c 84 b4 92 50 18  .....n..p$..4.P.
0030  16 d0 38 27 00 00 52 45 54 52 20 31 0d 0a        .P8'..RETR 1..

*** data packet #1

Fri Feb 28 22:00:25 1997 - e0 recv:
Ether: len 86 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 71 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84b492 Ack xcae702c ACK PSH Wnd 17520 Data 31
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  00 47 b8 64 40 00 fe 06 f9 a8 ce 1f 97 15 ce 1f  .G8d@.~.y(N...N.
0020  97 4e 00 6e 04 00 8c 84 b4 92 0c ae 70 2c 50 18  .N.n....4...p,P.
0030  44 70 9a a7 00 00 2b 4f 4b 20 36 38 31 33 20 62  Dp.'..+OK 6813 b
0040  79 74 65 20 6d 65 73 73 61 67 65 20 66 6f 6c 6c  yte message foll
0050  6f 77 73 0d 0a 6c                                ows..l


*** No separate Ack here, but we see a PSH (above) on a short data
    packet, inexplicably followed by a full data packet (below).
    That PSH proves that the buffer was idle.  AIMS must call OT
    separately, but no Nagle algorithm!

*** BTW, spurious byte is added to odd length ethernet packets by
    intervening NetBlazer.

*** data packet #2

Fri Feb 28 22:00:25 1997 - e0 recv:
Ether: len 1514 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 1500 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84b4b1 Ack xcae702c ACK Wnd 17520 Data 1460
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  05 dc b8 65 40 00 fe 06 f4 12 ce 1f 97 15 ce 1f  .\8e@.~.t.N...N.
0020  97 4e 00 6e 04 00 8c 84 b4 b1 0c ae 70 2c 50 10  .N.n....41..p,P.
0030  44 70 f7 88 00 00 52 65 63 65 69 76 65 64 3a 20  Dpw...Received:
(long data deleted)


*** At this point, there should be a pause awaiting my Ack during slow
    start.  Instead, it has clocked up cwnd to the full snd.wnd.

*** data packet #3

Fri Feb 28 22:00:25 1997 - e0 recv:
Ether: len 354 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 340 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84ba65 Ack xcae702c ACK PSH Wnd 17520 Data 300
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  01 54 b8 66 40 00 fe 06 f8 99 ce 1f 97 15 ce 1f  .T8f@.~.x.N...N.
0020  97 4e 00 6e 04 00 8c 84 ba 65 0c ae 70 2c 50 18  .N.n....:e..p,P.
0030  44 70 4c 1f 00 00 69 64 20 57 41 41 30 37 33 35  DpL...id WAA0735
(long data deleted)
0140  6d 3e 0d 0a 54 6f 3a 20 77 73 69 6d 70 73 6f 6e  m>..To: wsimpson
0150  40 67 72 65 65 6e 64 72 61 67 6f 6e 2e 63 6f 6d  @greendragon.com
0160  0d 0a                                            ..


*** Something odd has happened here.  The previous short packet has a PSH,
    and is immediately followed by another full-sized packet.  The data
    shows this is the blank message line between the SMTP headers and
    body.  That PSH proves that the buffer was idle.  AIMS must call OT
    separately, but no Nagle algorithm!

*** data packet #4

Fri Feb 28 22:00:25 1997 - e0 recv:
Ether: len 1514 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 1500 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84bb91 Ack xcae702c ACK Wnd 17520 Data 1460
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  05 dc b8 67 40 00 fe 06 f4 10 ce 1f 97 15 ce 1f  .\8g@.~.t.N...N.
0020  97 4e 00 6e 04 00 8c 84 bb 91 0c ae 70 2c 50 10  .N.n....;...p,P.
0030  44 70 da c1 00 00 0d 0a 46 72 6f 6d 20 4d 41 49  DpZA....From MAI
(long data deleted)

*** data packet #5

Fri Feb 28 22:00:25 1997 - e0 recv:
Ether: len 1514 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 1500 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84c145 Ack xcae702c ACK Wnd 17520 Data 1460
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  05 dc b8 68 40 00 fe 06 f4 0f ce 1f 97 15 ce 1f  .\8h@.~.t.N...N.
0020  97 4e 00 6e 04 00 8c 84 c1 45 0c ae 70 2c 50 10  .N.n....AE..p,P.
0030  44 70 e5 19 00 00 09 69 64 20 41 41 31 33 30 38  Dpe....id AA1308
(long data deleted)


*** Not having received any Acks yet, retransmits the first PSH'd
    short message #1.  RTT is much too short.

*** The retransmitted message parts are not recombined, wasting
    bandwidth and packet processing time.

Fri Feb 28 22:00:25 1997 - e0 recv:
Ether: len 86 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 71 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84b492 Ack xcae702c ACK PSH Wnd 17520 Data 31
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  00 47 b8 69 40 00 fe 06 f9 a3 ce 1f 97 15 ce 1f  .G8i@.~.y#N...N.
0020  97 4e 00 6e 04 00 8c 84 b4 92 0c ae 70 2c 50 18  .N.n....4...p,P.
0030  44 70 9a a7 00 00 2b 4f 4b 20 36 38 31 33 20 62  Dp.'..+OK 6813 b
0040  79 74 65 20 6d 65 73 73 61 67 65 20 66 6f 6c 6c  yte message foll
0050  6f 77 73 0d 0a 37                                ows..7


*** Finally, the flood subsides, and we get a chance to send our
    Ack for #1 and #2 packets:

Fri Feb 28 22:00:25 1997 - e0 sent:
Ether: len 54 00:80:c7:5b:e8:a8->00:00:c0:74:36:20 type IP
IP: len 40 206.31.151.78->206.31.151.21 ihl 20 ttl 59 prot TCP
TCP: 1024->110 Seq xcae702c Ack x8c84ba65 ACK Wnd 5840
0000  00 00 c0 74 36 20 00 80 c7 5b e8 a8 08 00 45 00  ..@t6 ..G[h(..E.
0010  00 28 00 06 00 00 3b 06 b5 27 ce 1f 97 4e ce 1f  .(....;.5'N..NN.
0020  97 15 04 00 00 6e 0c ae 70 2c 8c 84 ba 65 50 10  .....n..p,..:eP.
0030  16 d0 06 2f 00 00                                .P./..


*** And, our Ack of #3 and #4 packets:

Fri Feb 28 22:00:25 1997 - e0 sent:
Ether: len 54 00:80:c7:5b:e8:a8->00:00:c0:74:36:20 type IP
IP: len 40 206.31.151.78->206.31.151.21 ihl 20 ttl 59 prot TCP
TCP: 1024->110 Seq xcae702c Ack x8c84c145 ACK Wnd 5840
0000  00 00 c0 74 36 20 00 80 c7 5b e8 a8 08 00 45 00  ..@t6 ..G[h(..E.
0010  00 28 00 07 00 00 3b 06 b5 26 ce 1f 97 4e ce 1f  .(....;.5&N..NN.
0020  97 15 04 00 00 6e 0c ae 70 2c 8c 84 c1 45 50 10  .....n..p,..AEP.
0030  16 d0 ff 4e 00 00                                .P.N..


*** And now, the Ack of #5 (caused by the retransmission):

Fri Feb 28 22:00:25 1997 - e0 sent:
Ether: len 54 00:80:c7:5b:e8:a8->00:00:c0:74:36:20 type IP
IP: len 40 206.31.151.78->206.31.151.21 ihl 20 ttl 59 prot TCP
TCP: 1024->110 Seq xcae702c Ack x8c84c6f9 ACK Wnd 5840
0000  00 00 c0 74 36 20 00 80 c7 5b e8 a8 08 00 45 00  ..@t6 ..G[h(..E.
0010  00 28 00 08 00 00 3b 06 b5 25 ce 1f 97 4e ce 1f  .(....;.5%N..NN.
0020  97 15 04 00 00 6e 0c ae 70 2c 8c 84 c6 f9 50 10  .....n..p,..FyP.
0030  16 d0 f9 9a 00 00                                .Py...


*** Retransmission of #3, probably in response to our 1st Ack.

*** The retransmitted message parts are not recombined, wasting
    bandwidth and packet processing time.

Fri Feb 28 22:00:25 1997 - e0 recv:
Ether: len 354 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 340 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84ba65 Ack xcae702c ACK PSH Wnd 17520 Data 300
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  01 54 b8 6a 40 00 fe 06 f8 95 ce 1f 97 15 ce 1f  .T8j@.~.x.N...N.
0020  97 4e 00 6e 04 00 8c 84 ba 65 0c ae 70 2c 50 18  .N.n....:e..p,P.
0030  44 70 4c 1f 00 00 69 64 20 57 41 41 30 37 33 35  DpL...id WAA0735
(long data deleted)
0140  6d 3e 0d 0a 54 6f 3a 20 77 73 69 6d 70 73 6f 6e  m>..To: wsimpson
0150  40 67 72 65 65 6e 64 72 61 67 6f 6e 2e 63 6f 6d  @greendragon.com
0160  0d 0a                                            ..


*** Retransmission of #5, probably in response to our 2nd Ack.

Fri Feb 28 22:00:25 1997 - e0 recv:
Ether: len 1514 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 1500 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84c145 Ack xcae702c ACK PSH Wnd 17520 Data 1460
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  05 dc b8 6b 40 00 fe 06 f4 0c ce 1f 97 15 ce 1f  .\8k@.~.t.N...N.
0020  97 4e 00 6e 04 00 8c 84 c1 45 0c ae 70 2c 50 18  .N.n....AE..p,P.
0030  44 70 e5 11 00 00 09 69 64 20 41 41 31 33 30 38  Dpe....id AA1308
(long data deleted)


*** cwnd must be opened to 2 MSS, but there isn't enough send window
    left, so OT sends a partial MSS.  Silly window!

*** data packet #6

Fri Feb 28 22:00:25 1997 - e0 recv:
Ether: len 1230 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 1216 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84c6f9 Ack xcae702c ACK Wnd 17520 Data 1176
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  04 c0 b8 6c 40 00 fe 06 f5 27 ce 1f 97 15 ce 1f  .@8l@.~.u'N...N.
0020  97 4e 00 6e 04 00 8c 84 c6 f9 0c ae 70 2c 50 10  .N.n....Fy..p,P.
0030  44 70 f7 54 00 00 2e 36 34 2f 54 65 6e 6f 6e 2d  DpwT...64/Tenon-
(long data deleted)


*** Finally, having received our 3rd Ack, the silly window opens again,
    and sends the final partial MSS of data.

*** data packet #7

Fri Feb 28 22:00:25 1997 - e0 recv:
Ether: len 1014 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 1000 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84cb91 Ack xcae702c ACK PSH Wnd 17520 Data 960
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  03 e8 b8 6d 40 00 fe 06 f5 fe ce 1f 97 15 ce 1f  .h8m@.~.u~N...N.
0020  97 4e 00 6e 04 00 8c 84 cb 91 0c ae 70 2c 50 18  .N.n....K...p,P.
0030  44 70 b8 64 00 00 41 74 74 72 69 62 75 74 65 20  Dp8d..Attribute
(long data deleted)


*** duplicate Ack, caused by retransmission of #3.

Fri Feb 28 22:00:26 1997 - e0 sent:
Ether: len 54 00:80:c7:5b:e8:a8->00:00:c0:74:36:20 type IP
IP: len 40 206.31.151.78->206.31.151.21 ihl 20 ttl 59 prot TCP
TCP: 1024->110 Seq xcae702c Ack x8c84c6f9 ACK Wnd 5840
0000  00 00 c0 74 36 20 00 80 c7 5b e8 a8 08 00 45 00  ..@t6 ..G[h(..E.
0010  00 28 00 09 00 00 3b 06 b5 24 ce 1f 97 4e ce 1f  .(....;.5$N..NN.
0020  97 15 04 00 00 6e 0c ae 70 2c 8c 84 c6 f9 50 10  .....n..p,..FyP.
0030  16 d0 f9 9a 00 00                                .Py...


*** duplicate Ack, caused by retransmission of #5.

Fri Feb 28 22:00:26 1997 - e0 sent:
Ether: len 54 00:80:c7:5b:e8:a8->00:00:c0:74:36:20 type IP
IP: len 40 206.31.151.78->206.31.151.21 ihl 20 ttl 59 prot TCP
TCP: 1024->110 Seq xcae702c Ack x8c84c6f9 ACK Wnd 5840
0000  00 00 c0 74 36 20 00 80 c7 5b e8 a8 08 00 45 00  ..@t6 ..G[h(..E.
0010  00 28 00 0a 00 00 3b 06 b5 23 ce 1f 97 4e ce 1f  .(....;.5#N..NN.
0020  97 15 04 00 00 6e 0c ae 70 2c 8c 84 c6 f9 50 10  .....n..p,..FyP.
0030  16 d0 f9 9a 00 00                                .Py...


*** Ack of #7, combined with new command sequence.

Fri Feb 28 22:00:26 1997 - e0 sent:
Ether: len 62 00:80:c7:5b:e8:a8->00:00:c0:74:36:20 type IP
IP: len 48 206.31.151.78->206.31.151.21 ihl 20 ttl 59 prot TCP
TCP: 1024->110 Seq xcae702c Ack x8c84cf51 ACK PSH Wnd 5840 Data 8
0000  00 00 c0 74 36 20 00 80 c7 5b e8 a8 08 00 45 00  ..@t6 ..G[h(..E.
0010  00 30 00 0b 00 00 3b 06 b5 1a ce 1f 97 4e ce 1f  .0....;.5.N..NN.
0020  97 15 04 00 00 6e 0c ae 70 2c 8c 84 cf 51 50 18  .....n..p,..OQP.
0030  16 d0 33 6d 00 00 44 45 4c 45 20 31 0d 0a        .P3m..DELE 1..

Fri Feb 28 22:00:26 1997 - e0 recv:
Ether: len 84 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 69 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84cf51 Ack xcae7034 ACK PSH Wnd 17520 Data 29
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  00 45 b8 6e 40 00 fe 06 f9 a0 ce 1f 97 15 ce 1f  .E8n@.~.y N...N.
0020  97 4e 00 6e 04 00 8c 84 cf 51 0c ae 70 34 50 18  .N.n....OQ..p4P.
0030  44 70 89 0f 00 00 2b 4f 4b 20 6d 65 73 73 61 67  Dp....+OK messag
0040  65 20 77 69 6c 6c 20 62 65 20 64 65 6c 65 74 65  e will be delete
0050  64 0d 0a 20                                      d..

Fri Feb 28 22:00:26 1997 - e0 sent:
Ether: len 62 00:80:c7:5b:e8:a8->00:00:c0:74:36:20 type IP
IP: len 48 206.31.151.78->206.31.151.21 ihl 20 ttl 59 prot TCP
TCP: 1024->110 Seq xcae7034 Ack x8c84cf6e ACK PSH Wnd 5840 Data 8
0000  00 00 c0 74 36 20 00 80 c7 5b e8 a8 08 00 45 00  ..@t6 ..G[h(..E.
0010  00 30 00 0c 00 00 3b 06 b5 19 ce 1f 97 4e ce 1f  .0....;.5.N..NN.
0020  97 15 04 00 00 6e 0c ae 70 34 8c 84 cf 6e 50 18  .....n..p4..OnP.
0030  16 d0 1d 3a 00 00 52 45 54 52 20 32 0d 0a        .P.:..RETR 2..


*** And we get the same treatment again....

*** data packet #1

Fri Feb 28 22:00:26 1997 - e0 recv:
Ether: len 86 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 71 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84cf6e Ack xcae703c ACK PSH Wnd 17520 Data 31
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  00 47 b8 6f 40 00 fe 06 f9 9d ce 1f 97 15 ce 1f  .G8o@.~.y.N...N.
0020  97 4e 00 6e 04 00 8c 84 cf 6e 0c ae 70 3c 50 18  .N.n....On..p<P.
0030  44 70 81 bb 00 00 2b 4f 4b 20 32 32 33 39 20 62  Dp.;..+OK 2239 b
0040  79 74 65 20 6d 65 73 73 61 67 65 20 66 6f 6c 6c  yte message foll
0050  6f 77 73 0d 0a 3b                                ows..;

*** data packet #2, no Nagle

Fri Feb 28 22:00:26 1997 - e0 recv:
Ether: len 1444 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 1430 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84cf8d Ack xcae703c ACK Wnd 17520 Data 1390
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  05 96 b8 70 40 00 fe 06 f4 4d ce 1f 97 15 ce 1f  ..8p@.~.tMN...N.
0020  97 4e 00 6e 04 00 8c 84 cf 8d 0c ae 70 3c 50 10  .N.n....O...p<P.
0030  44 70 6d 79 00 00 52 65 63 65 69 76 65 64 3a 20  Dpmy..Received:
(long data deleted)
05a0  6c 6b 0d 0a                                      lk..

*** This is even odder than before.  At line break between message
    headers and body (above), short MSS; but unlike before, no PSH.

*** data packet #3, short MSS, silly window (below)

Fri Feb 28 22:00:26 1997 - e0 recv:
Ether: len 904 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 889 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84d4fb Ack xcae703c ACK PSH Wnd 17520 Data 849
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  03 79 b8 71 40 00 fe 06 f6 69 ce 1f 97 15 ce 1f  .y8q@.~.viN...N.
0020  97 4e 00 6e 04 00 8c 84 d4 fb 0c ae 70 3c 50 18  .N.n....T{..p<P.
0030  44 70 4b 7c 00 00 0d 0a 4f 6e 20 54 68 75 2c 20  DpK|....On Thu,
(long data deleted)
0380  63 6f 6d 0d 0a 0d 0a 65                          com....e

*** Ack of #1 and #2

Fri Feb 28 22:00:26 1997 - e0 sent:
Ether: len 54 00:80:c7:5b:e8:a8->00:00:c0:74:36:20 type IP
IP: len 40 206.31.151.78->206.31.151.21 ihl 20 ttl 59 prot TCP
TCP: 1024->110 Seq xcae703c Ack x8c84d4fb ACK Wnd 5840
0000  00 00 c0 74 36 20 00 80 c7 5b e8 a8 08 00 45 00  ..@t6 ..G[h(..E.
0010  00 28 00 0d 00 00 3b 06 b5 20 ce 1f 97 4e ce 1f  .(....;.5 N..NN.
0020  97 15 04 00 00 6e 0c ae 70 3c 8c 84 d4 fb 50 10  .....n..p<..T{P.
0030  16 d0 eb 88 00 00                                .Pk...

*** retransmission of #3

Fri Feb 28 22:00:26 1997 - e0 recv:
Ether: len 904 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 889 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84d4fb Ack xcae703c ACK PSH Wnd 17520 Data 849
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  03 79 b8 72 40 00 fe 06 f6 68 ce 1f 97 15 ce 1f  .y8r@.~.vhN...N.
0020  97 4e 00 6e 04 00 8c 84 d4 fb 0c ae 70 3c 50 18  .N.n....T{..p<P.
0030  44 70 4b 7c 00 00 0d 0a 4f 6e 20 54 68 75 2c 20  DpK|....On Thu,
(long data deleted)
0380  63 6f 6d 0d 0a 0d 0a 38                          com....8

*** Ack of #3 (caused by the retransmission)

Fri Feb 28 22:00:26 1997 - e0 sent:
Ether: len 54 00:80:c7:5b:e8:a8->00:00:c0:74:36:20 type IP
IP: len 40 206.31.151.78->206.31.151.21 ihl 20 ttl 59 prot TCP
TCP: 1024->110 Seq xcae703c Ack x8c84d84c ACK Wnd 5840
0000  00 00 c0 74 36 20 00 80 c7 5b e8 a8 08 00 45 00  ..@t6 ..G[h(..E.
0010  00 28 00 0e 00 00 3b 06 b5 1f ce 1f 97 4e ce 1f  .(....;.5.N..NN.
0020  97 15 04 00 00 6e 0c ae 70 3c 8c 84 d8 4c 50 10  .....n..p<..XLP.
0030  16 d0 e8 37 00 00                                .Ph7..

*** data packet #4, end silly window.  My favorite, as the silly window
    had only 3 more bytes to finish, when 591 was left in the #3 MSS!

Fri Feb 28 22:00:26 1997 - e0 recv:
Ether: len 60 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 43 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84d84c Ack xcae703c ACK PSH Wnd 17520 Data 3
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  00 2b b8 73 40 00 fe 06 f9 b5 ce 1f 97 15 ce 1f  .+8s@.~.y5N...N.
0020  97 4e 00 6e 04 00 8c 84 d8 4c 0c ae 70 3c 50 18  .N.n....XL..p<P.
0030  44 70 82 7f 00 00 2e 0d 0a 6e 20 54              Dp.......n T

*** Ack plus new command sequence

Fri Feb 28 22:00:26 1997 - e0 sent:
Ether: len 62 00:80:c7:5b:e8:a8->00:00:c0:74:36:20 type IP
IP: len 48 206.31.151.78->206.31.151.21 ihl 20 ttl 59 prot TCP
TCP: 1024->110 Seq xcae703c Ack x8c84d84f ACK PSH Wnd 5840 Data 8
0000  00 00 c0 74 36 20 00 80 c7 5b e8 a8 08 00 45 00  ..@t6 ..G[h(..E.
0010  00 30 00 0f 00 00 3b 06 b5 16 ce 1f 97 4e ce 1f  .0....;.5.N..NN.
0020  97 15 04 00 00 6e 0c ae 70 3c 8c 84 d8 4f 50 18  .....n..p<..XOP.
0030  16 d0 2a 5e 00 00 44 45 4c 45 20 32 0d 0a        .P*^..DELE 2..

Fri Feb 28 22:00:26 1997 - e0 recv:
Ether: len 84 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
IP: len 69 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
TCP: 110->1024 Seq x8c84d84f Ack xcae7044 ACK PSH Wnd 17520 Data 29
0000  00 80 c7 5b e8 a8 00 00 c0 74 36 20 08 00 45 00  ..G[h(..@t6 ..E.
0010  00 45 b8 74 40 00 fe 06 f9 9a ce 1f 97 15 ce 1f  .E8t@.~.y.N...N.
0020  97 4e 00 6e 04 00 8c 84 d8 4f 0c ae 70 44 50 18  .N.n....XO..pDP.
0030  44 70 80 01 00 00 2b 4f 4b 20 6d 65 73 73 61 67  Dp....+OK messag
0040  65 20 77 69 6c 6c 20 62 65 20 64 65 6c 65 74 65  e will be delete
0050  64 0d 0a 65                                      d..e

Fri Feb 28 22:00:26 1997 - e0 sent:
Ether: len 62 00:80:c7:5b:e8:a8->00:00:c0:74:36:20 type IP
IP: len 48 206.31.151.78->206.31.151.21 ihl 20 ttl 59 prot TCP
TCP: 1024->110 Seq xcae7044 Ack x8c84d86c ACK PSH Wnd 5840 Data 8
0000  00 00 c0 74 36 20 00 80 c7 5b e8 a8 08 00 45 00  ..@t6 ..G[h(..E.
0010  00 30 00 10 00 00 3b 06 b5 15 ce 1f 97 4e ce 1f  .0....;.5.N..NN.
0020  97 15 04 00 00 6e 0c ae 70 44 8c 84 d8 6c 50 18  .....n..pD..XlP.
0030  16 d0 14 2b 00 00 52 45 54 52 20 33 0d 0a        .P.+..RETR 3..

*** etc.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 23:13:54 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA00848 for tcp-impl-list; Mon, 24 Mar 1997 23:09:51 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA00840 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 23:09:48 -0800
Received: from caipfs.rutgers.edu (caipfs.rutgers.edu [128.6.155.100]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id XAA24319 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 23:09:44 -0800
Received: from jenolan.caipgeneral (jenolan.rutgers.edu [128.6.111.5])
	by caipfs.rutgers.edu (8.8.5/8.8.5) with SMTP id CAA05766;
	Tue, 25 Mar 1997 02:05:57 -0500 (EST)
Received: by jenolan.caipgeneral (SMI-8.6/SMI-SVR4)
	id CAA00843; Tue, 25 Mar 1997 02:05:27 -0500
Date: Tue, 25 Mar 1997 02:05:27 -0500
Message-Id: <199703250705.CAA00843@jenolan.caipgeneral>
From: "David S. Miller" <davem@jenolan.rutgers.edu>
To: wsimpson@greendragon.com
CC: tcp-impl@relay.engr.SGI.COM
In-reply-to: <2255.wsimpson@greendragon.com>
Subject: Re: OT 1.1.2 trace
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   Date: Tue, 25 Mar 97 04:12:34 GMT
   From: "William Allen Simpson" <wsimpson@greendragon.com>

   Must not be using delayed Ack, or delay too short.

And none of us are going to be able to figure out easily which one it
is unless the time stamps in your traces have a higher resolution than
one second...

---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s   ////
ethernet.  Beat that!                     ////
-----------------------------------------////__________  o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 24 23:31:59 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA03467 for tcp-impl-list; Mon, 24 Mar 1997 23:26:18 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA03461 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 23:26:15 -0800
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id XAA26361 for <tcp-impl@relay.engr.SGI.COM>; Mon, 24 Mar 1997 23:26:12 -0800
Received: from feller.mentat.com ([192.88.122.144]) by mentat.com (4.1/SMI-4.1)
	id AA03677; Mon, 24 Mar 97 23:22:14 PST
Received: by feller.mentat.com (SMI-8.6/SMI-SVR4)
	id XAA19533; Mon, 24 Mar 1997 23:22:46 -0800
Date: Mon, 24 Mar 1997 23:22:46 -0800
From: jt@mentat.com (Jerry Toporek)
Message-Id: <199703250722.XAA19533@feller.mentat.com>
To: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
Subject: Re: OT 1.1.2 trace
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Bill:

> Fri Feb 28 22:00:24 1997 - e0 recv:
> Ether: len 60 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
> IP: len 40 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
> TCP: 110->1024 Seq x8c84b464 Ack xcae7010 ACK Wnd 17520
...
> 
> *** Note that the Ack (above) with no data was immediately followed by
>     data (below).  Must not be using delayed Ack, or delay too short.
> 
> Fri Feb 28 22:00:24 1997 - e0 recv:
> Ether: len 70 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
> IP: len 56 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
> TCP: 110->1024 Seq x8c84b464 Ack xcae7010 ACK PSH Wnd 17520 Data 16

If I understand your timestamp correctly, the data followed the ACK in the
same day, date, and year.  I'm going to go out on a limb and further guess
that 22:00:24 is hr:mm:ss, and that you are concluding that because the data
follows the ACK within a second that this is indicative of insufficient
delayed ACK.  I have to say that this is the first time a stack based on our
implementation has been accused of *not* delaying ACKs.

...
> 
> Fri Feb 28 22:00:25 1997 - e0 recv:
> Ether: len 86 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
> IP: len 71 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
> TCP: 110->1024 Seq x8c84b492 Ack xcae702c ACK PSH Wnd 17520 Data 31
...
> 
> 
> *** No separate Ack here, but we see a PSH (above) on a short data
>     packet, inexplicably followed by a full data packet (below).
>     That PSH proves that the buffer was idle.  AIMS must call OT
>     separately, but no Nagle algorithm!
> 
> Fri Feb 28 22:00:25 1997 - e0 recv:
> Ether: len 1514 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
> IP: len 1500 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
> TCP: 110->1024 Seq x8c84b4b1 Ack xcae702c ACK Wnd 17520 Data 1460

The first segment is short, but the PSH bit indicates that it is all we
have seen.  There is no un-ACKed data, so out it goes.  The second segment
is a full MSS.  You seem to have misinterpreted the Nagle algorithm.  Please
go read it again.

This stuff is worse than I expected.  I'm going to bed.

jt


From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 06:56:52 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id GAA17085 for tcp-impl-list; Tue, 25 Mar 1997 06:54:08 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id GAA17066 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 06:54:05 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id GAA00430 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 06:54:04 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm012-24.dialip.mich.net [141.211.7.192]) by merit.edu (8.8.5/merit-2.0) with SMTP id JAA16960 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 09:49:35 -0500 (EST)
Date: Tue, 25 Mar 97 13:51:52 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5718.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: PSH
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: jt@mentat.com (Jerry Toporek)
> As someone who has frequently tried to convince people that you have useful
> things to say, I find much of your response personally offensive, but not
> the least surprising...
>
To my memory, we have never met.  Your personal attack affected me in a
like manner.  If you want to express umbrage and bandy legalisms, make
sure it's backed up by your lawyer.


> - A simple question on the original topic:  The PSH bit...  Assuming a non-
> broken sender TCP and a non-broken receiver TCP, then what good can be done
> by an application API to set the PSH bit?  Do I understand correctly that you
> believe that such and API is required in order to get around problems in
> broken implementations, or is there something more that you are trying to
> accomplish?
>
Looking at the trace I sent, you will note that message 1 packet #3 is
shorter than MSS and has the PSH set, while message 2 packet #2 is
shorter than MSS and does _not_ have the PSH set.

In both cases, this occurs at the CRLF CRLF between the SMTP headers and
the body.  Presumably, multiple application calls to the stack.

In either case, had the API allowed the application to set PSH, the PSH
could be set at the end of the complete message following the body:
 1) full-size MSS packets could be sent;
 2) network performance would be better, with fewer packets;
 3) receiver performance would be better, with fewer task switches.

This is a documented case where the lack of a PSH (whether or not the
sender TCP implementation is also faulty) caused additional network
packet load and concomitant additional retransmissions.


From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 06:56:52 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id GAA17087 for tcp-impl-list; Tue, 25 Mar 1997 06:54:09 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id GAA17078 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 06:54:07 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id GAA00440 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 06:54:05 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm012-24.dialip.mich.net [141.211.7.192]) by merit.edu (8.8.5/merit-2.0) with SMTP id JAA16963 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 09:49:37 -0500 (EST)
Date: Tue, 25 Mar 97 14:21:53 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5719.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: OT 1.1.2 trace
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: "David S. Miller" <davem@jenolan.rutgers.edu>
>    From: "William Allen Simpson" <wsimpson@greendragon.com>
>
>    Must not be using delayed Ack, or delay too short.
>
> And none of us are going to be able to figure out easily which one it
> is unless the time stamps in your traces have a higher resolution than
> one second...
>
The platform won't give better resolution than 55 milliseconds.  Not
stellar.  I'll try to hack up the code to give that much for future
traces, but I doubt that it would resolve this question....

Anyway, where alternative explanations for the behaviour are possible,
in fairness I tried to identify them all.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 08:15:34 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA29613 for tcp-impl-list; Tue, 25 Mar 1997 08:13:07 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA29594 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 08:13:04 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA16896 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 08:13:03 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm012-07.dialip.mich.net [141.211.7.175]) by merit.edu (8.8.5/merit-2.0) with SMTP id LAA19107 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 11:09:11 -0500 (EST)
Date: Tue, 25 Mar 97 14:57:35 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5720.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: OT 1.1.2 trace -- delayed Ack
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

The purpose of this implementation detail is already covered verbosely
in RFC-813 (July 1982):

    "Measurement of TCP implementations, especially on large operating
    systems, indicate that most of the overhead of dealing with a
    segment is not in the processing at the TCP or IP level, but simply
    in the scheduling of the handler which is required to deal with the
    segment.  A steady dribble of acknowledgements causes a high
    overhead in scheduling, with very little to show for it.  This waste
    is to be avoided if possible.

    "... the receiver of data need not, and for efficiency reasons
    should not, acknowledge the data unless either the acknowledgement
    is intended to produce an increased useable window, is necessary in
    order to prevent retransmission or is being sent as part of a
    reverse direction segment being sent for some other reason."

More terse explanations (with cross references) are given in RFC-1122.

                                ----

> From: jt@mentat.com (Jerry Toporek)
> > Fri Feb 28 22:00:24 1997 - e0 recv:
> > Ether: len 60 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
> > IP: len 40 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
> > TCP: 110->1024 Seq x8c84b464 Ack xcae7010 ACK Wnd 17520
> ...
> >
> > *** Note that the Ack (above) with no data was immediately followed by
> >     data (below).  Must not be using delayed Ack, or delay too short.
> >
> > Fri Feb 28 22:00:24 1997 - e0 recv:
> > Ether: len 70 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
> > IP: len 56 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
> > TCP: 110->1024 Seq x8c84b464 Ack xcae7010 ACK PSH Wnd 17520 Data 16
>
> you are concluding that because the data
> follows the ACK within a second that this is indicative of insufficient
> delayed ACK.  I have to say that this is the first time a stack based on our
> implementation has been accused of *not* delaying ACKs.
>
My text concluded that, since your Ack is immediately followed by your
_own_ application data (no matter how long or short the time), that
either you are not using delayed Ack _or_ the delay is "too short".

In this case, you now indicate that you have implemented delayed Ack.
Therefore, my evaluation holds that the delay _is_ "too short".

Do you have another explanation?

Can you provide a trace with a better time resolution?

Does the time resolution matter in the face of clear evidence?

                                ----

It's been a long time since I looked at Clark's RFC-813, but it pretty
clearly says:

   "This low roundtrip situation can be covered very simply by including
   a minimum value below which the roundtrip estimate is not permitted
   to drop."

Clark recommended (in 1982) 200 to 300 milliseconds.  Remember, the key
expectation is that the time be long enough "that the timer, although
set, is seldom used."

RFC-1122 4.2.3.2 sets an _upper_ bound of 500 milliseconds.  One of the
recommendations we as a WG could make is an expected _lower_ bound,
which has to be long enough to reflect both inter-packet arrival time up
to 1/2 RTT _and_ anticipated processing time in the application.

I use 100 milliseconds, which is guaranteed to be substantially longer
than the PC tick of 55 milliseconds.

What lower bound do you use?

Is it guaranteed to be longer than a Mac tick?

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 09:15:44 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA12006 for tcp-impl-list; Tue, 25 Mar 1997 09:10:57 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA11995 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 09:10:51 -0800
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id JAA00781 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 09:10:45 -0800
Received: from feller.mentat.com ([192.88.122.144]) by mentat.com (4.1/SMI-4.1)
	id AA06195; Tue, 25 Mar 97 09:06:52 PST
Received: by feller.mentat.com (SMI-8.6/SMI-SVR4)
	id JAA20095; Tue, 25 Mar 1997 09:07:23 -0800
Date: Tue, 25 Mar 1997 09:07:23 -0800
From: jt@mentat.com (Jerry Toporek)
Message-Id: <199703251707.JAA20095@feller.mentat.com>
To: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
Subject: Re: OT 1.1.2 trace -- delayed Ack
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Bill:

The easiest way to discover whether or not an implementation uses delayed
ACKs is to exchange one byte messages between applications.  If you see
two packets per exchange, then ACKs are being deferred.  If you see four
packets per exchange, then either ACKs are not being deferred, or possibly
the deferred ACK interval is so small that the applications can not turn
around a single byte in the interval.  Had you tried this, you would have seen
that we do indeed defer ACKs.  When you look at a single data packet that
follows an ACK, all you have "discovered" is that the application got the data
out too late to piggyback on the ACK.  This can obviously happen regardless
of the deferred ACK interval.

The default deferred ACK interval in our implementation is 50ms.  The value is
tunable by vendors, and, in most cases, by the end user.  Where 50ms is less
than a tick (which is not the case in any of our production versions), you
are correct that the default should be larger.  In the case of the Mac, I
believe that a tick is 10ms, but I know that it is substantially smaller than
50ms.  Apple does not provide the ability for tuning by end-users in OT 1.1.1.

I'm more than happy to listen to comments from the group on what the default
deferred ACK interval ought to be.

jt


From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 09:34:20 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA16167 for tcp-impl-list; Tue, 25 Mar 1997 09:31:39 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA16152 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 09:31:35 -0800
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA06067 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 09:31:34 -0800
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id MAA23521; Tue, 25 Mar 1997 12:25:56 -0500 (EST)
Message-Id: <199703251725.MAA23521@grinch.eecs.umich.edu>
To: "William Allen Simpson" <wsimpson@greendragon.com>
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: OT 1.1.2 trace -- delayed Ack
References: <5720.wsimpson@greendragon.com>
From: Scott Dawson <sdawson@eecs.umich.edu>
In-Reply-To: "William Allen Simpson"'s message of Tue, 25 Mar 97 14:57:35 GMT
Lines: 83
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Tue, 25 Mar 1997 12:25:55 -0500
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> The purpose of this implementation detail is already covered verbosely
> in RFC-813 (July 1982):
> 
>     "Measurement of TCP implementations, especially on large operating
>     systems, indicate that most of the overhead of dealing with a
>     segment is not in the processing at the TCP or IP level, but simply
>     in the scheduling of the handler which is required to deal with the
>     segment.  A steady dribble of acknowledgements causes a high
>     overhead in scheduling, with very little to show for it.  This waste
>     is to be avoided if possible.
> 
>     "... the receiver of data need not, and for efficiency reasons
>     should not, acknowledge the data unless either the acknowledgement
>     is intended to produce an increased useable window, is necessary in
>     order to prevent retransmission or is being sent as part of a
>     reverse direction segment being sent for some other reason."
> 
> More terse explanations (with cross references) are given in RFC-1122.

Here is something from RFC-1122 which talks about actual times:

      4.2.3.2  When to Send an ACK Segment

      A host that is receiving a stream of TCP data segments can
      increase efficiency in both the Internet and the hosts by
      sending fewer than one ACK segment per data segment received;
      this is known as a "delayed ACK" [TCP:5].

      A TCP SHOULD implement a delayed ACK, but an ACK should not be
      excessively delayed; in particular, the delay MUST be less than
      .5 seconds, and in a stream of full-sized segments there SHOULD
      be an ACK for at least every second segment.

> > you are concluding that because the data follows the ACK within a
> > second that this is indicative of insufficient delayed ACK.  I
> > have to say that this is the first time a stack based on our
> > implementation has been accused of *not* delaying ACKs.
> >
>
> My text concluded that, since your Ack is immediately followed by your
> _own_ application data (no matter how long or short the time), that
> either you are not using delayed Ack _or_ the delay is "too short".

Using the 1 second granularity given in your trace, how do you decide
how much time passed before the ACK was sent?  Since the upper bound
on the delay (as given in RFC-1122) is .5 seconds, if the
implementation is using delayed ACKs, the delay will have to be .5 sec
or less.  But since your trace granularity is 1 second, you will get a
reading of either 0 or 1 second, one of which is too short, one of
which is too long.  This could lead you to believe that either:

   1.  The implementation doesn't do delayed ACKs, or
   2.  The implementation uses delayed ACKs, but violates the spec
       which states that the delay should be .5 sec or less.

However, you can't be sure of either of these without a finer
granularity trace.

> In this case, you now indicate that you have implemented delayed Ack.
> Therefore, my evaluation holds that the delay _is_ "too short".

This is probably not true.  Again, you need better a granularity trace
in order to tell.

> Do you have another explanation?

The ACK is indeed delayed, but since it was delayed by less than 1
second, you read the delay as 0.

> Can you provide a trace with a better time resolution?

Can you?  It seems like you are the one saying that there's a problem.
The burden of proof ought to be on you to try to produce evidence of
the problem.  It's not fair for you to say, "X's implementation is
broken unless someone from X proves that it's not."

> Does the time resolution matter in the face of clear evidence?

It sure does.  The evidence is by no means clear without better time
resolution on the trace data.

-Scott

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 09:38:12 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA17356 for tcp-impl-list; Tue, 25 Mar 1997 09:36:11 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA17313 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 09:36:09 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA07025 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 09:36:04 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm035-24.dialip.mich.net [141.211.7.35]) by merit.edu (8.8.5/merit-2.0) with SMTP id MAA20835 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 12:32:10 -0500 (EST)
Date: Tue, 25 Mar 97 16:19:59 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5721.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: OT 1.1.2 trace -- Nagle
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

The use of this implementation detail is already covered verbosely
in RFC-896 (January 1984):

    "The solution is to inhibit the sending of new TCP segments when
    new outgoing data arrives from the user if any previously
    transmitted data on the connection remains unacknowledged.  This
    inhibition is to be unconditional; no timers, tests for size of
    data received, or other conditions are required.

And in RFC-1122:

     4.2.3.4  When to Send Data

        A TCP MUST include a SWS avoidance algorithm in the sender.

        A TCP SHOULD implement the Nagle Algorithm [TCP:9] to
        coalesce short segments.  However, there MUST be a way for
        an application to disable the Nagle algorithm on an
        individual connection.  In all cases, sending data is also
        subject to the limitation imposed by the Slow Start
        algorithm (Section 4.2.2.15).

            The Nagle algorithm is generally as follows:

                If there is unacknowledged data (i.e., SND.NXT >
                SND.UNA), then the sending TCP buffers all user
                data (regardless of the PSH bit), until the
                outstanding data has been acknowledged or until
                the TCP can send a full-sized segment (Eff.snd.MSS
                bytes; see Section 4.2.2.6).

                                ----

> From: jt@mentat.com (Jerry Toporek)
> > Fri Feb 28 22:00:25 1997 - e0 recv:
> > Ether: len 86 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
> > IP: len 71 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
> > TCP: 110->1024 Seq x8c84b492 Ack xcae702c ACK PSH Wnd 17520 Data 31
> ...
> > *** No separate Ack here, but we see a PSH (above) on a short data
> >     packet, inexplicably followed by a full data packet (below).
> >     That PSH proves that the buffer was idle.  AIMS must call OT
> >     separately, but no Nagle algorithm!
> >
> > Fri Feb 28 22:00:25 1997 - e0 recv:
> > Ether: len 1514 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
> > IP: len 1500 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
> > TCP: 110->1024 Seq x8c84b4b1 Ack xcae702c ACK Wnd 17520 Data 1460
>
> The first segment is short, but the PSH bit indicates that it is all we
> have seen.  There is no un-ACKed data, so out it goes.  The second segment
> is a full MSS.  You seem to have misinterpreted the Nagle algorithm.  Please
> go read it again.
>
It's been a long time since I looked at Nagle's RFC-896, but it solved a
pretty serious problem when I worked on my first router implementation
back in '86-87, and I've been beating on vendors to implement Nagle ever
since!  I think that I understand it rather well!!!

I apologize that my comment is insufficiently clear.  See instead the
next section that appears to be the end of this OT send (labelled #3),
where #1 and #2 remain unacknowledged, yet a short MSS is sent:

  *** data packet #3

  Fri Feb 28 22:00:25 1997 - e0 recv:
  Ether: len 354 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
  IP: len 340 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
  TCP: 110->1024 Seq x8c84ba65 Ack xcae702c ACK PSH Wnd 17520 Data 300

Please remember that you have already admitted (and I already knew) that
you have no API for the application to add a PSH.  Therefore, that is
merely the end of a particular OT send.

Anyway, RFC-1122 amends RFC-896: "regardless of the PSH bit".

This is immediately followed by a new OT send of another full MSS that
could have been combined with the previous 300 bytes:

  *** data packet #4

  Fri Feb 28 22:00:25 1997 - e0 recv:
  Ether: len 1514 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
  IP: len 1500 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
  TCP: 110->1024 Seq x8c84bb91 Ack xcae702c ACK Wnd 17520 Data 1460

                                ----

Actually, your argument is disproved in the next trace section:

  *** data packet #1

  Fri Feb 28 22:00:26 1997 - e0 recv:
  Ether: len 86 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
  IP: len 71 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
  TCP: 110->1024 Seq x8c84cf6e Ack xcae703c ACK PSH Wnd 17520 Data 31

  *** data packet #2, no Nagle

  Fri Feb 28 22:00:26 1997 - e0 recv:
  Ether: len 1444 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
  IP: len 1430 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
  TCP: 110->1024 Seq x8c84cf8d Ack xcae703c ACK Wnd 17520 Data 1390

  *** data packet #3, short MSS

  Fri Feb 28 22:00:26 1997 - e0 recv:
  Ether: len 904 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
  IP: len 889 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
  TCP: 110->1024 Seq x8c84d4fb Ack xcae703c ACK PSH Wnd 17520 Data 849

There _is_ outstanding unacknowledged data (#1), and the next segment is
_not_ a full MSS (none of them are).  This happens repeatedly throughout
the trace.

I stand by my assertion that you have not correctly implemented Nagle.

                                ----

Finally, just to hit my point home, what is your required API mechanism
to turn off Nagle?

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 10:06:19 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA25869 for tcp-impl-list; Tue, 25 Mar 1997 10:04:18 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA25856 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 10:04:14 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA14877 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 10:04:11 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm036-06.dialip.mich.net [141.211.7.48]) by merit.edu (8.8.5/merit-2.0) with SMTP id NAA21363 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 13:00:23 -0500 (EST)
Date: Tue, 25 Mar 97 17:35:57 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5722.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: OT 1.1.2 trace -- delayed Ack
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Scott Dawson <sdawson@eecs.umich.edu>
>    1.  The implementation doesn't do delayed ACKs, or
>    2.  The implementation uses delayed ACKs, but violates the spec
>        which states that the delay should be .5 sec or less.
>
> The ACK is indeed delayed, but since it was delayed by less than 1
> second, you read the delay as 0.
>
I made no reading or prediction of the size of the delay, or whether a
delay occured.  I offered an explanation of the two reasons that these
particular packets could be seen.  I make the judgment, based on actual
data, that the delay (if present) is "too short".  That is a relative,
not an absolute, judgment based on clear repeatable data (dozens of
times in this trace).


> > Can you provide a trace with a better time resolution?
>
> Can you?  It seems like you are the one saying that there's a problem.

What a silly question.  Did you read any of the substance of my postings
or just skim?  The answer is: I cannot.


> The burden of proof ought to be on you to try to produce evidence of
> the problem.  It's not fair for you to say, "X's implementation is
> broken unless someone from X proves that it's not."
>
The burden of proof has already been established, that on 2 unloaded
machines, the delay is not long enough, since the packets _do_ occur.

Meanwhile, the issue for this WG is what the minimum SHOULD be, and what
good implementations SHOULD do.

I was asked to provide traces.  The traces are sufficient to show the
packets occurring.

If you need prettier traces, please supply them.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 10:34:24 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA04150 for tcp-impl-list; Tue, 25 Mar 1997 10:32:49 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA04137 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 10:32:46 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA22390 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 10:32:45 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id KAA05215; Tue, 25 Mar 1997 10:22:47 -0800 (PST)
Message-Id: <199703251822.KAA05215@daffy.ee.lbl.gov>
To: "William Allen Simpson" <wsimpson@greendragon.com>
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: PSH
In-reply-to: Your message of Tue, 25 Mar 1997 13:51:52 PST.
Date: Tue, 25 Mar 1997 10:22:47 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> If you want to express umbrage and bandy legalisms, make
> sure it's backed up by your lawyer.

How about: make sure it's done by private email.

I would make a general plea to temper the tone as expressed in some of
the recent discussions.  From my perspective, all it does is cloud the
underlying issues.

> This is a documented case where the lack of a PSH (whether or not the
> sender TCP implementation is also faulty) caused additional network
> packet load and concomitant additional retransmissions.

I agree that a mechanism to explicitly set PSH might streamline network
traffic.  However, API changes are out of our scope.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 11:31:34 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA12006 for tcp-impl-list; Tue, 25 Mar 1997 09:10:57 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA11995 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 09:10:51 -0800
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id JAA00781 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 09:10:45 -0800
Received: from feller.mentat.com ([192.88.122.144]) by mentat.com (4.1/SMI-4.1)
	id AA06195; Tue, 25 Mar 97 09:06:52 PST
Received: by feller.mentat.com (SMI-8.6/SMI-SVR4)
	id JAA20095; Tue, 25 Mar 1997 09:07:23 -0800
Date: Tue, 25 Mar 1997 09:07:23 -0800
From: jt@mentat.com (Jerry Toporek)
Message-Id: <199703251707.JAA20095@feller.mentat.com>
To: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
Subject: Re: OT 1.1.2 trace -- delayed Ack
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Bill:

The easiest way to discover whether or not an implementation uses delayed
ACKs is to exchange one byte messages between applications.  If you see
two packets per exchange, then ACKs are being deferred.  If you see four
packets per exchange, then either ACKs are not being deferred, or possibly
the deferred ACK interval is so small that the applications can not turn
around a single byte in the interval.  Had you tried this, you would have seen
that we do indeed defer ACKs.  When you look at a single data packet that
follows an ACK, all you have "discovered" is that the application got the data
out too late to piggyback on the ACK.  This can obviously happen regardless
of the deferred ACK interval.

The default deferred ACK interval in our implementation is 50ms.  The value is
tunable by vendors, and, in most cases, by the end user.  Where 50ms is less
than a tick (which is not the case in any of our production versions), you
are correct that the default should be larger.  In the case of the Mac, I
believe that a tick is 10ms, but I know that it is substantially smaller than
50ms.  Apple does not provide the ability for tuning by end-users in OT 1.1.1.

I'm more than happy to listen to comments from the group on what the default
deferred ACK interval ought to be.

jt


From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 11:47:32 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA17356 for tcp-impl-list; Tue, 25 Mar 1997 09:36:11 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA17313 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 09:36:09 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA07025 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 09:36:04 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm035-24.dialip.mich.net [141.211.7.35]) by merit.edu (8.8.5/merit-2.0) with SMTP id MAA20835 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 12:32:10 -0500 (EST)
Date: Tue, 25 Mar 97 16:19:59 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5721.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: OT 1.1.2 trace -- Nagle
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

The use of this implementation detail is already covered verbosely
in RFC-896 (January 1984):

    "The solution is to inhibit the sending of new TCP segments when
    new outgoing data arrives from the user if any previously
    transmitted data on the connection remains unacknowledged.  This
    inhibition is to be unconditional; no timers, tests for size of
    data received, or other conditions are required.

And in RFC-1122:

     4.2.3.4  When to Send Data

        A TCP MUST include a SWS avoidance algorithm in the sender.

        A TCP SHOULD implement the Nagle Algorithm [TCP:9] to
        coalesce short segments.  However, there MUST be a way for
        an application to disable the Nagle algorithm on an
        individual connection.  In all cases, sending data is also
        subject to the limitation imposed by the Slow Start
        algorithm (Section 4.2.2.15).

            The Nagle algorithm is generally as follows:

                If there is unacknowledged data (i.e., SND.NXT >
                SND.UNA), then the sending TCP buffers all user
                data (regardless of the PSH bit), until the
                outstanding data has been acknowledged or until
                the TCP can send a full-sized segment (Eff.snd.MSS
                bytes; see Section 4.2.2.6).

                                ----

> From: jt@mentat.com (Jerry Toporek)
> > Fri Feb 28 22:00:25 1997 - e0 recv:
> > Ether: len 86 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
> > IP: len 71 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
> > TCP: 110->1024 Seq x8c84b492 Ack xcae702c ACK PSH Wnd 17520 Data 31
> ...
> > *** No separate Ack here, but we see a PSH (above) on a short data
> >     packet, inexplicably followed by a full data packet (below).
> >     That PSH proves that the buffer was idle.  AIMS must call OT
> >     separately, but no Nagle algorithm!
> >
> > Fri Feb 28 22:00:25 1997 - e0 recv:
> > Ether: len 1514 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
> > IP: len 1500 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
> > TCP: 110->1024 Seq x8c84b4b1 Ack xcae702c ACK Wnd 17520 Data 1460
>
> The first segment is short, but the PSH bit indicates that it is all we
> have seen.  There is no un-ACKed data, so out it goes.  The second segment
> is a full MSS.  You seem to have misinterpreted the Nagle algorithm.  Please
> go read it again.
>
It's been a long time since I looked at Nagle's RFC-896, but it solved a
pretty serious problem when I worked on my first router implementation
back in '86-87, and I've been beating on vendors to implement Nagle ever
since!  I think that I understand it rather well!!!

I apologize that my comment is insufficiently clear.  See instead the
next section that appears to be the end of this OT send (labelled #3),
where #1 and #2 remain unacknowledged, yet a short MSS is sent:

  *** data packet #3

  Fri Feb 28 22:00:25 1997 - e0 recv:
  Ether: len 354 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
  IP: len 340 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
  TCP: 110->1024 Seq x8c84ba65 Ack xcae702c ACK PSH Wnd 17520 Data 300

Please remember that you have already admitted (and I already knew) that
you have no API for the application to add a PSH.  Therefore, that is
merely the end of a particular OT send.

Anyway, RFC-1122 amends RFC-896: "regardless of the PSH bit".

This is immediately followed by a new OT send of another full MSS that
could have been combined with the previous 300 bytes:

  *** data packet #4

  Fri Feb 28 22:00:25 1997 - e0 recv:
  Ether: len 1514 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
  IP: len 1500 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
  TCP: 110->1024 Seq x8c84bb91 Ack xcae702c ACK Wnd 17520 Data 1460

                                ----

Actually, your argument is disproved in the next trace section:

  *** data packet #1

  Fri Feb 28 22:00:26 1997 - e0 recv:
  Ether: len 86 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
  IP: len 71 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
  TCP: 110->1024 Seq x8c84cf6e Ack xcae703c ACK PSH Wnd 17520 Data 31

  *** data packet #2, no Nagle

  Fri Feb 28 22:00:26 1997 - e0 recv:
  Ether: len 1444 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
  IP: len 1430 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
  TCP: 110->1024 Seq x8c84cf8d Ack xcae703c ACK Wnd 17520 Data 1390

  *** data packet #3, short MSS

  Fri Feb 28 22:00:26 1997 - e0 recv:
  Ether: len 904 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
  IP: len 889 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
  TCP: 110->1024 Seq x8c84d4fb Ack xcae703c ACK PSH Wnd 17520 Data 849

There _is_ outstanding unacknowledged data (#1), and the next segment is
_not_ a full MSS (none of them are).  This happens repeatedly throughout
the trace.

I stand by my assertion that you have not correctly implemented Nagle.

                                ----

Finally, just to hit my point home, what is your required API mechanism
to turn off Nagle?

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 12:58:05 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA09481 for tcp-impl-list; Tue, 25 Mar 1997 12:51:17 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA09353 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 12:51:09 -0800
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA27649 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 12:50:52 -0800
Received: from feller.mentat.com ([192.88.122.144]) by mentat.com (4.1/SMI-4.1)
	id AA07710; Tue, 25 Mar 97 12:46:53 PST
Received: by feller.mentat.com (SMI-8.6/SMI-SVR4)
	id MAA20318; Tue, 25 Mar 1997 12:47:25 -0800
Date: Tue, 25 Mar 1997 12:47:25 -0800
From: jt@mentat.com (Jerry Toporek)
Message-Id: <199703252047.MAA20318@feller.mentat.com>
To: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
Subject: Re: OT 1.1.2 trace -- Nagle
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 
> I apologize that my comment is insufficiently clear.  See instead the
> next section that appears to be the end of this OT send (labelled #3),
> where #1 and #2 remain unacknowledged, yet a short MSS is sent:

Fine...  Your statement was that a short packet was "inexplicably followed
by a full data packet" and your conclusion was "no Nagle algorithm!".  My
apologies for not reading beyond this.  I'm sure this was not what you
intended to say...

Your methodology for determining whether a TCP has implemented the Nagle
algorithm is again suboptimal.  Run a simple application that does one byte
writes.  If the TCP does not implement Nagle, you will see a series of packets
with one byte in each.  If it does implement Nagle, then you will see the
first byte go out, and then nothing until an ACK comes back, at which time
you will see one or more aggregated packets come out.  If you do this with
our implementation, you will see that do implement the Nagle algorithm.  You
will still want answers to your questions about your trace.  You don't need
to remind me.  I will post a discussion.  The bottom line will be that you
will claim that we have not implemented the algorithm "correctly", and I will
claim that the intent of the algorithm has not been violated.  I may very well
change some code as a result of the argument we haven't yet had ;-)

 
> Finally, just to hit my point home, what is your required API mechanism
> to turn off Nagle?

Same as most...  Turn on TCP_NODELAY.  Your one-byte writes will all appear
in single packets.  How does that relate to your point?

jt


From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 12:58:09 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA09573 for tcp-impl-list; Tue, 25 Mar 1997 12:51:40 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA09556 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 12:51:37 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA27958 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 12:51:36 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm012-17.dialip.mich.net [141.211.7.185]) by merit.edu (8.8.5/merit-2.0) with SMTP id PAA28426 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 15:47:47 -0500 (EST)
Date: Tue, 25 Mar 97 20:11:27 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5723.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: OT 1.1.2 trace -- delayed Ack
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: jt@mentat.com (Jerry Toporek)
> The easiest way to discover whether or not an implementation uses delayed
> ACKs is to exchange one byte messages between applications.  ...
>
> The default deferred ACK interval in our implementation is 50ms.  The value is
> tunable by vendors, and, in most cases, by the end user.  Where 50ms is less
> than a tick (which is not the case in any of our production versions), you
> are correct that the default should be larger.  In the case of the Mac, I
> believe that a tick is 10ms, but I know that it is substantially smaller than
> 50ms.  Apple does not provide the ability for tuning by end-users in OT 1.1.1.
>
Yes, the Mac tick is 1/60 second, about 17 ms.  50 ms should be plenty
of time for a single application on a dedicated machine to turn around.
Perhaps Apple lowered the limit to "improve" the code?

Anyway, I didn't set out to test for delayed Ack.  I just was trying to
find out why my 600 user ISP (watervalley.net) went into congestive
collapse when I installed MacOS 7.6 on the POP3 mail server.  The reason
is the number of packets and retransmissions went through the roof.

The application not turning around fast enough to hit the delayed Ack is
only one of the problems I found.  But it does add quite a bit to the
overhead.

RFC-813 talks about making the delayed Ack timeout dynamically tunable.
Mine is srtt / 2 + mdev, with a min of 100 ms and max of 500 ms.
Doesn't anyone other than me actually implement a stack that tunes the
Ack timeout?

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 13:44:44 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA20369 for tcp-impl-list; Tue, 25 Mar 1997 13:42:34 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA20358 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 13:42:31 -0800
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id NAA10482 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 13:42:29 -0800
Received: from ftp.com by ftp.com  ; Tue, 25 Mar 1997 16:38:42 -0500
Received: from mailserv-2high.ftp.com by ftp.com  ; Tue, 25 Mar 1997 16:38:42 -0500
Received: by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id QAA28771; Tue, 25 Mar 1997 16:35:52 -0500
Date: Tue, 25 Mar 1997 16:35:52 -0500
Message-Id: <199703252135.QAA28771@MAILSERV-2HIGH.FTP.COM>
To: jt@mentat.com
Subject: Re: OT 1.1.2 trace -- delayed Ack
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
Repository: mailserv-2high.ftp.com, [message accepted at Tue Mar 25 16:35:45 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||The default deferred ACK interval in our implementation is 50ms.  The value is
||tunable by vendors, and, in most cases, by the end user.  Where 50ms is less
||than a tick (which is not the case in any of our production versions), you
||are correct that the default should be larger.  In the case of the Mac, I
||believe that a tick is 10ms, but I know that it is substantially smaller than
||50ms.  Apple does not provide the ability for tuning by end-users in OT 1.1.1.
||
||I'm more than happy to listen to comments from the group on what the default
||deferred ACK interval ought to be.
||
we use 55 msec. in our old stack and 200 msec. in our newer one.
Gievn the speed of modern CPU's, the 200 msec. is way too long
and the 55 msec. delayed ack is also too long.

Even slow, slow, slow PC clients in the old 286..386 days could turn 
100 app->stack io's per second.  Seems to me that dinosaurs like the
Bsd 200 msec. fast timer and the PC 55 Msec clock should be left behind
and we should go to a much smaller delayed ACK.

I'll argue for 10 msec as a stake in the ground.

And am preparing to be buried by the response...


L>


From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 14:10:38 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA25888 for tcp-impl-list; Tue, 25 Mar 1997 14:08:16 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA25873 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 14:08:10 -0800
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id OAA17490 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 14:08:01 -0800
Received: from rock.mentat.com ([192.88.122.136]) by mentat.com (4.1/SMI-4.1)
	id AA08249; Tue, 25 Mar 97 14:04:02 PST
Date: Tue, 25 Mar 97 14:04:02 PST
From: jt@mentat.com (Jerry Toporek)
Message-Id: <9703252204.AA08249@mentat.com>
To: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
Subject: Re: OT 1.1.2 trace -- delayed Ack
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Yes, the Mac tick is 1/60 second, about 17 ms.  50 ms should be plenty
> of time for a single application on a dedicated machine to turn around.
> Perhaps Apple lowered the limit to "improve" the code?

No, I'm pretty sure that it is at 50ms for OT 1.1.1 and 1.1.2.  I don't know
what kind of Mac tick is at 1/60 sec.  As far as OT TCP is concerned, ticks
are 1ms.  I know for sure that timing is accurate on PPC Macs.  I don't
have a 68K Mac that I can easily check, but I am not aware of any timing
problems.

> 
> Anyway, I didn't set out to test for delayed Ack.  I just was trying to
> find out why my 600 user ISP (watervalley.net) went into congestive
> collapse when I installed MacOS 7.6 on the POP3 mail server.  The reason
> is the number of packets and retransmissions went through the roof.

This is almost surely due to the RTT problems.  Given the excellent reports
of improvement coming from the few sites that have received fixes...  I'm
on real thin ice discussing customer product plans, so I can't.  I do hope
you get something soon.

> 
> RFC-813 talks about making the delayed Ack timeout dynamically tunable.
> Mine is srtt / 2 + mdev, with a min of 100 ms and max of 500 ms.
> Doesn't anyone other than me actually implement a stack that tunes the
> Ack timeout?

I have to admit that I missed this...  But I like it.  On the ride into
work I was thinking about how the fixed deferred ACK interval screws up
best case performance.  Sooner or later, someone is going to twist my arm
to get me to justify the fact that we "clearly violate the spec" by blatantly
not ACKing every other packet...  That will be fun...

jt


From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 14:46:35 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA06111 for tcp-impl-list; Tue, 25 Mar 1997 14:44:46 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA06105 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 14:44:44 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA26360 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 14:44:42 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id OAA06113; Tue, 25 Mar 1997 14:34:27 -0800 (PST)
Message-Id: <199703252234.OAA06113@daffy.ee.lbl.gov>
To: backman@ftp.com
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: OT 1.1.2 trace -- delayed Ack
In-reply-to: Your message of Tue, 25 Mar 1997 16:35:52 PST.
Date: Tue, 25 Mar 1997 14:34:27 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Even slow, slow, slow PC clients in the old 286..386 days could turn 
> 100 app->stack io's per second.

But the delayed-ack timer is not in lieu of acking-every-other.  So these
will (should) generate the necessary acks, and the delay won't come into play.

Also, the smaller the delay, the more likely on slow links that it will
rule out any possibility of acking more than one packet at a time.  For
example, for 512 byte packets and a 50 msec timer, it takes ~64 msec for
each packet to cross a 64 kbps link.  So a 50 msec timer winds up acking
each packet separately, rather than coalescing.  A 10 msec timer is just
barely long enough to coalesce the ack for back-to-back 1460 byte packets
arriving over a T1.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 16:36:53 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA06111 for tcp-impl-list; Tue, 25 Mar 1997 14:44:46 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA06105 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 14:44:44 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA26360 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 14:44:42 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id OAA06113; Tue, 25 Mar 1997 14:34:27 -0800 (PST)
Message-Id: <199703252234.OAA06113@daffy.ee.lbl.gov>
To: backman@ftp.com
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: OT 1.1.2 trace -- delayed Ack
In-reply-to: Your message of Tue, 25 Mar 1997 16:35:52 PST.
Date: Tue, 25 Mar 1997 14:34:27 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Even slow, slow, slow PC clients in the old 286..386 days could turn 
> 100 app->stack io's per second.

But the delayed-ack timer is not in lieu of acking-every-other.  So these
will (should) generate the necessary acks, and the delay won't come into play.

Also, the smaller the delay, the more likely on slow links that it will
rule out any possibility of acking more than one packet at a time.  For
example, for 512 byte packets and a 50 msec timer, it takes ~64 msec for
each packet to cross a 64 kbps link.  So a 50 msec timer winds up acking
each packet separately, rather than coalescing.  A 10 msec timer is just
barely long enough to coalesce the ack for back-to-back 1460 byte packets
arriving over a T1.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 16:48:28 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA03211 for tcp-impl-list; Tue, 25 Mar 1997 16:39:03 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA03170 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 16:38:52 -0800
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id QAA25462 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 16:38:49 -0800
Received: from ftp.com by ftp.com  ; Tue, 25 Mar 1997 19:35:03 -0500
Received: from mailserv-2high.ftp.com by ftp.com  ; Tue, 25 Mar 1997 19:35:03 -0500
Received: by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id TAA11232; Tue, 25 Mar 1997 19:32:14 -0500
Date: Tue, 25 Mar 1997 19:32:14 -0500
Message-Id: <199703260032.TAA11232@MAILSERV-2HIGH.FTP.COM>
To: vern@ee.lbl.gov
Subject: Re: OT 1.1.2 trace -- delayed Ack
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: tcp-impl@relay.engr.SGI.COM
Repository: mailserv-2high.ftp.com, [message accepted at Tue Mar 25 19:32:11 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||But the delayed-ack timer is not in lieu of acking-every-other.  So these
||will (should) generate the necessary acks, and the delay won't come into play.
||
||Also, the smaller the delay, the more likely on slow links that it will
||rule out any possibility of acking more than one packet at a time.  For
||example, for 512 byte packets and a 50 msec timer, it takes ~64 msec for
||each packet to cross a 64 kbps link.  So a 50 msec timer winds up acking
||each packet separately, rather than coalescing.  A 10 msec timer is just
||barely long enough to coalesce the ack for back-to-back 1460 byte packets
||arriving over a T1.
||
thanks for interjecting reality into my rather agressive balloon.
However - you bring up an interesting point w. your analysis of 64K vs. T1.

Much of the timing issues here are related to RTT as opposed to a fixed
timer.  Should delayed ACK timers be based on a RTT-multiplier instead
of a constant?   My 10 msec. value is perfectly reasonable for a 10Mbit
ethernet, but horrible for a serial link.

L>

L.



From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 25 16:48:28 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA03210 for tcp-impl-list; Tue, 25 Mar 1997 16:39:03 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA03137 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 16:38:42 -0800
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id QAA25429 for <tcp-impl@relay.engr.SGI.COM>; Tue, 25 Mar 1997 16:38:40 -0800
Received: from ftp.com by ftp.com  ; Tue, 25 Mar 1997 19:34:53 -0500
Received: from mailserv-2high.ftp.com by ftp.com  ; Tue, 25 Mar 1997 19:34:53 -0500
Received: by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id TAA11220; Tue, 25 Mar 1997 19:32:04 -0500
Date: Tue, 25 Mar 1997 19:32:04 -0500
Message-Id: <199703260032.TAA11220@MAILSERV-2HIGH.FTP.COM>
To: wsimpson@greendragon.com
Subject: Re: OT 1.1.2 trace -- delayed Ack
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: tcp-impl@relay.engr.SGI.COM
Repository: mailserv-2high.ftp.com, [message accepted at Tue Mar 25 19:32:03 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||The application not turning around fast enough to hit the delayed Ack is
||only one of the problems I found.  But it does add quite a bit to the
||overhead.
||
||RFC-813 talks about making the delayed Ack timeout dynamically tunable.
||Mine is srtt / 2 + mdev, with a min of 100 ms and max of 500 ms.
||Doesn't anyone other than me actually implement a stack that tunes the
||Ack timeout?
||
yes, no one but us geeks wanted it though :-). 

Seriously we put more knobs and dials on PCTCP/Onnet than anyone but a
TCP expert could understand.  

Another interesting  discussion we had but never implemented were how to
create heuristics that could sense an environment (slow serial/fast LAN,
single client connect/multiple server listen) and tune those stack
knobs for the user.

L.



From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 26 04:37:24 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id EAA27015 for tcp-impl-list; Wed, 26 Mar 1997 04:35:50 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id EAA27008 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 04:35:46 -0800
Received: from regin.dna.lth.se (regin.dna.lth.se [130.235.16.100]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id EAA11224 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 04:35:21 -0800
Received: from regin.dna.lth.se (localhost [127.0.0.1])
	by regin.dna.lth.se (8.8.5/8.8.5) with ESMTP id MAA04665;
	Wed, 26 Mar 1997 12:53:46 +0100
Message-Id: <199703261153.MAA04665@regin.dna.lth.se>
To: backman@ftp.com
cc: Eric.Schenk@dna.lth.se, wsimpson@greendragon.com,
        tcp-impl@relay.engr.SGI.COM
From: Eric.Schenk@dna.lth.se
Subject: Re: OT 1.1.2 trace -- delayed Ack 
In-reply-to: Your message of "Tue, 25 Mar 1997 19:32:04 EST."
             <199703260032.TAA11220@MAILSERV-2HIGH.FTP.COM> 
Date: Wed, 26 Mar 1997 12:53:45 +0100
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Larry Backman <backman@ftp.com> writes:
>
>||The application not turning around fast enough to hit the delayed Ack is
>||only one of the problems I found.  But it does add quite a bit to the
>||overhead.
>||
>||RFC-813 talks about making the delayed Ack timeout dynamically tunable.
>||Mine is srtt / 2 + mdev, with a min of 100 ms and max of 500 ms.
>||Doesn't anyone other than me actually implement a stack that tunes the
>||Ack timeout?
>||
>yes, no one but us geeks wanted it though :-). 
>
>Seriously we put more knobs and dials on PCTCP/Onnet than anyone but a
>TCP expert could understand.  
>
>Another interesting  discussion we had but never implemented were how to
>create heuristics that could sense an environment (slow serial/fast LAN,
>single client connect/multiple server listen) and tune those stack
>knobs for the user.

We've had adaptive delayed ACK timeouts in the Linux code since late
in the 1.3.X development cycle. The calculation is based on the
expected interarrival time between packets. It adapts in the same
way as the RTO timers, but is calculated independently. This is
necessary because a pure reciever will not have any RTO measurements
(it never sends any data packets).

-- 
Eric Schenk                               www: http://www.dna.lth.se/~erics
Dept. of Comp. Sci., Lund University          email: Eric.Schenk@dna.lth.se
Box 118, S-221 00 LUND, Sweden   fax: +46-46 13 10 21  ph: +46-46 222 96 38

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 26 05:57:01 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id FAA02729 for tcp-impl-list; Wed, 26 Mar 1997 05:55:18 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA02721 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 05:55:15 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id FAA22061 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 05:55:12 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm035-24.dialip.mich.net [141.211.7.35]) by merit.edu (8.8.5/merit-2.0) with SMTP id IAA07586 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 08:51:20 -0500 (EST)
Date: Wed, 26 Mar 97 02:12:11 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5724.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: OT 1.1.2 trace -- delayed Ack
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: backman@ftp.com (Larry Backman)
> ||Mine is srtt / 2 + mdev, with a min of 100 ms and max of 500 ms.
> ||Doesn't anyone other than me actually implement a stack that tunes the
> ||Ack timeout?
> ||
> yes, no one but us geeks wanted it though :-).
>
> Seriously we put more knobs and dials on PCTCP/Onnet than anyone but a
> TCP expert could understand.
>
Oh, I don't let a _user_ set it!  It's just done automatically on a per
connection basis.  I'm a firm believer in automation.

Besides, one of my implementations was Qualcomm's cellular phones (with
Karn), and there was no place for any user knobs.... :-)


> Another interesting  discussion we had but never implemented were how to
> create heuristics that could sense an environment (slow serial/fast LAN,
> single client connect/multiple server listen) and tune those stack
> knobs for the user.
>
It's not too hard, really.  The interface speed is generally available,
the number of current TCBs, etc.

Of course, you folks didn't design the packet driver interface to
return the speed, so that has been a problem sometimes....

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 26 07:46:08 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA13910 for tcp-impl-list; Wed, 26 Mar 1997 07:43:43 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA13902 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 07:43:40 -0800
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA10491 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 07:43:35 -0800
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id JAA25053; Wed, 26 Mar 1997 09:47:52 -0500 (EST)
Message-Id: <199703261447.JAA25053@grinch.eecs.umich.edu>
To: "William Allen Simpson" <wsimpson@greendragon.com>
Cc: tcp-impl@relay.engr.SGI.COM, sdawson@eecs.umich.edu
Subject: Re: OT 1.1.2 trace -- delayed Ack
References: <5722.wsimpson@greendragon.com>
From: Scott Dawson <sdawson@eecs.umich.edu>
In-Reply-To: "William Allen Simpson"'s message of Tue, 25 Mar 97 17:35:57 GMT
Lines: 14
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Wed, 26 Mar 1997 09:47:51 -0500
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> What a silly question.  Did you read any of the substance of my postings
> or just skim?  The answer is: I cannot.

I did.  I figured maybe you could reproduce the behavior after hacking
the code as you mentioned in another message:

> The platform won't give better resolution than 55 milliseconds.  Not
> stellar.  I'll try to hack up the code to give that much for future
> traces, but I doubt that it would resolve this question....

so that we could get a better idea of delays we're dealing with here.

-Scott

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 26 12:16:58 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA16206 for tcp-impl-list; Wed, 26 Mar 1997 12:15:12 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA16196 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 12:15:10 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA21712 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 12:15:08 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id MAA08872; Wed, 26 Mar 1997 12:05:06 -0800 (PST)
Message-Id: <199703262005.MAA08872@daffy.ee.lbl.gov>
To: backman@ftp.com
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: OT 1.1.2 trace -- delayed Ack
In-reply-to: Your message of Tue, 25 Mar 1997 19:32:14 PST.
Date: Wed, 26 Mar 1997 12:05:06 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Much of the timing issues here are related to RTT as opposed to a fixed
> timer.  Should delayed ACK timers be based on a RTT-multiplier instead
> of a constant?

Since the goal is to coalesce acks, the key parameter of interest is
the interarrival time between data packets, as Eric Schenk mentioned
in previous mail.  So that argues it's orthogonal to RTT.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 26 12:43:15 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA20664 for tcp-impl-list; Wed, 26 Mar 1997 12:40:31 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA20648 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 12:40:27 -0800
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA27956 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 12:40:22 -0800
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.7.5/8.7.3) with ESMTP id PAA05404; Wed, 26 Mar 1997 15:24:52 -0500 (EST)
Message-Id: <199703262024.PAA05404@brookfield.ans.net>
To: backman@ftp.com
cc: jt@mentat.com, tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
Reply-To: curtis@ans.net
Subject: Re: OT 1.1.2 trace -- delayed Ack 
In-reply-to: Your message of "Tue, 25 Mar 1997 16:35:52 EST."
             <199703252135.QAA28771@MAILSERV-2HIGH.FTP.COM> 
Date: Wed, 26 Mar 1997 15:24:52 -0500
From: Curtis Villamizar <curtis@ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <199703252135.QAA28771@MAILSERV-2HIGH.FTP.COM>, Larry Backman writes
:
> 
> ||The default deferred ACK interval in our implementation is 50ms.  The value
>  is
> ||tunable by vendors, and, in most cases, by the end user.  Where 50ms is les
> s
> ||than a tick (which is not the case in any of our production versions), you
> ||are correct that the default should be larger.  In the case of the Mac, I
> ||believe that a tick is 10ms, but I know that it is substantially smaller th
> an
> ||50ms.  Apple does not provide the ability for tuning by end-users in OT 1.1
> .1.
> ||
> ||I'm more than happy to listen to comments from the group on what the defaul
> t
> ||deferred ACK interval ought to be.
> ||
> we use 55 msec. in our old stack and 200 msec. in our newer one.
> Gievn the speed of modern CPU's, the 200 msec. is way too long
> and the 55 msec. delayed ack is also too long.
> 
> Even slow, slow, slow PC clients in the old 286..386 days could turn 
> 100 app->stack io's per second.  Seems to me that dinosaurs like the
> Bsd 200 msec. fast timer and the PC 55 Msec clock should be left behind
> and we should go to a much smaller delayed ACK.
> 
> I'll argue for 10 msec as a stake in the ground.
> 
> And am preparing to be buried by the response...
> 
> L>
> 


If the RTT is much larger than 10 msec, there is no delayed ACK in
that case.  RTT in the US (coast to coast) is roughly 70 msec with no
queueing delay.  If there is congestion, it is often longer.
Satellite links can have very long RTTs (250msec).

Delayed ACK is useful in bulk trasfer.  It is a problem when the
window is 1.  Delayed ACK is always disabled after idle and the link
is always considered initially idle.  10 msec is probably too short
for a WAN.

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 26 13:00:31 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA23856 for tcp-impl-list; Wed, 26 Mar 1997 12:58:11 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA23826 for <tcp-impl@relay.engr.sgi.com>; Wed, 26 Mar 1997 12:58:09 -0800
Received: from databus.databus.com (databus.databus.com [198.186.154.34]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA02065 for <tcp-impl@relay.engr.sgi.com>; Wed, 26 Mar 1997 12:57:43 -0800
From: Barney Wolff <barney@databus.com>
To: tcp-impl@relay.engr.sgi.com
Date: Wed, 26 Mar 1997 15:48 EST
Subject: Re: delayed Ack
Content-Length: 277
Content-Type: text/plain
Message-ID: <33398ce00.6100@databus.databus.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Could some kind soul educate me (or provide a pointer) on how one avoids
a pathological interaction between the delayed-ack logic and the Nagle
algorithm on the other side?  Or does one just accept the extra latency
as unavoidable?

Thanks,

Barney Wolff  <barney@databus.com>

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 26 13:03:05 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA24530 for tcp-impl-list; Wed, 26 Mar 1997 13:01:32 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA24525 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 13:01:29 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA02929 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 13:01:24 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id UAA11747; Wed, 26 Mar 1997 20:58:06 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0w9eGL-0005FcC; Tue, 25 Mar 97 22:00 GMT
Message-Id: <m0w9eGL-0005FcC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: OT 1.1.2 trace
To: wsimpson@greendragon.com (William Allen Simpson)
Date: Tue, 25 Mar 1997 22:00:13 +0000 (GMT)
Cc: tcp-impl@relay.engr.SGI.COM
In-Reply-To: <5719.wsimpson@greendragon.com> from "William Allen Simpson" at Mar 25, 97 02:21:53 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> The platform won't give better resolution than 55 milliseconds.  Not
> stellar.  I'll try to hack up the code to give that much for future
> traces, but I doubt that it would resolve this question....

If its a PC it'll give you sub microsecond accuracy if you learn to use
the hardware right. Read the remaining countdown time from the timer chips
and do some maths.

Alan


From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 26 13:14:27 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA26375 for tcp-impl-list; Wed, 26 Mar 1997 13:12:13 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA26369 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 13:12:11 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA05626 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 13:12:09 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm012-18.dialip.mich.net [141.211.7.186]) by merit.edu (8.8.5/merit-2.0) with SMTP id QAA00599 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 16:08:21 -0500 (EST)
Date: Wed, 26 Mar 97 18:42:45 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5732.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: OT 1.1.2 trace -- silly windows
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

One of the items listed in my trace does not make complete sense on
careful examination.  What at first inspection appears to be Silly
Window Syndrome (SWS) doesn't add up exactly.

For inexplicable reasons, OT sends partial MSS packets even when it has
not run out of window.  Yet, it is not due to a PSH.  I conclude this is
another form of SWS.

packet  size

#1 P      31 +              -                -
#2      1460 +              -                -

#3 P     300 +              +                +
#4      1460 +              +                +
               (5840)          (5840)
#5      1460 +  4711        +   3220         +
                ----            ----
               (1129)          (2620)          (5840)
#6      1176                                 +  4396
                                                ----
                                               (1444)
#7 P     960

If I assume that no Acks have been received, then the remaining window
at #6 would be 1129 bytes.  Yet, we received 1176.  So, an Ack must have
been received.

But, the Ack was for #1 and #2 together.  So, 2620 would remain, and we
only received 1176.  2620 - 1176 = 1444, more than enough for the 960 to
complete the message (end indicated by PSH).


> *** data packet #6
>
> Fri Feb 28 22:00:25 1997 - e0 recv:
> Ether: len 1230 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
> IP: len 1216 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
> TCP: 110->1024 Seq x8c84c6f9 Ack xcae702c ACK Wnd 17520 Data 1176
>
> *** data packet #7
>
> Fri Feb 28 22:00:25 1997 - e0 recv:
> Ether: len 1014 00:00:c0:74:36:20->00:80:c7:5b:e8:a8 type IP
> IP: len 1000 206.31.151.21->206.31.151.78 ihl 20 ttl 254 DF prot TCP
> TCP: 110->1024 Seq x8c84cb91 Ack xcae702c ACK PSH Wnd 17520 Data 960
>


#1 P      31
#2      1390    1421, but 1 MSS = 1460

#3 P     849    2270, but 2 MSS = 2920

#4 P       3

Again.  Of course, it may be related to the Nagle failure noted before.
Nagle also is a fix for SWS.

Thus, the windows are "silly", but not for the usual reasons expected;
snd.wnd would be more than enough.  So, perhaps it is an unusual
technique of incrementing cwnd, plus SWS.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 26 13:57:17 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA04972 for tcp-impl-list; Wed, 26 Mar 1997 13:55:35 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA04966 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 13:55:33 -0800
Received: from external.BSDI.COM (external.BSDI.COM [205.230.225.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA16739 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 13:55:30 -0800
Received: from forge.BSDI.COM (dab@forge.BSDI.COM [205.230.224.24]) by external.BSDI.COM (8.8.5/8.8.2) with ESMTP id OAA11512; Wed, 26 Mar 1997 14:54:13 -0700 (MST)
Received: (from dab@localhost) by forge.BSDI.COM (8.8.5/8.7.3) id OAA10261; Wed, 26 Mar 1997 14:54:12 -0700 (MST)
Date: Wed, 26 Mar 1997 14:54:12 -0700 (MST)
From: David Borman <dab@BSDI.COM>
Message-Id: <199703262154.OAA10261@forge.BSDI.COM>
To: barney@databus.com, tcp-impl@relay.engr.SGI.COM
Subject: Re: delayed Ack
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Barney,

> Could some kind soul educate me (or provide a pointer) on how one avoids
> a pathological interaction between the delayed-ack logic and the Nagle
> algorithm on the other side?  Or does one just accept the extra latency
> as unavoidable?

Usually, this is only an issue when one side has 2 or more small
chunks of data to send at the application layer, and the other side
doesn't generate any response until it gets all the pieces.

In a BSD based environment, the key is to not issue multiple system
calls to send a chunk of "atomic" data.  Typically, this means either
getting all the data into a single buffer, or using sendmsg() to send
discontiguous data.  Or taking the drastic measure of turning off the
Nagle algorithm via the TCP_NODELAY socket option.

Actually, this is one area were I've often felt that adding a PUSH
flag to the BSD send*() routines would be benificial.  Then, even
if you have multiple small send() calls, the application could set the
MSG_PUSH flag on the last send(), and get all the data out, without
having to turn off the Nagle algorithm.  Of course, that still means
that you are sending at least 2 TCP packets, when if the application
had coalesced the data into a single send() call, only 1 TCP packet
would have gone out.

			-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 26 15:29:43 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA22170 for tcp-impl-list; Wed, 26 Mar 1997 15:28:07 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA22153 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 15:28:03 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA09822 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 15:27:38 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id XAA15805; Wed, 26 Mar 1997 23:20:58 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wA1Vs-0005FcC; Wed, 26 Mar 97 22:49 GMT
Message-Id: <m0wA1Vs-0005FcC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: OT 1.1.2 trace -- delayed Ack
To: jt@mentat.com (Jerry Toporek)
Date: Wed, 26 Mar 1997 22:49:48 +0000 (GMT)
Cc: tcp-impl@relay.engr.SGI.COM, wsimpson@greendragon.com
In-Reply-To: <9703252204.AA08249@mentat.com> from "Jerry Toporek" at Mar 25, 97 02:04:02 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> what kind of Mac tick is at 1/60 sec.  As far as OT TCP is concerned, ticks
> are 1ms.  I know for sure that timing is accurate on PPC Macs.  I don't
> have a 68K Mac that I can easily check, but I am not aware of any timing
> problems.

Bizzarely enough I've just been taking the innards of a mac apart to port
another OS to it. The timer at boot seems to be loaded for about 1/50th-1/60th
of a second, but the 6522 VIA that controls it can happily do 1ms or whatever
you like if the MacOS is setting it adaptively

Alan


From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 26 16:31:33 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA03115 for tcp-impl-list; Wed, 26 Mar 1997 16:29:32 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA03106 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 16:29:30 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA24076 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 16:29:27 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm002-09.dialip.mich.net [141.211.7.145]) by merit.edu (8.8.5/merit-2.0) with SMTP id TAA09272 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 19:25:39 -0500 (EST)
Date: Wed, 26 Mar 97 23:08:52 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5736.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: OT 1.1.2 trace -- delayed Ack
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Vern Paxson <vern@ee.lbl.gov>
> > Much of the timing issues here are related to RTT as opposed to a fixed
> > timer.  Should delayed ACK timers be based on a RTT-multiplier instead
> > of a constant?
>
> Since the goal is to coalesce acks, the key parameter of interest is
> the interarrival time between data packets, as Eric Schenk mentioned
> in previous mail.  So that argues it's orthogonal to RTT.
>
Actually, 3 key parameters, as already written in RFC-813:

 1) Should not be less than the time the application takes to process
    the data ("holding" or "processing" time).  When RTT dominated by
    processing time (low delay networks), need floor ("minimum value").

    No floor specified, but the example is 200 to 300 milliseconds.  I
    always thought of that as a floor of 200 and a ceiling of 300.

 2) Should not be more than measured 1/2 RTT, as otherwise might cause
    retransmission.  RFC-1122 specifies a ceiling of 500 milliseconds.

 3) Should be related to inter-arrival time of packet _bursts_ (back to
    back packets).  This was for _high_ delay "very slow" links, trying
    to measure how long it takes a packet to transmit down the slowest
    link.  Unfortunately, a packet train over ethernet/FDDI/OC-3 will
    have virtually zero inter-arrival time, while a 1500 byte packet
    at 9600 bps is 1.5 seconds.

    BTW, this was for future research, in the appendix.

As I mentioned, I used "srtt/2 + mdev" quite successfully, with a floor
of 100 ms and a ceiling of 500 ms.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 26 16:31:39 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA03114 for tcp-impl-list; Wed, 26 Mar 1997 16:29:32 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA03104 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 16:29:30 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA24081 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 16:29:28 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm002-09.dialip.mich.net [141.211.7.145]) by merit.edu (8.8.5/merit-2.0) with SMTP id TAA09276 for <tcp-impl@relay.engr.SGI.COM>; Wed, 26 Mar 1997 19:25:41 -0500 (EST)
Date: Thu, 27 Mar 97 00:14:24 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5737.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: OT 1.1.2 trace
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: alan@lxorguk.ukuu.org.uk (Alan Cox)
> If its a PC it'll give you sub microsecond accuracy if you learn to use
> the hardware right. Read the remaining countdown time from the timer chips
> and do some maths.
>
Yeah, I found it.  About 838 ns.  Doesn't divide to microseconds very
well.  I'll see what I can do with it.  May just use Karn's code to do
milliseconds.  Thanks.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 27 00:43:33 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA06363 for tcp-impl-list; Thu, 27 Mar 1997 00:42:04 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA06358 for <tcp-impl@relay.engr.SGI.COM>; Thu, 27 Mar 1997 00:42:00 -0800
Received: from regin.dna.lth.se (regin.dna.lth.se [130.235.16.100]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id AAA08263 for <tcp-impl@relay.engr.SGI.COM>; Thu, 27 Mar 1997 00:41:58 -0800
Received: from regin.dna.lth.se (localhost [127.0.0.1])
	by regin.dna.lth.se (8.8.5/8.8.5) with ESMTP id JAA32168;
	Thu, 27 Mar 1997 09:01:59 +0100
Message-Id: <199703270801.JAA32168@regin.dna.lth.se>
To: "William Allen Simpson" <wsimpson@greendragon.com>
cc: Eric.Schenk@dna.lth.se, tcp-impl@relay.engr.SGI.COM
From: Eric.Schenk@dna.lth.se
Subject: Re: OT 1.1.2 trace -- delayed Ack 
In-reply-to: Your message of "Wed, 26 Mar 1997 23:08:52 GMT."
             <5736.wsimpson@greendragon.com> 
Date: Thu, 27 Mar 1997 09:01:58 +0100
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


"William Allen Simpson" <wsimpson@greendragon.com> writes:
>Actually, 3 key parameters, as already written in RFC-813:
>
> 1) Should not be less than the time the application takes to process
>    the data ("holding" or "processing" time).  When RTT dominated by
>    processing time (low delay networks), need floor ("minimum value").

I don't see anything in the discussion of the holding time that
implies a minimum value for the ACK timeout (ATO). The discussion talks
about a minimum value on the RTT calculated at the sender to prevent
spurious retransmission due to the fact that the receiver is delaying an ACK.
We see this problem in Linux due to the fact that we use 10 milisecond
or 1 milisecond granularity timers (depending on the hardware).
If we don't place a 200 millisecond floor on the RTT then we do
spurious retransmission against BSD derived stacks which have a fixed 200
millisecond delayed ACK policy.

>    No floor specified, but the example is 200 to 300 milliseconds.  I
>    always thought of that as a floor of 200 and a ceiling of 300.

The only place I find these numbers mentioned in the RFC is on the
discussion of the expected interarrival time on the Arpanet.
Have I missed something?

> 2) Should not be more than measured 1/2 RTT, as otherwise might cause
>    retransmission.  RFC-1122 specifies a ceiling of 500 milliseconds.

I can't find any discussion specific to this point in the RFC.
What have I missed?

As to the 500 millisecond ceiling, this seems to derive from the
500 millisecond granularity in the RTT calculation on BSD derived stacks.
Since we always have a 500 millisecond "over estimate" (roughly
speaking) in the RTT a 500 millisecond ceiling on the delay prevents
a spurious retransmission due to a delayed ack. Also note that
if you are calculating the RTT with higher resolution timers you
need to consider this fact. 

> 3) Should be related to inter-arrival time of packet _bursts_ (back to
>    back packets).  This was for _high_ delay "very slow" links, trying
>    to measure how long it takes a packet to transmit down the slowest
>    link.  Unfortunately, a packet train over ethernet/FDDI/OC-3 will
>    have virtually zero inter-arrival time, while a 1500 byte packet
>    at 9600 bps is 1.5 seconds.

Yes, we filter the ATO measure we do against large changes in the
interarrival time for just this reason. As you observe the 500 millisecond
ceiling kicks in on really slow links, and on really fast links
we end up just ACKing every second packet as required by RFC-1122.
(Although I can still measure the ATO on 10 base-T ethernet, it's
just very small. I'm not so sure about 100 base-T or FDDI, maybe
when we have 1 millisecond clocks.  Also, at this speed delay between
packets might be dominated by the time the sending machine takes to
stuff the next packet into the network and not on the transmission
delay in the network.)

>    BTW, this was for future research, in the appendix.

Quite right. Perhaps this discussion is straying from the charter
of this group at this point.

>As I mentioned, I used "srtt/2 + mdev" quite successfully, with a floor
>of 100 ms and a ceiling of 500 ms.

Note that you on a pure receiving link you have no RTT measure.
If you use the initial srtt and mdev settings on your link you are
essentially setting the ATO to a constant on a pure receiver.

-- 
Eric Schenk                               www: http://www.dna.lth.se/~erics
Dept. of Comp. Sci., Lund University          email: Eric.Schenk@dna.lth.se
Box 118, S-221 00 LUND, Sweden   fax: +46-46 13 10 21  ph: +46-46 222 96 38

From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 27 10:12:15 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA08007 for tcp-impl-list; Thu, 27 Mar 1997 10:10:42 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA07987 for <tcp-impl@engr.sgi.com>; Thu, 27 Mar 1997 10:10:40 -0800
Received: from ietf.org (ietf.org [132.151.1.19]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id KAA18343 for <tcp-impl@engr.sgi.com>; Thu, 27 Mar 1997 10:10:28 -0800
Received: from ietf.ietf.org by ietf.org id aa18741; 27 Mar 97 9:57 EST
Mime-Version: 1.0
Content-Type: Multipart/Mixed; Boundary="NextPart"
To: IETF-Announce:;@ietf.org
cc: tcp-impl@engr.sgi.com
From: Internet-Drafts@ietf.org
Reply-to: Internet-Drafts@ietf.org
Subject: I-D ACTION:draft-ietf-tcpimp-prob-00.txt
Date: Thu, 27 Mar 1997 09:57:46 -0500
Message-ID:  <9703270957.aa18741@ietf.org>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

--NextPart

 A New Internet-Draft is available from the on-line Internet-Drafts 
 directories. This draft is a work item of the TCP Implementation Working 
 Group of the IETF.                                                        

       Title     : Known TCP Implementation Problems                       
       Author(s) : V. Paxson
       Filename  : draft-ietf-tcpimp-prob-00.txt
       Pages     : 9
       Date      : 03/26/1997

This memo catalogs a number of known TCP implementation problems.  The goal
in doing so is to improve conditions in the existing Internet by enhancing 
the quality of current TCP/IP implementations.                             

Internet-Drafts are available by anonymous FTP.  Login with the username
"anonymous" and a password of your e-mail address.  After logging in,
type "cd internet-drafts" and then
     "get draft-ietf-tcpimp-prob-00.txt".
A URL for the Internet-Draft is:
ftp://ds.internic.net/internet-drafts/draft-ietf-tcpimp-prob-00.txt
 
Internet-Drafts directories are located at:	
	                                                
     o  Africa:  ftp.is.co.za                    
	                                                
     o  Europe:  ftp.nordu.net            	
                 ftp.nis.garr.it                 
	                                                
     o  Pacific Rim: munnari.oz.au               
	                                                
     o  US East Coast: ds.internic.net           
	                                                
     o  US West Coast: ftp.isi.edu               
	                                                
Internet-Drafts are also available by mail.	
	                                                
Send a message to:  mailserv@ds.internic.net. In the body type: 
     "FILE /internet-drafts/draft-ietf-tcpimp-prob-00.txt".
							
NOTE: The mail server at ds.internic.net can return the document in
      MIME-encoded form by using the "mpack" utility.  To use this
      feature, insert the command "ENCODING mime" before the "FILE"
      command.  To decode the response(s), you will need "munpack" or
      a MIME-compliant mail reader.  Different MIME-compliant mail readers
      exhibit different behavior, especially when dealing with
      "multipart" MIME messages (i.e., documents which have been split
      up into multiple messages), so check your local documentation on
      how to manipulate these messages.
							
							

Below is the data which will enable a MIME compliant mail reader 
implementation to automatically retrieve the ASCII version
of the Internet-Draft.

--NextPart
Content-Type: Multipart/Alternative; Boundary="OtherAccess"

--OtherAccess
Content-Type:  Message/External-body;
        access-type="mail-server";
        server="mailserv@ds.internic.net"

Content-Type: text/plain
Content-ID: <19970326160504.I-D@ietf.org>

ENCODING mime
FILE /internet-drafts/draft-ietf-tcpimp-prob-00.txt

--OtherAccess
Content-Type:   Message/External-body;
        name="draft-ietf-tcpimp-prob-00.txt";
        site="ds.internic.net";
        access-type="anon-ftp";
        directory="internet-drafts"

Content-Type: text/plain
Content-ID: <19970326160504.I-D@ietf.org>

--OtherAccess--

--NextPart--

From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 27 10:57:24 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA20829 for tcp-impl-list; Thu, 27 Mar 1997 10:55:46 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA20822 for <tcp-impl@relay.engr.SGI.COM>; Thu, 27 Mar 1997 10:55:43 -0800
Received: from merit.edu (merit.edu [35.1.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA01252 for <tcp-impl@relay.engr.SGI.COM>; Thu, 27 Mar 1997 10:55:40 -0800
Received: from Bill.Simpson.DialUp.Mich.Net (pm035-05.dialip.mich.net [141.211.7.16]) by merit.edu (8.8.5/merit-2.0) with SMTP id NAA21419 for <tcp-impl@relay.engr.SGI.COM>; Thu, 27 Mar 1997 13:51:51 -0500 (EST)
Date: Thu, 27 Mar 97 18:36:21 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <5743.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: OT 1.1.2 trace -- delayed Ack
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Eric.Schenk@dna.lth.se
> Have I missed something?
>
In regards to both queries, you seem to have missed that these
discussions are all in the sections on delayed Ack.  So, why did they
discuss holding time, RTT, inter-packet arrival time?  All are applied
to delayed Ack.


> As to the 500 millisecond ceiling, this seems to derive from the
> 500 millisecond granularity in the RTT calculation on BSD derived stacks.

Maybe, although RFC-1122 doesn't mention it.

> if you are calculating the RTT with higher resolution timers you
> need to consider this fact.
>
I just follow the rule, because it works and provides consistency.


> >As I mentioned, I used "srtt/2 + mdev" quite successfully, with a floor
> >of 100 ms and a ceiling of 500 ms.
>
> Note that you on a pure receiving link you have no RTT measure.
> If you use the initial srtt and mdev settings on your link you are
> essentially setting the ATO to a constant on a pure receiver.
>
Yes (yielding the ceiling value), and the problem is?

And "pure receiver" happens in which protocol?  TCP always provides at
least one round trip estimate.

Remember, it is always OK to have a larger value, as that is the intent
-- the timer should rarely be used, other events should happen sooner.

All these strategies are to find an acceptable _lower_ bound for more
interactive traffic.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 27 11:28:19 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA01042 for tcp-impl-list; Thu, 27 Mar 1997 11:26:35 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA01036 for <tcp-impl@relay.engr.SGI.COM>; Thu, 27 Mar 1997 11:26:33 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA10413 for <tcp-impl@relay.engr.SGI.COM>; Thu, 27 Mar 1997 11:26:27 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id LAA11430; Thu, 27 Mar 1997 11:16:32 -0800 (PST)
Message-Id: <199703271916.LAA11430@daffy.ee.lbl.gov>
To: tcp-impl@relay.engr.SGI.COM
Subject: Internet Draft on Known TCP Implementation Problems
Date: Thu, 27 Mar 1997 11:16:32 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Here's the first draft of a document describing TCP implementation problems.
It includes the four I sent earlier to the list, with feedback incorporated.
I picture this turning into an extensive document in the future and encourage
volunteers to write up descriptions of other problems.  I'll post the list
I've been keeping of problems to document in a few days.

So far my leaning is to include traces directly in the document, rather
than using URLs to on-line versions.  This is because they so far haven't
been big enough to bloat the doc, and it's handy having them right there
in the middle of the discussion.

		Vern


--------------------------------------------------------------------


Network Working Group                                  V. Paxson, Editor
Internet Draft
Expiration Date: September 1997                               March 1997


                   Known TCP Implementation Problems
                    <draft-ietf-tcpimp-prob-00.txt>


1. Status of this Memo

   This document is an Internet  Draft.   Internet  Drafts  are  working
   documents  of  the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may  also  distribute
   working documents as Internet Drafts.

   Internet Drafts are draft  documents  valid  for  a  maximum  of  six
   months, and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet Drafts as reference
   material or to cite them other than as ``work in progress''.

   To learn the current status of any Internet Draft, please  check  the
   ``1id-abstracts.txt'' listing contained in the Internet Drafts shadow
   directories  on  ftp.is.co.za   (Africa),   nic.nordu.net   (Europe),
   munnari.oz.au  (Pacific  Rim),  ds.internic.net  (US  East Coast), or
   ftp.isi.edu (US West Coast).

   This memo provides information for the Internet community.  This memo
   does  not  specify an Internet standard of any kind.  Distribution of
   this memo is unlimited.


2. Introduction

   This memo catalogs a number of  known  TCP  implementation  problems.
   The  goal  in  doing  so  is  to  improve  conditions in the existing
   Internet by enhancing the quality of current TCP/IP  implementations.
   It  is  hoped  that  both  performance  and correctness issues can be
   resolved by making implementors  aware  of  the  problems  and  their
   solutions.   In  the  long term, it is hoped that this will provide a
   reduction  in  unnecessary  traffic  on  the  network,  the  rate  of
   connection  failures  due  to  protocol  errors,  and load on network
   servers due to time spent processing  both  unsuccessful  connections
   and  retransmitted  data.   This will help to ensure the stability of
   the global Internet.

   Each problem is defined as follows:




Paxson, Editor                                                  [Page 1]





ID                 Known TCP Implementation Problems          March 1997


   Name The name associated with the problem.  In this memo, the name is
        given as a subsection heading.

   Category
        One or more problem categories for which the problem is  classi-
        fied.   Categories  used  so far: "congestion control", "perfor-
        mance", "reliability".  Others anticipated: "security", "intero-
        perability", "configuration".

   Description
        A definition of the problem, succinct  but  including  necessary
        background material.

   Significance
        A quanitification as to how serious the problem  is  considered.
        Categories are "Non-critical", "Serious", and "Critical".

   Implications
        Why the problem is viewed as a problem.

   Relevant RFCs
        Brief discussion of the RFCs with respect to which  the  problem
        is viewed as an implementation error.

   Trace file demonstrating the problem
        One or more ASCII trace  files  demonstrating  the  problem,  if
        applicable.   These  may  in the future be replaced with URLs to
        on-line traces.

   Trace file demonstrating correct behavior
        One or more examples of how correct behavior appears in a trace,
        if applicable.  These may in the future be replaced with URLs to
        on-line traces.

   References
        References that further discuss the problem.

   How to detect
        How to test an implementation to see if it exhibits the problem.
        This  discussion may include difficulties and subtleties associ-
        ated with causing the  problem  to  manifest  itself,  and  with
        interpreting  traces  to  detect the presence of the problem (if
        applicable).  In the future, this may include URLs for  diagnos-
        tic tools.

   How to fix
        For known causes of the problem, how to correct the  implementa-
        tion.



Paxson, Editor                                                  [Page 2]





ID                 Known TCP Implementation Problems          March 1997


   Implementation specifics
        If it is viewed as beneficial to document particular implementa-
        tions exhibiting the problem, and if the corresponding implemen-
        tors approve, then this section gives  the  specifics  of  those
        implementations,  along with a contact address for the implemen-
        tors.


3. Known implementation problems


3.1. No initial slow start

Category
     Congestion control

Description
     When a TCP begins transmitting data, it is required  by  RFC  1122,
     4.2.2.15,  to  engage in a "slow start" by initializing its conges-
     tion window, cwnd, to one packet (one segment of the maximum size).
     It  subsequently  increases  cwnd  by  one  packet  for each ack it
     receives for new data.  The minimum  of  cwnd  and  the  receiver's
     advertised  window  bounds  the highest sequence number the TCP can
     transmit.  A TCP that fails to initialize  and  increment  cwnd  in
     this fashion exhibits "No initial slow start".

Significance
     Serious.

Implications
     A TCP failing to slow start when beginning a connection results  in
     traffic  bursts  that  can stress the network, leading to excessive
     queueing delays and packet loss.

     Implementations exhibiting this problem might do  so  because  they
     suffer  from  the  general  problem  of  not including the required
     congestion window.  These implementations will also suffer from "No
     slow start after retransmission timeout".

     There are different shades of "No initial slow  start".   From  the
     perspective  of  stressing  the  network, the worst is a connection
     that simply always sends based on the receiver's advertised window,
     with  no  notion of a separate congestion window.  Some other forms
     are described in "Uninitialized CWND" and "Initial CWND of 2  pack-
     ets".


Relevant RFCs' 5



Paxson, Editor                                                  [Page 3]





ID                 Known TCP Implementation Problems          March 1997


     RFC 1122 requires use of slow start.  RFC 2001 gives the  specifics
     of slow start.

Trace file demonstrating it
     Made using tcpdump/BPF recording at the connection  responder.   No
     losses reported.

     10:40:42.244503 B > A: S 1168512000:1168512000(0) win 32768
                             <mss 1460,nop,wscale 0> (DF) [tos 0x8]
     10:40:42.259908 A > B: S 3688169472:3688169472(0)
                             ack 1168512001 win 32768 <mss 1460>
     10:40:42.389992 B > A: . ack 1 win 33580 (DF) [tos 0x8]
     10:40:42.664975 A > B: P 1:513(512) ack 1 win 32768
     10:40:42.700185 A > B: . 513:1973(1460) ack 1 win 32768
     10:40:42.718017 A > B: . 1973:3433(1460) ack 1 win 32768
     10:40:42.762945 A > B: . 3433:4893(1460) ack 1 win 32768
     10:40:42.811273 A > B: . 4893:6353(1460) ack 1 win 32768
     10:40:42.829149 A > B: . 6353:7813(1460) ack 1 win 32768
     10:40:42.853687 B > A: . ack 1973 win 33580 (DF) [tos 0x8]
     10:40:42.864031 B > A: . ack 3433 win 33580 (DF) [tos 0x8]

     After the third packet, the connection is established.  A, the con-
     nection responder, begins transmitting to B, the connection initia-
     tor.  Host A quickly sends 6 packets comprising  7812  bytes,  even
     though  the SYN exchange agreed upon an MSS of 1460 bytes (implying
     an initial congestion window  of  1  segment  corresponds  to  1460
     bytes), and so A should have sent at most 1460 bytes.

     The acks sent by B to A in the last two lines  indicate  that  this
     trace  is  not a measurement error (slow start really occurring but
     the corresponding acks having been dropped by the packet filter).

     A second trace confirmed that the problem is repeatable.


Trace file demonstrating correct behavior

     Made using tcpdump/BPF recording at the connection originator.   No
     losses reported.

     12:35:31.914050 C > D: S 1448571845:1448571845(0) win 4380 <mss 1460>
     12:35:32.068819 D > C: S 1755712000:1755712000(0) ack 1448571846 win 4096
     12:35:32.069341 C > D: . ack 1 win 4608
     12:35:32.075213 C > D: P 1:513(512) ack 1 win 4608
     12:35:32.286073 D > C: . ack 513 win 4096
     12:35:32.287032 C > D: . 513:1025(512) ack 1 win 4608
     12:35:32.287506 C > D: . 1025:1537(512) ack 1 win 4608
     12:35:32.432712 D > C: . ack 1537 win 4096



Paxson, Editor                                                  [Page 4]





ID                 Known TCP Implementation Problems          March 1997


     12:35:32.433690 C > D: . 1537:2049(512) ack 1 win 4608
     12:35:32.434481 C > D: . 2049:2561(512) ack 1 win 4608
     12:35:32.435032 C > D: . 2561:3073(512) ack 1 win 4608
     12:35:32.594526 D > C: . ack 3073 win 4096
     12:35:32.595465 C > D: . 3073:3585(512) ack 1 win 4608
     12:35:32.595947 C > D: . 3585:4097(512) ack 1 win 4608
     12:35:32.596414 C > D: . 4097:4609(512) ack 1 win 4608
     12:35:32.596888 C > D: . 4609:5121(512) ack 1 win 4608
     12:35:32.733453 D > C: . ack 4097 win 4096


References
     This problem is documented in [Paxson97].

How to detect
     For implementations always manifesting this problem,  it  shows  up
     immediately  in  a  packet trace or a sequence plot, as illustrated
     above.

How to fix
     If the root problem is that the implementation lacks a notion of  a
     congestion  window,  then  unfortunately  this requires significant
     work to fix.  However, doing so is critical,  as  such  implementa-
     tions  exhibit  "No slow start after retransmission timeout", which
     has a significance of "Critical".


3.2. No slow start after retransmission timeout

Category
     Congestion control

Description
     When a TCP experiences a retransmission timeout, it is required  by
     RFC  1122,  4.2.2.15, to engage in "slow start" by initializing its
     congestion window, cwnd, to one packet (one segment of the  maximum
     size).   It  subsequently increases cwnd by one packet for each ack
     it  receives  for  new  data  until  it  reaches  the   "congestion
     avoidance"  threshold,  ssthresh,  at  which  point  the congestion
     avoidance algorithm for updating the window takes over.  A TCP that
     fails  to  enter  slow start upon a timeout exhibits "No slow start
     after retransmission timeout".

Significance
     Critical.

Implications
     Entering slow start upon timeout forms one of the  cornerstones  of



Paxson, Editor                                                  [Page 5]





ID                 Known TCP Implementation Problems          March 1997


     Internet  congestion  stability,  as  outlined in [Jacobson88].  If
     TCPs fail to do so,  the  network  becomes  at  risk  of  suffering
     "congestion collapse" [RFC896].

Relevant RFCs
     RFC 1122 requires use of slow start after loss.  RFC 2001 gives the
     specifics  of  how  to  implement  slow  start.   RFC 896 describes
     congestion collapse.

     The retransmission timeout discussed here should  not  be  confused
     with  the  separate  "fast  recovery" retransmission mechanism dis-
     cussed in RFC 2001.


Trace file demonstrating it
     Made using tcpdump/BPF recording at the sending TCP (A).  No losses
     reported.

     10:40:59.090612 B > A: . ack 357125 win 33580 (DF) [tos 0x8]
     10:40:59.222025 A > B: . 357125:358585(1460) ack 1 win 32768
     10:40:59.868871 A > B: . 357125:358585(1460) ack 1 win 32768
     10:41:00.016641 B > A: . ack 364425 win 33580 (DF) [tos 0x8]
     10:41:00.036709 A > B: . 364425:365885(1460) ack 1 win 32768
     10:41:00.045231 A > B: . 365885:367345(1460) ack 1 win 32768
     10:41:00.053785 A > B: . 367345:368805(1460) ack 1 win 32768
     10:41:00.062426 A > B: . 368805:370265(1460) ack 1 win 32768
     10:41:00.071074 A > B: . 370265:371725(1460) ack 1 win 32768
     10:41:00.079794 A > B: . 371725:373185(1460) ack 1 win 32768
     10:41:00.089304 A > B: . 373185:374645(1460) ack 1 win 32768
     10:41:00.097738 A > B: . 374645:376105(1460) ack 1 win 32768
     10:41:00.106409 A > B: . 376105:377565(1460) ack 1 win 32768
     10:41:00.115024 A > B: . 377565:379025(1460) ack 1 win 32768
     10:41:00.123576 A > B: . 379025:380485(1460) ack 1 win 32768
     10:41:00.132016 A > B: . 380485:381945(1460) ack 1 win 32768
     10:41:00.141635 A > B: . 381945:383405(1460) ack 1 win 32768
     10:41:00.150094 A > B: . 383405:384865(1460) ack 1 win 32768
     10:41:00.158552 A > B: . 384865:386325(1460) ack 1 win 32768
     10:41:00.167053 A > B: . 386325:387785(1460) ack 1 win 32768
     10:41:00.175518 A > B: . 387785:389245(1460) ack 1 win 32768
     10:41:00.210835 A > B: . 389245:390705(1460) ack 1 win 32768
     10:41:00.226108 A > B: . 390705:392165(1460) ack 1 win 32768
     10:41:00.241524 B > A: . ack 389245 win 8760 (DF) [tos 0x8]

     The first packet indicates the ack point is 357125.  130 msec after
     receiving  the  ack,  A  transmits  the packet after the ack point,
     357125:358585.  640 msec after this  transmission,  it  retransmits
     357125:358585,  in  an  apparent  retransmission  timeout.  At this
     point, A's cwnd should be one MSS,  or  1460  bytes,  as  A  enters



Paxson, Editor                                                  [Page 6]





ID                 Known TCP Implementation Problems          March 1997


     slow-start.  The trace is consistent with this possibility.

     B replies with an ack of 364425, indicating that  A  has  filled  a
     sequence  hole.   At  this  point, A's cwnd should be 1460*2 = 2920
     bytes, since in slow start receiving an ack advances cwnd  by  MSS.
     However,  A  then  launches 19 consecutive packets, which is incon-
     sistent with slow start.

     A second trace confirmed that the problem is repeatable.


Trace file demonstrating correct behavior
     Made using tcpdump/BPF recording at the sending TCP (C).  No losses
     reported.

     12:35:48.442538 C > D: P 465409:465921(512) ack 1 win 4608
     12:35:48.544483 D > C: . ack 461825 win 4096
     12:35:48.703496 D > C: . ack 461825 win 4096
     12:35:49.044613 C > D: . 461825:462337(512) ack 1 win 4608
     12:35:49.192282 D > C: . ack 465921 win 2048
     12:35:49.192538 D > C: . ack 465921 win 4096
     12:35:49.193392 C > D: P 465921:466433(512) ack 1 win 4608
     12:35:49.194726 C > D: P 466433:466945(512) ack 1 win 4608
     12:35:49.350665 D > C: . ack 466945 win 4096
     12:35:49.351694 C > D: . 466945:467457(512) ack 1 win 4608
     12:35:49.352168 C > D: . 467457:467969(512) ack 1 win 4608
     12:35:49.352643 C > D: . 467969:468481(512) ack 1 win 4608
     12:35:49.506000 D > C: . ack 467969 win 3584

     After C transmits the first packet shown to D, it takes  no  action
     in  response  to  D's  acks  for  461825,  because the first packet
     already reached the advertised window limit  of  4096  bytes  above
     461825.    600   msec   after  transmitting  the  first  packet,  C
     retransmits  461825:462337,  presumably  due  to  a  timeout.   Its
     congestion window is now MSS (512 bytes).

     D acks 465921, indicating that C's retransmission filled a sequence
     hole.   This  ack advances C's cwnd from 512 to 1024.  Very shortly
     after, D acks 465921 again in order to update  the  offered  window
     from  2048 to 4096.  This ack does not advance cwnd since it is not
     for new data.  Very shortly after, C responds to the newly enlarged
     window  by  transmitting  two packets.  D acks both, advancing cwnd
     from 1024 to 1536.  C in turn transmits three packets.


References
     This problem is documented in [Paxson97].




Paxson, Editor                                                  [Page 7]





ID                 Known TCP Implementation Problems          March 1997


How to detect
     Packet loss is common enough in the Internet that generally  it  is
     not  difficult to find an Internet path that will force retransmis-
     sion due to packet loss.

     If the effective window prior to loss  is  large  enough,  however,
     then  the  TCP  may  retransmit using the "fast recovery" mechanism
     described in RFC 2001.  In a packet trace, the  signature  of  fast
     recovery  is  that  the packet retransmission occurs in response to
     the receipt of three duplicate acks, and subsequent duplicate  acks
     may  lead to the transmission of new data, above both the ack point
     and the highest sequence transmitted so far.  An absence  of  three
     duplicate  acks  prior  to  retransmission  suffices to distinguish
     between timeout and fast recovery retransmissions.  In the face  of
     only  observing  fast recovery retransmissions, generally it is not
     difficult to repeat the data transfer  until  observing  a  timeout
     retransmission.

     Once armed with a trace exhibiting a timeout retransmission, deter-
     mining  whether the TCP follows slow start is done by computing the
     correct progression of cwnd and comparing it to the amount of  data
     transmited by the TCP subsequent to the timeout rtransmission.


How to fix
     If the root problem is that the implementation lacks a notion of  a
     congestion  window,  then  unfortunately  this requires significant
     work to fix.  However, doing so is critical, for  reasons  outlined
     above.


3.3. Inconsistent retransmission

Category
     Reliability

Description
     If, for a given sequence number, a  sending  TCP  retransmits  dif-
     ferent  data  than previously sent for that sequence number, then a
     strong possibility arises that the receiving TCP will reconstruct a
     different  byte  stream  than that sent by the sending application,
     depending on which instance of  the  sequence  number  it  accepts.
     Such a sending TCP exhibits "Inconsistent retransmission".

Significance
     Critical.

Implications



Paxson, Editor                                                  [Page 8]





ID                 Known TCP Implementation Problems          March 1997


     Reliable delivery of data is a fundamental property of TCP.

Relevant RFCs
     RFC 793, section 1.5, discusses the central role of reliability  in
     TCP operation.

Trace file demonstrating it
     Made using tcpdump/BPF recording at  the  receiving  TCP  (B).   No
     losses reported.

     12:35:53.145503 A > B: FP 90048435:90048461(26) ack 393464682 win 4096
                                          4500 0042 9644 0000
                      3006 e4c2 86b1 0401 83f3 010a b2a4 0015
                      055e 07b3 1773 cb6a 5019 1000 68a9 0000
     data starts here>504f 5254 2031 3334 2c31 3737*2c34 2c31
                      2c31 3738 2c31 3635 0d0a
     12:35:53.146479 B > A: R 393464682:393464682(0) win 8192
     12:35:53.851714 A > B: FP 90048429:90048463(34) ack 393464682 win 4096
                                          4500 004a 965b 0000
                      3006 e4a3 86b1 0401 83f3 010a b2a4 0015
                      055e 07ad 1773 cb6a 5019 1000 8bd3 0000
     data starts here>5041 5356 0d0a 504f 5254 2031 3334 2c31
                      3737*2c31 3035 2c31 3431 2c34 2c31 3539
                      0d0a

     The sequence numbers shown in  this  trace  are  absolute  and  not
     adjusted to reflect the ISN.  The 4-digit hex values show a dump of
     the packet's IP and TCP headers, as well as payload.  A first sends
     to  B  data  for  90048435:90048461.  The corresponding data begins
     with hex words 504f, 5254, etc.

     B responds with a RST.  Since the recording location was  local  to
     B, it is unknown whether A received the RST.

     A then sends 90048429:90048463, which includes six  sequence  posi-
     tions  below the earlier transmission, all 26 positions of the ear-
     lier transmission, and two additional sequence positions.

     The retransmission disagrees starting just after sequence 90048447,
     annotated  above  with  a leading '*'.  These two bytes were origi-
     nally transmitted as hex 2c34 but retransmitted as hex 2c31.   Sub-
     sequent positions disagree as well.

     This behavior has been observed in other traces involving different
     hosts.  It is unknown how to repeat it.

     In this instance, no corruption would occur, since  B  has  already
     indicated it will not accept further packets from A.



Paxson, Editor                                                  [Page 9]





ID                 Known TCP Implementation Problems          March 1997


     A second example illustrates a slightly different instance  of  the
     problem.   The  tracing  again  was  made  with  tcpdump/BPF at the
     receiving TCP (D).

     22:23:58.645829 C > D: P 185:212(27) ack 565 win 4096
                                          4500 0043 90a3 0000
                      3306 0734 cbf1 9eef 83f3 010a 0525 0015
                      a3a2 faba 578c 70a4 5018 1000 9a53 0000
     data starts here>504f 5254 2032 3033 2c32 3431 2c31 3538
                      2c32 3339 2c35 2c34 330d 0a
     22:23:58.646805 D > C: . ack 184 win 8192
                                          4500 0028 beeb 0000
                      3e06 ce06 83f3 010a cbf1 9eef 0015 0525
                      578c 70a4 a3a2 fab9 5010 2000 342f 0000
     22:31:36.532244 C > D: FP 186:213(27) ack 565 win 4096
                                          4500 0043 9435 0000
                      3306 03a2 cbf1 9eef 83f3 010a 0525 0015
                      a3a2 fabb 578c 70a4 5019 1000 9a51 0000
     data starts here>504f 5254 2032 3033 2c32 3431 2c31 3538
                      2c32 3339 2c35 2c34 330d 0a

     In this trace, sequence numbers are relative.  C sends 185:212, but
     D only sends an ack for 184 (so sequence number 184 is missing).  C
     then sends 186:213.  The packet payload is identical to the  previ-
     ous  payload, but the base sequence number is one higher, resulting
     in an inconsistent retransmission.

     Neither trace exhibits checksum errors.


Trace file demonstrating correct behavior
     (Omitted, as presumably correct behavior is obvious.)

References
     None known.

How to detect
     This problem unfortunately can be very difficult to  detect,  since
     available  experience  indicates  it is quite rare that it is mani-
     fested.  No "trigger" has been  identified  that  can  be  used  to
     reproduce the problem.

How to fix
     In the absence of a known "trigger", we cannot always assess how to
     fix the problem.

     In one implementation (not the one illustrated above), the  problem
     manifested  itself  when  (1) the sender received a zero window and



Paxson, Editor                                                 [Page 10]





ID                 Known TCP Implementation Problems          March 1997


     stalled; (2) eventually an ack arrived that offered a window larger
     than  that  in  effect  at  the  time  of the stall; (3) the sender
     transmitted out of the buffer of data it held at the  time  of  the
     stall,  but (4) failed to limit this transfer to the buffer length,
     instead using the newly advertised  (and  larger)  offered  window.
     Consequently,  in  addition  to  the valid buffer contents, it sent
     whatever garbage values followed the end of the buffer.  If it then
     retransmitted  the corresponding sequence numbers, at that point it
     sent the correct data, resulting in an inconsistent retransmission.
     Note  that  this  instance  of  the problem reflects a more general
     problem, that of initially transmitting incorrect data.


3.4. Failure to retain above-sequence data

Category
     Congestion control, performance

Description
     When a TCP receives an "above sequence" segment, meaning one with a
     sequence  number  exceeding  RCV.NXT  but below RCV.NXT+RCV.WND, it
     SHOULD queue the segment for later delivery (RFC  1122,  4.2.2.20).
     A  TCP  that  fails  to do so is said to exhibit "Failure to retain
     above-sequence data".

     It may sometimes be appropriate for a TCP to discard above-sequence
     data  to  reclaim memory.  If they do so only rarely, then we would
     not consider them to exhibit this problem.  Instead, the particular
     concern is with TCPs that always discard above-sequence data.


Significance
     Serious.

Implications
     In times of congestion, a failure  to  retain  above-sequence  data
     will lead to numerous otherwise-unnecessary retransmissions, aggra-
     vating the congestion and potentially  reducing  performance  by  a
     large factor.

Relevant RFCs
     RFC 1122 revises RFC 793 by upgrading the latter's MAY to a  SHOULD
     on this issue.

Trace file demonstrating it
     Made using tcpdump/BPF recording at the receiving TCP.   No  losses
     reported.




Paxson, Editor                                                 [Page 11]





ID                 Known TCP Implementation Problems          March 1997


     B is the TCP sender, A the receiver.  A exhibits failure to  retain
     above sequence data:

     10:38:10.164860 B > A: . 221078:221614(536) ack 1 win 33232 [tos 0x8]
     10:38:10.170809 B > A: . 221614:222150(536) ack 1 win 33232 [tos 0x8]
     10:38:10.177183 B > A: . 222150:222686(536) ack 1 win 33232 [tos 0x8]
     10:38:10.225039 A > B: . ack 222686 win 25800

     Here B has sent up to (relative) sequence 222676 in-sequence, and A
     accordingly acknowledges.

     10:38:10.268131 B > A: . 223222:223758(536) ack 1 win 33232 [tos 0x8]
     10:38:10.337995 B > A: . 223758:224294(536) ack 1 win 33232 [tos 0x8]
     10:38:10.344065 B > A: . 224294:224830(536) ack 1 win 33232 [tos 0x8]
     10:38:10.350169 B > A: . 224830:225366(536) ack 1 win 33232 [tos 0x8]
     10:38:10.356362 B > A: . 225366:225902(536) ack 1 win 33232 [tos 0x8]
     10:38:10.362445 B > A: . 225902:226438(536) ack 1 win 33232 [tos 0x8]
     10:38:10.368579 B > A: . 226438:226974(536) ack 1 win 33232 [tos 0x8]
     10:38:10.374732 B > A: . 226974:227510(536) ack 1 win 33232 [tos 0x8]
     10:38:10.380825 B > A: . 227510:228046(536) ack 1 win 33232 [tos 0x8]
     10:38:10.387027 B > A: . 228046:228582(536) ack 1 win 33232 [tos 0x8]
     10:38:10.393053 B > A: . 228582:229118(536) ack 1 win 33232 [tos 0x8]
     10:38:10.399193 B > A: . 229118:229654(536) ack 1 win 33232 [tos 0x8]
     10:38:10.405356 B > A: . 229654:230190(536) ack 1 win 33232 [tos 0x8]

     A now receives 13 additional packets  from  B.   These  are  above-
     sequence because 222686:223222 was dropped.  The packets do however
     fit within the offered window of 25800.  A does  not  generate  any
     duplicate acks for them.

     The trace contributor (V. Paxson) verified that  these  13  packets
     had valid IP and TCP checksums.

     10:38:11.917728 B > A: . 222686:223222(536) ack 1  win  33232  [tos
     0x8] 10:38:11.930925 A > B: . ack 223222 win 32232

     B times out for 222686:223222 and retransmits it.   Upon  receiving
     it,  A  only acknowledges 223222.  Had it retained the valid above-
     sequence packets, it would instead have ack'd 230190.

     10:38:12.048438 B > A: . 223222:223758(536) ack 1  win  33232  [tos
     0x8]  10:38:12.054397  B  > A: . 223758:224294(536) ack 1 win 33232
     [tos 0x8] 10:38:12.068029 A > B: . ack 224294 win 31696

     B retransmits two more packets, and A only acknowledges them.  This
     pattern  continues  as  B retransmits the entire set of previously-
     received packets.




Paxson, Editor                                                 [Page 12]





ID                 Known TCP Implementation Problems          March 1997


     A second trace confirmed that the problem is repeatable.


Trace file demonstrating correct behavior
     Made using tcpdump/BPF recording at  the  receiving  TCP  (C).   No
     losses reported.

     09:11:25.790417 D > C: . 33793:34305(512) ack 1 win 61440
     09:11:25.791393 D > C: . 34305:34817(512) ack 1 win 61440
     09:11:25.792369 D > C: . 34817:35329(512) ack 1 win 61440
     09:11:25.792369 D > C: . 35329:35841(512) ack 1 win 61440
     09:11:25.793345 D > C: . 36353:36865(512) ack 1 win 61440
     09:11:25.794321 C > D: . ack 35841 win 59904

     A sequence hole occurs because 35841:36353 has been dropped.

     09:11:25.794321 D > C: . 36865:37377(512) ack 1 win 61440
     09:11:25.794321 C > D: . ack 35841 win 59904
     09:11:25.795297 D > C: . 37377:37889(512) ack 1 win 61440
     09:11:25.795297 C > D: . ack 35841 win 59904
     09:11:25.796273 C > D: . ack 35841 win 61440
     09:11:25.798225 D > C: . 37889:38401(512) ack 1 win 61440
     09:11:25.799201 C > D: . ack 35841 win 61440
     09:11:25.807009 D > C: . 38401:38913(512) ack 1 win 61440
     09:11:25.807009 C > D: . ack 35841 win 61440
     09:11:25.884113 D > C: . 52737:53249(512) ack 1 win 61440
     09:11:25.884113 C > D: . ack 35841 win 61440

     Each additional, above-sequence packet C receives from D elicits  a
     duplicate ack for 35841.

     09:11:25.887041 D > C: . 35841:36353(512) ack 1 win 61440
     09:11:25.887041 C > D: . ack 53249 win 44032

     D retransmits 35841:36353 and C acknowledges receipt  of  data  all
     the way up to 53249.


References
     This problem is documented in [Paxson97].


How to detect
     Packet loss is common enough in the Internet that generally  it  is
     not  difficult  to  find  an Internet path that will result in some
     above-sequence packets arriving.  A TCP that exhibits  "Failure  to
     retain  ..."  may  not  generate  duplicate acks for these packets.
     However, some TCPs that do retain above-sequence data also  do  not



Paxson, Editor                                                 [Page 13]





ID                 Known TCP Implementation Problems          March 1997


     generate  duplicate acks, so failure to do so does not definitively
     identify the problem.  Instead, the key observation is whether upon
     retransmission  of  the  dropped  packet,  data that was previously
     above-sequence is acknowledged.

     Two considerations in detecting this problem using a  packet  trace
     are  that  it  is  easiest  to  do  so with a trace made at the TCP
     receiver, in order to unambiguously determine which packets arrived
     successfully,  and  that  such  packets may still be correctly dis-
     carded if they arrive with checksum  errors.   The  latter  can  be
     tested  by  capturing the entire packet contents and performing the
     IP and TCP checksum algorithms to verify  their  integrity;  or  by
     confirming  that the packets arrive with the same checksum and con-
     tents as that with which they were sent, with  a  presumption  that
     the  sending  TCP correctly calculates checksums for the packets it
     transmits.

     It is considerably easier to verify that an implementation does NOT
     exhibit this problem.  This can be done by recording a trace at the
     data sender, and observing that sometimes  after  a  retransmission
     the  receiver  acknowledges a higher sequence number than just that
     which was retransmitted.


How to fix
     If the root problem is that the implementation lacks  buffer,  then
     then unfortunately this requires significant work to fix.  However,
     doing so is important, for reasons outlined above.


4. Security Considerations

   This version of this  memo  does  not  discuss  any  security-related
   implementation problems.  Futures versions most likely will, so secu-
   rity considerations will require revisiting.


5. Acknowledgements

   Thanks to numerous correspondents on the tcp-impl  mailing  list  for
   their  input: Steve Alexander, Mark Allman, Larry Backman, Jerry Chu,
   Alan Cox, Kevin Fall, Richard Fox, Jim Gettys,  Rick  Jones,  Allison
   Mankin,  Perry  Metzger, der Mouse, Thomas Narten, Andras Olah, Steve
   Parker, Francesco Potorti`, Luigi Rizzo,  Allyn  Romanow,  Al  Smith,
   Jerry Toporek, Joe Touch, and Curtis Villamizar.






Paxson, Editor                                                 [Page 14]





ID                 Known TCP Implementation Problems          March 1997


6. References


[Jacobson88]
     V. Jacobson, "Congestion Avoidance and Control," Proc. SIGCOMM '88.
     ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z

[Paxson97]
     V. Paxson, "Automated Packet  Trace  Analysis  of  TCP  Implementa-
     tions," available in draft form from vern@ee.lbl.gov, Feb. 1997.

[RFC896]
     J. Nagle, "Congestion Control in IP/TCP Internetworks," Jan. 1984.

[RFC1122]
     R. Braden, Editor, "Requirements for Internet Hosts  --  Communica-
     tion Layers," Oct. 1989.

[RFC2001]
     W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast Retransmit,
     and Fast Recovery Algorithms," Jan. 1997.


7. Author's Address

   Vern Paxson <vern@ee.lbl.gov>
   Network Research Group
   Lawrence Berkeley National Laboratory
   Berkeley, CA 94720
   USA
   Phone: +1 510/486-7504




















Paxson, Editor                                                 [Page 15]



From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 27 12:57:20 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA23783 for tcp-impl-list; Thu, 27 Mar 1997 12:54:59 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA23778 for <tcp-impl@relay.engr.SGI.COM>; Thu, 27 Mar 1997 12:54:57 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA02677 for <tcp-impl@relay.engr.SGI.COM>; Thu, 27 Mar 1997 12:54:56 -0800
From: touch@ISI.EDU
Received: from ash.isi.edu by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA23383>; Thu, 27 Mar 1997 12:51:02 -0800
Date: Thu, 27 Mar 1997 12:51:01 -0800
Posted-Date: Thu, 27 Mar 1997 12:51:01 -0800
Message-Id: <199703272051.AA14215@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA14215>; Thu, 27 Mar 1997 12:51:01 -0800
To: tcp-impl@relay.engr.SGI.COM, vern@ee.lbl.gov
Subject: Re: Internet Draft on Known TCP Implementation Problems
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hi,

Some suggestions (minor):

> From: Vern Paxson <vern@ee.lbl.gov>
> 
> So far my leaning is to include traces directly in the document, rather
> than using URLs to on-line versions.  This is because they so far haven't
> been big enough to bloat the doc, and it's handy having them right there
> in the middle of the discussion.

URLs are much more transient than RFCs; they can be used to
augment the discussion, but should not be used to replace
critical information, because they are not archival-quality
references.

(this has been the general consensus I've seen in the RFC/IDs,
perhaps enforced by the editors, but it seems reasonable to me)

>    Category
     ^^^^^^^^ - change to "Class?"
>         One or more problem categories for which the problem is  classi-
>         fied.   Categories  used  so far: "congestion control", "perfor-
>         mance", "reliability".  Others anticipated: "security", "intero-
>         perability", "configuration".

>    Significance
>         A quanitification as to how serious the problem  is  considered.
>         Categories are "Non-critical", "Serious", and "Critical".
          ^^^^^^^^^^

Both descriptions should include brief definitions - 
	for "Category," to aid in understanding

	for Significance, to describe the semantics
		i.e., Critical = REQUIRED (?)
		Serious = RECOMMENDED (?)
		Non-critical = OPTIONAL (?)

		ps - the REQ/RECC/OPT have an RFC/ID history
		and common use - is there a benefit to new
		names?

Finally, the use of the term "category" is overloaded - it
appears to mean both "problem group" and "significance semantics".
Maybe "Class" or "Type" is better for problem group??

PS - nice work...

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 27 14:19:39 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA11388 for tcp-impl-list; Thu, 27 Mar 1997 14:15:35 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA11377 for <tcp-impl@relay.engr.SGI.COM>; Thu, 27 Mar 1997 14:15:32 -0800
Received: from regin.dna.lth.se (regin.dna.lth.se [130.235.16.100]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA21843 for <tcp-impl@relay.engr.SGI.COM>; Thu, 27 Mar 1997 14:15:26 -0800
Received: from regin.dna.lth.se (localhost [127.0.0.1])
	by regin.dna.lth.se (8.8.5/8.8.5) with ESMTP id WAA03127;
	Thu, 27 Mar 1997 22:35:35 +0100
Message-Id: <199703272135.WAA03127@regin.dna.lth.se>
To: tcp-impl@relay.engr.SGI.COM
cc: Eric.Schenk@dna.lth.se
From: Eric.Schenk@dna.lth.se
In-reply-to: Your message of "Thu, 27 Mar 1997 18:36:21 GMT."
             <5743.wsimpson@greendragon.com> 
Subject: Re: OT 1.1.2 trace -- delayed Ack 
Date: Thu, 27 Mar 1997 22:35:35 +0100
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

"William Allen Simpson" <wsimpson@greendragon.com> writes:
>In regards to both queries, you seem to have missed that these
>discussions are all in the sections on delayed Ack.  So, why did they
>discuss holding time, RTT, inter-packet arrival time?  All are applied
>to delayed Ack.

Yes, it is in the section on delayed ACK, but in that section it
is located in the discussion that starts out:

         We will assume that sender of the data uses the optional  algorithm
    described  in  the  TCP  specification,  in which the roundtrip delay is
    measured using an exponential decay smoothing algorithm.  Retransmission
    of a segment occurs if the measured delay for that segment  exceeds  the
    smoothed  average  by  some  factor.  To see how retransmission might be
    triggered, one must consider the pattern  of  segment  arrivals  at  the
    receiver.

The remainder of the section goes on to discuss the conditions under which
a retransmission is triggered. This discusses how the RTO is computed
on the sender side, and what factors should be taken into account,
on the sender side, so that the RTO is not to short, this includes
taking into account how much the receiver delays the ACK and what
the holding time is. This discussion extends until the end of the
section.

Packet interarrival time is discussed somewhat earlier in that section:

         This algorithm will insure that the timer, although set, is  seldom
    used.    The  interval  of  the  timer is related to the expected inter-
    segment delay, which is in turn a function  of  the  particular  network
    through  which  the  data  is  flowing.    For the Arpanet, a reasonable
    interval seems to be 200 to 300 milliseconds.  Appendix A  describes  an
    adaptive algorithm for measuring this delay.

I can find no support in the RFC for placing a lower bound on the timeout,
beyond the discussion that notes that we prefer it if the timeout for
a delayed ack never gets fired. On the other hand, we should also not
delay an ack excessively, since that can cause an unnecessary retransmission
by the sender. I'll come back to this below.

>> Note that you on a pure receiving link you have no RTT measure.
>> If you use the initial srtt and mdev settings on your link you are
>> essentially setting the ATO to a constant on a pure receiver.
>>
>Yes (yielding the ceiling value), and the problem is?

I'll address this below.

>And "pure receiver" happens in which protocol?  TCP always provides at
>least one round trip estimate.

This is not always true. If you have to retransmit your SYN packet you will
have no valid estimate (unless you are using timestamps). Also, if the
function describing your transmission time is dominated by
the length of the packet you are sending (for example if you are
sending over a PPP link), then even if you do get a valid RTT measure
on your SYN handshake it will be much lower than the actual RTT on the
link. On a link with an MTU of 1500 this will be out by a factor
of around 37. This is less of an issue of you are using low granularity
timers, but I think it would be wise to avoid setting any new
requirements that assume that timers only have a resolution of 500 ms.
Now, at this point one might be tempted to argue that this supports placing
a lower bound on the ATO (Ack Timeout), say of 100ms. I think this is
a mistake.

>Remember, it is always OK to have a larger value, as that is the intent
>-- the timer should rarely be used, other events should happen sooner.

If the timer will never get used, don't bother setting it. The reality
is that the timer will get fired, and in fact there are circumstances
in which we want it to get fired.

>All these strategies are to find an acceptable _lower_ bound for more
>interactive traffic.

I disagree, I feel there is no acceptable lower bound. I have two
reasons.

First, delaying ACK's distorts the RTT calculation on the
sender side. If the delay is less than the granularity with which
the sender measures RTT, then the distortion is minimal, although
we can still get some distortion when a delay causes the RTT
measure to increase by 1. If the delay is larger than the
granularity with which the sender measures the RTT, the distortion
is more significant. This leads to an inflated estimate of the RTT
on the sender. On a very high speed network this can be many times
larger than the real round trip time, leading to long fallow periods (relative
to the speed of the transmission medium) during which we are
simply waiting for a retransmission timeout to occur.

Second, any time we actually fire the delayed ack timer, due to a
pause in the stream of packets in the sender, the sooner we fire it,
the better. This is because if we wait to long to fire it, the sender
might conclude that a packet has been lost, and it should now resend,
when in fact no packet has been lost. The problem is worse if
you happen to loose some of the ACKs preceeding the one you are currently
delaying. Again, this is mostly a non issue if you happen to measure RTT
with a granularity of 500 ms. In that case you never retransmit before 500 ms.
If the ATO happens to hit the ceiling of 500 ms then you will still get this
spurious retransmission, but not otherwise. Note that this problem will
almost never happen if you fix the ATO to 200 ms as done in BSD (it can
only happen if some preceeding ACKs got lost).

-- 
Eric Schenk                               www: http://www.dna.lth.se/~erics
Dept. of Comp. Sci., Lund University          email: Eric.Schenk@dna.lth.se
Box 118, S-221 00 LUND, Sweden   fax: +46-46 13 10 21  ph: +46-46 222 96 38

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 28 05:11:53 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id FAA18315 for tcp-impl-list; Fri, 28 Mar 1997 05:10:31 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA18310 for <tcp-impl@relay.engr.SGI.COM>; Fri, 28 Mar 1997 05:10:28 -0800
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id FAA07576 for <tcp-impl@relay.engr.SGI.COM>; Fri, 28 Mar 1997 05:10:23 -0800
Received: (from mouse@localhost) by Twig.Rodents.Montreal.QC.CA (8.7.5/8.7.3) id IAA13523; Fri, 28 Mar 1997 08:06:32 -0500 (EST)
Date: Fri, 28 Mar 1997 08:06:32 -0500 (EST)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199703281306.IAA13523@Twig.Rodents.Montreal.QC.CA>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: PSH
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Looking at the trace I sent, you will note that message 1 packet #3
> is shorter than MSS and has the PSH set, while message 2 packet #2 is
> shorter than MSS and does _not_ have the PSH set.

> In both cases, this occurs at the CRLF CRLF between the SMTP headers
> and the body.  Presumably, multiple application calls to the stack.

> In either case, had the API allowed the application to set PSH,

...nothing in particular would have happened, even if the application
had chosen to take advantage of it.  To cure this particular lossage,
you need to (a) have the stack _not_ set PSH at the end of every
application write, and (b) if the stack has a timeout that would make
it send non-PSHed data anyway, ensure that the application does the
first body write soon enough that this timeout doesn't trip.

The case of mesasge 2 packet #2 is stranger because of the missing PSH;
if the stack sends a less-than-MSS packet without PSH, then either it's
sending into a restricted window[%] or it's broken because it drained
its available data without setting PSH...assuming of course that
something (such as path MTU discovery) hasn't caused it to lower its
effective MSS for the connection.  I'm considering only the case of
message 1 packet #3 here, and I don't have the trace itself at hand.

[%] Even if the sending stack does silly window avoidance, such packets
    can be generated; if the peer insists on advertising a tiny window,
    you've got to eventually decide it's not being silly and send a
    tiny segment into it.

I will note that if the stack _does_ time out and send the last header
packet on its own initiative, this can cause the time between that and
the first body packet to be arbitrarily short, because the application
can write the first body data an arbitrarily short time after this
timeout occurs.

> This is a documented case where the lack of a PSH ([...]) caused
> additional network packet load and concomitant additional
> retransmissions.

More likely it's the stack forcing a PSH at the end of each application
write; less likely than that, it could have been the application being
slow.  Allowing the application layer to set PSH wouldn't help unless
accompanied by having the stack not supply a PSH at the end of every
application data write.

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 28 05:29:14 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id FAA19519 for tcp-impl-list; Fri, 28 Mar 1997 05:27:52 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA19511 for <tcp-impl@relay.engr.SGI.COM>; Fri, 28 Mar 1997 05:27:50 -0800
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id FAA09926 for <tcp-impl@relay.engr.SGI.COM>; Fri, 28 Mar 1997 05:27:44 -0800
Received: (from mouse@localhost) by Twig.Rodents.Montreal.QC.CA (8.7.5/8.7.3) id IAA13551; Fri, 28 Mar 1997 08:23:54 -0500 (EST)
Date: Fri, 28 Mar 1997 08:23:54 -0500 (EST)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199703281323.IAA13551@Twig.Rodents.Montreal.QC.CA>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: OT 1.1.2 trace -- delayed Ack
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

One thing that everyone seems to be missing about this discussion:

>>> Fri Feb 28 22:00:24 1997 - e0 recv:
>>> [...ACK-only packet...]
>>> *** Note that the Ack (above) with no data was immediately followed by
>>>     data (below).  Must not be using delayed Ack, or delay too short.
>>> Fri Feb 28 22:00:24 1997 - e0 recv:
>>> [...ACK PSH data packet...]

>> you are concluding that because the data follows the ACK within a
>> second that this is indicative of insufficient delayed ACK.

> My text concluded that, since your Ack is immediately followed by
> your _own_ application data (no matter how long or short the time),
> that either you are not using delayed Ack _or_ the delay is "too
> short".

("[N]o matter how long or short the time" is clearly ridiculous; if the
delta between those two timestamps had been an hour and a half, nobody
(hopefully not even you) would have taken that as evidence of
insufficiently delayed ACKs.)

More importantly, _that time difference is irrelevant_.  The time delta
that matters is the delta between the ACK-only packet and some
_previous_ event.  No matter what the delayed-ACK delay, it is possible
for the delayed ACK to be finally sent and then some other event (eg,
application write) to happen that causes another packet to be generated
right on the heels of the (delayed but finally sent) ACK-only packet.

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 29 22:40:37 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA00873 for tcp-impl-list; Sat, 29 Mar 1997 21:12:02 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id VAA00796 for <tcp-impl@relay.engr.SGI.COM>; Sat, 29 Mar 1997 21:10:14 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id VAA24415 for <tcp-impl@relay.engr.SGI.COM>; Sat, 29 Mar 1997 21:10:12 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id VAA16073; Sat, 29 Mar 1997 21:00:15 -0800 (PST)
Message-Id: <199703300500.VAA16073@daffy.ee.lbl.gov>
To: touch@ISI.EDU
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: Internet Draft on Known TCP Implementation Problems
In-reply-to: Your message of Thu, 27 Mar 1997 12:51:01 PST.
Date: Sat, 29 Mar 1997 21:00:15 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Some suggestions (minor):

Thanks ...

> URLs are much more transient than RFCs ...

Yes, good point.  There's been some discussion (not on tcp-impl) of having
an "official" IETF Web page for WG documents, so that's where these would
go.  Presumably those would be archival-quality references.  I think at
least so far that just including the traces in the RFC draft itself works
well, so I plan to stick with that if possible.

> >    Category
>      ^^^^^^^^ - change to "Class?"

How about "Classification"?

> Both descriptions should include brief definitions - 

Will do.

> 	for Significance, to describe the semantics
> 		i.e., Critical = REQUIRED (?)
> 		Serious = RECOMMENDED (?)
> 		Non-critical = OPTIONAL (?)

Critical/Serious/Non-critical was suggested as reflecting categories used
by some bug-tracking systems.  I view a problem's significance as different
from required/recommended/optional, as most problems we document are
already specified in these terms in existing RFCs.  Instead, I was thinking
that the significance given in this RFC means: "here's the rough consensus
as to how serious this problem is", as guidance to implementors as to where
they should concentrate their efforts in testing and fixing their TCPs.

In that light,

	Critical = this is a major problem, very high priority to fix
	Serious = do not ignore this problem, fix it as soon as you've
		taken care of any Critical problems
	Non-critical = while a definite problem, this one is not a big
		deal - fix when convenient

> Finally, the use of the term "category" is overloaded - it
> appears to mean both "problem group" and "significance semantics".
> Maybe "Class" or "Type" is better for problem group??

I take it this is now fixed by using "Classification" instead of "Category"
above, let me know if that's not right.

Thanks again for the feedback, much appreciated!

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 31 09:54:12 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA27795 for tcp-impl-list; Mon, 31 Mar 1997 09:52:07 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA27784 for <tcp-impl@relay.engr.SGI.COM>; Mon, 31 Mar 1997 09:52:06 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id JAA01818 for <tcp-impl@relay.engr.SGI.COM>; Mon, 31 Mar 1997 09:52:04 -0800
From: touch@ISI.EDU
Received: from ash.isi.edu by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA19125>; Mon, 31 Mar 1997 09:48:08 -0800
Date: Mon, 31 Mar 1997 09:48:07 -0800
Posted-Date: Mon, 31 Mar 1997 09:48:07 -0800
Message-Id: <199703311748.AA01901@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA01901>; Mon, 31 Mar 1997 09:48:07 -0800
To: touch@ISI.EDU, vern@ee.lbl.gov
Subject: Re: Internet Draft on Known TCP Implementation Problems
Cc: tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Vern Paxson <vern@ee.lbl.gov>
> 
> > >    Category
> >      ^^^^^^^^ - change to "Class?"
> 
> How about "Classification"?

Sure.

> > Both descriptions should include brief definitions - 
> 
> Will do.
> 
> > 	for Significance, to describe the semantics
> > 		i.e., Critical = REQUIRED (?)
> > 		Serious = RECOMMENDED (?)
> > 		Non-critical = OPTIONAL (?)
> 
> 	Critical = this is a major problem, very high priority to fix
> 	Serious = do not ignore this problem, fix it as soon as you've
> 		taken care of any Critical problems
> 	Non-critical = while a definite problem, this one is not a big
> 		deal - fix when convenient

This presumes that all bugs MUST be fixed. Given that these
are recommendations, I'd like some hint at:

	MUST fix this to be correct

	SHOULD fix this for performance reasons or to avoid *your* lockup

	MAY fix this to help everyone out and help yourself in the process
		but we really can't say definitively that it's the
		only way to solve the problem.

I don't know how to effect this, but would prefer
some "strength" categories, to tell implementers when
they hit the knee in the curve...

Joe
 
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 31 11:42:02 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA03773 for tcp-impl-list; Mon, 31 Mar 1997 11:40:05 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA03753 for <tcp-impl@relay.engr.SGI.COM>; Mon, 31 Mar 1997 11:40:02 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA00037 for <tcp-impl@relay.engr.SGI.COM>; Mon, 31 Mar 1997 11:39:58 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id LAA17233; Mon, 31 Mar 1997 11:29:31 -0800 (PST)
Message-Id: <199703311929.LAA17233@daffy.ee.lbl.gov>
To: touch@ISI.EDU
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: Internet Draft on Known TCP Implementation Problems
In-reply-to: Your message of Mon, 31 Mar 1997 09:48:07 PST.
Date: Mon, 31 Mar 1997 11:29:31 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> This presumes that all bugs MUST be fixed.

Not "MUST" in the RFC sense - I view this document as turning into a BCP,
not a standard.  Many of the problems already are MUSTs in the standard
sense because of existing RFCs.  But some of these are less serious than
others.  The idea here is to convey a sense of urgency and to aid in
prioritizing efforts.

> 	MUST fix this to be correct

My thinking is instead: "Critical that you fix this: you are hammering the
Internet with avoidable congestion; or you are breaking TCP reliability; or
you often won't interoperate; or you are open to a massive security hole,
or ..."

> 	MAY fix this to help everyone out and help yourself in the process
> 		but we really can't say definitively that it's the
> 		only way to solve the problem.

This document is primarily about identifying problems, with some discussion
of how to solve problems when there's collective wisdom/experience from
the WG.  So I don't see the above, with an emphasis on solving the problem,
as fitting with that notion.  It would though be a natural caveat to go with
the section in the description discussing how to fix the problem.

Similarly, I don't see us documenting problems that for which we don't find
rough consensus that the behavior is indeed a problem.

> I don't know how to effect this, but would prefer
> some "strength" categories, to tell implementers when
> they hit the knee in the curve...

Right, that's what I'm striving for with critical/serious/non-critical ...

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 31 13:22:23 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA00910 for tcp-impl-list; Mon, 31 Mar 1997 13:20:53 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA00900 for <tcp-impl@relay.engr.SGI.COM>; Mon, 31 Mar 1997 13:20:51 -0800
Received: from mail1.digital.com (mail1.digital.com [204.123.2.50]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id NAA27400 for <tcp-impl@relay.engr.SGI.COM>; Mon, 31 Mar 1997 13:20:50 -0800
Received: from pachyderm.pa.dec.com by mail1.digital.com (5.65 EXP 4/12/95 for V3.2/1.0/WV)
	id AA08090; Mon, 31 Mar 1997 13:13:09 -0800
Received: by pachyderm.pa.dec.com; id AA08907; Mon, 31 Mar 1997 13:13:22 -0800
Date: Mon, 31 Mar 1997 13:13:22 -0800
From: jg@pa.dec.com (Jim Gettys)
Message-Id: <9703312113.AA08907@pachyderm.pa.dec.com>
X-Mailer: Pachyderm (client tunsrv2-tunnel.imc.das.dec.com)
To: Curtis Villamizar <curtis@ans.net>
Cc: touch@ISI.EDU, F.Potorti@cnuce.cnr.it, tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP buffers 
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

It is the client that needs control here, not the server. It is the
client that needs control over latency, which is application dependent
(i.e. a browser versus a web crawler).

I'm explicitly asking for clients to be able to control the window size, 
only possible to do indirectly today by messing with the socket buffer 
size, which isn't the same thing as being able to control the window size.
				- Jim



From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 31 20:35:19 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA23300 for tcp-impl-list; Mon, 31 Mar 1997 20:33:16 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id UAA23295 for <tcp-impl@relay.engr.SGI.COM>; Mon, 31 Mar 1997 20:33:14 -0800
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id UAA28621 for <tcp-impl@relay.engr.SGI.COM>; Mon, 31 Mar 1997 20:33:08 -0800
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.7.5/8.7.3) with ESMTP id XAA26760; Mon, 31 Mar 1997 23:27:20 -0500 (EST)
Message-Id: <199704010427.XAA26760@brookfield.ans.net>
To: jg@pa.dec.com (Jim Gettys)
cc: Curtis Villamizar <curtis@ans.net>, touch@ISI.EDU, F.Potorti@cnuce.cnr.it,
        tcp-impl@relay.engr.SGI.COM
Reply-To: curtis@ans.net
Subject: Re: TCP buffers 
In-reply-to: Your message of "Mon, 31 Mar 1997 13:13:22 PST."
             <9703312113.AA08907@pachyderm.pa.dec.com> 
Date: Mon, 31 Mar 1997 23:27:20 -0500
From: Curtis Villamizar <curtis@ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <9703312113.AA08907@pachyderm.pa.dec.com>, Jim Gettys writes:
> It is the client that needs control here, not the server. It is the
> client that needs control over latency, which is application dependent
> (i.e. a browser versus a web crawler).
> 
> I'm explicitly asking for clients to be able to control the window size, 
> only possible to do indirectly today by messing with the socket buffer 
> size, which isn't the same thing as being able to control the window size.
> 				- Jim


Jim,

HTTP is a bulk transfer and what you are asking for seems to be an
artificially slowed bulk transfer but with no sound reasoning for
wanting it.  If the reason is so as not to affect more interactive
activity, then there are much better ways of accomplishing that goal
that involve no changes to either the client or server.

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 31 21:57:29 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA01865 for tcp-impl-list; Mon, 31 Mar 1997 21:55:57 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id VAA01861 for <tcp-impl@relay.engr.SGI.COM>; Mon, 31 Mar 1997 21:55:54 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id VAA09512 for <tcp-impl@relay.engr.SGI.COM>; Mon, 31 Mar 1997 21:55:51 -0800
From: touch@ISI.EDU
Received: from ash.isi.edu by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA02532>; Mon, 31 Mar 1997 21:52:04 -0800
Date: Mon, 31 Mar 1997 21:52:03 -0800
Posted-Date: Mon, 31 Mar 1997 21:52:03 -0800
Message-Id: <199704010552.AA20163@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA20163>; Mon, 31 Mar 1997 21:52:03 -0800
To: curtis@ans.net, jg@pa.dec.com
Subject: Re: TCP buffers
Cc: F.Potorti@cnuce.cnr.it, tcp-impl@relay.engr.SGI.COM, touch@ISI.EDU
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Date: Mon, 31 Mar 1997 23:27:20 -0500
> From: Curtis Villamizar <curtis@ans.net>
> 
> In message <9703312113.AA08907@pachyderm.pa.dec.com>, Jim Gettys writes:
> > I'm explicitly asking for clients to be able to control the window size, 
> > only possible to do indirectly today by messing with the socket buffer 
> > size, which isn't the same thing as being able to control the window size.

> HTTP is a bulk transfer and what you are asking for seems to be an

HTTP is an application layer protocol, and
indicates the request/response only; it presumes
an underlying reliable transport protocol.

TCP, the transport protocol that HTTP runs over, is
stream oriented (not quite bulk - it's designed for
both interactive and bulk transfer).

Joe


----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr  1 08:04:14 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA03136 for tcp-impl-list; Tue, 1 Apr 1997 08:02:31 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA03127 for <tcp-impl@relay.engr.SGI.COM>; Tue, 1 Apr 1997 08:02:26 -0800
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA12517 for <tcp-impl@relay.engr.SGI.COM>; Tue, 1 Apr 1997 08:02:19 -0800
Received: (from mouse@localhost) by Twig.Rodents.Montreal.QC.CA (8.7.5/8.7.3) id KAA23362; Tue, 1 Apr 1997 10:58:21 -0500 (EST)
Date: Tue, 1 Apr 1997 10:58:21 -0500 (EST)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199704011558.KAA23362@Twig.Rodents.Montreal.QC.CA>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: Keep-Alive size
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>> perhaps the correct wording should have been "resend the item with
>> the last acked sequence number -- be it a data byte or a SYN.

>> However another problem comes to mind -- the sender might not have
>> the "last byte" available anymore since, once acked, data can be
>> flushed  [...]

Gee, I guess a stack that wants to use this technique for keepalives
will have to keep it around, eh?

Or, since such a keepalive will always be outside the window (since it
duplicates a sequence number that's already been acked - if you have
unacked data outstanding, keepalives aren't even an issue), you can use
any value you please for the single data byte, as you suggest:

>> One can send a random byte of course, but then the receiver might
>> really become suspicious....

Suspicious?  I hardly think that, if we recommend doing this, being
suspicious of seeing such packets is justified. :-)

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr  1 08:09:54 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA04494 for tcp-impl-list; Tue, 1 Apr 1997 08:08:45 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA04477 for <tcp-impl@relay.engr.SGI.COM>; Tue, 1 Apr 1997 08:08:43 -0800
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA14705 for <tcp-impl@relay.engr.SGI.COM>; Tue, 1 Apr 1997 08:08:37 -0800
Received: (from mouse@localhost) by Twig.Rodents.Montreal.QC.CA (8.7.5/8.7.3) id LAA23394; Tue, 1 Apr 1997 11:04:48 -0500 (EST)
Date: Tue, 1 Apr 1997 11:04:48 -0500 (EST)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199704011604.LAA23394@Twig.Rodents.Montreal.QC.CA>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: Keep-Alive size
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I'm seeing implementations which send the first byte of pending data
> in a keepalive [...]

Perhaps I've misunderstood.  I've been operating on the assumption that
"keepalive" refers to something akin to the BSD SO_KEEPALIVE option, a
way for one TCP stack to ensure that if the peer crashes (or the
network goes away, which is indistinguishable), it notices, even in the
absence of any traffic to send.  As such, keepalives are not an issue
if there is any pending data.

The above quote, though, certainly makes it appear that not everyone
shares this interpretation of the term.  So, it looks as though we
first need to settle the question:

	What's a keepalive?

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr  1 10:59:22 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA12944 for tcp-impl-list; Tue, 1 Apr 1997 10:56:26 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA12922 for <tcp-impl@relay.engr.SGI.COM>; Tue, 1 Apr 1997 10:56:23 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA25651 for <tcp-impl@relay.engr.SGI.COM>; Tue, 1 Apr 1997 10:56:13 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id TAA04551; Tue, 1 Apr 1997 19:51:27 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wC7BZ-0005FHC; Tue, 1 Apr 97 18:17 BST
Message-Id: <m0wC7BZ-0005FHC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Internet Draft on Known TCP Implementation Problems
To: vern@ee.lbl.gov (Vern Paxson)
Date: Tue, 1 Apr 1997 18:17:29 +0100 (BST)
Cc: touch@ISI.EDU, tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199703311929.LAA17233@daffy.ee.lbl.gov> from "Vern Paxson" at Mar 31, 97 11:29:31 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > This presumes that all bugs MUST be fixed.
> Not "MUST" in the RFC sense - I view this document as turning into a BCP,
> not a standard.  Many of the problems already are MUSTs in the standard
> sense because of existing RFCs.  But some of these are less serious than
> others.  The idea here is to convey a sense of urgency and to aid in
> prioritizing efforts.

For the document to have the desired effect it needs the clear MUST/SHALL
notion so that end users can ask vendors "Are you XYZ compliant", and write
working TCP into their tenders.

Alan

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr  1 10:59:25 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA13346 for tcp-impl-list; Tue, 1 Apr 1997 10:57:27 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA13324 for <tcp-impl@relay.engr.SGI.COM>; Tue, 1 Apr 1997 10:57:24 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA25993 for <tcp-impl@relay.engr.SGI.COM>; Tue, 1 Apr 1997 10:56:58 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id TAA04553; Tue, 1 Apr 1997 19:51:44 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wC7F2-0005FHC; Tue, 1 Apr 97 18:21 BST
Message-Id: <m0wC7F2-0005FHC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TCP buffers
To: jg@pa.dec.com (Jim Gettys)
Date: Tue, 1 Apr 1997 18:21:03 +0100 (BST)
Cc: curtis@ans.net, touch@ISI.EDU, F.Potorti@cnuce.cnr.it,
        tcp-impl@relay.engr.SGI.COM
In-Reply-To: <9703312113.AA08907@pachyderm.pa.dec.com> from "Jim Gettys" at Mar 31, 97 01:13:22 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I'm explicitly asking for clients to be able to control the window size, 
> only possible to do indirectly today by messing with the socket buffer 
> size, which isn't the same thing as being able to control the window size.

Linux doesn't implement this for processes - partly on the belief that giving
users the ability to break things like window sizing is a bad move. However
window sizes are important so the OS allows you to set max window sizes on
routes. Thus for AX.25 the max window is normally set to 2*MTU to cope with
the link


From owner-tcp-impl@relay.engr.sgi.com  Tue Apr  1 10:59:25 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA12896 for tcp-impl-list; Tue, 1 Apr 1997 10:56:20 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA12843 for <tcp-impl@relay.engr.SGI.COM>; Tue, 1 Apr 1997 10:56:13 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA25271 for <tcp-impl@relay.engr.SGI.COM>; Tue, 1 Apr 1997 10:54:53 -0800
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id TAA04559; Tue, 1 Apr 1997 19:52:09 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wC7cN-0005FHC; Tue, 1 Apr 97 18:45 BST
Message-Id: <m0wC7cN-0005FHC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Keep-Alive size
To: mouse@Rodents.Montreal.QC.CA (der Mouse)
Date: Tue, 1 Apr 1997 18:45:11 +0100 (BST)
Cc: tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199704011558.KAA23362@Twig.Rodents.Montreal.QC.CA> from "der Mouse" at Apr 1, 97 10:58:21 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Or, since such a keepalive will always be outside the window (since it
> duplicates a sequence number that's already been acked - if you have
> unacked data outstanding, keepalives aren't even an issue), you can use
> any value you please for the single data byte, as you suggest:

Nothing says a stack may not use duplicate data over the original. There are
some very simple embedded stacks that do this to save code. They just do
something akin to

	offset=diff_seq(buff_start, tcp->seq);
	len=tcp->len;
	len=min(len, buffer_size-offset);
	memcpy(...)

Alan


From owner-tcp-impl@relay.engr.sgi.com  Tue Apr  1 13:42:14 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA00951 for tcp-impl-list; Tue, 1 Apr 1997 13:40:11 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA00914 for <tcp-impl@relay.engr.SGI.COM>; Tue, 1 Apr 1997 13:40:08 -0800
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA14697 for <tcp-impl@relay.engr.SGI.COM>; Tue, 1 Apr 1997 13:40:03 -0800
Received: (from mouse@localhost) by Twig.Rodents.Montreal.QC.CA (8.7.5/8.7.3) id QAA24050; Tue, 1 Apr 1997 16:36:08 -0500 (EST)
Date: Tue, 1 Apr 1997 16:36:08 -0500 (EST)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199704012136.QAA24050@Twig.Rodents.Montreal.QC.CA>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: Keep-Alive size
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>> Or, since such a keepalive will always be outside the window (since
>> it duplicates a sequence number that's already been acked [...]),
>> you can use any value you please for the single data byte, [...]

> Nothing says a stack may not use duplicate data over the original.

Even for a completely out-of-window segment?

> There are some very simple embedded stacks that do this to save code.
> They just do something akin to

> 	offset=diff_seq(buff_start, tcp->seq);
> 	len=tcp->len;
> 	len=min(len, buffer_size-offset);
> 	memcpy(...)

They shouldn't do this for segments that are entirely outside the
window.  (The discussion at the top of page 69 of RFC 793 specifies
that unacceptable segments like this are to do nothing more than
possibly generate an ACK.  While this may have been updated in various
ways, I have trouble imagining any update specifying that anything but
drop-and-return be done with completely-out-of-window segments.  Has
any update affected this?)

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 00:18:04 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA15411 for tcp-impl-list; Wed, 2 Apr 1997 00:16:14 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA15406 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 00:16:11 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id AAA10222 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 00:16:09 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id AAA22100; Wed, 2 Apr 1997 00:06:14 -0800 (PST)
Message-Id: <199704020806.AAA22100@daffy.ee.lbl.gov>
To: tcp-impl@relay.engr.SGI.COM
Subject: scheduling and agenda for Memphis
Date: Wed, 02 Apr 1997 00:06:14 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

It looks like our final slot is 1:30PM-3PM on *Monday*, April 7.  Note that
this is different than I earlier indicated - sorry about that.  Also note that
the slot in the schedule is shown as starting at 1PM, but we will instead be
starting at 1:30PM, and going for 90 minutes instead of 2 hours.

I've been told that the room is one of the MBone rooms.

I've appended the proposed agenda.  One of the items is to briefly discuss
implementation problems that need documenting.  This list is for sure
incomplete, so bring suggestions or float them by the mailing list between
now and Monday.  Note that I'll be out of town between now and the meeting,
and I've already made up viewgraphs based on the stuff below - this doesn't
mean we can't accommodate agenda changes, just that I might not be able to
comment on them on the list or alter my viewgraphs to fit them.  Steve will
be reading the list and can make up viewgraph tweaks through Friday, I believe.

See you next Monday ...

		Vern


Charter, scope & milestones [15 min]
"Naming names" policy [5 min]
Problem description format [20 min]
Problems documented in the I-D [10 min]
Problems awaiting description (and email haggling) [20 min]
	Initial RTO too low
	CWND uninitialized
	Initial slow-start with 2 packets (minor, maybe soon obsolete)
	Delay ack violations (esp., > 2 MSS)
	Failure to set PSH when send buffer drains
	Brakmo/Peterson header prediction bug
	Brakmo/Peterson deflation fencepost bug
	Fast retransmit w/ timestamp options sends 2 packets
	Number of keep-alives sent (Dawson et al paper)
	Whether keep-alive is 0 bytes or 1 byte
		(acking below-sequence pure acks?)
	Failing to ack above-sequence data
	Predictable initial sequence number
	Ameliorating SYN flooding
	Nagle algorithm
	Performance: implementing fast retransmit & recovery
	Replies to random ack frames (stealth scanning)
	RTO estimation on slow links
	ICMP handling
	Half-duplex close, ignoring subsequent traffic?
	Urgent pointer confusion

Calling all testing tools [10 min]
	Need to document
	Encourage development of new tools
		maybe simple raw socket interface for testing
			particular problems?
		(but how do you get the host TCP to shut up?)

Calling all volunteers ... [10 min]

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 00:31:22 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA17101 for tcp-impl-list; Wed, 2 Apr 1997 00:29:50 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA17082 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 00:29:36 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id AAA11909 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 00:29:34 -0800
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id AAA22200; Wed, 2 Apr 1997 00:19:22 -0800 (PST)
Message-Id: <199704020819.AAA22200@daffy.ee.lbl.gov>
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: Internet Draft on Known TCP Implementation Problems
In-reply-to: Your message of Tue, 01 Apr 1997 18:17:29 PST.
Date: Wed, 02 Apr 1997 00:19:22 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> For the document to have the desired effect it needs the clear MUST/SHALL
> notion so that end users can ask vendors "Are you XYZ compliant", and write
> working TCP into their tenders.

I think that's already taken care of by the existing RFCs that specify
TCP.  They're full of MUSTs etc.  The goal for this I-D is to document ways
in which implementations can fall short *in practice*, as an aid for
implementors to think about how to improve their implementations.  It can
also be used as an end user's checklist of vendor requirements, since it
highlights areas that TCPs are known to often have problems.  But this
document doesn't need the MUST/SHOULD/etc weight, that's already provided
by other documents.

I.e., I see this I-D as evolving into a BCP RFC, and not a standards RFC.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 04:03:15 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id EAA04380 for tcp-impl-list; Wed, 2 Apr 1997 04:01:47 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id EAA04375 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 04:01:45 -0800
Received: from kalae.kohala.com (kalae.kohala.com [206.62.226.35]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id EAA10724 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 04:01:44 -0800
Received: from kohala.kohala.com (kohala.kohala.com [206.62.226.33]) by kalae.kohala.com (8.8.5/8.7.3) with ESMTP id EAA09697 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 04:57:05 -0700 (MST)
Received: (from rstevens@localhost) by kohala.kohala.com (8.8.5/8.8.3) id EAA19350 for tcp-impl@relay.engr.SGI.COM; Wed, 2 Apr 1997 04:57:04 -0700 (MST)
Message-Id: <199704021157.EAA19350@kohala.kohala.com>
From: rstevens@kohala.com (W. Richard Stevens)
Date: Wed, 2 Apr 1997 04:57:04 -0700
Reply-To: "W. Richard Stevens" <rstevens@kohala.com>
X-Phone: +1 520 297 9416
X-Homepage: http://www.noao.edu/~rstevens
X-Mailer: Mail User's Shell (7.2.6 beta(3) 11/17/96)
To: tcp-impl@relay.engr.SGI.COM
Subject: testing tools
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

[In Vern's message of Apr  2, 12:06am he writes:]
> 
> Calling all testing tools [10 min]
> 	Need to document
> 	Encourage development of new tools
> 		maybe simple raw socket interface for testing
> 			particular problems?
> 		(but how do you get the host TCP to shut up?)

I have such a test program and I use a raw socket to write my own TCP
segments, and then libpcap (e.g., BPF on a BSD/OS system) to read back
the replies.  The way I shup up my TCP, to keep it from sending back
RSTs to all the replies to all the segments that I generated, is with
the following kernel hack:

        /*
         * Locate pcb for segment.
         */
findpcb:
        /* Following hack to let me read and write my own TCP segments
           using BPF, without confusing kernel.  Just patch tcp_ignport
           (at beginning of this file) to desired value. */
        if (htons(tcp_ignport) &&
            (htons(tcp_ignport) == ti->ti_dport ||
             htons(tcp_ignport) == ti->ti_sport))
                goto drop;

I could never figure out another way to do this.

	Rich Stevens

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 04:21:28 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id EAA05572 for tcp-impl-list; Wed, 2 Apr 1997 04:20:07 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id EAA05555 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 04:20:04 -0800
Received: from caipfs.rutgers.edu (caipfs.rutgers.edu [128.6.155.100]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id EAA12517 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 04:20:01 -0800
Received: from jenolan.caipgeneral (jenolan.rutgers.edu [128.6.111.5])
	by caipfs.rutgers.edu (8.8.5/8.8.5) with SMTP id HAA15138;
	Wed, 2 Apr 1997 07:16:08 -0500 (EST)
Received: by jenolan.caipgeneral (SMI-8.6/SMI-SVR4)
	id HAA00347; Wed, 2 Apr 1997 07:15:11 -0500
Date: Wed, 2 Apr 1997 07:15:11 -0500
Message-Id: <199704021215.HAA00347@jenolan.caipgeneral>
From: "David S. Miller" <davem@jenolan.rutgers.edu>
To: rstevens@kohala.com
CC: tcp-impl@relay.engr.SGI.COM
In-reply-to: <199704021157.EAA19350@kohala.kohala.com> (rstevens@kohala.com)
Subject: Re: testing tools
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   From: rstevens@kohala.com (W. Richard Stevens)
   Date: Wed, 2 Apr 1997 04:57:04 -0700

   I could never figure out another way to do this.

I would not be surprised if some trick could be played with IP
masquerading to forward a specific port (or set of ports) to
/dev/null... someone else probably knows better than I about this
though...

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 04:45:54 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id EAA07229 for tcp-impl-list; Wed, 2 Apr 1997 04:44:24 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id EAA07221 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 04:44:21 -0800
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id EAA15412 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 04:44:17 -0800
Received: (from mouse@localhost) by Twig.Rodents.Montreal.QC.CA (8.7.5/8.7.3) id HAA25950; Wed, 2 Apr 1997 07:40:26 -0500 (EST)
Date: Wed, 2 Apr 1997 07:40:26 -0500 (EST)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199704021240.HAA25950@Twig.Rodents.Montreal.QC.CA>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: testing tools
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>> [...]
> I have such a test program and I use a raw socket to write my own TCP
> segments, and then libpcap (e.g., BPF on a BSD/OS system) to read
> back the replies.  The way I shup up my TCP, to keep it from sending
> back RSTs to all the replies to all the segments that I generated, is
> with the following kernel hack:  [...ignore a particular port...]

> I could never figure out another way to do this.

Why not just use a different IP address?  Allocate another address on
the same subnet, then either go promiscuous and handle arps yourself or
stick in an arp entry for that address with your local MAC address (or
equivalent for non-ethernet).  Since the kernel doesn't know anything
about that address, it will toss those packets early; if you sysctl
net.inet.ip.forwarding=0, it won't try to forward them.  Have libpcap
filter out anything not to that IP address....

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 09:32:46 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA15162 for tcp-impl-list; Wed, 2 Apr 1997 09:30:24 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA15154 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 09:30:22 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA09738 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 09:30:20 -0800
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id JAA16952; Wed, 2 Apr 1997 09:20:25 -0800
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id JAA15487; Wed, 2 Apr 1997 09:20:22 -0800
Received: from fstop. by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id JAA13418; Wed, 2 Apr 1997 09:20:21 -0800
Received: from fstop by fstop. (SMI-8.6/SMI-SVR4)
	id JAA15914; Wed, 2 Apr 1997 09:19:34 -0800
Message-Id: <199704021719.JAA15914@fstop.>
From: Steve Parker <sparker@Eng.Sun.COM>
To: Vern Paxson <vern@ee.lbl.gov>
cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: scheduling and agenda for Memphis 
Date: Wed, 02 Apr 1997 09:19:33 -0800
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


- Calling all testing tools [10 min]
- 	Need to document
- 	Encourage development of new tools
- 		maybe simple raw socket interface for testing
- 			particular problems?
- 		(but how do you get the host TCP to shut up?)

My packet shell tool on SunOS 5.x takes advantage of an undocumented way
to accomplish this on our stack.  (It's a side-effect of the STREAMS
architecture and the decision to keep IP as a separate module from the
transports that makes this fall out.)

In fact, I consider it a probelm right now for my packet shell tool that
I can't as easily provide a BSD-stack equivalent method short of der Mouse's
to use a separate IP address, provide ARP, etc.  I can't expect everyone
to have a SunOS system to run this stuff with...

I think TCP testing in general would advance if all us host vendors
could all agree on and provide a method under sockets that would allow
sending and receiving "raw" TCP packets.  If others are interested in
pursuing this, please contact me via e-mail or at Memphis.

Cheers,

	~sparker

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 11:12:12 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA11477 for tcp-impl-list; Wed, 2 Apr 1997 11:09:22 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA11467 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 11:09:20 -0800
Received: from mail2.digital.com (mail2.digital.com [204.123.2.56]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id LAA07087 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 11:09:19 -0800
Received: from pachyderm.pa.dec.com by mail2.digital.com (5.65 EXP 4/12/95 for V3.2/1.0/WV)
	id AA11141; Wed, 2 Apr 1997 08:37:23 -0800
Received: by pachyderm.pa.dec.com; id AA03925; Wed, 2 Apr 1997 08:37:38 -0800
Date: Wed, 2 Apr 1997 08:37:38 -0800
From: jg@pa.dec.com (Jim Gettys)
Message-Id: <9704021637.AA03925@pachyderm.pa.dec.com>
X-Mailer: Pachyderm (client akocstdhcp88-115.ako.dec.com)
To: Curtis Villamizar <curtis@ans.net>
Cc: Curtis Villamizar <curtis@ans.net>, touch@ISI.EDU, F.Potorti@cnuce.cnr.it,
        tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP buffers 
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

The reality of HTTP is that it is an interactive bulk transport
protocol; i.e. there are users, who often changes their minds as a result
of what they see on the screen, and then surf on to another page
(or further down the screen, with a different set of embedded graphics
to fetch).

Control of latency by a browser is therefore important.  This isn't just
file transfer...

Now if you'd like the Web to go off and invent its own transport protocol
rather than TCP, I'm sure there are people in the Web community who'd be
happy to do so.  Just don't count on them understanding congestion and
flow control issues very well. :-(.  And maybe this is the right thing
to do ultimately.  But there be dragons there, as this mailing list certainly
knows.

				- Jim

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 11:36:35 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA18760 for tcp-impl-list; Wed, 2 Apr 1997 11:34:37 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA18750 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 11:34:35 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id LAA13713 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 11:34:33 -0800
From: touch@ISI.EDU
Received: from ash.isi.edu by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA19735>; Wed, 2 Apr 1997 11:30:47 -0800
Date: Wed, 2 Apr 1997 11:30:41 -0800
Posted-Date: Wed, 2 Apr 1997 11:30:41 -0800
Message-Id: <199704021930.AA15701@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA15701>; Wed, 2 Apr 1997 11:30:41 -0800
To: curtis@ans.net, jg@pa.dec.com
Subject: Re: TCP buffers
Cc: touch@ISI.EDU, F.Potorti@cnuce.cnr.it, tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: jg@pa.dec.com (Jim Gettys)
> Subject: Re: TCP buffers 
> 
> The reality of HTTP is that it is an interactive bulk transport
> protocol; i.e. there are users, who often changes their minds as a result
> of what they see on the screen, and then surf on to another page
> (or further down the screen, with a different set of embedded graphics
> to fetch).
> 
> Control of latency by a browser is therefore important.  This isn't just
> file transfer...

TCP isn't just file transfer either. The Nagle optimizations, for
example, are aimed at interactive traffic, such as Telnet. Not for
bulk.

> Now if you'd like the Web to go off and invent its own transport protocol
> rather than TCP, I'm sure there are people in the Web community who'd be
> happy to do so.  Just don't count on them understanding congestion ann
> flow control issues very well. :-(.  And maybe this is the right thing
> to do ultimately.  But there be dragons there, as this mailing list certainly
> knows.

Why the Web community? Isn't that how we got "http is better than FTP,
because we usually only ever get one file and don't want to take
the 2-rtt hit for opening the FTP connection"  (see the original
Berners-Lee papers) - then started doing 'persistent connections',
essentially re-inventing much of what FTP did right in the first place?

Maybe the Web community should give requirements to the transport
community, that does understand the issues of congestion
and flow control, and needs to understand the issues of a
new class of applications with new transport requirements...

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 11:41:08 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA20262 for tcp-impl-list; Wed, 2 Apr 1997 11:39:49 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA20248 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 11:39:47 -0800
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA15021 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 11:39:45 -0800
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id LAA16021; Wed, 2 Apr 1997 11:33:18 -0800 (PST)
Message-Id: <199704021933.LAA16021@aland.bbn.com>
To: jg@pa.dec.com (Jim Gettys)
cc: Curtis Villamizar <curtis@ans.net>, touch@ISI.EDU, F.Potorti@cnuce.cnr.it,
        tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP buffers 
In-reply-to: Your message of Wed, 02 Apr 97 08:37:38 -0800.
             <9704021637.AA03925@pachyderm.pa.dec.com> 
Date: Wed, 02 Apr 97 11:33:18 -0800
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


    The reality of HTTP is that it is an interactive bulk transport
    protocol; i.e. there are users, who often changes their minds as a result
    of what they see on the screen, and then surf on to another page
    (or further down the screen, with a different set of embedded graphics
    to fetch).

    Control of latency by a browser is therefore important.  This isn't just
    file transfer...

Jim:
    
    Sanity check for me here.  I don't see how this affects window size
    management by the client unless the client is severely CPU limited.

    Let me go through the separate problems.  I assume in all cases the
    window is equal to the delay*bandwidth product (if it is smaller, the
    problems get less bad, if it is bigger, it is never filled, so this
    seems the right spot to pick)

    1. Client goes to another page on same server

	- their request gets to sent to remote server immediately
	(the reverse traffic is small and completely under user control,
	so failure to achieve immediate transmission is a client application
	programming error if Nagle is turned off, nothing to do with window
	size)

	- server gets request in 1/2 RTT and starts sending

	- meanwhile client discards all data up to new page

	- result is client sees new page in 1 RTT which is the goal.
	Furthermore since client has been draining the window assiduously,
	the page comes at full bandwidth.

    2. Client goes to another page on another server

	- TCP connection to new server established
	- request gets sent to new server
	- replies start back, subject to slow start
	- bandwidth less than full until slow start over

    3. Client is going through proxy (makes case 2 into case 1)

    4. Client goes further down page, another version of case 1.

What am I missing?

Thanks!

Craig

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 12:44:07 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA06166 for tcp-impl-list; Wed, 2 Apr 1997 12:42:15 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA06135 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 12:42:11 -0800
Received: from mail2.digital.com (mail2.digital.com [204.123.2.56]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA00913 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 12:41:28 -0800
Received: from pachyderm.pa.dec.com by mail2.digital.com (5.65 EXP 4/12/95 for V3.2/1.0/WV)
	id AA04981; Wed, 2 Apr 1997 12:25:35 -0800
Received: by pachyderm.pa.dec.com; id AA09973; Wed, 2 Apr 1997 12:25:50 -0800
Date: Wed, 2 Apr 1997 12:25:50 -0800
From: jg@pa.dec.com (Jim Gettys)
Message-Id: <9704022025.AA09973@pachyderm.pa.dec.com>
X-Mailer: Pachyderm (client akocstdhcp88-115.ako.dec.com)
To: Craig Partridge <craig@aland.bbn.com>
Cc: Curtis Villamizar <curtis@ans.net>, touch@ISI.EDU, F.Potorti@cnuce.cnr.it,
        tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP buffers 
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Bear with me here, folks: I'm not a true TCP guru...  I may be confused
(probably am).  You may be missing nothing, and I may be confused on how
TCP works.

Craig says:

>      The reality of HTTP is that it is an interactive bulk transport
>      protocol; i.e. there are users, who often changes their minds as a result
>      of what they see on the screen, and then surf on to another page
>      (or further down the screen, with a different set of embedded graphics
>      to fetch).
>  
>      Control of latency by a browser is therefore important.  This isn't just
>      file transfer...
>  
>  Jim:
>  
>      Sanity check for me here.  I don't see how this affects window size
>      management by the client unless the client is severely CPU limited.
>  
>      Let me go through the separate problems.  I assume in all cases the
>      window is equal to the delay*bandwidth product (if it is smaller, the
>      problems get less bad, if it is bigger, it is never filled, so this
>      seems the right spot to pick)
>  
>      1. Client goes to another page on same server
>  
>  	- their request gets to sent to remote server immediately
>  	(the reverse traffic is small and completely under user control,
>  	so failure to achieve immediate transmission is a client application
>  	programming error if Nagle is turned off, nothing to do with window
>  	size)
>  
>  	- server gets request in 1/2 RTT and starts sending
>  
>  	- meanwhile client discards all data up to new page
>  
>  	- result is client sees new page in 1 RTT which is the goal.
>  	Furthermore since client has been draining the window assiduously,
>  	the page comes at full bandwidth.

The question is how much data gets queued in the final router driving 
the PPP link to the end user.  This is where most of the delay occurs, 
and where the packets congregate.  The question is: how many packets can 
congregate in this final router?  There is no way for a client to discard 
them, once they have been sent (and the client wouldn't want to in any 
case; he'd just as soon put the data in its cache).  Any packets sent 
will accumulate there.

Now, the question is therefore how many packets might get queued in this
router.  With current browser behavior, a client might have 4 times as much
data in flight (over 4 separate connections; that is how the current browsers
work).  With HTTP/1.1, things might be better, as a good client might be
using just one connection.
>  
>      2. Client goes to another page on another server
>  
>  	- TCP connection to new server established
>  	- request gets sent to new server
>  	- replies start back, subject to slow start
>  	- bandwidth less than full until slow start over
>  
>      3. Client is going through proxy (makes case 2 into case 1)
>  
>      4. Client goes further down page, another version of case 1.
>  
>  What am I missing?
>  
>  Thanks!
>  
>  Craig

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 12:45:53 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA06819 for tcp-impl-list; Wed, 2 Apr 1997 12:44:49 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA06801 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 12:44:47 -0800
Received: from mail2.digital.com (mail2.digital.com [204.123.2.56]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA01578 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 12:44:45 -0800
Received: from pachyderm.pa.dec.com by mail2.digital.com (5.65 EXP 4/12/95 for V3.2/1.0/WV)
	id AA05214; Wed, 2 Apr 1997 12:31:23 -0800
Received: by pachyderm.pa.dec.com; id AA15584; Wed, 2 Apr 1997 12:31:39 -0800
Date: Wed, 2 Apr 1997 12:31:39 -0800
From: jg@pa.dec.com (Jim Gettys)
Message-Id: <9704022031.AA15584@pachyderm.pa.dec.com>
X-Mailer: Pachyderm (client akocstdhcp88-115.ako.dec.com)
To: touch@ISI.EDU
Cc: curtis@ans.net, jg@pa.dec.com, touch@ISI.EDU, F.Potorti@cnuce.cnr.it,
        tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP buffers
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Please don't accuse me of having anything to do with HTTP/1.0.  It isn't 
what I would have built...  And yes, Tim et. al. now understands the mistakes 
they made (part of which was that people are using the Web in a very different 
way than people first envisioned; the idea that people would be embedding 
20 or more embedded small gif's in a page wasn't something that was obvious 
4 years or more ago).  With 20-20 hindsight, everything is obvious. So please
give them a break.

The reality of the Web community is that it doesn't understand TCP. (as 
a community, I mean; there are clear exceptions, e.g. Jeff Mogul, who 
has been doing great service to try to help straighten out HTTP. ) A slightly 
larger minority now understand enough to at least worry about and care 
about some of the issues (folks like me, and most other active members of the
HTTP working group).

Of course you could generalize this non-understanding to most people now 
building Internet applications, which is a frightening thought.  I had 
dinner with the manager of a significant software company in the 
games business.  She does not know what congestion is, and had not heard 
of RED.  Even scarier: they use UDP in their games.

As to what a protocol for web transport should do, it is something
I've been thinking about for a while, and I need to somehow make some time
to write down the requirements sometime.  
			- Jim

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 13:32:10 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA15161 for tcp-impl-list; Wed, 2 Apr 1997 13:29:18 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA15152 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 13:29:16 -0800
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA11802 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 13:29:13 -0800
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id NAA00287; Wed, 2 Apr 1997 13:22:45 -0800 (PST)
Message-Id: <199704022122.NAA00287@aland.bbn.com>
To: jg@pa.dec.com (Jim Gettys)
cc: Craig Partridge <craig@aland.bbn.com>, tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP buffers 
In-reply-to: Your message of Wed, 02 Apr 97 12:25:50 -0800.
             <9704022025.AA09973@pachyderm.pa.dec.com> 
Date: Wed, 02 Apr 97 13:22:44 -0800
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


    Jim asks:

    The question is how much data gets queued in the final router driving 
    the PPP link to the end user.  This is where most of the delay occurs, 
    and where the packets congregate.  The question is: how many packets can 
    congregate in this final router?  There is no way for a client to discard 
    them, once they have been sent (and the client wouldn't want to in any 
    case; he'd just as soon put the data in its cache).  Any packets sent 
    will accumulate there.

    Now, the question is therefore how many packets might get queued in this
    router.

The complex answer is it depends and mostly it depends on the router.

Let me try walking through the pieces.

TCP tries to find the correct window size given the bandwidth available.
Assuming the TCP connection has been going for a while (and without long
pauses that may throw it back into slow start), this means the sender is
sending *in theory* only at the rate the client can absorb and should
not be creating an appreciable queue at any router.

NOTE: There's one exception to this rule.  TCP never shrinks its window
smaller than one MSS.  So if you've got a very small pipe, where the
available delay bandwidth product is less than 1 MSS, you will have
a queue.  That's one reason why parallel TCP is evil.  If you have 4
TCPs running, each with MSS of 1 KB, the slowest they can transmit is
(4*1KB * 8 b/B)/RTT = 32Kb/RTT.  If your RTT is low, that will back up
your dial-up PPP line pretty easily.  (Since this situation isn't stable,
what happens is the sending TCPs see an unstable RTT and increase their
RTT estimate, but regardless you'll likely get very variable performance).

OK, but lets assume we're running one TCP and move from theory to practice.
In practice, TCP figures it is overdriving a link when it sees loss.  What
that means is that slow start loads up the queue at the last hop router and
only slows down when it sees something thrown away.  So, for practical
purposes, the queue you get is equal to the min of (a) the buffer space
in the router; or (b) the receiver window size, UNLESS you run something
like RED in your routers which strives to keep the queues short.  If you
run RED, you're closer to theory again.  One thing this also says is that
there's no point in advertising a big window if you're behind a small
pipe.  Better to advertise a small window and smaller TCP MSS.

[Folks -- I think I've done this all right -- beat me up if I missed some
subtle interaction -- thanks!]

Craig

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 13:43:51 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA18793 for tcp-impl-list; Wed, 2 Apr 1997 13:41:47 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA18788 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 13:41:45 -0800
Received: from parmesan.cs.wisc.edu (parmesan.cs.wisc.edu [128.105.77.34]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA15082 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 13:41:36 -0800
Received: (from poon@localhost) by parmesan.cs.wisc.edu (8.7.6/8.7.3) id PAA01961; Wed, 2 Apr 1997 15:39:39 -0600 (CST)
Date: Wed, 2 Apr 1997 15:39:39 -0600 (CST)
From: Kacheong Poon <poon@cs.wisc.edu>
Message-Id: <199704022139.PAA01961@parmesan.cs.wisc.edu>
To: tcp-impl@relay.engr.SGI.COM, vern@ee.lbl.gov
Subject: Re: scheduling and agenda for Memphis
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Included message from Vern Paxson <vern@ee.lbl.gov>:

>----
>Problems awaiting description (and email haggling) [20 min]
>	Initial RTO too low
>	RTO estimation on slow links
>----

I have a question about first few RTO estimations on slow links, sort of
combining the 2 problems above.  As I understand, in BSD, the smoothed RTT
(t_srtt) is initialized to 0 and RTT variance (t_rttvar) is initialized so
that RTO is 3 seconds, required by RFC 1122.  Becuase of t_srtt being 0, the
first RTT measurement will override the t_rttvar and t_srtt to get a RTO of
3 times the measured RTT.  On slow links, as various people on the list have
pointed out, the first measured RTT is usually an underestimate of RTT.  The
two sides may first exchange a few small segments, and then later send full
segments, which can be as large as 30 times the small segments.  Isn't this
typical for HTTP?  And for slow links, the transmission time is significantly
larger.

My question is, is it better if t_srtt is not initialized to 0, but to, say,
a small value.  Then the first few RTO estimations will have a bigger
variance part and will converge (much?) slower to the "real" RTT.  Actually,
by just reading the RFC, it seems that this is what is intended, instead of
BSD's first RTT measurement overriding the initial t_srtt and t_rttvar
values.  Correct me if I am wrong.  The RFC also mentions that TCP should
be reasonably insensitive to those initial values because of Karn and 
Jacobson's algorithms.  I guess this is only true if the TCP connection is
not short lived, like the current HTTP.

							Poon.
							poon@cs.wisc.edu


From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 15:00:12 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA07543 for tcp-impl-list; Wed, 2 Apr 1997 14:58:07 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA07532 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 14:58:05 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id OAA03830 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 14:58:04 -0800
From: touch@ISI.EDU
Received: from ash.isi.edu by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA04674>; Wed, 2 Apr 1997 14:54:14 -0800
Date: Wed, 2 Apr 1997 14:54:12 -0800
Posted-Date: Wed, 2 Apr 1997 14:54:12 -0800
Message-Id: <199704022254.AA19357@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA19357>; Wed, 2 Apr 1997 14:54:12 -0800
To: craig@aland.bbn.com, jg@pa.dec.com
Subject: Re: TCP buffers
Cc: curtis@ans.net, touch@ISI.EDU, F.Potorti@cnuce.cnr.it,
        tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> The question is how much data gets queued in the final router driving 
> the PPP link to the end user.  This is where most of the delay occurs, 
> and where the packets congregate.  The question is: how many packets can 
> congregate in this final router?  There is no way for a client to discard 
> them, once they have been sent (and the client wouldn't want to in any 
> case; he'd just as soon put the data in its cache).  Any packets sent 
> will accumulate there.

What would the benefit be if they were discarded?
They represent pending info that has to go end-to-end anyway.
It doesn't matter if it's buffered at the source or in the net
in that case.

What does matter is that they clog the pipe for other
requests or actions, because TCP does in-order with
strict total ordering on the byte stream.

I think you want the queues low so you can have in-band, in-order
signalling. If the signalling is moved out, it doesn't matter
how the queues run - i.e., send requests with RPC. This is
the advantage to 'one connection/association per request'.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 15:08:28 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA10376 for tcp-impl-list; Wed, 2 Apr 1997 15:07:02 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA10346 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 15:06:58 -0800
Received: from parmesan.cs.wisc.edu (parmesan.cs.wisc.edu [128.105.77.34]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA06813 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 15:06:57 -0800
Received: (from poon@localhost) by parmesan.cs.wisc.edu (8.7.6/8.7.3) id RAA02123; Wed, 2 Apr 1997 17:04:52 -0600 (CST)
Date: Wed, 2 Apr 1997 17:04:52 -0600 (CST)
From: Kacheong Poon <poon@cs.wisc.edu>
Message-Id: <199704022304.RAA02123@parmesan.cs.wisc.edu>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP buffers
Cc: craig@aland.bbn.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Included message from Craig Partridge <craig@aland.bbn.com>:

>----
>run RED, you're closer to theory again.  One thing this also says is that
>there's no point in advertising a big window if you're behind a small
>pipe.  Better to advertise a small window and smaller TCP MSS.
>----

I agree.  But the problem is how does a TCP stack know it is behind a small
pipe?  I use PPP dialup link with Linux at home.  When I ftp, the receive
window is 24K (I don't remember the exact figure.)  The other side's RTO can
be as high as 20 seconds.  I am lucky that the other end of PPP seems to
have a very large buffer and it can hold a full window.  But I have a slow
machine and sometimes while doing other things, a few segments are dropped.
And it will take "forever" to recover.  Imagine that it is not ftp but Web
browsing and many of the pictures are just in the "right" size...

A few questions.  Suppose a TCP stack knows, from some methods, that it is
behind a small pipe.  An application does a setsockopt(SO_RCVBUF) to a large
value and then establishes a connection.  Can TCP ignore the setsockopt()
and just advertizes a small window?  I guess this is an API question, what
should setsockopt(SO_RCVBUF) really mean to TCP?

Suppose a TCP stack knows, from some methods, that the other end of a
connection is behind a small pipe, can it "ignore" the other side's
advertized window and restrict itself from sending too much.  One will argue
that this is not needed because of TCP's congestion control.  Consider this,
BSD initializes congestion window (snd_cwnd) and send threshold
(snd_ssthresh) to their maximum.  So in my PPP case, the queue can grow to
very long before a loss quickly.  And the timeout is then "forever" and it
can take very long to recover.  By the time it stablises, it can well be the
end of a (HTTP) connection.

I guess the research community can suggest many "some methods" for detecting
small pipes.  But are there any algorithms that are as universally accepted
as Jacobson's original one?

							Poon.
							poon@cs.wisc.edu


From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 17:26:42 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA12869 for tcp-impl-list; Wed, 2 Apr 1997 17:24:36 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA12863 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 17:24:34 -0800
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA07251 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 17:24:31 -0800
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id RAA00767; Wed, 2 Apr 1997 17:18:51 -0800 (PST)
Message-Id: <199704030118.RAA00767@aland.bbn.com>
To: Kacheong Poon <poon@cs.wisc.edu>
cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP buffers 
In-reply-to: Your message of Wed, 02 Apr 97 17:04:52 -0600.
             <199704022304.RAA02123@parmesan.cs.wisc.edu> 
Date: Wed, 02 Apr 97 17:18:50 -0800
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


    Included message from Craig Partridge <craig@aland.bbn.com>:

    >----
    >run RED, you're closer to theory again.  One thing this also says is that
    >there's no point in advertising a big window if you're behind a small
    >pipe.  Better to advertise a small window and smaller TCP MSS.
    >----

    I agree.  But the problem is how does a TCP stack know it is behind a small
    pipe?  I use PPP dialup link with Linux at home.  When I ftp, the receive
    window is 24K (I don't remember the exact figure.)  The other side's RTO can
    be as high as 20 seconds.

In fact, I'd wager that the high RTO is due in part to the window size.

At the end of a 28.8 Kbps PPP link, your window size should be something
like 4KB, not 24 KB.

At 24KB you're building up queues about 6 seconds long (with high variance)
and so the RTO goes out to large values accordingly.

Try a smaller window size (and set your MSS to 1KB or less) and see if
performance doesn't improve.  (It did for me over a dial-up link a few
years ago).

Note too, in this case your machine knows it is connected to a dial-up
line and the link layer could therefore tell the IP and TCP layers about
the available path bandwidth and allow your TCP to set a rational window
size.

Craig

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 18:06:47 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA19682 for tcp-impl-list; Wed, 2 Apr 1997 18:03:46 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA19660 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 18:03:43 -0800
Received: from parmesan.cs.wisc.edu (parmesan.cs.wisc.edu [128.105.77.34]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id SAA14736 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 18:03:40 -0800
Received: (from poon@localhost) by parmesan.cs.wisc.edu (8.7.6/8.7.3) id UAA02712; Wed, 2 Apr 1997 20:01:36 -0600 (CST)
Date: Wed, 2 Apr 1997 20:01:36 -0600 (CST)
From: Kacheong Poon <poon@cs.wisc.edu>
Message-Id: <199704030201.UAA02712@parmesan.cs.wisc.edu>
To: craig@aland.bbn.com
Subject: Re: TCP buffers
Cc: tcp-impl@relay.engr.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Included message from Craig Partridge <craig@aland.bbn.com>:

>----
>In fact, I'd wager that the high RTO is due in part to the window size.

Yes.

>Try a smaller window size (and set your MSS to 1KB or less) and see if
>performance doesn't improve.  (It did for me over a dial-up link a few
>years ago).

A small window helps.  But ...

>Note too, in this case your machine knows it is connected to a dial-up
>line and the link layer could therefore tell the IP and TCP layers about
>the available path bandwidth and allow your TCP to set a rational window
>size.

As I mentioned in my mail, applications, like ftp, use setsockopt() to set
the buffer size to a large value.  That is why I asked the question "what
should setsockopt(SO_RCVBUF) really mean to TCP?"  If TCP knows that the
link is bandwidth dominated, can it just ignore the setsockopt()?  Here I
assume that setsockopt() affects the advertized window size, as in BSD.  One
can say that it is the application problem.  But, say, in a LAN environment,
a larger window is really needed.  And the application, like ftp, can
improve performance by calling setsockopt() instead of using the default
window size.  An application does not know the actual environment it runs
on.

Does everyone agree that setsockopt() should not affect TCP's advertized
window size?  The window size should be determined by TCP "intelligently"
depending on the path and available buffer space instead of just using some
default value.  If this is the right thing to do, I guess people should start
thinking about how to do it.

In my PPP case, the link layer can know the link bandwidth.  But how about a
machine connected to an Ethernet but the real link to the outside is a low
speed PPP link?  I guess my questions in previous mail are still unanswered.

>----


							Poon.
							poon@cs.wisc.edu


From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 19:44:15 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA03466 for tcp-impl-list; Wed, 2 Apr 1997 19:42:36 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA03457 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 19:42:34 -0800
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id TAA01519 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 19:42:32 -0800
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id TAA01455; Wed, 2 Apr 1997 19:36:53 -0800 (PST)
Message-Id: <199704030336.TAA01455@aland.bbn.com>
To: Kacheong Poon <poon@cs.wisc.edu>
cc: craig@aland.bbn.com, tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP buffers 
In-reply-to: Your message of Wed, 02 Apr 97 20:01:36 -0600.
             <199704030201.UAA02712@parmesan.cs.wisc.edu> 
Date: Wed, 02 Apr 97 19:36:52 -0800
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


    >Note too, in this case your machine knows it is connected to a dial-up
    >line and the link layer could therefore tell the IP and TCP layers about
    >the available path bandwidth and allow your TCP to set a rational window
    >size.

    As I mentioned in my mail, applications, like ftp, use setsockopt() to set
    the buffer size to a large value.  That is why I asked the question "what
    should setsockopt(SO_RCVBUF) really mean to TCP?"  If TCP knows that the
    link is bandwidth dominated, can it just ignore the setsockopt()?  Here I
    assume that setsockopt() affects the advertized window size, as in BSD.

OK -- sorry your note raised two questions and I didn't understand which
was intended.

Yes, you've got a point.  The API should allow you to increase window size,
but only so large as makes sense from the link layer.  Currently BSD doesn't
do that.

Craig

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  2 20:10:11 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA06272 for tcp-impl-list; Wed, 2 Apr 1997 20:08:29 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id UAA06264 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 20:08:27 -0800
Received: from palrel3.hp.com (palrel3.hp.com [15.253.88.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id UAA05217 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 20:08:24 -0800
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185]) by palrel3.hp.com with SMTP (8.7.5/8.7.3) id UAA28389 for <tcp-impl@relay.engr.SGI.COM>; Wed, 2 Apr 1997 20:04:45 -0800 (PST)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA23929; Wed, 2 Apr 1997 19:55:56 -0800
Message-Id: <33432A4B.638C@cup.hp.com>
Date: Wed, 02 Apr 1997 19:55:55 -0800
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: Craig Partridge <craig@aland.bbn.com>
Cc: Kacheong Poon <poon@cs.wisc.edu>, tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP buffers
References: <199704030336.TAA01455@aland.bbn.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>     As I mentioned in my mail, applications, like ftp, use setsockopt() to set
>     the buffer size to a large value.  That is why I asked the question "what
>     should setsockopt(SO_RCVBUF) really mean to TCP?"  If TCP knows that the
> ...
> Yes, you've got a point.  The API should allow you to increase window size,
> but only so large as makes sense from the link layer.  Currently BSD doesn't
> do that.

So is this taking us to that place where a sending TCP does not increase
cwnd if it does not increase throughput? 

rick jones

PS - folks interested in running tcptrace on HP-UX can find a quick port
(no xplot) on ftp.cup.hp.com under dist/networking/tools...

From owner-tcp-impl@relay.engr.sgi.com  Thu Apr  3 03:08:25 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id DAA16097 for tcp-impl-list; Thu, 3 Apr 1997 03:06:50 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id DAA16092 for <tcp-impl@relay.engr.SGI.COM>; Thu, 3 Apr 1997 03:06:47 -0800
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id DAA01054 for <tcp-impl@relay.engr.SGI.COM>; Thu, 3 Apr 1997 03:06:44 -0800
Received: from ftp.com by ftp.com  ; Thu, 3 Apr 1997 06:02:25 -0500
Received: from mailserv-2high.ftp.com by ftp.com  ; Thu, 3 Apr 1997 06:02:25 -0500
Received: by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id FAA01721; Thu, 3 Apr 1997 05:59:29 -0500
Date: Thu, 3 Apr 1997 05:59:29 -0500
Message-Id: <199704031059.FAA01721@MAILSERV-2HIGH.FTP.COM>
To: poon@cs.wisc.edu
Subject: Re: TCP buffers
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: craig@aland.bbn.com, tcp-impl@relay.engr.SGI.COM
Repository: mailserv-2high.ftp.com, [message accepted at Thu Apr  3 05:59:23 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


First - I agree w/ Craig that the stack should try to be aware of what
type of link it is running over and set its default window size
accordingly.  A 2K to 4K window is optimal for a 9.6..28.8Baud link.
That solves 75% of the problems being discussed as most slow link scenerios
these days are a dialup client connected to a T1 or faster network; ie
the slow link is at the end of the complete path as opposed to in the
middle.

||As I mentioned in my mail, applications, like ftp, use setsockopt() to set
||the buffer size to a large value.  That is why I asked the question "what
||should setsockopt(SO_RCVBUF) really mean to TCP?"  If TCP knows that the
||link is bandwidth dominated, can it just ignore the setsockopt()?  Here I
||assume that setsockopt() affects the advertized window size, as in BSD.  One
||can say that it is the application problem.  But, say, in a LAN environment,
||a larger window is really needed.  And the application, like ftp, can
||improve performance by calling setsockopt() instead of using the default
||window size.  An application does not know the actual environment it runs
||on.
||
||Does everyone agree that setsockopt() should not affect TCP's advertized
||window size?  The window size should be determined by TCP "intelligently"
||depending on the path and available buffer space instead of just using some
||default value.  If this is the right thing to do, I guess people should start
||thinking about how to do it.
||
I strongly disagree with this even though it makes sense TCP wise.
We've argued this a bunch and believe that if an app sets a window
size within the proper range, the application knows what its doing
and is responsible for its behavior and performance. Silently changing
window size on an app may affect thats app's behavior in unforseen ways.

In Windows/Winsock land some of us have been arguing for years that a
stack should reflect driver characteristics upwards to an application
such that an application can make intelligent decisions at its level.
There was for one brief and shining moment a NetDev annex to Winsock-2,
designed at that time for wireless connections, that was to have allowed
an app to get link speed, link latency, link signal strength, link
packet loss, etc. so the app could control its behavior.

My belief is that this type of API, driver -> stack -> app is needed
to completely solve the slow or slow+lossy or wildly variable+lossy
link problem.


||In my PPP case, the link layer can know the link bandwidth.  But how about a
||machine connected to an Ethernet but the real link to the outside is a low
||speed PPP link?  I guess my questions in previous mail are still unanswered.
||
as Craig also said; using RTT as an estimate is your best bet.

L.



From owner-tcp-impl@relay.engr.sgi.com  Thu Apr  3 06:53:21 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id GAA02918 for tcp-impl-list; Thu, 3 Apr 1997 06:51:47 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id GAA02914 for <tcp-impl@relay.engr.SGI.COM>; Thu, 3 Apr 1997 06:51:45 -0800
Received: from socks2.raleigh.ibm.com (socks2.raleigh.ibm.com [204.146.167.123]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id GAA02413 for <tcp-impl@relay.engr.SGI.COM>; Thu, 3 Apr 1997 06:51:39 -0800
Received: from rtpdce02.raleigh.ibm.com by socks2.raleigh.ibm.com (AIX 4.1/UCB 5.64/RTP-FW1.0)
          id AA36046; Thu, 3 Apr 1997 09:47:20 -0500
Received: from ludwigia.raleigh.ibm.com (ludwigia.raleigh.ibm.com [9.37.83.125]) by rtpdce02.raleigh.ibm.com (8.7.3/8.7.3/RTP-ral-1.0) with SMTP id JAA32702; Thu, 3 Apr 1997 09:47:18 -0500
Received: from localhost.raleigh.ibm.com by ludwigia.raleigh.ibm.com (AIX 4.1/UCB 5.64/4.03-RAL)
          id AA16468; Thu, 3 Apr 1997 09:46:49 -0500
Message-Id: <9704031446.AA16468@ludwigia.raleigh.ibm.com>
To: Craig Partridge <craig@aland.bbn.com>
Cc: Kacheong Poon <poon@cs.wisc.edu>, tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP buffers 
In-Reply-To: Your message of "Wed, 02 Apr 1997 19:36:52 PST."
             <199704030336.TAA01455@aland.bbn.com> 
Date: Thu, 03 Apr 1997 09:46:49 -0400
From: Thomas Narten <narten@raleigh.ibm.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Craig Partridge <craig@aland.bbn.com> writes:

> Yes, you've got a point.  The API should allow you to increase window size,
> but only so large as makes sense from the link layer.  Currently BSD doesn't
> do that.

IMO, letting the API adjust the window size is a hack. What if you
have a LAN at home with a dedicated router that connects to the
internet via a 28.8 modem? 

You really want the window size to match bandwidth X delay of the
*path* being used, and you want that to be done automajically by
TCP. The fundamental problem is that by dumping packets onto the path,
you *increase* the delay artificially, as the sender's packets start
sitting in queues. The actual max delay you get then is often *much*
larger than what anybody wants. As you point out, the actual max delay
is generally a function of the router.

> TCP tries to find the correct window size given the bandwidth
> available.

I don't think this is quite true. It tries to find the correct window
size for the bandwidth X delay.  That is, it comes pretty close to
finding the right sending *rate* (to match bandwidth) but it asks the
network to buffer more packets than is needed to maintain that sending
rate. It is this unfortunate "over buffering" that TCP does that is I
think the cause of Jim's concern.

This (obviously) goes beyond the scope of this WG, but it seems like
some of the results from TCP Vegas could be of use here. Vegas seems
to do a better job of keeping extra (i.e., queued) packets out of the
network. That is, if instantaneous throughput doesn't increase when
the send window is increased, the send window is made a bit smaller
(i.e., perhaps one segment smaller). In contrast, current TCPs simply
increase the windows until there is a loss (at this point there is too
much buffered in the network), and then decrease the window size by a
*large* amount (i.e., to 1/2 its previous value).

Thomas

From owner-tcp-impl@relay.engr.sgi.com  Thu Apr  3 07:21:04 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA06210 for tcp-impl-list; Thu, 3 Apr 1997 07:19:10 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA06178 for <tcp-impl@relay.engr.SGI.COM>; Thu, 3 Apr 1997 07:19:07 -0800
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA07761 for <tcp-impl@relay.engr.SGI.COM>; Thu, 3 Apr 1997 07:19:06 -0800
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id HAA01962; Thu, 3 Apr 1997 07:13:57 -0800 (PST)
Message-Id: <199704031513.HAA01962@aland.bbn.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: re: TCP buffers
From: Craig Partridge <craig@aland.bbn.com>
Date: Thu, 03 Apr 97 07:13:57 -0800
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Hi folks:

A night's reflection made me realize I said something stupid, namely:

> Yes, you've got a point.  The API should allow you to increase window size,
> but only so large as makes sense from the link layer.

Putting the link layer in control of the window size is a botch, since
the link layer doesn't know the end-to-end RTT.

I think we need a better answer here.

Craig

From owner-tcp-impl@relay.engr.sgi.com  Thu Apr  3 08:02:08 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA11628 for tcp-impl-list; Thu, 3 Apr 1997 08:00:21 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA11612 for <tcp-impl@relay.engr.SGI.COM>; Thu, 3 Apr 1997 08:00:19 -0800
Received: from labinfo.iet.unipi.it (labinfo.iet.unipi.it [131.114.9.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA16438 for <tcp-impl@relay.engr.SGI.COM>; Thu, 3 Apr 1997 08:00:02 -0800
Received: from localhost (luigi@localhost) by labinfo.iet.unipi.it (8.6.5/8.6.5) id RAA00177; Thu, 3 Apr 1997 17:00:28 +0200
From: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Message-Id: <199704031500.RAA00177@labinfo.iet.unipi.it>
Subject: Re: TCP buffers
To: narten@raleigh.ibm.com (Thomas Narten)
Date: Thu, 3 Apr 1997 17:00:28 +0200 (MET DST)
Cc: craig@aland.bbn.com, poon@cs.wisc.edu, tcp-impl@relay.engr.SGI.COM
In-Reply-To: <9704031446.AA16468@ludwigia.raleigh.ibm.com> from "Thomas Narten" at Apr 3, 97 09:46:30 am
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 1008      
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> IMO, letting the API adjust the window size is a hack. What if you
> have a LAN at home with a dedicated router that connects to the
> internet via a 28.8 modem? 

given that, as someone already observed, a single, low capacity bottleneck
(often near one end of the path) is quite common these days, what if
the bottleneck router with such as the one above patches window
advertisements in TCP headers so as to avoid exceedingly large values
(e.g. set an upper bound of windows) ?

Apart from being a hack (but so is Network Address Translation),
is there any obvious reason why the above might not work ?

	Luigi
-----------------------------+--------------------------------------
Luigi Rizzo                  |  Dip. di Ingegneria dell'Informazione
email: luigi@iet.unipi.it    |  Universita' di Pisa
tel: +39-50-568533           |  via Diotisalvi 2, 56126 PISA (Italy)
fax: +39-50-568522           |  http://www.iet.unipi.it/~luigi/
_____________________________|______________________________________

From owner-tcp-impl@relay.engr.sgi.com  Thu Apr  3 09:00:44 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA04099 for tcp-impl-list; Thu, 3 Apr 1997 08:58:23 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from odin.corp.sgi.com (odin.corp.sgi.com [192.26.51.194]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA04059 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 3 Apr 1997 08:58:20 -0800
Received: from sgi.sgi.com by odin.corp.sgi.com via ESMTP (951211.SGI.8.6.12.PATCH1502/951211.SGI)
	for <tcp-impl@relay.engr.SGI.COM> id IAA06914; Thu, 3 Apr 1997 08:35:46 -0800
Received: from labinfo.iet.unipi.it (labinfo.iet.unipi.it [131.114.9.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA25660 for <tcp-impl@relay.engr.SGI.COM>; Thu, 3 Apr 1997 08:34:35 -0800
Received: from localhost (luigi@localhost) by labinfo.iet.unipi.it (8.6.5/8.6.5) id RAA00270; Thu, 3 Apr 1997 17:37:49 +0200
From: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Message-Id: <199704031537.RAA00270@labinfo.iet.unipi.it>
Subject: Re: TCP buffers
To: narten@raleigh.ibm.com (Thomas Narten)
Date: Thu, 3 Apr 1997 17:37:49 +0200 (MET DST)
Cc: craig@aland.bbn.com, poon@cs.wisc.edu, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <9704031446.AA16468@ludwigia.raleigh.ibm.com> from "Thomas Narten" at Apr 3, 97 09:46:30 am
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 1863      
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> This (obviously) goes beyond the scope of this WG, but it seems like
> some of the results from TCP Vegas could be of use here. Vegas seems
> to do a better job of keeping extra (i.e., queued) packets out of the
> network. That is, if instantaneous throughput doesn't increase when
> the send window is increased, the send window is made a bit smaller
> (i.e., perhaps one segment smaller). In contrast, current TCPs simply
> increase the windows until there is a loss (at this point there is too
> much buffered in the network), and then decrease the window size by a
> *large* amount (i.e., to 1/2 its previous value).

My apologies if this is outside the scope of the group, but timings
are extremely noisy on slow links and with few packets in transit.
RTTs are dominated by the transmission times, and packets from
competing flows might cause variations of up to 100% in your RTT
samples.

To compensate for this you need averaging on many samples, but
since you want small windows (otherwise your control does not work
well) you also need many RTT's before being able to take a decision,
and in the meantime...  competing flows will have already eaten up
your share of bandwidth, causing queue overflows (one of which
might hit you) and causing you to timeout.

Waiting for a loss to occur may be not the most efficient way to
operate but at least it gives more uniform results in presence of
widely different environments and "background noise".

	Cheers
	Luigi
-----------------------------+--------------------------------------
Luigi Rizzo                  |  Dip. di Ingegneria dell'Informazione
email: luigi@iet.unipi.it    |  Universita' di Pisa
tel: +39-50-568533           |  via Diotisalvi 2, 56126 PISA (Italy)
fax: +39-50-568522           |  http://www.iet.unipi.it/~luigi/
_____________________________|______________________________________

From owner-tcp-impl@relay.engr.sgi.com  Thu Apr  3 10:25:11 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA00169 for tcp-impl-list; Thu, 3 Apr 1997 10:22:47 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA00156 for <tcp-impl@relay.engr.SGI.COM>; Thu, 3 Apr 1997 10:22:44 -0800
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA23982 for <tcp-impl@relay.engr.SGI.COM>; Thu, 3 Apr 1997 10:22:30 -0800
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.7.5/8.7.3) with ESMTP id NAA11816; Thu, 3 Apr 1997 13:15:54 -0500 (EST)
Message-Id: <199704031815.NAA11816@brookfield.ans.net>
To: jg@pa.dec.com (Jim Gettys)
cc: Curtis Villamizar <curtis@ans.net>, touch@ISI.EDU, F.Potorti@cnuce.cnr.it,
        tcp-impl@relay.engr.SGI.COM
Reply-To: curtis@ans.net
Subject: Re: TCP buffers 
In-reply-to: Your message of "Wed, 02 Apr 1997 08:37:38 PST."
             <9704021637.AA03925@pachyderm.pa.dec.com> 
Date: Thu, 03 Apr 1997 13:15:52 -0500
From: Curtis Villamizar <curtis@ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <9704021637.AA03925@pachyderm.pa.dec.com>, Jim Gettys writes:
> The reality of HTTP is that it is an interactive bulk transport
> protocol; i.e. there are users, who often changes their minds as a result
> of what they see on the screen, and then surf on to another page
> (or further down the screen, with a different set of embedded graphics
> to fetch).
> 
> Control of latency by a browser is therefore important.  This isn't just
> file transfer...

This is something that should be fixed within HTTP using a single TCP
flow.

  Major HTTP/1.1 goals include: 

      improving performance for end users 
      lower HTTP's load on the Internet for the same amount of "real work" 
      make HTTP a good "network citizen" 
      enable applications to work reliably even with caching 

  HTTP/1.1 includes a number of new elements that together should have a
  major effect on Internet traffic. These include:

      Transport improvements, consisting of persistent connections,
        additions to allow pipelining,
      transport data compression, and range requests. All of these
        improvements are optional parts of HTTP/1.1. 
      Caching extensions, to allow applications to work reliably in
        the face of caching, and to allow applications to mark more
        content cacheable, including the results of searches.

That is from:

  http://www.w3.org/pub/WWW/Protocols/HTTP/Performance/Pipeline.html

which you no doubt know about since your name is on it.

The client can make limited range requests, overlapped to keep data in
flight (for example: request two large ranges and when the first is
complete request a third).  When you have to switch streams because a
different inline is in view, request ranges from that one.

I don't think HTTP supports it yet but the ability to say "clear my
list of requests that you haven't already started sending and send
this instead".  It might also be useful to be able to make one request
to send in chunks and then use this "cancel and reshuffle" request if
the sending order needs to change.

Any WAN transport is going to have to keep data in flight to keep
performance up and to do so is going to have to live with a delay in
switching from one stream to another including living with some data
in the wire and in the router queue in front of the stream you just
switched to.  The only optimization you can do is to try to reduce the
amount of data in flight so as not to be excessive and slow the
switch.  Reducing data in flight too much would reduce performance in
the normal streaming case.

One way to reduce the cutover time is to provide an indication in the
initial HTTP request of the clients recieve buffer size.  The server
can set the send buffer size to match (accomodating faster links as
well as slower links and allowing the default to be set small) before
ever writing to the socket.  This way the amount of data in the send
buffer on the server host prior to switch over is at most 1 send
buffer worth (which should be only a few RTT if the client hasn't set
the receive buffer way too big.

> Now if you'd like the Web to go off and invent its own transport protocol
> rather than TCP, I'm sure there are people in the Web community who'd be
> happy to do so.  Just don't count on them understanding congestion and
> flow control issues very well. :-(.  And maybe this is the right thing
> to do ultimately.  But there be dragons there, as this mailing list certainly
> knows.
> 
> 				- Jim

I don't follow the HTTP WG but I thought the more clueful among them
are trying to consolidate the HTTP multiple flow mess into one TCP
flow as the HTTP 1.1 changes do.

Curtis

ps- This is no longer about TCP implementation so if we want to talk
further about possible ways to tune HTTP can we take this offline.

From owner-tcp-impl@relay.engr.sgi.com  Thu Apr  3 10:33:53 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA03391 for tcp-impl-list; Thu, 3 Apr 1997 10:32:08 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA03376 for <tcp-impl@relay.engr.SGI.COM>; Thu, 3 Apr 1997 10:32:06 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id KAA26494 for <tcp-impl@relay.engr.SGI.COM>; Thu, 3 Apr 1997 10:32:03 -0800
From: touch@ISI.EDU
Received: from ash.isi.edu by zephyr.isi.edu (5.65c/5.61+local-23)
	id <AA21025>; Thu, 3 Apr 1997 10:28:09 -0800
Date: Thu, 3 Apr 1997 10:28:08 -0800
Posted-Date: Thu, 3 Apr 1997 10:28:08 -0800
Message-Id: <199704031828.AA19283@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA19283>; Thu, 3 Apr 1997 10:28:08 -0800
To: jg@pa.dec.com, curtis@ans.net
Subject: Re: TCP buffers
Cc: touch@ISI.EDU, F.Potorti@cnuce.cnr.it, tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From curtis@brookfield.ans.net Thu Apr  3 10:18:05 1997
> To: jg@pa.dec.com (Jim Gettys)
> Cc: Curtis Villamizar <curtis@ans.net>, touch@ISI.EDU, F.Potorti@cnuce.cnr.it,
>         tcp-impl@relay.engr.SGI.COM
> Subject: Re: TCP buffers 
> Date: Thu, 03 Apr 1997 13:15:52 -0500
> From: Curtis Villamizar <curtis@ans.net>
> 
> 
> In message <9704021637.AA03925@pachyderm.pa.dec.com>, Jim Gettys writes:
> > The reality of HTTP is that it is an interactive bulk transport
> > protocol; i.e. there are users, who often changes their minds as a result
> > of what they see on the screen, and then surf on to another page
> > (or further down the screen, with a different set of embedded graphics
> > to fetch).
> > 
> > Control of latency by a browser is therefore important.  This isn't just
> > file transfer...
> 
> This is something that should be fixed within HTTP using a single TCP
> flow.
> 
>   Major HTTP/1.1 goals include: 
> 
>       improving performance for end users 
>       lower HTTP's load on the Internet for the same amount of "real work" 
>       make HTTP a good "network citizen" 
>       enable applications to work reliably even with caching 
> 
>   HTTP/1.1 includes a number of new elements that together should have a
>   major effect on Internet traffic. These include:
> 
>       Transport improvements, consisting of persistent connections,
>         additions to allow pipelining,
>       transport data compression, and range requests. All of these
>         improvements are optional parts of HTTP/1.1. 
>       Caching extensions, to allow applications to work reliably in
>         the face of caching, and to allow applications to mark more
>         content cacheable, including the results of searches.
> 
> That is from:
> 
>   http://www.w3.org/pub/WWW/Protocols/HTTP/Performance/Pipeline.html
> 
> which you no doubt know about since your name is on it.
> 
> The client can make limited range requests, overlapped to keep data in
> flight (for example: request two large ranges and when the first is
> complete request a third).  When you have to switch streams because a
> different inline is in view, request ranges from that one.
> 
> I don't think HTTP supports it yet but the ability to say "clear my
> list of requests that you haven't already started sending and send
> this instead".  It might also be useful to be able to make one request
> to send in chunks and then use this "cancel and reshuffle" request if
> the sending order needs to change.
> 
> Any WAN transport is going to have to keep data in flight to keep
> performance up and to do so is going to have to live with a delay in
> switching from one stream to another including living with some data
> in the wire and in the router queue in front of the stream you just
> switched to.  The only optimization you can do is to try to reduce the
> amount of data in flight so as not to be excessive and slow the
> switch.  Reducing data in flight too much would reduce performance in
> the normal streaming case.
> 
> One way to reduce the cutover time is to provide an indication in the
> initial HTTP request of the clients recieve buffer size.  The server
> can set the send buffer size to match (accomodating faster links as
> well as slower links and allowing the default to be set small) before

> I don't follow the HTTP WG but I thought the more clueful among them
> are trying to consolidate the HTTP multiple flow mess into one TCP
> flow as the HTTP 1.1 changes do.
> 
> Curtis
> 
> ps- This is no longer about TCP implementation so if we want to talk
> further about possible ways to tune HTTP can we take this offline.

Whoa!!!

Just because HTTP1.1 is trying to do this at the HTTP level
by using a single connection does NOT mean there is agreement
that this is best solved there.

First, 'one flow' won't work - you have to have one flow per
transport class in order to support QoS. And it's very hard
to know what flow to put a query in when you don't know what
the response will be - postscript, audio, html, or video.

Second, there are lots of good reasons to have the transport
level know about the transactions, for exactly the reasons
this stuff is getting crufty. Like aborting pending
transactions, establishing dependencies (B iff A succeeds), 
and allowing reordering when it doesn't break things.

Maybe it would be more productive to decouple the discussion,
i.e., to have the HTTP WG set requirements for the transport API
and services it would _like_ and see what the transport community
has to say about it.

(i.e., tell them you want low latency or preemption, but let
___them___ decide -how- to provide that service)

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Apr  3 12:47:45 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA13538 for tcp-impl-list; Thu, 3 Apr 1997 12:46:24 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA13517 for <tcp-impl@relay.engr.SGI.COM>; Thu, 3 Apr 1997 12:46:20 -0800
Received: from parmesan.cs.wisc.edu (parmesan.cs.wisc.edu [128.105.77.34]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA01425 for <tcp-impl@relay.engr.SGI.COM>; Thu, 3 Apr 1997 12:46:15 -0800
Received: (from poon@localhost) by parmesan.cs.wisc.edu (8.7.6/8.7.3) id OAA06115; Thu, 3 Apr 1997 14:43:37 -0600 (CST)
Date: Thu, 3 Apr 1997 14:43:37 -0600 (CST)
From: Kacheong Poon <poon@cs.wisc.edu>
Message-Id: <199704032043.OAA06115@parmesan.cs.wisc.edu>
To: backman@ftp.com
Subject: Re: TCP buffers
Cc: craig@aland.bbn.com, tcp-impl@relay.engr.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Included message from backman@ftp.com (Larry Backman):

>----
>I strongly disagree with this even though it makes sense TCP wise.
>We've argued this a bunch and believe that if an app sets a window
>size within the proper range, the application knows what its doing
>and is responsible for its behavior and performance. Silently changing
>window size on an app may affect thats app's behavior in unforseen ways.

I agree that sometimes this may not be desirable.  But I guess there are
many apps out there which does not know what they are doing and just set
window size to large value hoping to get better throughput.  Suppose there
is another app sharing the same slow link which uses a 2K window.  Should it
be penalized by the clueless app?  A 24K window on a slow link is certainly
not appropriate.  Should TCP change the size then?

>My belief is that this type of API, driver -> stack -> app is needed
>to completely solve the slow or slow+lossy or wildly variable+lossy
>link problem.

Another question, is API implementation also within the scope of this WG?
Suppose someone claims that a stack does not do keepalive.  And it turns out
that one needs to call setsockopt(SO_KEEPALIVE) twice to enable it.  Is this
a problem within the scope of this WG?  And how about a stack which does not
allow setsockopt(SO_RCVBUF) to change the advertized window size?

>----


							Poon.
							poon@cs.wisc.edu


From owner-tcp-impl@relay.engr.sgi.com  Thu Apr  3 14:11:57 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA03812 for tcp-impl-list; Thu, 3 Apr 1997 14:10:34 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA03799 for <tcp-impl@relay.engr.SGI.COM>; Thu, 3 Apr 1997 14:10:32 -0800
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA22010 for <tcp-impl@relay.engr.SGI.COM>; Thu, 3 Apr 1997 14:10:30 -0800
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.7.5/8.7.3) with ESMTP id RAA12759; Thu, 3 Apr 1997 17:04:14 -0500 (EST)
Message-Id: <199704032204.RAA12759@brookfield.ans.net>
To: jg@pa.dec.com (Jim Gettys)
cc: Craig Partridge <craig@aland.bbn.com>, Curtis Villamizar <curtis@ans.net>,
        touch@ISI.EDU, F.Potorti@cnuce.cnr.it, tcp-impl@relay.engr.SGI.COM
Reply-To: curtis@ans.net
Subject: Re: TCP buffers 
In-reply-to: Your message of "Wed, 02 Apr 1997 12:25:50 PST."
             <9704022025.AA09973@pachyderm.pa.dec.com> 
Date: Thu, 03 Apr 1997 17:04:14 -0500
From: Curtis Villamizar <curtis@ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Long tutorial followed by something that might be a TCP issue, but
still out of scope for tcp-impl.

Appologies in advance for out of scope message.  Delete now if you're
busy.

In message <9704022025.AA09973@pachyderm.pa.dec.com>, Jim Gettys writes:
> 
> The question is how much data gets queued in the final router driving 
> the PPP link to the end user.  This is where most of the delay occurs, 
> and where the packets congregate.  The question is: how many packets can 
> congregate in this final router?  There is no way for a client to discard 
> them, once they have been sent (and the client wouldn't want to in any 
> case; he'd just as soon put the data in its cache).  Any packets sent 
> will accumulate there.
> 
> Now, the question is therefore how many packets might get queued in this
> router.  With current browser behavior, a client might have 4 times as much
> data in flight (over 4 separate connections; that is how the current browsers
> work).  With HTTP/1.1, things might be better, as a good client might be
> using just one connection.


You need to keep data in flight.  Think of the data as being "in the
wire" (or fiber) not in the queue if things are working right.

Clairvoyant TCP (doesn't exist) might be able to guess the perfect
window size for a connection.  That window size would be the effective
bandwidth that could be acheived by that flow times the round trip
delay.  The packets would go through the bottleneck at a fixed pace, a
neglecting ACK compression, the ACKs would come back at a relatively
constant pace.

The speed of light is finite so from the time a packet is sent to when
the ACK comes back is the round trip time (RTT).  If you keep a full
window in flight, the bottleneck won't run dry.  If the pacing isn't
impacted by ACK compression no queue will form.  The packets in flight
are litterally "in the wire".

In practive, ACKs are delayed by getting stuck in a small standing
queue in the reverse direction, the size of which is highly variable.
If you really wanted full utilization you'd have to compensate by
making the window slightly larger and forming a small queue behind the
bottleneck.

The problem is that it is really not possible to set the window to an
ideal value.  If you err on the side of too large, a queue somewhere
will overflow and TCP will automatically find a near optimal window
(getting the details right are the topic of this WG).

If you have a good idea what an upper bound would be, you can improve
performance somewhat and reduce delay by not setting the window way
too large.  The classic example is if you have a 28.8 modem, you are
never going to get a connection to go faster than 28.8.  It takes
almost 150 msec to get one 512 byte packet through the 28.8 link, so
you will never see an RTT lower than 50 msec with compression.  The
bandwidth delay product for 200 msec is 720 bytes, so making the
window very small is a good idea.  In this case you have no choice but
to make the window larger than optimal and force a queue to form.

It helps to keep more than 4 packets in flight to allow fast
retransmit to work, so I usually suggest 3KB, 4KB at the most.  3KB
would give you a delay of about 850 msec.  Four flows would make it
3.4 seconds and overflow the queue on the other side and reduce
performance significantly.

The most common error is to set a window size to 32KB on a modem and
then also use multiple flows.  A lot of packets (about 10-15%) get
sent across the WAN only to overflow the queue in front on the modem
and get dropped.  Fixing this is just a matter of setting the recv
buffer to a reasonable size if you are behind a modem.

It takes only 170 msec to move 32KB through a T1, so that window size
is fine for T1.

Here's the TCP relevant part:

The only thing that TCP could do in the case where the window is
radically too large is for the sender to keep track of the approximate
throughput and the shortest delay recently experienced and not set the
window greater than twice that.  The throughput can be estimated by
cwnd/rtt.  The window would never be increased beyond
cwnd*rtt_min/rtt.  Initially rtt_min is approximately equal to rtt, so
the wouldn't affect initial slow start.  If the queue had drained it
also wouldn't affect a subsequent slow start but it tend to cap the
window without incurring loss.  [a trivial filter would be needed to
keep the rtt_min estimate.]

This is more relevant to end2end-interest than tcp-impl.

Again, I appologize for an out of scope message.

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Mon Apr  7 10:13:55 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA22984 for tcp-impl-list; Mon, 7 Apr 1997 10:11:52 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA22960 for <tcp-impl@relay.engr.SGI.COM>; Mon, 7 Apr 1997 10:11:48 -0700
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA12387 for <tcp-impl@relay.engr.SGI.COM>; Mon, 7 Apr 1997 10:11:47 -0700
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id KAA04122; Mon, 7 Apr 1997 10:06:01 -0700 (PDT)
Message-Id: <199704071706.KAA04122@aland.bbn.com>
To: Kacheong Poon <poon@cs.wisc.edu>
cc: backman@ftp.com, craig@aland.bbn.com, tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP buffers 
In-reply-to: Your message of Thu, 03 Apr 97 14:43:37 -0600.
             <199704032043.OAA06115@parmesan.cs.wisc.edu> 
Date: Mon, 07 Apr 97 10:06:00 -0700
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


    Included message from backman@ftp.com (Larry Backman):

    >----
    >I strongly disagree with this even though it makes sense TCP wise.
    >We've argued this a bunch and believe that if an app sets a window
    >size within the proper range, the application knows what its doing
    >and is responsible for its behavior and performance. Silently changing
    >window size on an app may affect thats app's behavior in unforseen ways.

    I agree that sometimes this may not be desirable.  But I guess there are
    many apps out there which does not know what they are doing and just set
    window size to large value hoping to get better throughput.  Suppose there
    is another app sharing the same slow link which uses a 2K window.  Should i
   t
    be penalized by the clueless app?  A 24K window on a slow link is certainly
    not appropriate.  Should TCP change the size then?

In a proper environment, the clueless app gets hammered by having more packets
dropped.  (See the RED Manifesto).

Craig

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  9 10:18:46 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA11575 for tcp-impl-list; Wed, 9 Apr 1997 10:16:31 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA11532 for <tcp-impl@relay.engr.SGI.COM>; Wed, 9 Apr 1997 10:16:27 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA02494 for <tcp-impl@relay.engr.SGI.COM>; Wed, 9 Apr 1997 10:16:23 -0700
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id SAA22657; Wed, 9 Apr 1997 18:08:01 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wEf7b-0005FHC; Tue, 8 Apr 97 18:55 BST
Message-Id: <m0wEf7b-0005FHC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TCP buffers
To: poon@cs.wisc.edu (Kacheong Poon)
Date: Tue, 8 Apr 1997 18:55:55 +0100 (BST)
Cc: tcp-impl@relay.engr.SGI.COM, craig@aland.bbn.com
In-Reply-To: <199704022304.RAA02123@parmesan.cs.wisc.edu> from "Kacheong Poon" at Apr 2, 97 05:04:52 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I agree.  But the problem is how does a TCP stack know it is behind a small
> pipe?  I use PPP dialup link with Linux at home.  When I ftp, the receive
> window is 24K (I don't remember the exact figure.)  The other side's RTO can
> be as high as 20 seconds.  I am lucky that the other end of PPP seems to
> have a very large buffer and it can hold a full window.  But I have a slow

Not really. If the TCP stack started seeing that a window full of data kept
causing drops the congestion window would shrink. Slow start ensures we don't
just let rip down the link

Alan


From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  9 10:18:48 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA11611 for tcp-impl-list; Wed, 9 Apr 1997 10:16:34 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA11579 for <tcp-impl@relay.engr.SGI.COM>; Wed, 9 Apr 1997 10:16:31 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA02585 for <tcp-impl@relay.engr.SGI.COM>; Wed, 9 Apr 1997 10:16:28 -0700
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id SAA22655; Wed, 9 Apr 1997 18:07:57 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wEevK-0005FHC; Tue, 8 Apr 97 18:43 BST
Message-Id: <m0wEevK-0005FHC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: testing tools
To: rstevens@kohala.com
Date: Tue, 8 Apr 1997 18:43:13 +0100 (BST)
Cc: tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199704021157.EAA19350@kohala.kohala.com> from "W. Richard Stevens" at Apr 2, 97 04:57:04 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I could never figure out another way to do this.

For the Linux case ipfwadm will let you block incoming TCP packets from the
tcp stack but you can still listen in on them via SOCK_PACKET and libpcap
so tcpdump is happy.


From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  9 16:47:31 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA15108 for tcp-impl-list; Wed, 9 Apr 1997 16:45:50 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA15096 for <tcp-impl@relay.engr.SGI.COM>; Wed, 9 Apr 1997 16:45:47 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA20599 for <tcp-impl@relay.engr.SGI.COM>; Wed, 9 Apr 1997 16:45:37 -0700
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id AAA01920; Thu, 10 Apr 1997 00:37:21 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wF6GP-0005FHC; Wed, 9 Apr 97 23:54 BST
Message-Id: <m0wF6GP-0005FHC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TCP buffers
To: craig@aland.bbn.com (Craig Partridge)
Date: Wed, 9 Apr 1997 23:54:49 +0100 (BST)
Cc: poon@cs.wisc.edu, backman@ftp.com, craig@aland.bbn.com,
        tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199704071706.KAA04122@aland.bbn.com> from "Craig Partridge" at Apr 7, 97 10:06:00 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>     be penalized by the clueless app?  A 24K window on a slow link is certainly
>     not appropriate.  Should TCP change the size then?
> 
> In a proper environment, the clueless app gets hammered by having more packets
> dropped.  (See the RED Manifesto).

I've just done a pile of measurements from a Linux 2.0.pre30 box out across
a backbone to a slow distant target. The 24K window was if anything too small,
with a long link, high latency and low bandwidth but plenty of queueing it
was happily clunking away at about 99% packet arrival.

Equally measuring a similar distance across a fast backbone link through a
really buggered uunet router the 24K window was way excessive, equally the
cwnd kept it at about 4K.

My network interfaces don't support telepathy and the VJ congestion schemes
can't help here.

Alan


From owner-tcp-impl@relay.engr.sgi.com  Wed Apr  9 17:13:13 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA21441 for tcp-impl-list; Wed, 9 Apr 1997 17:10:59 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA21394 for <tcp-impl@relay.engr.SGI.COM>; Wed, 9 Apr 1997 17:10:53 -0700
Received: from parmesan.cs.wisc.edu (parmesan.cs.wisc.edu [128.105.77.34]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA25767 for <tcp-impl@relay.engr.SGI.COM>; Wed, 9 Apr 1997 17:10:47 -0700
Received: (from poon@localhost) by parmesan.cs.wisc.edu (8.7.6/8.7.3) id TAA27776; Wed, 9 Apr 1997 19:08:46 -0500 (CDT)
Date: Wed, 9 Apr 1997 19:08:46 -0500 (CDT)
From: Kacheong Poon <poon@cs.wisc.edu>
Message-Id: <199704100008.TAA27776@parmesan.cs.wisc.edu>
To: alan@lxorguk.ukuu.org.uk
Subject: Re: TCP buffers
Cc: tcp-impl@relay.engr.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Included message from alan@lxorguk.ukuu.org.uk (Alan Cox):

>----
>Not really. If the TCP stack started seeing that a window full of data kept
>causing drops the congestion window would shrink. Slow start ensures we don't
>just let rip down the link
>----

No, I was not saying that the congestion control algorithm did not kick in
and decrease the cwnd appropriately.  My original "complain" is that with
such a high RTO value, when multiple segments are dropped, waiting for a
timeout to recover is painful.  That is why I suggested in the case I
described, we should make the window smaller.  To do this, we may need to
ignore application's setsockopt().  TCP should know better what the
appropriate window size to use, thus preventing the "painful" timeout.

							Poon.
							poon@cs.wisc.edu


From owner-tcp-impl@relay.engr.sgi.com  Thu Apr 10 09:43:05 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA29287 for tcp-impl-list; Thu, 10 Apr 1997 09:40:06 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA29253 for <tcp-impl@relay.engr.SGI.COM>; Thu, 10 Apr 1997 09:40:02 -0700
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA01409 for <tcp-impl@relay.engr.SGI.COM>; Thu, 10 Apr 1997 09:40:01 -0700
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id JAA07388; Thu, 10 Apr 1997 09:34:37 -0700 (PDT)
Message-Id: <199704101634.JAA07388@aland.bbn.com>
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP buffers 
In-reply-to: Your message of Wed, 09 Apr 97 23:54:49 +0100.
             <m0wF6GP-0005FHC@lightning.swansea.linux.org.uk> 
Date: Thu, 10 Apr 97 09:34:36 -0700
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


    >     be penalized by the clueless app?  A 24K window on a slow link is cer
   tainly
    >     not appropriate.  Should TCP change the size then?
    > 
    > In a proper environment, the clueless app gets hammered by having more pa
   ckets
    > dropped.  (See the RED Manifesto).

    I've just done a pile of measurements from a Linux 2.0.pre30 box out across
    a backbone to a slow distant target. The 24K window was if anything too sma
   ll,
    with a long link, high latency and low bandwidth but plenty of queueing it
    was happily clunking away at about 99% packet arrival.

Interesting -- so you had a 6 second delay?  (I'm assuming 28.8 Kbps link).
It would help here if you gave all three key measurements: window size, link
bandwidth, and delay.

    Equally measuring a similar distance across a fast backbone link through a
    really buggered uunet router the 24K window was way excessive, equally the
    cwnd kept it at about 4K.

That's perfectly plausible if the delay is shorter (don't mistake distance
and delay).

    My network interfaces don't support telepathy and the VJ congestion schemes
    can't help here.

I don't understand this comment fully.  Clearly, in the absence of delay
measurements, one has to guess at window size.  But VJ congestion was working
just fine -- it kept you at 4KB despite a window of 24 KB.

Craig

From owner-tcp-impl@relay.engr.sgi.com  Thu Apr 10 09:43:42 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA00053 for tcp-impl-list; Thu, 10 Apr 1997 09:42:23 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA00036 for <tcp-impl@relay.engr.SGI.COM>; Thu, 10 Apr 1997 09:42:19 -0700
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA01926 for <tcp-impl@relay.engr.SGI.COM>; Thu, 10 Apr 1997 09:42:18 -0700
Received: from brookfield.ans.net (brookfield.ans.net [204.148.1.1]) by brookfield.ans.net (8.7.5/8.7.3) with ESMTP id MAA21079; Thu, 10 Apr 1997 12:37:04 -0400 (EDT)
Message-Id: <199704101637.MAA21079@brookfield.ans.net>
To: Kacheong Poon <poon@cs.wisc.edu>
cc: alan@lxorguk.ukuu.org.uk, tcp-impl@relay.engr.SGI.COM
Reply-To: curtis@ans.net
Subject: Re: TCP buffers 
In-reply-to: Your message of "Wed, 09 Apr 1997 19:08:46 CDT."
             <199704100008.TAA27776@parmesan.cs.wisc.edu> 
Date: Thu, 10 Apr 1997 12:37:03 -0400
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <199704100008.TAA27776@parmesan.cs.wisc.edu>, Kacheong Poon writes:
> Included message from alan@lxorguk.ukuu.org.uk (Alan Cox):
> 
> >----
> >Not really. If the TCP stack started seeing that a window full of data kept
> >causing drops the congestion window would shrink. Slow start ensures we don'
> t
> >just let rip down the link
> >----
> 
> No, I was not saying that the congestion control algorithm did not kick in
> and decrease the cwnd appropriately.  My original "complain" is that with
> such a high RTO value, when multiple segments are dropped, waiting for a
> timeout to recover is painful.  That is why I suggested in the case I
> described, we should make the window smaller.  To do this, we may need to
> ignore application's setsockopt().  TCP should know better what the
> appropriate window size to use, thus preventing the "painful" timeout.
> 
> 							Poon.
> 							poon@cs.wisc.edu


The multiple drop problem has a fix for which patches are available.
Like SACK, it isn't in mainline BSD code but if you are going to
change the source, this is the change you want.  With a window that
large you will always trigger fast retransmit so who cares about RTO
once your TCP is fixed?

Curtis


From owner-tcp-impl@relay.engr.sgi.com  Thu Apr 10 15:13:26 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA03026 for tcp-impl-list; Thu, 10 Apr 1997 15:11:34 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA03018 for <tcp-impl@relay.engr.SGI.COM>; Thu, 10 Apr 1997 15:11:32 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA25951 for <tcp-impl@relay.engr.SGI.COM>; Thu, 10 Apr 1997 15:11:28 -0700
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id XAA11054; Thu, 10 Apr 1997 23:11:13 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wFQ1P-0005FHC; Thu, 10 Apr 97 21:00 BST
Message-Id: <m0wFQ1P-0005FHC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TCP buffers
To: poon@cs.wisc.edu (Kacheong Poon)
Date: Thu, 10 Apr 1997 21:00:39 +0100 (BST)
Cc: alan@lxorguk.ukuu.org.uk, tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199704100008.TAA27776@parmesan.cs.wisc.edu> from "Kacheong Poon" at Apr 9, 97 07:08:46 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> timeout to recover is painful.  That is why I suggested in the case I
> described, we should make the window smaller.  To do this, we may need to
> ignore application's setsockopt().  TCP should know better what the
> appropriate window size to use, thus preventing the "painful" timeout.

A high RTT requires a high window size to avoid pipeline stalls. It sounds
like you are talking more about some of the cases SACK handles. SACK I think
is beyond this list


From owner-tcp-impl@relay.engr.sgi.com  Thu Apr 10 16:12:38 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA18068 for tcp-impl-list; Thu, 10 Apr 1997 16:09:50 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA18057 for <tcp-impl@relay.engr.SGI.COM>; Thu, 10 Apr 1997 16:09:43 -0700
Received: from parmesan.cs.wisc.edu (parmesan.cs.wisc.edu [128.105.77.34]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA08720 for <tcp-impl@relay.engr.SGI.COM>; Thu, 10 Apr 1997 16:09:42 -0700
Received: (from poon@localhost) by parmesan.cs.wisc.edu (8.7.6/8.7.3) id SAA00822; Thu, 10 Apr 1997 18:07:42 -0500 (CDT)
Date: Thu, 10 Apr 1997 18:07:42 -0500 (CDT)
From: Kacheong Poon <poon@cs.wisc.edu>
Message-Id: <199704102307.SAA00822@parmesan.cs.wisc.edu>
To: alan@lxorguk.ukuu.org.uk
Subject: Re: TCP buffers
Cc: tcp-impl@relay.engr.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Included message from alan@lxorguk.ukuu.org.uk (Alan Cox):

>----
>A high RTT requires a high window size to avoid pipeline stalls. It sounds
>like you are talking more about some of the cases SACK handles. SACK I think
>is beyond this list
>----

A high RTO may be due to long queue in a slow link.  For example, a student
dials in to school from home to access school's web pages.  The link is low
latency (~5ms) and low bandwidth (maybe 14.4Kbps).  And the RTO can get to a
high value because of long queue, in turn because of large window, in the
dial in server.  I guess the list may not be interested in this situation
because it does not affect the Internet at large.  But I think if TCP uses a
good estimate of the bandwidth delay product as the window size instead of
an arbitrary big window, the problem will be solved for this case and other
cases like high latency low bandwidth...

And I am waiting to see SACK to be widely deployed...

							Poon.
							poon@cs.wisc.edu


From owner-tcp-impl@relay.engr.sgi.com  Sat Apr 12 15:12:31 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA00442 for tcp-impl-list; Sat, 12 Apr 1997 15:05:48 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA00437 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 12 Apr 1997 15:05:45 -0700
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA18292 for <tcp-impl@engr.sgi.com>; Sat, 12 Apr 1997 15:05:39 -0700
Message-Id: <199704122205.PAA18292@refugee.engr.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Memphis Minutes
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----- =_aaaaaaaaaa0"
Content-ID: <18290.860882698.0@refugee.engr.sgi.com>
Date: Sat, 12 Apr 1997 15:05:39 -0700
From: Steve Alexander <sca@refugee.engr.sgi.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

------- =_aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <18290.860882698.1@refugee.engr.sgi.com>

If you see things that need clarification or outright correction, please
let me know by 4/18.

Thanks,
-- Steve


------- =_aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <18290.860882698.2@refugee.engr.sgi.com>
Content-Description: Memphis TCP-IMPL Minutes

The first formal meeting of the TCP Implementor's Working Group was held at
1:30 PM on Monday, April 7th.  The chairs of the group are Steve Alexander
(sca@sgi.com) of Silicon Graphics and Vern Paxson (vern@ee.lbl.gov) of
Lawrence Berkeley Labs.

Vern started off the meeting by presenting the agenda.  In order to ensure that
the attendees were up-to-date, Vern then presented a brief overview of the
scope, charter, and deliverables of the group.

A brief overview of the policy on naming of vendor names followed.  The policy
is currently that it is acceptable to mention vendor names explicitly on the
mailing list, or in informal discussions, but not in official products of the
group, e.g. RFCs.  In general, this is motivated by the belief that information
about bugs in vendor releases tends to become out-of-date rather quickly, and
that embedding such information in documents with a long life-time is
problematic.

Vern then presented an overview of the current format of the I-D on known
TCP problems.  This document is an enumeration of problem descriptions, with
each description containing:
	- Name
	- Classification
	- Description
	- Significance (this is an impact, not a MUST/SHOULD/MAY)
	- Implications
	- Traces (if available)
	- Information about Detection
	- Suggested Fix
	- Any Specific Information pertaining to the problem

Vern gave an overview of the current I-D status.  At present, it has
information about four problems:
	- no slow start
	- no slow start after retransmit
	- retransmission of different data
	- failure to retain above-sequence data (violates SHOULD, not MUST)

The problems remaining to be documented include:
	- initial RTO too low
	- uninitialized congestion window
	- slow-start with two segments
	- violations of delayed ACK rules
	- failure to correctly set PSH
	- Brakmo/Peterson header prediction bug
	- Brakmo/Peterson deflation bug
	- fast retransmit with timestamp sends two segments
	- Dawson keepalive problem
	- failure to correctly implement Nagle's algorithm
	- keepalive behavior (0-byte/1-byte)
	- failure to ACK above-sequence data
	- predictable ISN security problem
	- SYN-flood security problem
	- failure to implement fast retransmit/recovery
	- RTO estimation on slow links
	- replies to random ACKs
	- ICMP error handling
	- half-duplex close ignores subsequent data
	- urgent pointer confusion

At this point, Vern briefly enumerated issues surrounding test tools.  The
chairs are soliciting a list for compilation into an I-D.  The current goal
is to have a first draft of this I-D available by the Munich meeting in August.
There are a few technical issues related to some of the existing tools, such
as Steve Parker's Packet Shell.  These are:
	- a portable "raw socket" interface for such tools to use
	- how to suppress responses from the host TCP

These need to be discussed, either directly between the interested parties, or
on the group mailing list.

Vern closed with a request for volunteers to help with the following areas:
	- reporting any problems not on the current list
	- working on detailed descriptions of known problems for inclusion in
	  the I-D
	- developing new testing tools

This concluded the formal agenda and open discussion began.  The following
issues were raised:
	
	Is the group formally attempting to survey all available
	implementations for the known problems, or in an effort to find
	new issues?  Answer: not really.

	Bob Braden asked what percentage of the attendees are or have been
	implementors.  Relatively few of the attendees appeared to be.

	It was suggested that perhaps bake-offs should be started up again.
	This was generally considered to be a good idea; it is not clear that
	this is the responsibility of the group, and this should be discussed
	on the list and with the IESG.

	Matt Mathis raised the issue of backward compatibility and suggested
	that workarounds for defective implementations should be well-known,
	easy to test for, and easy to disable in the event that they cause
	performance overhead when not actually needed.

	Jim Gettys brought up an issue with connections remaining in
	FIN-WAIT-2 indefinitely.  This appears to be most common when
	using the Apache web server, but may have other causes as well.
	This should be added to the list of known issues.

	Perry Metzger brought up the issue of minimizing memory usage in
	stacks which fail to retain state after the application has closed.
	This probably requires further discussion on the mailing list.

	Bob Briscoe suggested the the specification is ambiguous about slow-
	start after an idle period.  This needs to be reviewed.

	Matt Mathis suggested that part of the above problem is due to failure
	to agree on what how long of a time value is used to determine whether
	or not the connection is "idle."  This borders on a research issue;
	more discussion seems to be needed.

	Perry Metzger brought up the fact that RFC 1948 (Defending Against
	Sequence Number Attacks) is not currently a standards-track document.
	Vern suggested that perhaps the group should recommend it in one of
	the new documents.  We should probably also investigate whether the
	status of the document could (or should) be changed, possibly to
	BCP.

	Ian Heavens raised the issue of problem taxonomy.  This may just be an
	issue of how the current I-D is organized.  We should discuss this
	on the list.

The meeting concluded at this point.

Steve Alexander

------- =_aaaaaaaaaa0--

From owner-tcp-impl@relay.engr.sgi.com  Sat Apr 12 15:32:58 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA02006 for tcp-impl-list; Sat, 12 Apr 1997 15:24:01 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA02002 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 12 Apr 1997 15:23:59 -0700
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA18730 for <tcp-impl@engr.sgi.com>; Sat, 12 Apr 1997 15:23:52 -0700
Message-Id: <199704122223.PAA18730@refugee.engr.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Memphis slides online
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <18721.860883832.1@refugee.engr.sgi.com>
Date: Sat, 12 Apr 1997 15:23:52 -0700
From: Steve Alexander <sca@refugee.engr.sgi.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

http://reality.sgi.com/sca/tcp-impl

-- Steve

From owner-tcp-impl@relay.engr.sgi.com  Mon Apr 14 09:37:05 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA17894 for tcp-impl-list; Mon, 14 Apr 1997 09:33:41 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA17879 for <tcp-impl@relay.engr.SGI.COM>; Mon, 14 Apr 1997 09:33:39 -0700
Received: from zippy.psc.edu (zippy.psc.edu [128.182.61.149]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA27675 for <tcp-impl@relay.engr.SGI.COM>; Mon, 14 Apr 1997 09:33:38 -0700
Received: (from mathis@localhost) by zippy.psc.edu (8.8.5/8.8.2) id MAA26749; Mon, 14 Apr 1997 12:29:30 -0400 (EDT)
Date: Mon, 14 Apr 1997 12:29:30 -0400 (EDT)
Message-Id: <199704141629.MAA26749@zippy.psc.edu>
From: Matt Mathis <mathis@psc.edu>
To: tcp-impl@relay.engr.SGI.COM
In-reply-to: alan@lxorguk.ukuu.org.uk's message of Tue, 1 Apr 1997 18:45:11
	+0100 (BST)
Subject: Re: Keep-Alive size
Reply-to: mathis@psc.edu
References: <199704011558.KAA23362@Twig.Rodents.Montreal.QC.CA>
	<m0wC7cN-0005FHC@lightning.swansea.linux.org.uk>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Sorry to go back on an old thread, but "sending a garbage byte as a
keep-alive" makes testing more complicated.  I would like to
discourage the practice (garbage bytes, not 1 byte keep-alives).

One useful tester would be a sniffer that ubiquitously validates that
the association between <connection 4 tuple>.<sequence number> and
<payload data> is always constant.  Such a sniffer could be run
continuously in all TCP development labs and field test sites etc,
alarming violations.  However, the garbage byte keep-alives makes the
tester more complicated because it has to identify keep-alives and
allow then as exceptions.

The exception logic makes the tester vastly more complicated,
particularly because it needs to sniff both sides of the connection to
monitor the advancing ACKs.  If you can't sniff both sides for some
reason, there are heuristics that might work.  In any case the
exception logic has the potential to mask bugs in the TCP
implementations under test.

It would be much better if the keep-alive was always the correct valid
data for the sequence in question (define pre-SYN data to be zero).
Thus the tester could enforce a strong invariant on *ALL* TCP
connections at all times: that the payload data associated with each
sequence number (extended with a wrap count) is forever constant.

The requires changing "MAY use a garbage byte" to "SHOULD use the
correct data byte", even though the restriction is not needed for
protocol correctness.

Now the meta-question: Is this in scope for TCP-Impl?  It seems to be
that there could be a separate "TCP Implementation guidelines for
improved testability", which need not be a strong as protocol
standard.  Perhaps a BCP?

Thoughts?

--MM--


> > Or, since such a keepalive will always be outside the window (since it
> > duplicates a sequence number that's already been acked - if you have
> > unacked data outstanding, keepalives aren't even an issue), you can use
> > any value you please for the single data byte, as you suggest:
> 
> Nothing says a stack may not use duplicate data over the original. There are
> some very simple embedded stacks that do this to save code. They just do
> something akin to
> 
> 	offset=diff_seq(buff_start, tcp->seq);
> 	len=tcp->len;
> 	len=min(len, buffer_size-offset);
> 	memcpy(...)
> 
> Alan

From owner-tcp-impl@relay.engr.sgi.com  Mon Apr 14 09:41:20 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA19575 for tcp-impl-list; Mon, 14 Apr 1997 09:39:52 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA19566 for <tcp-impl@relay.engr.SGI.COM>; Mon, 14 Apr 1997 09:39:49 -0700
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA29186 for <tcp-impl@relay.engr.SGI.COM>; Mon, 14 Apr 1997 09:39:45 -0700
Received: from brookfield.ans.net (brookfield.ans.net [204.148.1.1]) by brookfield.ans.net (8.7.5/8.7.3) with ESMTP id MAA02296; Mon, 14 Apr 1997 12:34:08 -0400 (EDT)
Message-Id: <199704141634.MAA02296@brookfield.ans.net>
To: Kacheong Poon <poon@cs.wisc.edu>
cc: alan@lxorguk.ukuu.org.uk, tcp-impl@relay.engr.SGI.COM
Reply-To: curtis@ans.net
Subject: Re: TCP buffers 
In-reply-to: Your message of "Thu, 10 Apr 1997 18:07:42 CDT."
             <199704102307.SAA00822@parmesan.cs.wisc.edu> 
Date: Mon, 14 Apr 1997 12:34:07 -0400
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <199704102307.SAA00822@parmesan.cs.wisc.edu>, Kacheong Poon writes:
> Included message from alan@lxorguk.ukuu.org.uk (Alan Cox):
> 
> >----
> >A high RTT requires a high window size to avoid pipeline stalls. It sounds
> >like you are talking more about some of the cases SACK handles. SACK I think
> >is beyond this list
> >----
> 
> A high RTO may be due to long queue in a slow link.  For example, a student
> dials in to school from home to access school's web pages.  The link is low
> latency (~5ms) and low bandwidth (maybe 14.4Kbps).  And the RTO can get to a
> high value because of long queue, in turn because of large window, in the
> dial in server.  I guess the list may not be interested in this situation
> because it does not affect the Internet at large.  But I think if TCP uses a
> good estimate of the bandwidth delay product as the window size instead of
> an arbitrary big window, the problem will be solved for this case and other
> cases like high latency low bandwidth...
> 
> And I am waiting to see SACK to be widely deployed...
> 
> 							Poon.
> 							poon@cs.wisc.edu

At (14.4 kbps / 8 b/B) * 0.005 sec, you get about 9KB/sec through the
pipe.  One 552 byte packet takes 50 msec.  If you keep 8 in flight
(buffer size of 4KB) you get a 400 msec delay.  If you try to keep
32KB in flight, either you get a long delay or at some point the queue
overflows.  If your queue holds the whole 32KB, you get a long delay
but no loss and TCP goes as fast as it possibly can (exactly link
speed).  If you take an isolated loss and your sender implements fast
retransmit and fast recovery correctly, you halve the amount of
packets in flight from the point of loss.  If you take multiple
losses, you can have the window size more than once and also end up
with a timeout.  That is where the fix to fast recovery is needed.

The fix, proposed by Sally Floyd in the appendix of one of her papers,
is to remain in fast recovery until the ACK progression has advanced
all the way to the last packet transmitted when the dupplicate ACKs
where noticed.  Luigi Rizzo has implemented this for FreeBSD TCP in
his SACK and TSACK code.  Luigi gives credit to J. Hoe but doesn't
give a reference, so maybe it was independently discovered.  See the
description of Newreno at http://www.iet.unipi.it/~luigi/sack.html.

The scope of this WG is to identify errors in implementing TCP.  I'm
not sure what error if any you are suggesting the WG document since
TCP makes no estimate of the delay-bandwidth product other than
backing off when a loss occurs.  If you think it should, then that is
a matter for the research community who would be happy to review a
well thought out proposal to fix TCP when you have one.

The sort of fix for fast recovery described above might be slightly
out of scope for this WG since it is just emerging from the research
community in BSD TCPs.  Or it might not be.  Vern?

Curtis

ps- If you are having the problems with a particular host on the
sender end, there is a good chance it doesn't do fast retransmit
and/or it retransmits more than it has to after the drop occurs.
These are implemention errors already documented by this WG.

From owner-tcp-impl@relay.engr.sgi.com  Mon Apr 14 09:57:54 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA24313 for tcp-impl-list; Mon, 14 Apr 1997 09:54:40 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA24302 for <tcp-impl@relay.engr.SGI.COM>; Mon, 14 Apr 1997 09:54:37 -0700
Received: from fab.md.interlink.com (fab.md.interlink.com [138.42.32.80]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id JAA03175 for <tcp-impl@relay.engr.SGI.COM>; Mon, 14 Apr 1997 09:54:34 -0700
Received: by fab.md.interlink.com (5.0/SMI-SVR4)
	id AA02117; Mon, 14 Apr 1997 12:49:20 +0500
Date: Mon, 14 Apr 1997 12:49:20 +0500
From: fab@fab.md.interlink.com (Fred Bohle)
Message-Id: <9704141649.AA02117@fab.md.interlink.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP buffers
X-Sun-Charset: US-ASCII
content-length: 1095
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> Vegas seems
> to do a better job of keeping extra (i.e., queued) packets out of the
> network. That is, if instantaneous throughput doesn't increase when
> the send window is increased, the send window is made a bit smaller
> (i.e., perhaps one segment smaller).

Several years ago a paper was published (around the time the original
Van Jacobson paper was published) which described monitoring a 'power
function', or the data delivered over a connection as a function of time.
If the 'power' did not increase when the cwind was increased, the cwind
was decreased again.  Perhaps someone can dig up this paper (SIGCOMM, I
think),  and consider its impact on congestion.  

It seemed to do what you describe Vegas doing.  It lowers the buffer usage
in the net, where VJ fills the buffers to the max.

Fred

------------------------------------------------------------------------
Fred Bohle			EMAIL: fab@interlink.com
Interlink Computer Sciences	AT&T : 410-992-7750 
9250 Rumsey Road, Suite 200
Columbia, MD 21045-1946
------------------------------------------------------------------------


From owner-tcp-impl@relay.engr.sgi.com  Mon Apr 14 21:56:59 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA04504 for tcp-impl-list; Mon, 14 Apr 1997 21:54:41 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id VAA04475 for <tcp-impl@relay.engr.SGI.COM>; Mon, 14 Apr 1997 21:54:38 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id VAA09025 for <tcp-impl@relay.engr.SGI.COM>; Mon, 14 Apr 1997 21:54:36 -0700
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id VAA01600; Mon, 14 Apr 1997 21:44:34 -0700 (PDT)
Message-Id: <199704150444.VAA01600@daffy.ee.lbl.gov>
To: curtis@ans.net
Cc: Kacheong Poon <poon@cs.wisc.edu>, alan@lxorguk.ukuu.org.uk,
        tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP buffers 
In-reply-to: Your message of Mon, 14 Apr 1997 12:34:07 PDT.
Date: Mon, 14 Apr 1997 21:44:34 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> The scope of this WG is to identify errors in implementing TCP.  I'm
> not sure what error if any you are suggesting the WG document since
> TCP makes no estimate of the delay-bandwidth product other than
> backing off when a loss occurs.  If you think it should, then that is
> a matter for the research community who would be happy to review a
> well thought out proposal to fix TCP when you have one.

Exactly my thoughts, too - interesting area of research, yes; within
tcp-impl scope, no.  So it would probably be more productive to move
the discussion off of tcp-impl and onto end2end-interest.

> The sort of fix for fast recovery described above might be slightly
> out of scope for this WG since it is just emerging from the research
> community in BSD TCPs.  Or it might not be.  Vern?

My hit is that it's out of scope.  With different SACK algorithms in the
works, it's not clear how tweaks to retransmission are going to shake out.
This might be a good item for us to revisit in six months - I've made a
note to do so.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Mon Apr 14 23:33:12 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA02383 for tcp-impl-list; Mon, 14 Apr 1997 23:31:09 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA02364 for <tcp-impl@relay.engr.SGI.COM>; Mon, 14 Apr 1997 23:31:05 -0700
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id XAA22731 for <tcp-impl@relay.engr.SGI.COM>; Mon, 14 Apr 1997 23:31:03 -0700
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id XAA09829; Mon, 14 Apr 1997 23:21:07 -0700
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id XAA11192; Mon, 14 Apr 1997 23:21:06 -0700
Received: from fstop. by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id XAA01417; Mon, 14 Apr 1997 23:21:05 -0700
Received: from fstop by fstop. (SMI-8.6/SMI-SVR4)
	id XAA16026; Mon, 14 Apr 1997 23:19:49 -0700
Message-Id: <199704150619.XAA16026@fstop.>
From: sparker@Eng.Sun.COM
To: mathis@psc.edu
cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: Keep-Alive size 
Date: Mon, 14 Apr 1997 23:19:49 -0700
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


- Sorry to go back on an old thread, but "sending a garbage byte as a
- keep-alive" makes testing more complicated.  I would like to
- discourage the practice (garbage bytes, not 1 byte keep-alives).

You could, but until 99.99% of the TCPs out there were updated to
reflect this, I don't think you'ld gain much.  If I sit down to
write such a tool today, I would feel compeled for the forseeable
future to cope with keepalives having garbage bytes even if an
updated standard was approved tomorrow with your suggestion as
a "MUST".  The point of such a tool, I think, is to cope with real
life, and in real life keepalives have garbage.

Cheers,

	~sparker

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 15 11:37:31 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA24699 for tcp-impl-list; Tue, 15 Apr 1997 11:34:25 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA24691 for <tcp-impl@relay.engr.SGI.COM>; Tue, 15 Apr 1997 11:34:22 -0700
Received: from zippy.psc.edu (zippy.psc.edu [128.182.61.149]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA00501 for <tcp-impl@relay.engr.SGI.COM>; Tue, 15 Apr 1997 11:34:21 -0700
Received: (from mathis@localhost) by zippy.psc.edu (8.8.5/8.8.2) id OAA29372; Tue, 15 Apr 1997 14:30:10 -0400 (EDT)
Date: Tue, 15 Apr 1997 14:30:10 -0400 (EDT)
Message-Id: <199704151830.OAA29372@zippy.psc.edu>
From: Matt Mathis <mathis@psc.edu>
To: sparker@Eng.Sun.COM
cc: tcp-impl@relay.engr.SGI.COM
In-reply-to: sparker@Eng.Sun.COM's message of Mon, 14 Apr 1997 23:19:49 -0700
Subject: Re: Keep-Alive size
Reply-to: mathis@psc.edu
References: <199704150619.XAA16026@fstop.>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> You could, but until 99.99% of the TCPs out there were updated to
> reflect this, I don't think you'ld gain much.  If I sit down to
> write such a tool today, I would feel compeled for the forseeable
> future to cope with keepalives having garbage bytes even if an
> updated standard was approved tomorrow with your suggestion as
> a "MUST".  The point of such a tool, I think, is to cope with real
> life, and in real life keepalives have garbage.

True for wide area testing of other peoples implementations, but this
isn't the goal.  I don't really care about their bugs (although they
might be fun to look for ;-) .

In a TCP development lab/beta test environment I can use address range
checks such that my validation sniffing is focused on my own code
only.  Then I can test TCP under live fire against other older buggy
versions.   If I use true data rather than a garbage byte, I don't
even have to look at packets from foreign TCP's to strongly tests
that I always send the correct bytes.

Gradually the dinosaurs will thin out, and then perhaps we can use
wide area testing to speed the demise of the remainder.


--MM--

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 15 13:48:03 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA02157 for tcp-impl-list; Tue, 15 Apr 1997 13:44:20 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA02144; Tue, 15 Apr 1997 13:44:18 -0700
Received: from zippy.psc.edu (zippy.psc.edu [128.182.61.149]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA02184; Tue, 15 Apr 1997 13:44:17 -0700
Received: (from mathis@localhost) by zippy.psc.edu (8.8.5/8.8.2) id QAA29607; Tue, 15 Apr 1997 16:40:09 -0400 (EDT)
Date: Tue, 15 Apr 1997 16:40:09 -0400 (EDT)
Message-Id: <199704152040.QAA29607@zippy.psc.edu>
From: Matt Mathis <mathis@psc.edu>
To: Steve Alexander <sca@refugee.engr.sgi.com>
Cc: tcp-impl@relay.engr.SGI.COM
In-reply-to: Steve Alexander's message of Sat, 12 Apr 1997 15:05:39 -0700
Subject: Re: Memphis Minutes
Reply-to: mathis@psc.edu
References: <199704122205.PAA18292@refugee.engr.sgi.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Steve,

I have some comments about the minutes:

        Bob Braden asked what percentage of the attendees are or have been
        implementors.  Relatively few of the attendees appeared to be.

He asked two very different questions, and got different responses.
1) Who is currently responsible for supporting a TCP implementation.
2) Who has supported a TCP implementation in the past.

Question 1 got nil response, but the word "responsible" means
management in many contexts (E.g. a coding wizard is never
"responsible" for anything).  Also I wondered if some TCP shops didn't
send non-responsible ringers to collect info and avoid any contamination
issues.
Question 2 got non-nil response, but could mean many things....
Since the interpretations are ambiguous, I would consider dropping the
item, or at least use more nuetral language.  Was "about half" correct?

----------
        Matt Mathis raised the issue of backward compatibility and suggested
        that workarounds for defective implementations should be well-known,
        easy to test for, and easy to disable in the event that they cause
        performance overhead when not actually needed.

I think said something stronger:

"In cases where workarounds interfere with new features there is a
strong market disincentive for the new features because they often
will be the only implementations that fail to interoperate with the
dinosaurs [older buggy versions].  .... tcpimpl should document these
cases and consider recommending against the workarounds, such that both
the dinosaurs and workaround become non-compliant.  Vendors could then
market backward compatibility [as a separate product in an alternate
release] while moving to the new [incompatible] features.

Unfortunately I repeated slightly different version of the above words in
different contexts, and don't remember precisely which were in front of
the tcp-impl mic.

----------
        Matt Mathis suggested that part of the above problem is due to failure
        to agree on what how long of a time value is used to determine whether
        or not the connection is "idle."  This borders on a research issue;
        more discussion seems to be needed.

No, I suggested:

Amending "no slow-start after idle" by inserting the word "long"
before idle.  [I observed] that correct behavior after "short idle" is
an open research question, as are the precise definitions of "long"
and "short".  However there are TCP implementations that do not
slow-start after an hour of idle, and this is unambiguously incorrect,
and not a research question.

----------

Thanks,  It was fun and productive!
--MM--

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 15 13:55:53 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA05392 for tcp-impl-list; Tue, 15 Apr 1997 13:53:48 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA05380 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 15 Apr 1997 13:53:46 -0700
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA16228; Tue, 15 Apr 1997 13:53:39 -0700
Message-Id: <199704152053.NAA16228@refugee.engr.sgi.com>
X-Mailer: exmh version 2.0gamma 1/27/96
To: mathis@psc.edu
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Memphis Minutes 
In-reply-to: Message from mathis@psc.edu of 15 Apr 1997 16:40:09 EDT
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Tue, 15 Apr 1997 13:53:39 -0700
From: Steve Alexander <sca@refugee.engr.sgi.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Matt Mathis <mathis@psc.edu> writes:
>I have some comments about the minutes:

Thanks, I'll revise them prior to sending them in.  If anybody else has
anything, now would be an excellent time to let me know...

-- Steve



From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 15 14:29:00 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA15079 for tcp-impl-list; Tue, 15 Apr 1997 14:24:46 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA14880; Tue, 15 Apr 1997 14:24:03 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA12517; Tue, 15 Apr 1997 14:24:01 -0700
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id OAA03224; Tue, 15 Apr 1997 14:14:05 -0700 (PDT)
Message-Id: <199704152114.OAA03224@daffy.ee.lbl.gov>
To: mathis@psc.edu
Cc: Steve Alexander <sca@refugee.engr.sgi.com>, tcp-impl@relay.engr.SGI.COM
Subject: Re: Memphis Minutes
In-reply-to: Your message of Tue, 15 Apr 1997 16:40:09 PDT.
Date: Tue, 15 Apr 1997 14:14:05 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

(I haven't yet looked over the minutes, being positively buried this week.)

>         Bob Braden asked what percentage of the attendees are or have been
>         implementors.  Relatively few of the attendees appeared to be.
> 
> He asked two very different questions, and got different responses.
> 1) Who is currently responsible for supporting a TCP implementation.
> 2) Who has supported a TCP implementation in the past.
> 
> Question 1 got nil response ...

I remember this differently.  I asked for a show of hands for those who
were current maintainers (not sure what phrasing I used), and reported the
count as something like 30-35.  (I remember it was a range of n,n+5 and
think 30 was the low end - Steve, isn't this in the notes?)  I then asked for
who had maintained a TCP in the past but "now knows better", and that was
either 10-15 or 15-20, I think the former.

> ... Also I wondered if some TCP shops didn't
> send non-responsible ringers to collect info and avoid any contamination
> issues.

Gee, I'd hope that discussing problems in generic terms would mean vendors
don't have to resort to these sorts of steps.  (If any list readers want to
send me comments about this via private, confidential email, I'll summarize
the responses in anonymized form to the list.)

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 15 15:12:49 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA29215 for tcp-impl-list; Tue, 15 Apr 1997 15:08:31 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA29195 for <tcp-impl@relay.engr.SGI.COM>; Tue, 15 Apr 1997 15:08:27 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id PAA24206 for <tcp-impl@relay.engr.SGI.COM>; Tue, 15 Apr 1997 15:08:25 -0700
Received: from ftp.com by ftp.com  ; Tue, 15 Apr 1997 18:00:51 -0400
Received: from mailserv-2high.ftp.com by ftp.com  ; Tue, 15 Apr 1997 18:00:51 -0400
Received: by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id RAA14471; Tue, 15 Apr 1997 17:57:47 -0400
Date: Tue, 15 Apr 1997 17:57:47 -0400
Message-Id: <199704152157.RAA14471@MAILSERV-2HIGH.FTP.COM>
To: mathis@psc.edu
Subject: Re: Memphis Minutes
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: sca@refugee.engr.sgi.com, tcp-impl@relay.engr.SGI.COM
Repository: mailserv-2high.ftp.com, [message accepted at Tue Apr 15 17:57:46 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||        Bob Braden asked what percentage of the attendees are or have been
||        implementors.  Relatively few of the attendees appeared to be.
||
||He asked two very different questions, and got different responses.
||1) Who is currently responsible for supporting a TCP implementation.
||2) Who has supported a TCP implementation in the past.

I was listening over the Mbone and that was one of the few relatively
clear sections and thats what I heard also.  And from my perspective
having done a couple TCP's in my past, but not being a coder now, but
instead a responsible management type with a TCP stack or two in
my chain of command you want to be careful about pushing management
"watchers" away.  Its us who gets to fund the developers who actually
fix the bugs :-)




From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 16 11:11:39 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA11544 for tcp-impl-list; Wed, 16 Apr 1997 11:09:21 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA11505 for <tcp-impl@relay.engr.SGI.COM>; Wed, 16 Apr 1997 11:09:18 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id LAA23433 for <tcp-impl@relay.engr.SGI.COM>; Wed, 16 Apr 1997 11:09:14 -0700
Received: from ftp.com by ftp.com  ; Wed, 16 Apr 1997 14:05:28 -0400
Received: from mailserv-2high.ftp.com by ftp.com  ; Wed, 16 Apr 1997 14:05:28 -0400
Received: from fenway.ftp.com by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id OAA23945; Wed, 16 Apr 1997 14:02:22 -0400
Message-Id: <199704161802.OAA23945@MAILSERV-2HIGH.FTP.COM>
X-Mapi-Messageclass: IPM
To: vern@ee.lbl.gov
Cc: tcp-impl@relay.engr.SGI.COM
X-Mailer: FTP Software Internet Mail 2.0
Mime-Version: 1.0
From: Frank T Solensky <solensky@ftp.com>
Subject: RE: Memphis Minutes
Date: Wed, 16 Apr 1997 14:06:17 -0400
Content-Type: text/plain; charset=US-ASCII; X-MAPIextension=".TXT"
Content-Transfer-Encoding: quoted-printable
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>>Reply to your message of 4/15/97 5:39 PM
>
>I remember this differently.  I asked for a show of hands for those who
>were current maintainers (not sure what phrasing I used), and reported the
>count as something like 30-35.  (I remember it was a range of n,n+5 and
>think 30 was the low end - Steve, isn't this in the notes?)  I then asked =
for
>who had maintained a TCP in the past but "now knows better", and that was
>either 10-15 or 15-20, I think the former.

This is pretty much the way I remember it too: I was in the back of the roo=
m
and wasn't sure I saw even 30 hands out of, what, 300 people?

>> ... Also I wondered if some TCP shops didn't
>> send non-responsible ringers to collect info and avoid any contamination
>> issues.
>
>Gee, I'd hope that discussing problems in generic terms would mean vendors
>don't have to resort to these sorts of steps...

The only motivation for a non-developer to show up that I could think of
was the expectation that the meeting could be used as a tutorial.
							-- Frank


From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 16 12:13:30 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA14576 for tcp-impl-list; Wed, 16 Apr 1997 12:11:41 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA14549 for <tcp-impl@relay.engr.SGI.COM>; Wed, 16 Apr 1997 12:11:36 -0700
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA11279 for <tcp-impl@relay.engr.SGI.COM>; Wed, 16 Apr 1997 12:11:25 -0700
Received: (from mouse@localhost) by Twig.Rodents.Montreal.QC.CA (8.7.5/8.7.3) id OAA27720; Wed, 16 Apr 1997 14:47:11 -0400 (EDT)
Date: Wed, 16 Apr 1997 14:47:11 -0400 (EDT)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199704161847.OAA27720@Twig.Rodents.Montreal.QC.CA>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: Memphis Minutes
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> The only motivation for a non-developer to show up that I could think
> of was the expectation that the meeting could be used as a tutorial.

Speaking personally...I have never written a TCP stack and do not
expect to, which as I see it makes me a non-developer.

However (absent something more important to me playing opposite it), I
would have shown up in person had I been at Memphis at all.

And no, I don't expect - wouldn't have expected - tcp-impl to be a
tutorial.

(So why am I on this list, why would I have attended in person?
Because I believe - perhaps arrogantly - that I understand TCP and have
thoughts on right and wrong ways to do it.  Notice I haven't spoken up
on the list much; this is because I know I don't have the practical
experience of someone who _has_ been deep inside the code.  I speak up
only when I'm reasonably sure I'm not out of my depth.  As now. :-)

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 16 12:53:30 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA03374 for tcp-impl-list; Wed, 16 Apr 1997 12:49:24 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA03288 for <tcp-impl@relay.engr.SGI.COM>; Wed, 16 Apr 1997 12:49:18 -0700
Received: from motgate.mot.com (motgate.mot.com [129.188.136.100]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA20052 for <tcp-impl@relay.engr.SGI.COM>; Wed, 16 Apr 1997 12:49:12 -0700
Received: from pobox.mot.com (pobox.mot.com [129.188.137.100]) by motgate.mot.com (8.7.6/8.6.10/MOT-3.8) with ESMTP id OAA16543 for <tcp-impl@relay.engr.SGI.COM>; Wed, 16 Apr 1997 14:39:06 -0500 (CDT)
Received: from il02dns1.comm.mot.com (il02dns1.comm.mot.com [145.1.3.2]) by pobox.mot.com (8.7.6/8.6.10/MOT-3.8) with ESMTP id OAA26297 for <tcp-impl@relay.engr.SGI.COM>; Wed, 16 Apr 1997 14:38:53 -0500 (CDT)
Received: from magoo.comm.mot.com (magoo.comm.mot.com [145.1.80.34]) by il02dns1.comm.mot.com (8.7.5/8.7.3) with SMTP id OAA05789 for <tcp-impl@relay.engr.SGI.COM>; Wed, 16 Apr 1997 14:03:15 -0500 (CDT)
Received: by magoo.comm.mot.com (4.1/SMI-4.1)
	id AA04917; Wed, 16 Apr 97 14:03:07 CDT
Date: Wed, 16 Apr 97 14:03:07 CDT
From: romano@magoo.comm.mot.com (Guy Romano)
Message-Id: <9704161903.AA04917@magoo.comm.mot.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Header compression and packet IDs
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Van Jacobson header compression works best when the IP packet ID is 
incremented by 1 for each packet sent.  In other words, the difference
between the current and previous packet ID is equal to 1.  It is my
understanding that the difference between the current and previous 
packet ID is included in the compressed header only when it is not equal
to 1.  If the difference is less than 256 but greater than 1 then it 
can be represented in a one byte field.  If the difference is greater 
than or equal to 256 then the difference requires 2 bytes.  I have 
included a section from RFC1144 that supports my point.

   "Finally, the TCP/IP header on the outgoing packet is replaced with a
   compressed header:

     - The change in the packet ID is computed and, if not one,/21/ the
       difference is encoded (note that it may be zero or negative) and the
       I bit is set in the change mask."

At least two different but related stacks increment the packet ID by 256
(in most cases).  In fact I believe that the packet ID is being sent 
in non-network byte order.  The tcpdump trace below shows the first packet
from mip247.139 having a packet ID of 65313.  The next packet sent has a
packet ID of 34.  If the packet ID was in network byte order and the sender
was incrementing the packet ID by 256 then the second packet ID should have
been 33 (65313 + 256 - 65536) or in hex (0xff21 + 0x100 - 0x10000 = 0x21).  
If we do the math in network byte order, 0x21ff + 0x001 = 0x2200.  The computer 
then sent the packet ID in non-network byte order or 0x0022 (34 decimal).


ip 60: mip247.139 > mip248.1027: . ack 2369 win 8576 (DF) (ttl 128, id 65313)
ip 590: mip248.1027 > mip247.139: . 2369:2905(536) ack 109 win 8326 (DF) (ttl 31, id 19459)
ip 590: mip248.1027 > mip247.139: . 2905:3441(536) ack 109 win 8326 (DF) (ttl 31, id 19715)
ip 60: mip247.139 > mip248.1027: . ack 3441 win 8576 (DF) (ttl 128, id 34)
ip 590: mip248.1027 > mip247.139: . 3441:3977(536) ack 109 win 8326 (DF) (ttl 31, id 19971)
ip 450: mip248.1027 > mip247.139: P 3977:4373(396) ack 109 win 8326 (DF) (ttl 31, id 20227)
ip 60: mip247.139 > mip248.1027: . ack 4373 win 8576 (DF) (ttl 128, id 290)
ip 95: mip247.139 > mip248.1027: P 109:150(41) ack 4373 win 8576 (DF) (ttl 12 8, id 546)


An example of a different case:  

ip 450: mip248.1027 > mip247.139: P 16421:16817(396) ack 232 win 8203 (DF) (ttl 31, id 26371)
ip 60: mip247.139 > mip248.1027: . ack 16817 win 8576 (DF) (ttl 128, id 5922)
ip 95: mip247.139 > mip248.1027: P 232:273(41) ack 16817 win 8576 (DF) (ttl 1 28, id 6178)
ip 590: mip248.1027 > mip247.139: . 16817:17353(536) ack 273 win 8162 (DF) (ttl 31, id 26627)
ip 60: mip247.139 > mip248.1027: . ack 17353 win 8576 (DF) (ttl 128, id 6434)
ip 590: mip248.1027 > mip247.139: . 17353:17889(536) ack 273 win 8162 (DF) (ttl 31, id 26883)
ip 60: mip247.139 > mip248.1027: . ack 17889 win 8576 (DF) (ttl 128, id 6690)
ip 590: mip248.1027 > mip247.139: . 17889:18425(536) ack 273 win 8162 (DF) (ttl 31, id 27139)
ip 60: mip247.139 > mip248.1027: . ack 18425 win 8576 (DF) (ttl 128, id 6946)
ip 590: mip248.1027 > mip247.139: . 18425:18961(536) ack 273 win 8162 (DF) (ttl 31, id 27395)


While I have not found the RFC myself I an told that this behavior does not
violate any rules.  

This implementation issue is not directly related to TCP since it is an 
IP header issue.  But it will effect header compressibility and performance 
of TCP packets when header compression is used.  I have checked with Vern
Paxson and this issue is within the scope of this working group.



Guy

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 16 19:28:54 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA18656 for tcp-impl-list; Wed, 16 Apr 1997 19:27:11 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA18646 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 16 Apr 1997 19:27:09 -0700
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id TAA03462; Wed, 16 Apr 1997 19:26:48 -0700
Message-Id: <199704170226.TAA03462@refugee.engr.sgi.com>
X-Mailer: exmh version 2.0gamma 1/27/96
To: Frank T Solensky <solensky@ftp.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Memphis Minutes 
In-reply-to: Message from solensky@ftp.com of 16 Apr 1997 14:06:17 EDT
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 16 Apr 1997 19:26:47 -0700
From: Steve Alexander <sca@refugee.engr.sgi.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Frank T Solensky <solensky@ftp.com> writes:
>The only motivation for a non-developer to show up that I could think of
>was the expectation that the meeting could be used as a tutorial.

I was thinking that perhaps many administrators came in the hope of getting G-2
on bogus implementations; too bad we didn't ask how many SAs there were...

-- Steve



From owner-tcp-impl@relay.engr.sgi.com  Thu Apr 17 02:05:32 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id CAA04882 for tcp-impl-list; Thu, 17 Apr 1997 02:03:28 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id CAA04867 for <tcp-impl@relay.engr.sgi.com>; Thu, 17 Apr 1997 02:03:24 -0700
Received: from scol.sco.com (scol.london.sco.COM [150.126.1.48]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id CAA12777 for <tcp-impl@relay.engr.sgi.com>; Thu, 17 Apr 1997 02:03:19 -0700
Received: from tyne.london.sco.com by scol.sco.COM id aa26618;
          17 Apr 97 9:48 BST
From: Jonathan Webb <jonwe@sco.COM>
To: solensky@ftp.com, vern@ee.lbl.gov
Subject: RE: Memphis Minutes
Cc: tcp-impl@relay.engr.sgi.com
X-Mailer: ScoMail 3.0.Ca
MIME-Version: 1.0
Date: Thu, 17 Apr 1997 9:44:38 +0100 (BST)
Message-ID:  <9704170949.aa06394@tyne.sco.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

There were probably quite a few people like myself in the audience - 
representing the tcp engineering group but not personally hands on 
engineering it.

- Jonathan

	From scol.london.sco.com!sco.sco.com!sgi.com!relay.engr.sgi.com!owner-tcp-impl Wed Apr 16 
19:31:20 1997
	Message-Id: <199704161802.OAA23945@MAILSERV-2HIGH.FTP.COM>
	X-Mapi-Messageclass: IPM
	To: vern@ee.lbl.gov
	Cc: tcp-impl@relay.engr.sgi.com
	X-Mailer: FTP Software Internet Mail 2.0
	Mime-Version: 1.0
	From: Frank T Solensky <solensky@ftp.com>
	Subject: RE: Memphis Minutes
	Date: Wed, 16 Apr 1997 14:06:17 -0400
	Content-Type: text/plain; charset=US-ASCII; X-MAPIextension=".TXT"
	Content-Transfer-Encoding: quoted-printable
	Sender: owner-tcp-impl@relay.engr.sgi.com
	
	>>Reply to your message of 4/15/97 5:39 PM
	>
	>I remember this differently.  I asked for a show of hands for those who
	>were current maintainers (not sure what phrasing I used), and reported the
	>count as something like 30-35.  (I remember it was a range of n,n+5 and
	>think 30 was the low end - Steve, isn't this in the notes?)  I then asked for
	>who had maintained a TCP in the past but "now knows better", and that was
	>either 10-15 or 15-20, I think the former.
	
	This is pretty much the way I remember it too: I was in the back of the room
	and wasn't sure I saw even 30 hands out of, what, 300 people?
	
	>> ... Also I wondered if some TCP shops didn't
	>> send non-responsible ringers to collect info and avoid any contamination
	>> issues.
	>
	>Gee, I'd hope that discussing problems in generic terms would mean vendors
	>don't have to resort to these sorts of steps...
	
	The only motivation for a non-developer to show up that I could think of
	was the expectation that the meeting could be used as a tutorial.
								-- Frank
	
	
_________________________________________________________

Jonathan Webb
Internet Engineering Group

SCO, Croxley Business Park,	Phone: +44 (0)1923 813658
Hatters Lane, Watford		Fax:   +44 (0)1923 813804
WD1 8YN, UK		Email: jonwe@sco.com
_________________________________________________________

From owner-tcp-impl@relay.engr.sgi.com  Mon Apr 21 04:27:59 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA13306 for tcp-impl-list; Sun, 20 Apr 1997 23:34:07 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA13294 for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 20 Apr 1997 23:34:05 -0700
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id XAA23701 for <tcp-impl@engr.sgi.com>; Sun, 20 Apr 1997 23:34:04 -0700
Message-Id: <199704210634.XAA23701@refugee.engr.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Web page moved
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <23294.861604444.1@refugee.engr.sgi.com>
Date: Sun, 20 Apr 1997 23:34:04 -0700
From: Steve Alexander <sca@refugee.engr.sgi.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

The old URL will still work for a while, but the "officially approved" URL is
now:

	http://reality.sgi.com/csp/tcp-impl

-- Steve
"Please make a note of it."

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 11:56:16 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA08378 for tcp-impl-list; Tue, 29 Apr 1997 11:49:06 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA08351 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:49:01 -0700
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA04558
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:48:59 -0700
	env-from (sdawson@eecs.umich.edu)
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id OAA05416; Tue, 29 Apr 1997 14:42:47 -0400 (EDT)
Message-Id: <199704291842.OAA05416@grinch.eecs.umich.edu>
To: tcp-impl@relay.engr.SGI.COM
Cc: sdawson@eecs.umich.edu
Subject: Comment request for a couple "Known TCP Implementation Problems"
From: Scott Dawson <sdawson@eecs.umich.edu>
In-Reply-To: der Mouse's message of Wed, 16 Apr 1997 14:47:11 -0400 (EDT)
Lines: 12
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Tue, 29 Apr 1997 14:42:46 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


I've just finished making a first pass at documenting a couple of
problems with keepalive that occur in some implementations.  I want to
run them by the group and solicit comments on them.  These are from
the list of problems remaining to be documented in the minutes of the
April meeting.

One thing I should mention is that I believe that the "significance"
category is still going to be re-visited at some point.  For now, I've
used just used non-critical.

-Scott

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 11:56:25 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA09148 for tcp-impl-list; Tue, 29 Apr 1997 11:51:13 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA09113 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:51:11 -0700
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA04940
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:50:55 -0700
	env-from (sdawson@eecs.umich.edu)
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id OAA05450; Tue, 29 Apr 1997 14:44:39 -0400 (EDT)
Message-Id: <199704291844.OAA05450@grinch.eecs.umich.edu>
To: tcp-impl@relay.engr.SGI.COM
Cc: sdawson@eecs.umich.edu
Subject: draft description of "Excessively short keepalive connection timeout"
From: Scott Dawson <sdawson@eecs.umich.edu>
Lines: 136
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Tue, 29 Apr 1997 14:44:39 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Name of Problem
     Excessively short keepalive connection timeout

Category
     Reliability

Description
     Keep-alive is a mechanism  for checking whether an idle  connection
     is  still alive.  According  to  RFC-1122, keepalive should only be
     invoked  in server    applications    that might otherwise     hang
     indefinitely  and   consume resources   unnecessarily if  a  client
     crashes or aborts a connection during a network failure.

     RFC-1122   also specifies  that   if  a  keep-alive mechanism    is
     implemented   it  MUST NOT  interpret  failure   to respond to  any
     specific probe as  a dead connection.  The RFC  does not  specify a
     particular mechanism for timing  out a connection when  no response
     is received  for keepalive probes.   However, if the mechanism does
     not allow ample time for recovery from network congestion or delay,
     connections may be timed out unnecessarily.

Significance
     Non-critical

Implications
     It is possible for the network connection between two peer machines
     to become congested  or to exhibit packet  loss at the time  that a
     keep-alive  probe  is  sent on  a  connection.   If  the keep-alive
     mechanism   does    not allow    sufficient time   before  dropping
     connections in  the face of  unacknowledged probes, connections may
     be dropped even when both peers of a connection are still alive.

Relevant RFCs'
     RFC 1122 specifies  that the keep-alive  mechanism may be provided.
     It does not specify a   mechanism for determining dead  connections
     when keepalive probes are not acknowledged.

Trace file demonstrating it
     Made using the   Orchestra tool at the  peer  of the machine  using
     keep-alive.   After connection  establishment, incoming keep-alives
     were dropped by Orchestra to simulate a dead connection.

     22:11:12.040000 A > B: 22666019:0 win 8192 datasz 4 SYN
     22:11:12.060000 B > A: 2496001:22666020 win 4096 datasz 4 SYN ACK
     22:11:12.130000 A > B: 22666020:2496002 win 8760 datasz 0 ACK
     22:11:12.150000 B > A: 2496001:22666020 win 4096 datasz 4 SYN ACK
     22:11:12.180000 A > B: 22666020:2496002 win 8760 datasz 0 ACK

     00:23:00.680000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
     00:23:01.770000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
     00:23:02.870000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
     00:23.03.970000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
     00:23.05.070000 A > B: 22666019:2496002 win 8760 datasz 1 ACK

     The initial five packets are the SYN exchange for connection setup.
     About   two  hours later, the keepalive     timer fires because the
     connection has been idle.  Keepalive probes are transmitted a total
     of 5 times, with a 1 second spacing between probes, after which the
     connection  is dropped.  This  is problematic  because  a 5  second
     network  outage  at the  time of  the  first probe  results  in the
     connection being killed.
     
Trace file demonstrating correct behavior
     Made using  the  Orchestra tool at  the peer  of the  machine using
     keep-alive.  After  connection establishment,  incoming keep-alives
     were dropped by Orchestra to simulate a dead connection.

     16:01:52.130000 A > B: 1804412929:0 win 4096 datasz 4 SYN
     16:01:52.360000 B > A: 16512001:1804412930 win 4096 datasz 4 SYN ACK
     16:01:52.410000 A > B: 1804412930:16512002 win 4096 datasz 0 ACK
     16:01:52.440000 B > A: 17612001:1804412930 win 4096 datasz 4 SYN ACK
     16:01:52.520000 A > B: 1804412930:16512002 win 4096 datasz 0 ACK

     18:01:57.170000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:03:12.220000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:04:27.270000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:05:42.320000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:06:57.370000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:08:12.420000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:09:27.480000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:10:43.290000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:11:57.580000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:13:12.630000 A > B: 1804412929:16512002 win 4096 datasz 0 RST ACK

     In  this trace, when    the keep-alive timer expires,  9  keepalive
     probes are sent at 75 second intervals.  75  seconds after the last
     probe  is sent,  a final RST  segment is  sent indicating that  the
     connection has been closed.  This   implementation waits about   11
     minutes  before  timing   out  the connection,   while    the first
     implementation shown allows only 5 seconds.

References
     This problem is documented in [Dawson97].

How to detect
     For implementations   manifesting this problem,  it  shows up  on a
     packet trace  after the keepalive timer  fires if  the peer machine
     receiving the keepalive   does not respond.  Usually  the keepalive
     timer will  fire at least two  hours after keepalive  is turned on,
     but it may be sooner if the  timer value has been configured lower,
     or if the   keepalive  mechanism violates the  specification   (see
     Insufficient  interval   between   keepalives  problem).   In  this
     example,  suppressing the response of  the peer to keepalive probes
     was    accomplished using the   Orchestra   toolkit,  which  can be
     configured  to drop packets.   It   could also  have  been done  by
     creating a connection, turning  on keepalive, and disconnecting the
     network connection at the receiver machine.

How to fix
     This  problem can be fixed  by using a  different method for timing
     out keepalives that allows a longer period of time to elapse before
     dropping the connection.  For example, the algorithm for timing out
     on dropped data could be used.  Another possibility is an algorithm
     such as the  one shown in the trace  above, which sends 9 probes at
     75 second  intervals and then waits an  additional 75 seconds for a
     response before closing the connection.

6. References

[Dawson97]
     S.  Dawson,  F.   Jahanian, and  T.   Mitton, "Experiments   on Six
     Commercial  TCP Implementations  Using  a Software  Fault Injection
     Tool," to  appear  in  Software Practice &    Experience,  1997.  A
     technical  report version of   this   paper  can be  obtained    at
     ftp://rtcl.eecs.umich.edu/outgoing/sdawson/CSE-TR-298-96.ps.gz.

7. Author's Address

   Scott Dawson <sdawson@eecs.umich.edu>
   Real-Time Computing Laboratory
   EECS Building
   University of Michigan
   Ann Arbor, MI  48109-2122
   USA
   Phone: +1 313/763-5363

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 11:56:26 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA09332 for tcp-impl-list; Tue, 29 Apr 1997 11:51:51 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA09315 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:51:48 -0700
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA05379
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:51:47 -0700
	env-from (sdawson@eecs.umich.edu)
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id OAA05470; Tue, 29 Apr 1997 14:45:52 -0400 (EDT)
Message-Id: <199704291845.OAA05470@grinch.eecs.umich.edu>
To: tcp-impl@relay.engr.SGI.COM
Cc: sdawson@eecs.umich.edu
Subject: draft description of "Insufficient interval between keepalives"
From: Scott Dawson <sdawson@eecs.umich.edu>
Lines: 124
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Tue, 29 Apr 1997 14:45:52 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Name of Problem
     Insufficient interval between keepalives

Category
     Reliability

Description
     Keep-alive is a mechanism  for checking whether an idle  connection
     is still alive.  According to RFC-1122,  keep-alive may be included
     in an implementation.  If   it  is included, the interval   between
     keep-alive packets  MUST be configurable,   and MUST default to  no
     less than two hours.

Significance
     Non-critical

Implications
     According    to   RFC-1122,    keep-alive   is not    required   of
     implementations   because it    could:  (1)  cause  perfectly  good
     connections   to break  during   transient Internet  failures;  (2)
     consume unnecessary bandwidth ("if  no one is using the connection,
     who  cares if   it is  still good?"); and    (3) cost money  for an
     Internet path that charges  for packets.  If keepalive  is provided
     the RFC states  that  the  required inter-keepalive distance   MUST
     default to no less than two hours.  If it does not, the probability
     of connections  breaking    increases, the bandwidth used   due  to
     keepalives  increases, and cost  increases over  paths which charge
     per packet.

Relevant RFCs'
     RFC 1122  specifies that the  keep-alive mechanism may be provided.
     It also  specifies the two   hour minimum for the  default interval
     between keepalive probes.

Trace file demonstrating it
     Made  using the Orchestra tool  at  the peer  of the machine  using
     keep-alive.

     11:36:32.910000 A > B: 3288354305:0      win 28672 datasz 4 SYN
     11:36:32.930000 B > A: 896001:3288354306 win 4096  datasz 4 SYN ACK
     11:36:32.950000 A > B: 3288354306:896002 win 28672 datasz 0 ACK
     11:36:32.970000 B > A: 896001:3288354306 win 4096  datasz 4 SYN ACK
     11:36:33.000000 A > B: 3288354306:896002 win 28672 datasz 0 ACK

     11:50:01.190000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     11:50:01.210000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     12:03:29.410000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     12:03:29.430000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     12:16:57.630000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     12:16:57.650000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     12:30:25.850000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     12:30:25.870000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     12:43:54.070000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     12:43:54.090000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     The initial five packets are the SYN exchange for connection setup.
     About 13  minutes  later, the  keepalive  timer fires  because  the
     connection is  idle.  The keepalive  is acknowledged, and the timer
     fires  again in about 13   more  minutes.  This behavior  continues
     indefinitely until the connection is closed, and  is a violation of
     the specification.

Trace file demonstrating correct behavior
     Made using  the  Orchestra tool at  the peer  of the  machine using
     keep-alive.

     17:37:20.500000 A > B: 34155521:0       win 4096 datasz 4 SYN
     17:37:20.520000 B > A: 6272001:34155522 win 4096 datasz 4 SYN ACK
     17:37:20.540000 A > B: 34155522:6272002 win 4096 datasz 0 ACK
     17:37:20.560000 B > A: 6272001:34155522 win 4096 datasz 4 SYN ACK
     17:37:20.580000 A > B: 34155522:6272002 win 4096 datasz 0 ACK

     19:37:25.430000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     19:37:25.450000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     21:37:30.560000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     21:37:30.570000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     23:37:35.580000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     23:37:35.600000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     01:37:40.620000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     01:37:40.640000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     03:37:45.590000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     03:37:45.610000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     The initial five packets are the SYN exchange for connection setup.
     Just  over two hours later, the  keepalive timer  fires because the
     connection is idle.   The keepalive is  acknowledged, and the timer
     fires   again just over two hours   later.  This behavior continues
     indefinitely until the connection is closed.

References
     This problem is documented in [Dawson97].

How to detect
     For   implementations manifesting this  problem, it   shows up on a
     packet trace.  If the connection is left idle, the keepalive probes
     will arrive closer together than the two hour minimum.

6. References

[Dawson97]
     S.  Dawson,  F.   Jahanian, and  T.   Mitton, "Experiments   on Six
     Commercial  TCP Implementations  Using  a Software  Fault Injection
     Tool," to  appear  in  Software Practice &    Experience,  1997.  A
     technical  report version of   this   paper  can be  obtained    at
     ftp://rtcl.eecs.umich.edu/outgoing/sdawson/CSE-TR-298-96.ps.gz.

7. Author's Address

   Scott Dawson <sdawson@eecs.umich.edu>
   Real-Time Computing Laboratory
   EECS Building
   University of Michigan
   Ann Arbor, MI  48109-2122
   USA
   Phone: +1 313/763-5363

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 12:17:05 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA16668 for tcp-impl-list; Tue, 29 Apr 1997 12:14:36 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA16656 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:14:34 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA11487
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:14:29 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA15978; Wed, 30 Apr 97 00:44:23+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22096; Wed, 30 Apr 1997 00:37:26 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA08585; Tue, 29 Apr 1997 12:04:22 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA09332 for tcp-impl-list; Tue, 29 Apr 1997 11:51:51 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA09315 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:51:48 -0700
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA05379
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:51:47 -0700
	env-from (sdawson@eecs.umich.edu)
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id OAA05470; Tue, 29 Apr 1997 14:45:52 -0400 (EDT)
Message-Id: <199704291845.OAA05470@grinch.eecs.umich.edu>
To: tcp-impl@relay.engr.SGI.COM
Cc: sdawson@eecs.umich.edu
Subject: draft description of "Insufficient interval between keepalives"
From: Scott Dawson <sdawson@eecs.umich.edu>
Lines: 124
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Tue, 29 Apr 1997 14:45:52 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Name of Problem
     Insufficient interval between keepalives

Category
     Reliability

Description
     Keep-alive is a mechanism  for checking whether an idle  connection
     is still alive.  According to RFC-1122,  keep-alive may be included
     in an implementation.  If   it  is included, the interval   between
     keep-alive packets  MUST be configurable,   and MUST default to  no
     less than two hours.

Significance
     Non-critical

Implications
     According    to   RFC-1122,    keep-alive   is not    required   of
     implementations   because it    could:  (1)  cause  perfectly  good
     connections   to break  during   transient Internet  failures;  (2)
     consume unnecessary bandwidth ("if  no one is using the connection,
     who  cares if   it is  still good?"); and    (3) cost money  for an
     Internet path that charges  for packets.  If keepalive  is provided
     the RFC states  that  the  required inter-keepalive distance   MUST
     default to no less than two hours.  If it does not, the probability
     of connections  breaking    increases, the bandwidth used   due  to
     keepalives  increases, and cost  increases over  paths which charge
     per packet.

Relevant RFCs'
     RFC 1122  specifies that the  keep-alive mechanism may be provided.
     It also  specifies the two   hour minimum for the  default interval
     between keepalive probes.

Trace file demonstrating it
     Made  using the Orchestra tool  at  the peer  of the machine  using
     keep-alive.

     11:36:32.910000 A > B: 3288354305:0      win 28672 datasz 4 SYN
     11:36:32.930000 B > A: 896001:3288354306 win 4096  datasz 4 SYN ACK
     11:36:32.950000 A > B: 3288354306:896002 win 28672 datasz 0 ACK
     11:36:32.970000 B > A: 896001:3288354306 win 4096  datasz 4 SYN ACK
     11:36:33.000000 A > B: 3288354306:896002 win 28672 datasz 0 ACK

     11:50:01.190000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     11:50:01.210000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     12:03:29.410000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     12:03:29.430000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     12:16:57.630000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     12:16:57.650000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     12:30:25.850000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     12:30:25.870000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     12:43:54.070000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     12:43:54.090000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     The initial five packets are the SYN exchange for connection setup.
     About 13  minutes  later, the  keepalive  timer fires  because  the
     connection is  idle.  The keepalive  is acknowledged, and the timer
     fires  again in about 13   more  minutes.  This behavior  continues
     indefinitely until the connection is closed, and  is a violation of
     the specification.

Trace file demonstrating correct behavior
     Made using  the  Orchestra tool at  the peer  of the  machine using
     keep-alive.

     17:37:20.500000 A > B: 34155521:0       win 4096 datasz 4 SYN
     17:37:20.520000 B > A: 6272001:34155522 win 4096 datasz 4 SYN ACK
     17:37:20.540000 A > B: 34155522:6272002 win 4096 datasz 0 ACK
     17:37:20.560000 B > A: 6272001:34155522 win 4096 datasz 4 SYN ACK
     17:37:20.580000 A > B: 34155522:6272002 win 4096 datasz 0 ACK

     19:37:25.430000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     19:37:25.450000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     21:37:30.560000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     21:37:30.570000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     23:37:35.580000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     23:37:35.600000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     01:37:40.620000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     01:37:40.640000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     03:37:45.590000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     03:37:45.610000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     The initial five packets are the SYN exchange for connection setup.
     Just  over two hours later, the  keepalive timer  fires because the
     connection is idle.   The keepalive is  acknowledged, and the timer
     fires   again just over two hours   later.  This behavior continues
     indefinitely until the connection is closed.

References
     This problem is documented in [Dawson97].

How to detect
     For   implementations manifesting this  problem, it   shows up on a
     packet trace.  If the connection is left idle, the keepalive probes
     will arrive closer together than the two hour minimum.

6. References

[Dawson97]
     S.  Dawson,  F.   Jahanian, and  T.   Mitton, "Experiments   on Six
     Commercial  TCP Implementations  Using  a Software  Fault Injection
     Tool," to  appear  in  Software Practice &    Experience,  1997.  A
     technical  report version of   this   paper  can be  obtained    at
     ftp://rtcl.eecs.umich.edu/outgoing/sdawson/CSE-TR-298-96.ps.gz.

7. Author's Address

   Scott Dawson <sdawson@eecs.umich.edu>
   Real-Time Computing Laboratory
   EECS Building
   University of Michigan
   Ann Arbor, MI  48109-2122
   USA
   Phone: +1 313/763-5363

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 12:17:06 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA16997 for tcp-impl-list; Tue, 29 Apr 1997 12:15:27 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA16979 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:15:23 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA11620
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:15:15 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA15985; Wed, 30 Apr 97 00:45:00+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22097; Wed, 30 Apr 1997 00:37:36 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA08588; Tue, 29 Apr 1997 12:04:22 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA09148 for tcp-impl-list; Tue, 29 Apr 1997 11:51:13 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA09113 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:51:11 -0700
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA04940
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:50:55 -0700
	env-from (sdawson@eecs.umich.edu)
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id OAA05450; Tue, 29 Apr 1997 14:44:39 -0400 (EDT)
Message-Id: <199704291844.OAA05450@grinch.eecs.umich.edu>
To: tcp-impl@relay.engr.SGI.COM
Cc: sdawson@eecs.umich.edu
Subject: draft description of "Excessively short keepalive connection timeout"
From: Scott Dawson <sdawson@eecs.umich.edu>
Lines: 136
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Tue, 29 Apr 1997 14:44:39 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Name of Problem
     Excessively short keepalive connection timeout

Category
     Reliability

Description
     Keep-alive is a mechanism  for checking whether an idle  connection
     is  still alive.  According  to  RFC-1122, keepalive should only be
     invoked  in server    applications    that might otherwise     hang
     indefinitely  and   consume resources   unnecessarily if  a  client
     crashes or aborts a connection during a network failure.

     RFC-1122   also specifies  that   if  a  keep-alive mechanism    is
     implemented   it  MUST NOT  interpret  failure   to respond to  any
     specific probe as  a dead connection.  The RFC  does not  specify a
     particular mechanism for timing  out a connection when  no response
     is received  for keepalive probes.   However, if the mechanism does
     not allow ample time for recovery from network congestion or delay,
     connections may be timed out unnecessarily.

Significance
     Non-critical

Implications
     It is possible for the network connection between two peer machines
     to become congested  or to exhibit packet  loss at the time  that a
     keep-alive  probe  is  sent on  a  connection.   If  the keep-alive
     mechanism   does    not allow    sufficient time   before  dropping
     connections in  the face of  unacknowledged probes, connections may
     be dropped even when both peers of a connection are still alive.

Relevant RFCs'
     RFC 1122 specifies  that the keep-alive  mechanism may be provided.
     It does not specify a   mechanism for determining dead  connections
     when keepalive probes are not acknowledged.

Trace file demonstrating it
     Made using the   Orchestra tool at the  peer  of the machine  using
     keep-alive.   After connection  establishment, incoming keep-alives
     were dropped by Orchestra to simulate a dead connection.

     22:11:12.040000 A > B: 22666019:0 win 8192 datasz 4 SYN
     22:11:12.060000 B > A: 2496001:22666020 win 4096 datasz 4 SYN ACK
     22:11:12.130000 A > B: 22666020:2496002 win 8760 datasz 0 ACK
     22:11:12.150000 B > A: 2496001:22666020 win 4096 datasz 4 SYN ACK
     22:11:12.180000 A > B: 22666020:2496002 win 8760 datasz 0 ACK

     00:23:00.680000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
     00:23:01.770000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
     00:23:02.870000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
     00:23.03.970000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
     00:23.05.070000 A > B: 22666019:2496002 win 8760 datasz 1 ACK

     The initial five packets are the SYN exchange for connection setup.
     About   two  hours later, the keepalive     timer fires because the
     connection has been idle.  Keepalive probes are transmitted a total
     of 5 times, with a 1 second spacing between probes, after which the
     connection  is dropped.  This  is problematic  because  a 5  second
     network  outage  at the  time of  the  first probe  results  in the
     connection being killed.
     
Trace file demonstrating correct behavior
     Made using  the  Orchestra tool at  the peer  of the  machine using
     keep-alive.  After  connection establishment,  incoming keep-alives
     were dropped by Orchestra to simulate a dead connection.

     16:01:52.130000 A > B: 1804412929:0 win 4096 datasz 4 SYN
     16:01:52.360000 B > A: 16512001:1804412930 win 4096 datasz 4 SYN ACK
     16:01:52.410000 A > B: 1804412930:16512002 win 4096 datasz 0 ACK
     16:01:52.440000 B > A: 17612001:1804412930 win 4096 datasz 4 SYN ACK
     16:01:52.520000 A > B: 1804412930:16512002 win 4096 datasz 0 ACK

     18:01:57.170000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:03:12.220000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:04:27.270000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:05:42.320000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:06:57.370000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:08:12.420000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:09:27.480000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:10:43.290000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:11:57.580000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:13:12.630000 A > B: 1804412929:16512002 win 4096 datasz 0 RST ACK

     In  this trace, when    the keep-alive timer expires,  9  keepalive
     probes are sent at 75 second intervals.  75  seconds after the last
     probe  is sent,  a final RST  segment is  sent indicating that  the
     connection has been closed.  This   implementation waits about   11
     minutes  before  timing   out  the connection,   while    the first
     implementation shown allows only 5 seconds.

References
     This problem is documented in [Dawson97].

How to detect
     For implementations   manifesting this problem,  it  shows up  on a
     packet trace  after the keepalive timer  fires if  the peer machine
     receiving the keepalive   does not respond.  Usually  the keepalive
     timer will  fire at least two  hours after keepalive  is turned on,
     but it may be sooner if the  timer value has been configured lower,
     or if the   keepalive  mechanism violates the  specification   (see
     Insufficient  interval   between   keepalives  problem).   In  this
     example,  suppressing the response of  the peer to keepalive probes
     was    accomplished using the   Orchestra   toolkit,  which  can be
     configured  to drop packets.   It   could also  have  been done  by
     creating a connection, turning  on keepalive, and disconnecting the
     network connection at the receiver machine.

How to fix
     This  problem can be fixed  by using a  different method for timing
     out keepalives that allows a longer period of time to elapse before
     dropping the connection.  For example, the algorithm for timing out
     on dropped data could be used.  Another possibility is an algorithm
     such as the  one shown in the trace  above, which sends 9 probes at
     75 second  intervals and then waits an  additional 75 seconds for a
     response before closing the connection.

6. References

[Dawson97]
     S.  Dawson,  F.   Jahanian, and  T.   Mitton, "Experiments   on Six
     Commercial  TCP Implementations  Using  a Software  Fault Injection
     Tool," to  appear  in  Software Practice &    Experience,  1997.  A
     technical  report version of   this   paper  can be  obtained    at
     ftp://rtcl.eecs.umich.edu/outgoing/sdawson/CSE-TR-298-96.ps.gz.

7. Author's Address

   Scott Dawson <sdawson@eecs.umich.edu>
   Real-Time Computing Laboratory
   EECS Building
   University of Michigan
   Ann Arbor, MI  48109-2122
   USA
   Phone: +1 313/763-5363

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 12:17:07 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA16694 for tcp-impl-list; Tue, 29 Apr 1997 12:14:41 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA16684 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:14:39 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA11494
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:14:32 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA15981; Wed, 30 Apr 97 00:44:50+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22098; Wed, 30 Apr 1997 00:38:03 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA08610; Tue, 29 Apr 1997 12:04:27 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA08378 for tcp-impl-list; Tue, 29 Apr 1997 11:49:06 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA08351 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:49:01 -0700
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA04558
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:48:59 -0700
	env-from (sdawson@eecs.umich.edu)
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id OAA05416; Tue, 29 Apr 1997 14:42:47 -0400 (EDT)
Message-Id: <199704291842.OAA05416@grinch.eecs.umich.edu>
To: tcp-impl@relay.engr.SGI.COM
Cc: sdawson@eecs.umich.edu
Subject: Comment request for a couple "Known TCP Implementation Problems"
From: Scott Dawson <sdawson@eecs.umich.edu>
In-Reply-To: der Mouse's message of Wed, 16 Apr 1997 14:47:11 -0400 (EDT)
Lines: 12
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Tue, 29 Apr 1997 14:42:46 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


I've just finished making a first pass at documenting a couple of
problems with keepalive that occur in some implementations.  I want to
run them by the group and solicit comments on them.  These are from
the list of problems remaining to be documented in the minutes of the
April meeting.

One thing I should mention is that I believe that the "significance"
category is still going to be re-visited at some point.  For now, I've
used just used non-critical.

-Scott

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 12:35:30 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA22246 for tcp-impl-list; Tue, 29 Apr 1997 12:31:26 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA22238 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:31:24 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA15233
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:31:16 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA16148; Wed, 30 Apr 97 01:00:35+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22124; Wed, 30 Apr 1997 00:53:10 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA12594; Tue, 29 Apr 1997 12:19:35 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA16997 for tcp-impl-list; Tue, 29 Apr 1997 12:15:27 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA16979 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:15:23 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA11620
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:15:15 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA15985; Wed, 30 Apr 97 00:45:00+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22097; Wed, 30 Apr 1997 00:37:36 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA08588; Tue, 29 Apr 1997 12:04:22 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA09148 for tcp-impl-list; Tue, 29 Apr 1997 11:51:13 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA09113 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:51:11 -0700
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA04940
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:50:55 -0700
	env-from (sdawson@eecs.umich.edu)
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id OAA05450; Tue, 29 Apr 1997 14:44:39 -0400 (EDT)
Message-Id: <199704291844.OAA05450@grinch.eecs.umich.edu>
To: tcp-impl@relay.engr.SGI.COM
Cc: sdawson@eecs.umich.edu
Subject: draft description of "Excessively short keepalive connection timeout"
From: Scott Dawson <sdawson@eecs.umich.edu>
Lines: 136
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Tue, 29 Apr 1997 14:44:39 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Name of Problem
     Excessively short keepalive connection timeout

Category
     Reliability

Description
     Keep-alive is a mechanism  for checking whether an idle  connection
     is  still alive.  According  to  RFC-1122, keepalive should only be
     invoked  in server    applications    that might otherwise     hang
     indefinitely  and   consume resources   unnecessarily if  a  client
     crashes or aborts a connection during a network failure.

     RFC-1122   also specifies  that   if  a  keep-alive mechanism    is
     implemented   it  MUST NOT  interpret  failure   to respond to  any
     specific probe as  a dead connection.  The RFC  does not  specify a
     particular mechanism for timing  out a connection when  no response
     is received  for keepalive probes.   However, if the mechanism does
     not allow ample time for recovery from network congestion or delay,
     connections may be timed out unnecessarily.

Significance
     Non-critical

Implications
     It is possible for the network connection between two peer machines
     to become congested  or to exhibit packet  loss at the time  that a
     keep-alive  probe  is  sent on  a  connection.   If  the keep-alive
     mechanism   does    not allow    sufficient time   before  dropping
     connections in  the face of  unacknowledged probes, connections may
     be dropped even when both peers of a connection are still alive.

Relevant RFCs'
     RFC 1122 specifies  that the keep-alive  mechanism may be provided.
     It does not specify a   mechanism for determining dead  connections
     when keepalive probes are not acknowledged.

Trace file demonstrating it
     Made using the   Orchestra tool at the  peer  of the machine  using
     keep-alive.   After connection  establishment, incoming keep-alives
     were dropped by Orchestra to simulate a dead connection.

     22:11:12.040000 A > B: 22666019:0 win 8192 datasz 4 SYN
     22:11:12.060000 B > A: 2496001:22666020 win 4096 datasz 4 SYN ACK
     22:11:12.130000 A > B: 22666020:2496002 win 8760 datasz 0 ACK
     22:11:12.150000 B > A: 2496001:22666020 win 4096 datasz 4 SYN ACK
     22:11:12.180000 A > B: 22666020:2496002 win 8760 datasz 0 ACK

     00:23:00.680000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
     00:23:01.770000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
     00:23:02.870000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
     00:23.03.970000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
     00:23.05.070000 A > B: 22666019:2496002 win 8760 datasz 1 ACK

     The initial five packets are the SYN exchange for connection setup.
     About   two  hours later, the keepalive     timer fires because the
     connection has been idle.  Keepalive probes are transmitted a total
     of 5 times, with a 1 second spacing between probes, after which the
     connection  is dropped.  This  is problematic  because  a 5  second
     network  outage  at the  time of  the  first probe  results  in the
     connection being killed.
     
Trace file demonstrating correct behavior
     Made using  the  Orchestra tool at  the peer  of the  machine using
     keep-alive.  After  connection establishment,  incoming keep-alives
     were dropped by Orchestra to simulate a dead connection.

     16:01:52.130000 A > B: 1804412929:0 win 4096 datasz 4 SYN
     16:01:52.360000 B > A: 16512001:1804412930 win 4096 datasz 4 SYN ACK
     16:01:52.410000 A > B: 1804412930:16512002 win 4096 datasz 0 ACK
     16:01:52.440000 B > A: 17612001:1804412930 win 4096 datasz 4 SYN ACK
     16:01:52.520000 A > B: 1804412930:16512002 win 4096 datasz 0 ACK

     18:01:57.170000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:03:12.220000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:04:27.270000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:05:42.320000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:06:57.370000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:08:12.420000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:09:27.480000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:10:43.290000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:11:57.580000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:13:12.630000 A > B: 1804412929:16512002 win 4096 datasz 0 RST ACK

     In  this trace, when    the keep-alive timer expires,  9  keepalive
     probes are sent at 75 second intervals.  75  seconds after the last
     probe  is sent,  a final RST  segment is  sent indicating that  the
     connection has been closed.  This   implementation waits about   11
     minutes  before  timing   out  the connection,   while    the first
     implementation shown allows only 5 seconds.

References
     This problem is documented in [Dawson97].

How to detect
     For implementations   manifesting this problem,  it  shows up  on a
     packet trace  after the keepalive timer  fires if  the peer machine
     receiving the keepalive   does not respond.  Usually  the keepalive
     timer will  fire at least two  hours after keepalive  is turned on,
     but it may be sooner if the  timer value has been configured lower,
     or if the   keepalive  mechanism violates the  specification   (see
     Insufficient  interval   between   keepalives  problem).   In  this
     example,  suppressing the response of  the peer to keepalive probes
     was    accomplished using the   Orchestra   toolkit,  which  can be
     configured  to drop packets.   It   could also  have  been done  by
     creating a connection, turning  on keepalive, and disconnecting the
     network connection at the receiver machine.

How to fix
     This  problem can be fixed  by using a  different method for timing
     out keepalives that allows a longer period of time to elapse before
     dropping the connection.  For example, the algorithm for timing out
     on dropped data could be used.  Another possibility is an algorithm
     such as the  one shown in the trace  above, which sends 9 probes at
     75 second  intervals and then waits an  additional 75 seconds for a
     response before closing the connection.

6. References

[Dawson97]
     S.  Dawson,  F.   Jahanian, and  T.   Mitton, "Experiments   on Six
     Commercial  TCP Implementations  Using  a Software  Fault Injection
     Tool," to  appear  in  Software Practice &    Experience,  1997.  A
     technical  report version of   this   paper  can be  obtained    at
     ftp://rtcl.eecs.umich.edu/outgoing/sdawson/CSE-TR-298-96.ps.gz.

7. Author's Address

   Scott Dawson <sdawson@eecs.umich.edu>
   Real-Time Computing Laboratory
   EECS Building
   University of Michigan
   Ann Arbor, MI  48109-2122
   USA
   Phone: +1 313/763-5363

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 12:35:32 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA22081 for tcp-impl-list; Tue, 29 Apr 1997 12:30:50 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA22067 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:30:48 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA15018
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:30:43 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA16149; Wed, 30 Apr 97 01:00:56+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22125; Wed, 30 Apr 1997 00:53:20 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA12607; Tue, 29 Apr 1997 12:19:38 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA16694 for tcp-impl-list; Tue, 29 Apr 1997 12:14:41 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA16684 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:14:39 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA11494
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:14:32 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA15981; Wed, 30 Apr 97 00:44:50+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22098; Wed, 30 Apr 1997 00:38:03 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA08610; Tue, 29 Apr 1997 12:04:27 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA08378 for tcp-impl-list; Tue, 29 Apr 1997 11:49:06 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA08351 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:49:01 -0700
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA04558
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:48:59 -0700
	env-from (sdawson@eecs.umich.edu)
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id OAA05416; Tue, 29 Apr 1997 14:42:47 -0400 (EDT)
Message-Id: <199704291842.OAA05416@grinch.eecs.umich.edu>
To: tcp-impl@relay.engr.SGI.COM
Cc: sdawson@eecs.umich.edu
Subject: Comment request for a couple "Known TCP Implementation Problems"
From: Scott Dawson <sdawson@eecs.umich.edu>
In-Reply-To: der Mouse's message of Wed, 16 Apr 1997 14:47:11 -0400 (EDT)
Lines: 12
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Tue, 29 Apr 1997 14:42:46 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


I've just finished making a first pass at documenting a couple of
problems with keepalive that occur in some implementations.  I want to
run them by the group and solicit comments on them.  These are from
the list of problems remaining to be documented in the minutes of the
April meeting.

One thing I should mention is that I believe that the "significance"
category is still going to be re-visited at some point.  For now, I've
used just used non-critical.

-Scott

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 12:35:31 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA22871 for tcp-impl-list; Tue, 29 Apr 1997 12:33:33 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA22866 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:33:32 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA15811
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:33:24 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA16183; Wed, 30 Apr 97 01:03:16+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22138; Wed, 30 Apr 1997 00:55:35 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA12609; Tue, 29 Apr 1997 12:19:38 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA16668 for tcp-impl-list; Tue, 29 Apr 1997 12:14:36 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA16656 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:14:34 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA11487
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:14:29 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA15978; Wed, 30 Apr 97 00:44:23+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22096; Wed, 30 Apr 1997 00:37:26 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA08585; Tue, 29 Apr 1997 12:04:22 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA09332 for tcp-impl-list; Tue, 29 Apr 1997 11:51:51 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA09315 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:51:48 -0700
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA05379
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:51:47 -0700
	env-from (sdawson@eecs.umich.edu)
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id OAA05470; Tue, 29 Apr 1997 14:45:52 -0400 (EDT)
Message-Id: <199704291845.OAA05470@grinch.eecs.umich.edu>
To: tcp-impl@relay.engr.SGI.COM
Cc: sdawson@eecs.umich.edu
Subject: draft description of "Insufficient interval between keepalives"
From: Scott Dawson <sdawson@eecs.umich.edu>
Lines: 124
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Tue, 29 Apr 1997 14:45:52 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Name of Problem
     Insufficient interval between keepalives

Category
     Reliability

Description
     Keep-alive is a mechanism  for checking whether an idle  connection
     is still alive.  According to RFC-1122,  keep-alive may be included
     in an implementation.  If   it  is included, the interval   between
     keep-alive packets  MUST be configurable,   and MUST default to  no
     less than two hours.

Significance
     Non-critical

Implications
     According    to   RFC-1122,    keep-alive   is not    required   of
     implementations   because it    could:  (1)  cause  perfectly  good
     connections   to break  during   transient Internet  failures;  (2)
     consume unnecessary bandwidth ("if  no one is using the connection,
     who  cares if   it is  still good?"); and    (3) cost money  for an
     Internet path that charges  for packets.  If keepalive  is provided
     the RFC states  that  the  required inter-keepalive distance   MUST
     default to no less than two hours.  If it does not, the probability
     of connections  breaking    increases, the bandwidth used   due  to
     keepalives  increases, and cost  increases over  paths which charge
     per packet.

Relevant RFCs'
     RFC 1122  specifies that the  keep-alive mechanism may be provided.
     It also  specifies the two   hour minimum for the  default interval
     between keepalive probes.

Trace file demonstrating it
     Made  using the Orchestra tool  at  the peer  of the machine  using
     keep-alive.

     11:36:32.910000 A > B: 3288354305:0      win 28672 datasz 4 SYN
     11:36:32.930000 B > A: 896001:3288354306 win 4096  datasz 4 SYN ACK
     11:36:32.950000 A > B: 3288354306:896002 win 28672 datasz 0 ACK
     11:36:32.970000 B > A: 896001:3288354306 win 4096  datasz 4 SYN ACK
     11:36:33.000000 A > B: 3288354306:896002 win 28672 datasz 0 ACK

     11:50:01.190000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     11:50:01.210000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     12:03:29.410000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     12:03:29.430000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     12:16:57.630000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     12:16:57.650000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     12:30:25.850000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     12:30:25.870000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     12:43:54.070000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     12:43:54.090000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     The initial five packets are the SYN exchange for connection setup.
     About 13  minutes  later, the  keepalive  timer fires  because  the
     connection is  idle.  The keepalive  is acknowledged, and the timer
     fires  again in about 13   more  minutes.  This behavior  continues
     indefinitely until the connection is closed, and  is a violation of
     the specification.

Trace file demonstrating correct behavior
     Made using  the  Orchestra tool at  the peer  of the  machine using
     keep-alive.

     17:37:20.500000 A > B: 34155521:0       win 4096 datasz 4 SYN
     17:37:20.520000 B > A: 6272001:34155522 win 4096 datasz 4 SYN ACK
     17:37:20.540000 A > B: 34155522:6272002 win 4096 datasz 0 ACK
     17:37:20.560000 B > A: 6272001:34155522 win 4096 datasz 4 SYN ACK
     17:37:20.580000 A > B: 34155522:6272002 win 4096 datasz 0 ACK

     19:37:25.430000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     19:37:25.450000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     21:37:30.560000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     21:37:30.570000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     23:37:35.580000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     23:37:35.600000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     01:37:40.620000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     01:37:40.640000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     03:37:45.590000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     03:37:45.610000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     The initial five packets are the SYN exchange for connection setup.
     Just  over two hours later, the  keepalive timer  fires because the
     connection is idle.   The keepalive is  acknowledged, and the timer
     fires   again just over two hours   later.  This behavior continues
     indefinitely until the connection is closed.

References
     This problem is documented in [Dawson97].

How to detect
     For   implementations manifesting this  problem, it   shows up on a
     packet trace.  If the connection is left idle, the keepalive probes
     will arrive closer together than the two hour minimum.

6. References

[Dawson97]
     S.  Dawson,  F.   Jahanian, and  T.   Mitton, "Experiments   on Six
     Commercial  TCP Implementations  Using  a Software  Fault Injection
     Tool," to  appear  in  Software Practice &    Experience,  1997.  A
     technical  report version of   this   paper  can be  obtained    at
     ftp://rtcl.eecs.umich.edu/outgoing/sdawson/CSE-TR-298-96.ps.gz.

7. Author's Address

   Scott Dawson <sdawson@eecs.umich.edu>
   Real-Time Computing Laboratory
   EECS Building
   University of Michigan
   Ann Arbor, MI  48109-2122
   USA
   Phone: +1 313/763-5363

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 12:49:44 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA27904 for tcp-impl-list; Tue, 29 Apr 1997 12:47:50 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA27896 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:47:48 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA19918
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:47:37 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA16366; Wed, 30 Apr 97 01:17:27+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22180; Wed, 30 Apr 1997 01:10:30 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA17106; Tue, 29 Apr 1997 12:37:57 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA22081 for tcp-impl-list; Tue, 29 Apr 1997 12:30:50 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA22067 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:30:48 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA15018
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:30:43 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA16149; Wed, 30 Apr 97 01:00:56+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22125; Wed, 30 Apr 1997 00:53:20 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA12607; Tue, 29 Apr 1997 12:19:38 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA16694 for tcp-impl-list; Tue, 29 Apr 1997 12:14:41 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA16684 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:14:39 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA11494
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:14:32 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA15981; Wed, 30 Apr 97 00:44:50+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22098; Wed, 30 Apr 1997 00:38:03 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA08610; Tue, 29 Apr 1997 12:04:27 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA08378 for tcp-impl-list; Tue, 29 Apr 1997 11:49:06 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA08351 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:49:01 -0700
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA04558
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:48:59 -0700
	env-from (sdawson@eecs.umich.edu)
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id OAA05416; Tue, 29 Apr 1997 14:42:47 -0400 (EDT)
Message-Id: <199704291842.OAA05416@grinch.eecs.umich.edu>
To: tcp-impl@relay.engr.SGI.COM
Cc: sdawson@eecs.umich.edu
Subject: Comment request for a couple "Known TCP Implementation Problems"
From: Scott Dawson <sdawson@eecs.umich.edu>
In-Reply-To: der Mouse's message of Wed, 16 Apr 1997 14:47:11 -0400 (EDT)
Lines: 12
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Tue, 29 Apr 1997 14:42:46 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


I've just finished making a first pass at documenting a couple of
problems with keepalive that occur in some implementations.  I want to
run them by the group and solicit comments on them.  These are from
the list of problems remaining to be documented in the minutes of the
April meeting.

One thing I should mention is that I believe that the "significance"
category is still going to be re-visited at some point.  For now, I've
used just used non-critical.

-Scott

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 12:50:33 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA28420 for tcp-impl-list; Tue, 29 Apr 1997 12:49:12 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA28409 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:49:10 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA20287
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:49:02 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA16379; Wed, 30 Apr 97 01:17:52+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22179; Wed, 30 Apr 1997 01:09:57 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA17087; Tue, 29 Apr 1997 12:37:53 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA22871 for tcp-impl-list; Tue, 29 Apr 1997 12:33:33 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA22866 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:33:32 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA15811
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:33:24 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA16183; Wed, 30 Apr 97 01:03:16+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22138; Wed, 30 Apr 1997 00:55:35 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA12609; Tue, 29 Apr 1997 12:19:38 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA16668 for tcp-impl-list; Tue, 29 Apr 1997 12:14:36 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA16656 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:14:34 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA11487
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:14:29 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA15978; Wed, 30 Apr 97 00:44:23+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22096; Wed, 30 Apr 1997 00:37:26 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA08585; Tue, 29 Apr 1997 12:04:22 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA09332 for tcp-impl-list; Tue, 29 Apr 1997 11:51:51 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA09315 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:51:48 -0700
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA05379
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:51:47 -0700
	env-from (sdawson@eecs.umich.edu)
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id OAA05470; Tue, 29 Apr 1997 14:45:52 -0400 (EDT)
Message-Id: <199704291845.OAA05470@grinch.eecs.umich.edu>
To: tcp-impl@relay.engr.SGI.COM
Cc: sdawson@eecs.umich.edu
Subject: draft description of "Insufficient interval between keepalives"
From: Scott Dawson <sdawson@eecs.umich.edu>
Lines: 124
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Tue, 29 Apr 1997 14:45:52 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Name of Problem
     Insufficient interval between keepalives

Category
     Reliability

Description
     Keep-alive is a mechanism  for checking whether an idle  connection
     is still alive.  According to RFC-1122,  keep-alive may be included
     in an implementation.  If   it  is included, the interval   between
     keep-alive packets  MUST be configurable,   and MUST default to  no
     less than two hours.

Significance
     Non-critical

Implications
     According    to   RFC-1122,    keep-alive   is not    required   of
     implementations   because it    could:  (1)  cause  perfectly  good
     connections   to break  during   transient Internet  failures;  (2)
     consume unnecessary bandwidth ("if  no one is using the connection,
     who  cares if   it is  still good?"); and    (3) cost money  for an
     Internet path that charges  for packets.  If keepalive  is provided
     the RFC states  that  the  required inter-keepalive distance   MUST
     default to no less than two hours.  If it does not, the probability
     of connections  breaking    increases, the bandwidth used   due  to
     keepalives  increases, and cost  increases over  paths which charge
     per packet.

Relevant RFCs'
     RFC 1122  specifies that the  keep-alive mechanism may be provided.
     It also  specifies the two   hour minimum for the  default interval
     between keepalive probes.

Trace file demonstrating it
     Made  using the Orchestra tool  at  the peer  of the machine  using
     keep-alive.

     11:36:32.910000 A > B: 3288354305:0      win 28672 datasz 4 SYN
     11:36:32.930000 B > A: 896001:3288354306 win 4096  datasz 4 SYN ACK
     11:36:32.950000 A > B: 3288354306:896002 win 28672 datasz 0 ACK
     11:36:32.970000 B > A: 896001:3288354306 win 4096  datasz 4 SYN ACK
     11:36:33.000000 A > B: 3288354306:896002 win 28672 datasz 0 ACK

     11:50:01.190000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     11:50:01.210000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     12:03:29.410000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     12:03:29.430000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     12:16:57.630000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     12:16:57.650000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     12:30:25.850000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     12:30:25.870000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     12:43:54.070000 A > B: 3288354305:896002 win 28672 datasz 0 ACK
     12:43:54.090000 B > A: 896002:3288354306 win 4096  datasz 0 ACK

     The initial five packets are the SYN exchange for connection setup.
     About 13  minutes  later, the  keepalive  timer fires  because  the
     connection is  idle.  The keepalive  is acknowledged, and the timer
     fires  again in about 13   more  minutes.  This behavior  continues
     indefinitely until the connection is closed, and  is a violation of
     the specification.

Trace file demonstrating correct behavior
     Made using  the  Orchestra tool at  the peer  of the  machine using
     keep-alive.

     17:37:20.500000 A > B: 34155521:0       win 4096 datasz 4 SYN
     17:37:20.520000 B > A: 6272001:34155522 win 4096 datasz 4 SYN ACK
     17:37:20.540000 A > B: 34155522:6272002 win 4096 datasz 0 ACK
     17:37:20.560000 B > A: 6272001:34155522 win 4096 datasz 4 SYN ACK
     17:37:20.580000 A > B: 34155522:6272002 win 4096 datasz 0 ACK

     19:37:25.430000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     19:37:25.450000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     21:37:30.560000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     21:37:30.570000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     23:37:35.580000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     23:37:35.600000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     01:37:40.620000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     01:37:40.640000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     03:37:45.590000 A > B: 34155521:6272002 win 4096 datasz 0 ACK
     03:37:45.610000 B > A: 6272002:34155522 win 4096 datasz 0 ACK

     The initial five packets are the SYN exchange for connection setup.
     Just  over two hours later, the  keepalive timer  fires because the
     connection is idle.   The keepalive is  acknowledged, and the timer
     fires   again just over two hours   later.  This behavior continues
     indefinitely until the connection is closed.

References
     This problem is documented in [Dawson97].

How to detect
     For   implementations manifesting this  problem, it   shows up on a
     packet trace.  If the connection is left idle, the keepalive probes
     will arrive closer together than the two hour minimum.

6. References

[Dawson97]
     S.  Dawson,  F.   Jahanian, and  T.   Mitton, "Experiments   on Six
     Commercial  TCP Implementations  Using  a Software  Fault Injection
     Tool," to  appear  in  Software Practice &    Experience,  1997.  A
     technical  report version of   this   paper  can be  obtained    at
     ftp://rtcl.eecs.umich.edu/outgoing/sdawson/CSE-TR-298-96.ps.gz.

7. Author's Address

   Scott Dawson <sdawson@eecs.umich.edu>
   Real-Time Computing Laboratory
   EECS Building
   University of Michigan
   Ann Arbor, MI  48109-2122
   USA
   Phone: +1 313/763-5363

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 12:50:41 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA28474 for tcp-impl-list; Tue, 29 Apr 1997 12:49:21 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA28463 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:49:19 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA20314
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:49:10 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA16381; Wed, 30 Apr 97 01:17:53+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22181; Wed, 30 Apr 1997 01:10:41 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA17107; Tue, 29 Apr 1997 12:37:57 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA22246 for tcp-impl-list; Tue, 29 Apr 1997 12:31:26 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA22238 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:31:24 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA15233
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:31:16 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA16148; Wed, 30 Apr 97 01:00:35+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22124; Wed, 30 Apr 1997 00:53:10 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA12594; Tue, 29 Apr 1997 12:19:35 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA16997 for tcp-impl-list; Tue, 29 Apr 1997 12:15:27 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA16979 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:15:23 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA11620
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 12:15:15 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA15985; Wed, 30 Apr 97 00:45:00+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22097; Wed, 30 Apr 1997 00:37:36 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA08588; Tue, 29 Apr 1997 12:04:22 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA09148 for tcp-impl-list; Tue, 29 Apr 1997 11:51:13 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA09113 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:51:11 -0700
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA04940
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 11:50:55 -0700
	env-from (sdawson@eecs.umich.edu)
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id OAA05450; Tue, 29 Apr 1997 14:44:39 -0400 (EDT)
Message-Id: <199704291844.OAA05450@grinch.eecs.umich.edu>
To: tcp-impl@relay.engr.SGI.COM
Cc: sdawson@eecs.umich.edu
Subject: draft description of "Excessively short keepalive connection timeout"
From: Scott Dawson <sdawson@eecs.umich.edu>
Lines: 136
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Tue, 29 Apr 1997 14:44:39 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Name of Problem
     Excessively short keepalive connection timeout

Category
     Reliability

Description
     Keep-alive is a mechanism  for checking whether an idle  connection
     is  still alive.  According  to  RFC-1122, keepalive should only be
     invoked  in server    applications    that might otherwise     hang
     indefinitely  and   consume resources   unnecessarily if  a  client
     crashes or aborts a connection during a network failure.

     RFC-1122   also specifies  that   if  a  keep-alive mechanism    is
     implemented   it  MUST NOT  interpret  failure   to respond to  any
     specific probe as  a dead connection.  The RFC  does not  specify a
     particular mechanism for timing  out a connection when  no response
     is received  for keepalive probes.   However, if the mechanism does
     not allow ample time for recovery from network congestion or delay,
     connections may be timed out unnecessarily.

Significance
     Non-critical

Implications
     It is possible for the network connection between two peer machines
     to become congested  or to exhibit packet  loss at the time  that a
     keep-alive  probe  is  sent on  a  connection.   If  the keep-alive
     mechanism   does    not allow    sufficient time   before  dropping
     connections in  the face of  unacknowledged probes, connections may
     be dropped even when both peers of a connection are still alive.

Relevant RFCs'
     RFC 1122 specifies  that the keep-alive  mechanism may be provided.
     It does not specify a   mechanism for determining dead  connections
     when keepalive probes are not acknowledged.

Trace file demonstrating it
     Made using the   Orchestra tool at the  peer  of the machine  using
     keep-alive.   After connection  establishment, incoming keep-alives
     were dropped by Orchestra to simulate a dead connection.

     22:11:12.040000 A > B: 22666019:0 win 8192 datasz 4 SYN
     22:11:12.060000 B > A: 2496001:22666020 win 4096 datasz 4 SYN ACK
     22:11:12.130000 A > B: 22666020:2496002 win 8760 datasz 0 ACK
     22:11:12.150000 B > A: 2496001:22666020 win 4096 datasz 4 SYN ACK
     22:11:12.180000 A > B: 22666020:2496002 win 8760 datasz 0 ACK

     00:23:00.680000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
     00:23:01.770000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
     00:23:02.870000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
     00:23.03.970000 A > B: 22666019:2496002 win 8760 datasz 1 ACK
     00:23.05.070000 A > B: 22666019:2496002 win 8760 datasz 1 ACK

     The initial five packets are the SYN exchange for connection setup.
     About   two  hours later, the keepalive     timer fires because the
     connection has been idle.  Keepalive probes are transmitted a total
     of 5 times, with a 1 second spacing between probes, after which the
     connection  is dropped.  This  is problematic  because  a 5  second
     network  outage  at the  time of  the  first probe  results  in the
     connection being killed.
     
Trace file demonstrating correct behavior
     Made using  the  Orchestra tool at  the peer  of the  machine using
     keep-alive.  After  connection establishment,  incoming keep-alives
     were dropped by Orchestra to simulate a dead connection.

     16:01:52.130000 A > B: 1804412929:0 win 4096 datasz 4 SYN
     16:01:52.360000 B > A: 16512001:1804412930 win 4096 datasz 4 SYN ACK
     16:01:52.410000 A > B: 1804412930:16512002 win 4096 datasz 0 ACK
     16:01:52.440000 B > A: 17612001:1804412930 win 4096 datasz 4 SYN ACK
     16:01:52.520000 A > B: 1804412930:16512002 win 4096 datasz 0 ACK

     18:01:57.170000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:03:12.220000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:04:27.270000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:05:42.320000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:06:57.370000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:08:12.420000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:09:27.480000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:10:43.290000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:11:57.580000 A > B: 1804412929:16512002 win 4096 datasz 0 ACK
     18:13:12.630000 A > B: 1804412929:16512002 win 4096 datasz 0 RST ACK

     In  this trace, when    the keep-alive timer expires,  9  keepalive
     probes are sent at 75 second intervals.  75  seconds after the last
     probe  is sent,  a final RST  segment is  sent indicating that  the
     connection has been closed.  This   implementation waits about   11
     minutes  before  timing   out  the connection,   while    the first
     implementation shown allows only 5 seconds.

References
     This problem is documented in [Dawson97].

How to detect
     For implementations   manifesting this problem,  it  shows up  on a
     packet trace  after the keepalive timer  fires if  the peer machine
     receiving the keepalive   does not respond.  Usually  the keepalive
     timer will  fire at least two  hours after keepalive  is turned on,
     but it may be sooner if the  timer value has been configured lower,
     or if the   keepalive  mechanism violates the  specification   (see
     Insufficient  interval   between   keepalives  problem).   In  this
     example,  suppressing the response of  the peer to keepalive probes
     was    accomplished using the   Orchestra   toolkit,  which  can be
     configured  to drop packets.   It   could also  have  been done  by
     creating a connection, turning  on keepalive, and disconnecting the
     network connection at the receiver machine.

How to fix
     This  problem can be fixed  by using a  different method for timing
     out keepalives that allows a longer period of time to elapse before
     dropping the connection.  For example, the algorithm for timing out
     on dropped data could be used.  Another possibility is an algorithm
     such as the  one shown in the trace  above, which sends 9 probes at
     75 second  intervals and then waits an  additional 75 seconds for a
     response before closing the connection.

6. References

[Dawson97]
     S.  Dawson,  F.   Jahanian, and  T.   Mitton, "Experiments   on Six
     Commercial  TCP Implementations  Using  a Software  Fault Injection
     Tool," to  appear  in  Software Practice &    Experience,  1997.  A
     technical  report version of   this   paper  can be  obtained    at
     ftp://rtcl.eecs.umich.edu/outgoing/sdawson/CSE-TR-298-96.ps.gz.

7. Author's Address

   Scott Dawson <sdawson@eecs.umich.edu>
   Real-Time Computing Laboratory
   EECS Building
   University of Michigan
   Ann Arbor, MI  48109-2122
   USA
   Phone: +1 313/763-5363

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 13:02:23 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA02372 for tcp-impl-list; Tue, 29 Apr 1997 12:59:51 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA02355; Tue, 29 Apr 1997 12:59:48 -0700
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA23344; Tue, 29 Apr 1997 12:59:47 -0700
	env-from (sdawson@eecs.umich.edu)
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id PAA06203; Tue, 29 Apr 1997 15:53:53 -0400 (EDT)
Message-Id: <199704291953.PAA06203@grinch.eecs.umich.edu>
To: Steve Alexander <sca@refugee.engr.sgi.com>
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: Memphis Minutes
References: <199704122205.PAA18292@refugee.engr.sgi.com>
From: Scott Dawson <sdawson@eecs.umich.edu>
In-Reply-To: Steve Alexander's message of Sat, 12 Apr 1997 15:05:39 -0700
Lines: 9
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Tue, 29 Apr 1997 15:53:53 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Steve,

As you've probably noticed by now, my posts to the list have gotten
duplicated quite a few times.  Any ideas how to fix it?

Thanks,
-Scott


From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 13:14:09 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA06757 for tcp-impl-list; Tue, 29 Apr 1997 13:12:15 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA06746 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 29 Apr 1997 13:12:13 -0700
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (970321.SGI.8.8.5/950213.SGI.AUTOCF) via ESMTP id NAA05837; Tue, 29 Apr 1997 13:12:11 -0700 (PDT)
Message-Id: <199704292012.NAA05837@refugee.engr.sgi.com>
X-Mailer: exmh version 2.0gamma 1/27/96
To: Scott Dawson <sdawson@eecs.umich.edu>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Memphis Minutes 
In-reply-to: Message from sdawson@eecs.umich.edu of 29 Apr 1997 15:53:53 EDT
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Tue, 29 Apr 1997 13:12:11 -0700
From: Steve Alexander <sca@refugee.engr.sgi.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Scott Dawson <sdawson@eecs.umich.edu> writes:
>As you've probably noticed by now, my posts to the list have gotten
>duplicated quite a few times.  Any ideas how to fix it?

Yes, I noticed ;->

Analysis of the Received: lines leads me to believe that somebody on the
list was reflecting mail back to the list.  The duplicate messages all seem to
go through the same site.

The address in question has been removed; hopefully that will correct the
problem, but we'll keep an eye on it.

Thanks,
-- Steve



From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 13:35:13 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA14106 for tcp-impl-list; Tue, 29 Apr 1997 13:33:00 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA14101 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 13:32:58 -0700
Received: from milan.doe.ernet.in ([202.41.99.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id NAA02110
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 13:32:53 -0700
	env-from (owner-tcp-impl@relay.engr.SGI.COM)
Received: from cdacb.ernet.in by milan.doe.ernet.in (4.1/SMI-4.1)
	id AA16782; Wed, 30 Apr 97 01:50:45+050
Received: from sgi.sgi.com (SGI.COM) by cdacb.ernet.in (5.x/SMI-SVR4)
	id AA22235; Wed, 30 Apr 1997 01:43:23 -0500
Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA25724; Tue, 29 Apr 1997 13:07:42 -0700
	env-from (owner-tcp-impl@cthulhu.engr.sgi.com)
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA02372 for tcp-impl-list; Tue, 29 Apr 1997 12:59:51 -0700
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA02355; Tue, 29 Apr 1997 12:59:48 -0700
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA23344; Tue, 29 Apr 1997 12:59:47 -0700
	env-from (sdawson@eecs.umich.edu)
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id PAA06203; Tue, 29 Apr 1997 15:53:53 -0400 (EDT)
Message-Id: <199704291953.PAA06203@grinch.eecs.umich.edu>
To: Steve Alexander <sca@refugee.engr.sgi.com>
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: Memphis Minutes
References: <199704122205.PAA18292@refugee.engr.sgi.com>
From: Scott Dawson <sdawson@eecs.umich.edu>
In-Reply-To: Steve Alexander's message of Sat, 12 Apr 1997 15:05:39 -0700
Lines: 9
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Tue, 29 Apr 1997 15:53:53 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Steve,

As you've probably noticed by now, my posts to the list have gotten
duplicated quite a few times.  Any ideas how to fix it?

Thanks,
-Scott


From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 15:03:25 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA09205 for tcp-impl-list; Tue, 29 Apr 1997 14:58:33 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA09170 for <tcp-impl@relay.engr.sgi.com>; Tue, 29 Apr 1997 14:58:27 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id OAA22564
	for <tcp-impl@relay.engr.sgi.com>; Tue, 29 Apr 1997 14:58:23 -0700
	env-from (braden@ISI.EDU)
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-24)
	id <AA25014>; Tue, 29 Apr 1997 14:54:35 -0700
Date: Tue, 29 Apr 97 14:56:07 PDT
From: braden@ISI.EDU
Posted-Date: Tue, 29 Apr 97 14:56:07 PDT
Message-Id: <9704292156.AA08133@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA08133>; Tue, 29 Apr 97 14:56:07 PDT
To: tcp-impl@relay.engr.sgi.com, sdawson@eecs.umich.edu
Subject: TCP keep-alives
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


BTW, The Host Requirements treatment of TCP keepalives was basically a
compromise.  As usual for the topic of transport-level keep-alives,
there were vocal minorities for and against, plus a centrist majority
that did not especially like keep-alives but was unwilling to prevent
their implementation.  So the compromise was to go as far as we could
to make keep-alives disappear without actually banning them...

Bob Braden


From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 15:54:53 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA21981 for tcp-impl-list; Tue, 29 Apr 1997 15:46:28 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA21955 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 15:46:25 -0700
Received: from databus.databus.com (databus.databus.com [198.186.154.34]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id PAA03763
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 15:46:20 -0700
	env-from (barney@databus.databus.com)
From: Barney Wolff <barney@databus.com>
To: tcp-impl@relay.engr.SGI.COM
Date: Tue, 29 Apr 1997 18:28 EDT
Subject: Re: draft description of "Insufficient interval between keepalives"
Content-Length: 1077
Content-Type: text/plain
Message-ID: <3366799c0.4393@databus.databus.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Scott Dawson <sdawson@eecs.umich.edu>
> Date: Tue, 29 Apr 1997 14:45:52 -0400

I don't understand what specifically you are complaining about.

While the *default* keepalive interval MUST be no less that 2 hours,
system A may have been configured with a shorter interval, which is
specifically allowed.  Only if system A's default configuration results
in keepalives of less than 2 hours is there ground for complaint.  I
take "configurable" here to mean by, for example, kernel reconfiguration.
So the application itself may not be able to control the keepalive
interval, only whether keepalives are used or not.

As a practical matter, I have often needed the ability to configure
a keepalive interval of a few minutes, and considered the bandwidth
used well worth the assurance of peer survival, in cases where the
higher-level protocols had no easy keepalive mechanism of their own.
I would prefer the ability to set the keepalive interval from an
application on a specific connection, but alas that is not required
by RFC 1122.

Barney Wolff  <barney@databus.com>

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 16:07:22 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA26751 for tcp-impl-list; Tue, 29 Apr 1997 16:02:44 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA26715 for <tcp-impl@engr.sgi.com>; Tue, 29 Apr 1997 16:02:34 -0700
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA07366
	for <tcp-impl@engr.sgi.com>; Tue, 29 Apr 1997 16:02:33 -0700
	env-from (sparker@fstop.Sun.COM)
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id PAA01292 for <tcp-impl@engr.sgi.com>; Tue, 29 Apr 1997 15:52:39 -0700
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id PAA04315; Tue, 29 Apr 1997 15:52:37 -0700
Received: from fstop. by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id PAA08755; Tue, 29 Apr 1997 15:52:37 -0700
Received: from fstop by fstop. (SMI-8.6/SMI-SVR4)
	id PAA19551; Tue, 29 Apr 1997 15:50:44 -0700
Message-Id: <199704292250.PAA19551@fstop.>
From: sparker@Eng.Sun.COM
To: tcp-impl@engr.sgi.com
cc: cschmec@Eng.Sun.COM
Subject: Packet Shell 4.0 release now available
Date: Tue, 29 Apr 1997 15:50:44 -0700
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


At the initial TCP implementer's BOF I described a TCL/Tk-based protocol
testing tool which we use for, among other things, TCP protocol testing.
A prior release was available publicly, but had some rough edges.  :)

Also, this release contains test scripts for two of the TCP problems
outlined in Vern's first internet draft for this group.  I will write
more about these soon.

A new release of the Packet Shell is now available.  This new release
includes:

	Updates to TCL 7.6/Tk 4.2
	Addition of BLT 2.1 extension (bit map graphics support)
	Improved test harness
	Complete man pages
	Improved protocol support  (TLI and IP6 extension protocols added)
	Improved built-in help
	GUI-based "packet builder" - to assist script writing
	Reads tcpdump capture files
	Additional tests

Pre-compiled, SVR4 packages are available for SunOS 5.x in:

	ftp://playground.sun.com/pub/psh

for both sparc and x86.  To install, fetch those binaries and:

	# pkgadd -d tcltk-sparc
	# pkgadd -d psh-sparc-40

For non-SunOS systems, please fetch psh-src.tar.Z and consult the README.

See the web page:

	http://playground.sun.com/psh

for more information.

Cheers,

	~sparker

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 16:09:00 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA28516 for tcp-impl-list; Tue, 29 Apr 1997 16:07:41 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA28489 for <tcp-impl@relay.engr.sgi.com>; Tue, 29 Apr 1997 16:07:38 -0700
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA08463
	for <tcp-impl@relay.engr.sgi.com>; Tue, 29 Apr 1997 16:07:33 -0700
	env-from (mouse@Twig.Rodents.Montreal.QC.CA)
Received: (from mouse@localhost) by Twig.Rodents.Montreal.QC.CA (8.7.5/8.7.3) id TAA03619; Tue, 29 Apr 1997 19:03:41 -0400 (EDT)
Date: Tue, 29 Apr 1997 19:03:41 -0400 (EDT)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199704292303.TAA03619@Twig.Rodents.Montreal.QC.CA>
To: tcp-impl@relay.engr.sgi.com
Subject: Re: TCP keep-alives
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> The Host Requirements treatment of TCP keepalives was basically a
> compromise.
(Gee, what a surprise. :-)
> So the compromise was to go as far as we could to make keep-alives
> disappear without actually banning them...

Well, I'm glad you didn't ban them, because I've run into a situation
where they are essential.

At a certain company whom I'll call XYZ, there is a firewall box that,
among other things, maintains state for every TCP connection that's
open through it.

This is relevant because the firewall box has a timeout on that
per-connection state.  A connection idle for longer than that timeout
will have its firewall state dropped, and thereafter its packets won't
get through when it de-idles.  Keepalives fix this.  (You don't want to
know how long it took me to figure out what was killing my connections.)

What's the opinion of the list here?  If keepalives are to be
condemned, it seems to me we will also have to condemn anything else
which causes an idle connection to be destroyed, such as a stateful
router that drops packets for which it has no state (if nothing else,
the box could crash).  (Yes, I realize RFC1122 _didn't_ condemn
keepalives, but it sounds as though it tried to come as close as it
could.)

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 16:37:27 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA07381 for tcp-impl-list; Tue, 29 Apr 1997 16:31:42 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA07335 for <tcp-impl@relay.engr.sgi.com>; Tue, 29 Apr 1997 16:31:34 -0700
Received: from all-purpose-gunk.near.net (all-purpose-gunk.near.net [199.94.220.184]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA13908
	for <tcp-impl@relay.engr.sgi.com>; Tue, 29 Apr 1997 16:31:33 -0700
	env-from (jhawk@bbnplanet.com)
Received: (from jhawk@localhost)
	by all-purpose-gunk.near.net (8.8.5/8.8.5) id TAA09236;
	Tue, 29 Apr 1997 19:27:39 -0400 (EDT)
From: John Hawkinson <jhawk@bbnplanet.com>
Message-Id: <199704292327.TAA09236@all-purpose-gunk.near.net>
Subject: Re: TCP keep-alives
To: mouse@rodents.montreal.qc.ca (der Mouse)
Date: Tue, 29 Apr 1997 19:27:39 -0400 (EDT)
Cc: tcp-impl@relay.engr.sgi.com
In-Reply-To: <199704292303.TAA03619@Twig.Rodents.Montreal.QC.CA> from "der Mouse" at Apr 29, 97 07:03:41 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Well, I'm glad you didn't ban them, because I've run into a situation
> where they are essential.

A question of definitions, surely :-)

> At a certain company whom I'll call XYZ, there is a firewall box that,
> among other things, maintains state for every TCP connection that's
> open through it.
> 
> This is relevant because the firewall box has a timeout on that
> per-connection state.  A connection idle for longer than that timeout
> will have its firewall state dropped, and thereafter its packets won't
> get through when it de-idles.  Keepalives fix this.  (You don't want to
> know how long it took me to figure out what was killing my connections.)

Obviously such a firewall is not expected to interoperate with
all specification-compliant uses of the TCP; some parts of BBN
are behind such a firewall -- when it was installed, people
stuck behind were able to get the vendor to produce a modification
such that the timeout was increased to something on the order of days.

> What's the opinion of the list here?  If keepalives are to be
> condemned, it seems to me we will also have to condemn anything else
> which causes an idle connection to be destroyed, such as a stateful
> router that drops packets for which it has no state (if nothing else,
> the box could crash).

Perhaps such routers should in fact be condemned. Or at least,
encouraged to use sensible state aging policies (i.e. age the state
when there is too much of it, not after an arbitrary period).

--jhawk

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 16:37:29 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA08098 for tcp-impl-list; Tue, 29 Apr 1997 16:34:08 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA08077 for <tcp-impl@relay.engr.sgi.com>; Tue, 29 Apr 1997 16:34:05 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id QAA14290
	for <tcp-impl@relay.engr.sgi.com>; Tue, 29 Apr 1997 16:33:54 -0700
	env-from (braden@ISI.EDU)
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-24)
	id <AA08583>; Tue, 29 Apr 1997 16:25:29 -0700
Date: Tue, 29 Apr 97 16:26:53 PDT
From: braden@ISI.EDU
Posted-Date: Tue, 29 Apr 97 16:26:53 PDT
Message-Id: <9704292326.AA08253@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA08253>; Tue, 29 Apr 97 16:26:53 PDT
To: tcp-impl@relay.engr.sgi.com, mouse@rodents.montreal.qc.ca
Subject: Re: TCP keep-alives
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk



  *> 
  *> What's the opinion of the list here?  If keepalives are to be
  *> condemned, it seems to me we will also have to condemn anything else
  *> which causes an idle connection to be destroyed, such as a stateful
  *> router that drops packets for which it has no state (if nothing else,
  *> the box could crash).

I have a feeling we are getting outside the legitimate area for this
list, but...

Stateful router?  That is precisely why Internet routers do not have
per-flow state [[and why make RSVP-type reservations for TCP
connections is a bad idea]].  An idle TCP connection should tie up no
resources except at the end systems.  (This is one reason why I think
IPSEC is architecturally superior to application relays for firewalls.)

If your idle connection through a firewall is a Telnet connection,
for example, it would be reasonable for your Telnet client to send
Telnet-level keep-alives... some null negotiation, for example.

If the router crashes, presumably the Internet routing protocols find a
working route, if possible.

I guess you can tell my own prejudice on this subject.

Bob Braden

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 16:43:02 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA10054 for tcp-impl-list; Tue, 29 Apr 1997 16:41:34 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA10035 for <tcp-impl@relay.engr.sgi.com>; Tue, 29 Apr 1997 16:41:32 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id QAA16051
	for <tcp-impl@relay.engr.sgi.com>; Tue, 29 Apr 1997 16:41:31 -0700
	env-from (braden@ISI.EDU)
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-24)
	id <AA09309>; Tue, 29 Apr 1997 16:37:40 -0700
Date: Tue, 29 Apr 97 16:39:10 PDT
From: braden@ISI.EDU
Posted-Date: Tue, 29 Apr 97 16:39:10 PDT
Message-Id: <9704292339.AA08275@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA08275>; Tue, 29 Apr 97 16:39:10 PDT
To: tcp-impl@relay.engr.sgi.com
Subject: From the historical record... Keepalive messages
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk



I just delved into my Host Requirements RFC directory, and found the
following sampling of opinions about TCP keepalives, from various
Internet luminaries (including Vint Cerf Himself!).  For your
entertainment...

Bob Braden

----- Begin Included Message -----

>From @A.ISI.EDU:tcp-ip-RELAY@SRI-NIC.ARPA Fri Mar 18 06:44:43 1988
Date: 15 Mar 88 08:58:00 GMT
From: thumper!karn@faline.bellcore.com  (Phil R. Karn)
Organization: Bell Communications Research, Inc
Subject: Re: TCP Keep-alives, also push bit
References: <900@rlgvax.UUCP>
Sender: tcp-ip-request@sri-nic.arpa
To: tcp-ip@sri-nic.arpa
Content-Length: 2175
X-Lines: 39

In BSD at least, keepalives are implemented by sending a TCP segment
containing a single byte of "garbage". However, the SEQ field is one
less than the receiver is expecting, so it is not accepted as real data.

When the receiver sees an "old" data packet (i.e., a packet containing
data that has already been acked, i.e., the sequence number in the
header plus the length of its data is less than the receiver's RCV.NXT)
it is required by the spec to send a segment with its next expected
sequence number, i.e., RCV.NXT, in the ACK field.  (This is primarily
intended to prevent deadlock in normal data transfer should an
acknowledgment packet be lost.) The "polling" TCP uses this "do-nothing
ACK" as the indication that the remote host is still there. So hosts
that don't respond properly to BSD-style keepalives are most likely not
following the spec.

Having said this, I should point out that keepalives are NOT a formal
part of the TCP spec. I also think they're a very bad idea.  I didn't
always feel this way. However, some long and frustrating experiences
with slow, unreliable and often expensive network paths (amateur packet
radio, as well as commercial X.25 networks that charge for every packet)
have turned me into a crusader against keepalive pinging.  It simply
isn't worth the cost, especially when there's no way for me to tell the
other end to cease and desist.

Besides, the whole philosophy of TCP and the datagram approach to
networking was supposed to be reliability and robustness in the face of
network problems. Why should the system gratuitously close a connection
just because the network path happens to go down for a few minutes? If
the connection has been idle during the entire outage, the user wouldn't
even know (or care) that there had been a problem, as long as it's back
by the time he sends more data. But keepalive pinging will make SURE he
knows in the most annoying way possible!

In the same category are TCPs that immediately close a connection when they
get an ICMP Unreachable message. At most they should abort connections
only before they are established; once established they should serve as
diagnostic messages only.

Phil


----- End Included Message -----


----- Begin Included Message -----

>From tcp-ip-RELAY@SRI-NIC.ARPA Sat Jun  3 03:07:16 1989
Date: 28 May 1989 14:23-EDT
Sender: CERF@A.ISI.EDU
Subject: Re: SO_KEEPALIVE considered harmful?
From: CERF@A.ISI.EDU
To: voder!pyramid!nsc!icldata!altos86!elxsi!beatnix!mre@UCBVAX.BERKELEY.EDU
Cc: tcp-ip@SRI-NIC.ARPA
In-Reply-To: <2681@elxsi.UUCP>
Content-Length: 1225
X-Lines: 24

When TCP was first designed, and for all subsequent versions, it was
thought inappropriate to impose any kind of semantics on the logical
connections extablished by TCP. In particular, no sense of absolute
timeout for the severing of a connection was desired. We thought that
such notions of "impatience" or "time to give up" ought to be the
choice of the upper level protocol using TCP as the basis merely for
reliable delivery.

A part of this view stemmed from the fact that the networks over which
TCP had to function, for the DoD applications we had in mind, were
potentially very unpredictable as to loss and delay. Mobile packet
radio systems had to function under jamming and radio shadow effects,
for instance. TCP never unilaterally severed connections but only
reported failure to achieve positive acknowledgement after a time
which could be controlled by the application or upper-level protocol.
It was up to the application to decide whether to sever the connection
and, even then, the choice to do so gracefully or abruptly was also
left to the application.

The use of a feature (X-level NOP) to test the liveness of a TCP
connection is consonant with the model against which the TCP was
designed. 

Vint Cerf


----- End Included Message -----


----- Begin Included Message -----

>From tcp-ip-RELAY@SRI-NIC.ARPA Sun Jun  4 11:02:20 1989
Date: Tue, 30 May 89 11:59:13 EDT
From: jas@proteon.com (John A. Shriver)
To: xanth!nic.MR.NET!ns!jmh@g.ms.uky.edu
Cc: tcp-ip@sri-nic.arpa
In-Reply-To: (1606's message of 23 May 89 21:02:32 GMT <1409@ns.network.com>
Subject: keep-alive
Content-Length: 1657
X-Lines: 37

I can detect loss of connectivity real easily.  Just type a character.
If the connection won't work, it will time out, or get an immediate
TCP reset. 

However, I have no desire to watch my Telnet windows just go *pop*
just because there was a lack of connectivity for thirty seconds.
This happens quite often in the real world, like when one route goes
away, and the routing protocols have to re-settle.  In the interim,
the keepalives will cause ICMP error messages to be sent (which
routers *must* send to meet RFC 1009), and the connection will be
gratuitously shot.

Keep-alives are, from my point of view, keep-deads.  They guaruntee
that the my connection *will die* any time there is any momentary
network outage.  Keep-alives are absolutely contrary to the robustness
principle.  (See TCP RFC.)

In my experience, keep-alives kill far more live connections than dead
ones. 

If we're going to retain the stupid idea of keep-alives, lets add a
session protocol to TCP-IP to put the connection back together after
the keep-alive kills it.  However, since I doubt many people want to
add a session protocol to TCP, I'd rather kill keep-alives.

Let's quit trying to bend over backwards to make the TCP/IP specs
match the 4.xBSD implementations.  They were experimental, and not
fully conformant to the specs.  They were optimized for a local
network, not an Internet.  The specs work, lets meet them.  Fix the
code.


If you want the server Telnet host to be able to clean up your
connection, many systems have inactivity timers, which can kill it for
you.  Please don't break our TCP protocol because of a limitation in
some other operating system.



----- End Included Message -----


----- Begin Included Message -----

>From tcp-ip-RELAY@SRI-NIC.ARPA Thu Jun  8 21:18:35 1989
From: karels%okeeffe.Berkeley.EDU@ucbvax.Berkeley.EDU (Mike Karels)
To: karn@thumper.bellcore.com (Phil R. Karn)
Cc: tcp-ip@sri-nic.arpa
Subject: Re: SO_KEEPALIVE considered harmful? 
In-Reply-To: Your message of Fri, 26 May 89 19:47:25 EDT.
Date: Thu, 08 Jun 89 16:28:37 PDT
Content-Length: 2079
X-Lines: 39

Sorry, I can't let this go by without commenting on Phil's message
and this discussion, even though the discussion has mostly died down.
(I haven't been reading tcp-ip very often, but noticed this subject
line going by.)

Last time Phil and I talked about keepalives in person, I asked him
whether he had problems with telnet/rlogin servers accumulating on
his systems if they didn't use keepalives.  We certainly accumulate
junk, including xterm programs, waiting for input from a half-open
connection.  Phil told me that he doesn't have problems, because
he runs a "wall" every night to force output to all users, and of
course breaking connections that time out.  In other words, Phil
violently objects to servers requesting keepalives from TCP, but
allows the system manager (himself) to force them above the application
level.  And before people jump up to point out the difference in time
scales, the current BSD code sends no keepalive packets until a connection
has been idle for 2 hr, and that interval is easily changeable.
One proposal for the Host Requirements document was to wait for 12 hr.
I think that's a bit high, but the difference is only a factor of 6.
Compare the number of keepalive packets with the number of packets
exchanged by an xterm and an X server over the course of a week
if used 4 hours a day!

Phil says:
	... I'd go a little further, though,
	and say that a REMOTE USER (not just the application code) must always
	be able to turn off keepalives, even on binary-only systems. It does no
	good to say "the application must be able to disable keepalives" when
	I'm having problems with a remote server that I have no administrative
	control over.

I'm sorry, Phil, but remote users have no more right to override system
management policies than do local users (at least on *our* systems!).
On some of the systems where I have guest accounts, local or remote
users are logged off if they aren't active for two hours.  I don't like
that, either, but I don't claim that the managers of those systems
have no right to enforce such a policy.

		Mike


----- End Included Message -----


----- Begin Included Message -----

>From tcp-ip-RELAY@SRI-NIC.ARPA Sat Jun 10 10:22:29 1989
Date: Sat, 10 Jun 89 08:34:23 PDT
From: Dave Crocker <dcrocker@ahwahnee.Stanford.EDU>
Subject: Re: SO_KEEPALIVE considered harmful?
To: dcrocker@ahwahnee.stanford.edu, stev@vax.ftp.com, tcp-ip@sri-nic.arpa
Content-Length: 1258
X-Lines: 29

Steve,

Let me try, one last time:

If the application can direct TCP as to the periodicity and the action
to be taken (notify application vs. abort connection) then the application
will not abort your connection unless the application programmer decided
to force that condition.  Under proper design, the programmer will give the
user a switch to set, indicating something about the "persistance" that
is desired.

With respect to having the mechanism in tcp or the application, I agree with
you, philosophically, that the mechanism should be in the application (although
I believe the OSI model would put it into the session layer, but that seems
mostly to be part of the application process, these days.

The major issues, however, are kernel vs. user space, and additional
complexity to the application protocol.

There is a remarkable economy that derives from puting this mechanism
into the kernel/transport system.  It may be an accident that TCP does
not have the mechanism but can be tricked into creating one, but it still
is remarkably simple.

Most application protocols have very simple interaction styles and tend to
be relatively easy to program.  To force time-based generation of action
would complexify these protocols significantly.

Dave


----- End Included Message -----


From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 17:55:58 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA00285 for tcp-impl-list; Tue, 29 Apr 1997 17:54:12 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA00266 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 17:54:10 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id RAA01369
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 17:54:07 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Tue, 29 Apr 1997 20:50:21 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Tue, 29 Apr 1997 20:50:21 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id UAA09796; Tue, 29 Apr 1997 20:51:50 -0400
Date: Tue, 29 Apr 1997 20:51:50 -0400
Message-Id: <199704300051.UAA09796@MAILSERV-2HIGH-A.FTP.COM>
To: tcp-impl@relay.engr.SGI.COM
Subject: TCP keep-alives & Reality bites
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Repository: mailserv-2high-a.ftp.com, [message accepted at Tue Apr 29 20:51:44 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Now lets make it clear that I find keepalives as evil as the next person
and wish they could be vaporized from the face of the earth.  However
in Windows-Land (and perhaps some of the Microsoft watchers on the list
might chime up also) we have all these somewhat shall we say - 
network-inexperienced - application programmers who do not understand either
network programming or the concept of multiple CPU synchronization.

For a variety of reasons in writing their client/server applications
using winsock they think they have very good reasons for requiring 
keep-alive, often for ridiculous periods like 60 or 90 seconds.So they
use setsockopt to turn on keepalive and call us, telling us that keepalive
doesn't work because nothing has gone out.

Many times their insistance on a shirt keepalive is to get instant
feedback when one end of the connection has gone away.  We all know
that there are better ways to determine this than keepalive and that
short keepalives can provide false positives in complex networks.

We tell them our keepalive is 2 hr's and they protest.  Loudly.  Often
a mutual customer is involved and competitive products with shorter
keepalives are mentioned.

After a few years of this, perhaps 7 or 8 years back,we provided a
configuration parameter by which a user could set our keepalive to
any value they wanted.

So application vendor A now delivers with their sterling and well
received application a readme that states something to the effect:

"keepalives on the FTP stack take too darned long and adversely affect
the throughput of our sterling app.  Simply twiddle the following
parameter and the FTP stack will perform much better".

Lovely.  For that one app.

However, the user has now set a persistant and global parameter, which
has set our keepalive down to a ridiculously low value so professional
TCP types like you and I can take traces of a connection spewing keepalives
and draw the wrong conclusion as to the culprit.

My point, which I perhaps have belaboured a bit too long, is that keepalive
timers are not just a stack issue, but also an issue of educating application
developers as to what keepalives are used for and what they should not
be used for.

L.



From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 17:57:33 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA00966 for tcp-impl-list; Tue, 29 Apr 1997 17:56:17 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA00937 for <tcp-impl@engr.sgi.com>; Tue, 29 Apr 1997 17:56:15 -0700
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA01607
	for <tcp-impl@engr.sgi.com>; Tue, 29 Apr 1997 17:56:06 -0700
	env-from (sparker@fstop.Sun.COM)
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id RAA13505 for <tcp-impl@engr.sgi.com>; Tue, 29 Apr 1997 17:46:11 -0700
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id RAA25656; Tue, 29 Apr 1997 17:46:09 -0700
Received: from fstop. by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id RAA29671; Tue, 29 Apr 1997 17:46:08 -0700
Received: from fstop by fstop. (SMI-8.6/SMI-SVR4)
	id RAA20112; Tue, 29 Apr 1997 17:44:11 -0700
Message-Id: <199704300044.RAA20112@fstop.>
From: sparker@Eng.Sun.COM
To: tcp-impl@engr.sgi.com
cc: cschmec@Eng.Sun.COM
Subject: Proposed test for TCP initial slow start...
Date: Tue, 29 Apr 1997 17:44:11 -0700
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In my packet shell, I wrote a script which I think tests whether a
correspondent TCP has initial slow start.  The method I used was:

1.  Send an initial SYN, to port 19 (chargen)
2.  Wait for a SYN-ACK
3.  Send an ACK back
4.  Listen for two seconds, expecting no segments
    (However, notice if the segment is a retransmission of the first segment)
5.  Send an ACK back for the first segment
6.  Listen for two seconds, expecting two new segments


In writing this I assumed I didn't need to do anything special to illicit
the failure to use slow start.  By using 'chargen', I avoid needing to
have any program on the system being tested, at least most of the time.

I chose to listen for two seconds under the presumption the TCP being
tested complies with RFC2001's recommendation of 3s for an initial RTO.
The test notices if a retransmission occurs doing the first two seconds
after the connection is established.  In that case, the test complains
about the initial RTO and exits.

The second loop could use some more refinement.  It doesn't do as much
checking of the sequences received as it could.  What it does check is
that it receives two segements with different sequence numbers from
the previous packet's.  However, as long as the ACK of the first segment
is acceptable, and a retransmission isn't already in progress, this
should be good enough.

Also, I don't take it further than slow start opening up from one segment
to two segments.  The test could be better by using a large window relative
to MSS and checking that slow start opens up correctly to a larger number
of segments.  I didn't do this, in part, because it seemed a bit like a
separate problem.

Anyway, I'm interested in input from the group about the test.

The test is found in /opt/psh/scripts/tcp-impl directory of the packet
shell when installed.  The file no_initial_slow_start.pst is the source.
You can run them by invoking the test harness /opt/psh/bin/harness while
in that directory.  Edit the 'config' file to point it at the target.


Cheers,

	~sparker

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr 29 19:24:16 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA19963 for tcp-impl-list; Tue, 29 Apr 1997 19:21:37 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA19958 for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 19:21:34 -0700
Received: from border.com (janus.border.com [199.71.190.98]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id TAA17493
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Apr 1997 19:21:27 -0700
	env-from (chk@rafael.rnd.border.com)
Received: by janus.border.com id <11649>; Tue, 29 Apr 1997 22:13:04 -0400
Message-Id: <97Apr29.221304edt.11649@janus.border.com>
To: der Mouse <mouse@rodents.montreal.qc.ca>
cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP keep-alives 
References: <199704292303.TAA03619@Twig.Rodents.Montreal.QC.CA>
In-reply-to: Your message of "Tue, 29 Apr 1997 19:03:41 -0400".
	 <199704292303.TAA03619@Twig.Rodents.Montreal.QC.CA> 
From: "C. Harald Koch" <chk@utcc.utoronto.ca>
Date: Tue, 29 Apr 1997 22:17:27 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

In message <199704292303.TAA03619@Twig.Rodents.Montreal.QC.CA>, der Mouse writes:
> 
> What's the opinion of the list here?  If keepalives are to be
> condemned, it seems to me we will also have to condemn anything else
> which causes an idle connection to be destroyed

Firewalls, practically by definition, break the letter of the TCP spec.

I would state to you that an idle telnet session is a potential security
hole, and thus the firewall is doing the correct thing by killing it. If you
*really* need it to stay up, use application level "keep alives" of some
sort.

Having said that, I point out that some firewalls use keepalives and/or
timeouts because it really is the only way to detect client or server
failures, when you're intercepting the TCP session somewhere in the middle
of the network. Without them, dead TCP state structures slowly build up on
the firewall until resource exhaustion; slowly can be a matter of hours on
busy WWW proxies.

-- 
Harald Koch <chk@utcc.utoronto.ca>

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 04:40:46 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id EAA26270 for tcp-impl-list; Wed, 30 Apr 1997 04:39:00 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id EAA26261 for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 04:38:57 -0700
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id EAA28926
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 04:38:52 -0700
	env-from (mouse@Twig.Rodents.Montreal.QC.CA)
Received: (from mouse@localhost) by Twig.Rodents.Montreal.QC.CA (8.7.5/8.7.3) id HAA04885; Wed, 30 Apr 1997 07:34:58 -0400 (EDT)
Date: Wed, 30 Apr 1997 07:34:58 -0400 (EDT)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199704301134.HAA04885@Twig.Rodents.Montreal.QC.CA>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: TCP keep-alives
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I would state to you that an idle telnet session is a potential
> security hole, and thus the firewall is doing the correct thing by
> killing it.

I'd be inclined to agree.  However, I didn't say "telnet session",
because the TCP connections in question weren't/aren't telnet
connections.  (They're actually ssh connections.)

Note also that most of the hazard of an idle telnet connection is
hijacking (at least as far as I can see), and the firewall actually
_helps_ that - once the firewall drops its state, then the hijacker can
play with impunity because the other end of the connection is
guaranteed to not notice any peculiar segments.

> Having said that, [...]

But why is the firewall keeping connection state at all?  I've seen no
justification for it to do so.  The only danger I can see that it
prevents is that some TCP segments for nonexistent connections might
get through, and I can't see how that would be a danger.  (It'd be a
covert channel, sure, but the firewall doesn't even try to do anything
strong enough for covert channels to matter.)

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 06:44:04 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id GAA16583 for tcp-impl-list; Wed, 30 Apr 1997 06:42:03 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id GAA16523 for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 06:41:57 -0700
Received: from grinch.eecs.umich.edu (grinch.eecs.umich.edu [141.213.8.89]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id GAA16912
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 06:41:55 -0700
	env-from (sdawson@eecs.umich.edu)
Received: from grinch.eecs.umich.edu (localhost [127.0.0.1]) by grinch.eecs.umich.edu (8.8.5/8.8.2) with ESMTP id JAA08549; Wed, 30 Apr 1997 09:35:26 -0400 (EDT)
Message-Id: <199704301335.JAA08549@grinch.eecs.umich.edu>
To: Barney Wolff <barney@databus.com>
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: draft description of "Insufficient interval between keepalives"
References: <3366799c0.4393@databus.databus.com>
From: Scott Dawson <sdawson@eecs.umich.edu>
In-Reply-To: Barney Wolff's message of Tue, 29 Apr 1997 18:28 EDT
Lines: 11
X-Mailer: Gnus v5.3/Emacs 19.34
Date: Wed, 30 Apr 1997 09:35:25 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


I definitely agree that configuration of the keepalive timer is
possible and should be allowed.  However, the trace I presented in
this example was taken using default settings, which is why it's a
problem (and a spec violation).

I'll change the presentation of the traces to mention that the traces
were taken using the default settings.

Thanks,
-Scott

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 07:37:52 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA28392 for tcp-impl-list; Wed, 30 Apr 1997 07:34:05 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA28352 for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 07:33:58 -0700
Received: from jekyll.piermont.com (jekyll.piermont.com [206.1.51.15]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA26609
	for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 07:33:56 -0700
	env-from (perry@jekyll.piermont.com)
Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by jekyll.piermont.com (8.8.5/8.6.12) with SMTP id KAA00905; Wed, 30 Apr 1997 10:30:03 -0400 (EDT)
Message-Id: <199704301430.KAA00905@jekyll.piermont.com>
X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
To: backman@ftp.com
cc: tcp-impl@relay.engr.sgi.com
Subject: Re: TCP keep-alives & Reality bites 
In-reply-to: Your message of "Tue, 29 Apr 1997 20:51:50 EDT."
             <199704300051.UAA09796@MAILSERV-2HIGH-A.FTP.COM> 
Reply-To: perry@piermont.com
X-Reposting-Policy: redistribute only with permission
Date: Wed, 30 Apr 1997 10:30:03 -0400
From: "Perry E. Metzger" <perry@piermont.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Larry Backman writes:
> Now lets make it clear that I find keepalives as evil as the next person
> and wish they could be vaporized from the face of the earth.  However
> in Windows-Land (and perhaps some of the Microsoft watchers on the list
> might chime up also) we have all these somewhat shall we say - 
> network-inexperienced - application programmers who do not understand either
> network programming or the concept of multiple CPU synchronization.
[...]
> We tell them our keepalive is 2 hr's and they protest.  Loudly.  Often
> a mutual customer is involved and competitive products with shorter
> keepalives are mentioned.

I've had several apps where it was critical to know a TCP connection
was dead or jammed, and fast. This was for a trading floor application
where the TCP connection was sending out quote updates -- a bad place
to lose on getting information! If the TCP to one server went down, I
wanted to know that I needed to switch to the other.

I had a simple way to do this. My application protocol included a "are
you there" noop function. I'd send an "are you there" on a timer that
got reset if I got traffic (i.e. I only sent "are you there?" queries
if the line was idle), and if I got nothing back in a moment or two, I
switched.

In other words, I did keepalives at the application layer, where,
IMHO, they belonged.

I don't understand why people expect TCP to provide this for them. One
size does *not* fit all for these things. I needed to switch in
seconds if I got a network outage. Most people have no such
requirement and would be hurt by switching in seconds.

> After a few years of this, perhaps 7 or 8 years back,we provided a
> configuration parameter by which a user could set our keepalive to
> any value they wanted.

Yeah, and then you find yourself with some of the things on a machine
taking too long to fail, and some gratuitously failing all the
time. This doesn't work. One size does NOT fit all.

You want your network backup program to last through a midnight router
reload. You want your network market quote service to switch off if
you don't get data for seconds. These are NOT the same requirement.

> My point, which I perhaps have belaboured a bit too long, is that keepalive
> timers are not just a stack issue, but also an issue of educating application
> developers as to what keepalives are used for and what they should not
> be used for.

One of the reasons keepalives make me upset is because apps guys DO
NOT understand them, and start to think the stack should take care of
this for them, which it probably shouldn't.

Perry

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 07:37:56 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA29123 for tcp-impl-list; Wed, 30 Apr 1997 07:36:05 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA29112 for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 07:36:04 -0700
Received: from jekyll.piermont.com (jekyll.piermont.com [206.1.51.15]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA27010
	for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 07:36:02 -0700
	env-from (perry@jekyll.piermont.com)
Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by jekyll.piermont.com (8.8.5/8.6.12) with SMTP id KAA00913; Wed, 30 Apr 1997 10:32:14 -0400 (EDT)
Message-Id: <199704301432.KAA00913@jekyll.piermont.com>
X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
To: backman@ftp.com
cc: tcp-impl@relay.engr.sgi.com
Subject: Re: TCP keep-alives & Reality bites 
In-reply-to: Your message of "Tue, 29 Apr 1997 20:51:50 EDT."
             <199704300051.UAA09796@MAILSERV-2HIGH-A.FTP.COM> 
Reply-To: perry@piermont.com
X-Reposting-Policy: redistribute only with permission
Date: Wed, 30 Apr 1997 10:32:14 -0400
From: "Perry E. Metzger" <perry@piermont.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


BTW, if I didn't make it clear in my last message:

1) Some apps need to know that a connection is dead, even for seconds,
   and can't afford to wait for the normal TCP timeouts to
   happen. Even if you set keepalives to go out every five seconds,
   they wouldn't be happy because TCP would still take many minutes to
   report the connection dead.
2) Some apps are happy waiting for very long periods without any
   traffic.

One size doesn't fit all. This is an application issue.

Perry

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 07:42:37 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA00785 for tcp-impl-list; Wed, 30 Apr 1997 07:40:59 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA00775 for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 07:40:56 -0700
Received: from postoffice.Reston.mci.net (postoffice.Reston.mci.net [204.70.128.20]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA27947
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 07:40:55 -0700
	env-from (gmiller@mci.net)
Received: from mci.net (ale [166.45.4.49])
	by postoffice.Reston.mci.net (8.8.5/8.8.5) with ESMTP id KAA28093
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 10:31:00 -0400 (EDT)
Message-Id: <199704301431.KAA28093@postoffice.Reston.mci.net>
X-Mailer: exmh version 1.6.9 8/22/96
To: tcp-impl@relay.engr.SGI.COM
Subject: Failure to send window scale option with shift.cnt == 0
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 30 Apr 1997 10:31:00 -0400
From: Greg Miller <gmiller@mci.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


I'd like to get some feedback from the list on a behavior I've observed from 
a particular TCP implementation. This implementation supports RFC1323 window 
scaling. It sends the window scaling option with a scale factor determined by 
the size of the socket receive buffer (SO_RCVBUF), as one would expect. 
However, this implementation sends the window scale option on an active open 
only if a shift of 1 or more is required for the given socket buffer size 
(i.e., it nevers sends an option with shift.cnt == 0).

The result of this behavior is that the TCP that does the passive open is 
prevented from using window scaling if the TCP that does the active open is 
using a receive buffer smaller than 65535 bytes. Failure to send the window 
scale option on the SYN prevents the option from appearing on the SYN/ACK.

On this subject, RFC1323 says:
    [...]
    Thus, a TCP that is prepared to scale windows should send the option, 
    even if its own scale factor is 1. 

The spec says "should" and not "MUST" or even "SHOULD" so I think it'd be 
hard to call this behavior broken. It is unfortunate though. Comments?

Greg

----
Gregory J. Miller
vBNS Engineering
MCI Telecommunications              
Reston, VA 20191                                     gmiller@mci.net



From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 08:07:09 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA06938 for tcp-impl-list; Wed, 30 Apr 1997 08:03:45 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA06927 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 30 Apr 1997 08:03:41 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id JAA16981 for tcp-impl@cthulhu.engr.sgi.com; Wed, 30 Apr 1997 09:03:37 -0600
Date: Wed, 30 Apr 1997 09:03:37 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199704301503.JAA16981@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re:  Failure to send window scale option with shift.cnt == 0
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Greg Miller <gmiller@mci.net>

> ...
> I'd like to get some feedback from the list on a behavior I've observed from 
> a particular TCP implementation. This implementation supports RFC1323 window 
> scaling. It sends the window scaling option with a scale factor determined by 
> the size of the socket receive buffer (SO_RCVBUF), as one would expect. 
> However, this implementation sends the window scale option on an active open 
> only if a shift of 1 or more is required for the given socket buffer size 
> (i.e., it nevers sends an option with shift.cnt == 0).
> 
> The result of this behavior is that the TCP that does the passive open is 
> prevented from using window scaling if the TCP that does the active open is 
> using a receive buffer smaller than 65535 bytes. Failure to send the window 
> scale option on the SYN prevents the option from appearing on the SYN/ACK.
> 
> On this subject, RFC1323 says:
>     [...]
>     Thus, a TCP that is prepared to scale windows should send the option, 
>     even if its own scale factor is 1. 
> 
> The spec says "should" and not "MUST" or even "SHOULD" so I think it'd be 
> hard to call this behavior broken. It is unfortunate though. Comments?


How many applications want a shift of 0 in one direction but non-zero
in the other direction?  Of those, how many would be harmed or even
inconvenience by having the active side that does not care specify a
SO_RCVBUF size greater than 128K?  How many don't just always set
a big SO_RCVBUF in both directions?

If you always send the option in every SYN, what happens when the peer
is broken and it croaks when it receives the option?  Do you require a
system-wide configuration flag that globally turns on sending the
option?  If so, again what happens if your system needs to talk to a
lame-o system that falls over when it receives the TCP option?  Do you
require a new switch to be set by the application that wants to send
big clumps but not receive them?

Why not use the SO_RCVBUF on the active openner as that new switch?  If
the active opener knows that it is desirable to use big windows in
either direction, why not let it set SO_RCVBUF large?

Unless you have made SO_SNDBUF a nop, you might use SO_SNDBUF as the
switch.  On fast enough media, the sender might need to stuff a lot of
bytes into the kernel's buffers to use the receiver's big windows.


On the other hand, maybe every system in the world can tolerate the option.


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 08:51:32 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA16218 for tcp-impl-list; Wed, 30 Apr 1997 08:48:27 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA16200 for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 08:48:21 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id IAA13258
	for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 08:48:20 -0700
	env-from (braden@ISI.EDU)
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-24)
	id <AA10227>; Wed, 30 Apr 1997 08:44:23 -0700
Date: Wed, 30 Apr 97 08:45:55 PDT
From: braden@ISI.EDU
Posted-Date: Wed, 30 Apr 97 08:45:55 PDT
Message-Id: <9704301545.AA08773@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA08773>; Wed, 30 Apr 97 08:45:55 PDT
To: tcp-impl@relay.engr.sgi.com, gmiller@mci.net
Subject: Re: Failure to send window scale option with shift.cnt == 0
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

  *> 
  *> I'd like to get some feedback from the list on a behavior I've observed from 
  *> a particular TCP implementation. This implementation supports RFC1323 window 
  *> scaling. It sends the window scaling option with a scale factor determined by 
  *> the size of the socket receive buffer (SO_RCVBUF), as one would expect. 
  *> However, this implementation sends the window scale option on an active open 
  *> only if a shift of 1 or more is required for the given socket buffer size 
  *> (i.e., it nevers sends an option with shift.cnt == 0).
  *> 
  *> The result of this behavior is that the TCP that does the passive open is 
  *> prevented from using window scaling if the TCP that does the active open is 
  *> using a receive buffer smaller than 65535 bytes. Failure to send the window 
  *> scale option on the SYN prevents the option from appearing on the SYN/ACK.
  *> 
  *> On this subject, RFC1323 says:
  *>     [...]
  *>     Thus, a TCP that is prepared to scale windows should send the option, 
  *>     even if its own scale factor is 1. 
  *> 
  *> The spec says "should" and not "MUST" or even "SHOULD" so I think it'd be 
  *> hard to call this behavior broken. It is unfortunate though. Comments?
  *> 

RFC1323 was not considered a "requirements document" (Applicability
Statement), so it had no capitalized requirements words.  It expected
that its readers would be (WOULD BE!) adults, who would do what they
should do without anyone shouting at them.

This is clearly a requirement for interoperability.  There can be no
doubt that the implementation you describe is out of compliance with
RFC1323.  It's broken.

Bob Braden

  *> Greg
  *> 
  *> ----
  *> Gregory J. Miller
  *> vBNS Engineering
  *> MCI Telecommunications              
  *> Reston, VA 20191                                     gmiller@mci.net
  *> 
  *> 
  *> 

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 09:06:51 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA21107 for tcp-impl-list; Wed, 30 Apr 1997 09:04:12 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA21054 for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 09:04:02 -0700
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id JAA17048
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 09:04:01 -0700
	env-from (jt@mentat.com)
Received: from feller.mentat.com ([192.88.122.144]) by mentat.com (4.1/SMI-4.1)
	id AA23839; Wed, 30 Apr 97 08:58:15 PDT
Received: by feller.mentat.com (SMI-8.6/SMI-SVR4)
	id JAA01651; Wed, 30 Apr 1997 09:00:36 -0700
Date: Wed, 30 Apr 1997 09:00:36 -0700
From: jt@mentat.com (Jerry Toporek)
Message-Id: <199704301600.JAA01651@feller.mentat.com>
To: tcp-impl@relay.engr.SGI.COM, vjs@mica.denver.sgi.com
Subject: Re:  Failure to send window scale option with shift.cnt == 0
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 
> On the other hand, maybe every system in the world can tolerate the option.
> 

Well, this is the key question, isn't it?  Most "modern" TCP innovations have
been based on the assumption that we can add TCP options to SYN segments
without breaking implementations that do not recognize them.  We have never
seen any problem with this assumption, and add the window scale option to
every active connection request.  Has anyone had a problem with this?

jt


From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 09:11:00 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA22325 for tcp-impl-list; Wed, 30 Apr 1997 09:09:38 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA22315 for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 09:09:36 -0700
Received: from zippy.psc.edu (zippy.psc.edu [128.182.61.149]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA18565
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 09:09:35 -0700
	env-from (mathis@psc.edu)
Received: (from mathis@localhost) by zippy.psc.edu (8.8.5/8.8.2) id MAA03610; Wed, 30 Apr 1997 12:04:53 -0400 (EDT)
Date: Wed, 30 Apr 1997 12:04:53 -0400 (EDT)
Message-Id: <199704301604.MAA03610@zippy.psc.edu>
From: Matt Mathis <mathis@psc.edu>
To: Greg Miller <gmiller@mci.net>
CC: tcp-impl@relay.engr.SGI.COM
In-reply-to: Greg Miller's message of Wed, 30 Apr 1997 10:31:00 -0400
Subject: Re: Failure to send window scale option with shift.cnt == 0
Reply-to: mathis@psc.edu
References: <199704301431.KAA28093@postoffice.Reston.mci.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


I would like make wndshift the negotiation more aggressive.  The
current socket buffer size (set by SO_RCVBUF) is not the best
control to choose the wndshift to negotiate.

The current approach creates problems if the server side of an
application does not have sufficient advance information on a plausible
range for cwnd or the receivers window.  To get the winshift negotiation
correct but not consume excessive mbufs (on many systems) it is
necessary to set SO_RCVBUF very large before the listen, and then
adjust both SO_RCVBUF and SO_SNDBUF to more realistic values once the
client is known.

No stock applications do this correctly, and it is a royal pain to
retrofit..... ("more realistic values" has its own set of research
questions.)

It would be better if the negotiation defaulted to attempting the
wndshift necessary to support the system wide sb_max.  (Yes there
should also be a way for applications to override it).  Then the
application can later adjust SO_RCVBUF and SO_SNDBUF up or down as it
sees fit.

This will be fatal to any dinosaurs that croak on winshift options.
Does anybody know of any?   I suspect that there are enough people
owning systems with the current behavior and default SO_RCVBUF > 64k to
have already persecuted the dinosaurs.

It also subjects all applications to window quantization.  Does
anybody know of any applications that depend on controlling the sender
through deliberately tiny windows?  (I suspect that these applications
are non-functional anyhow, due to silly window avoidance.)

We are currently testing a kernel that uses sb_max to select the
winshift to negotiate.

Comments?

--MM--

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 09:11:20 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA22466 for tcp-impl-list; Wed, 30 Apr 1997 09:10:05 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA22449 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 30 Apr 1997 09:10:02 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id KAA17273 for tcp-impl@cthulhu.engr.sgi.com; Wed, 30 Apr 1997 10:09:56 -0600
Date: Wed, 30 Apr 1997 10:09:56 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199704301609.KAA17273@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re:  Failure to send window scale option with shift.cnt == 0
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: jt@mentat.com (Jerry Toporek)

> > On the other hand, maybe every system in the world can tolerate the option.
> > 
> 
> Well, this is the key question, isn't it?  Most "modern" TCP innovations have
> been based on the assumption that we can add TCP options to SYN segments
> without breaking implementations that do not recognize them.  We have never
> seen any problem with this assumption, and add the window scale option to
> every active connection request.  Has anyone had a problem with this?

My vague recollection from many years ago is that we did have
problems.  SGI has been shipping RFC 1323 support for a lot longer than
most other outfits.

I have some very clear memories of problems with systems that assumed
window sizes cannot be larger than 32765, and I don't mean only boxes
running flavors of UNIX.  One was a fancy 'scope/logic analyzer that
when crazy and sent one byte every couple of minutes in what looked
like some kind of window probe.  Such experiences tend to make you
conservative in what you send.

I changed SGI's default window size to 60K because it improved
performance significantly in interesting cases, but is there a
significant harm to being conservative in sending the window shift
option?  Again, in how many applications does active-opener not set a
large SO_RCVBUF but the passive opener wants to receive big windows?
How many of those do not have the active-opener not set a large
SO_SNDBUF?


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 09:33:38 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA28352 for tcp-impl-list; Wed, 30 Apr 1997 09:31:35 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA28338 for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 09:31:33 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id JAA23799
	for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 09:31:32 -0700
	env-from (braden@ISI.EDU)
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-24)
	id <AA14816>; Wed, 30 Apr 1997 09:27:45 -0700
Date: Wed, 30 Apr 97 09:29:17 PDT
From: braden@ISI.EDU
Posted-Date: Wed, 30 Apr 97 09:29:17 PDT
Message-Id: <9704301629.AA08808@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA08808>; Wed, 30 Apr 97 09:29:17 PDT
To: tcp-impl@relay.engr.sgi.com, vjs@mica.denver.sgi.com
Subject: Re:  Failure to send window scale option with shift.cnt == 0
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


  *> > From: jt@mentat.com (Jerry Toporek)
  *> 
  *> > > On the other hand, maybe every system in the world can tolerate the option.
  *> > > 
  *> > 
  *> > Well, this is the key question, isn't it?  Most "modern" TCP innovations have
  *> > been based on the assumption that we can add TCP options to SYN segments
  *> > without breaking implementations that do not recognize them.  We have never
  *> > seen any problem with this assumption, and add the window scale option to
  *> > every active connection request.  Has anyone had a problem with this?
  *> 
  *> My vague recollection from many years ago is that we did have
  *> problems.  SGI has been shipping RFC 1323 support for a lot longer than
  *> most other outfits.

Vernon,

I believe jt is correct; there have been no problems with SYN options.

  *> 
  *> I have some very clear memories of problems with systems that assumed
  *> window sizes cannot be larger than 32765, and I don't mean only boxes
  *> running flavors of UNIX.  One was a fancy 'scope/logic analyzer that
  *> when crazy and sent one byte every couple of minutes in what looked
  *> like some kind of window probe.  Such experiences tend to make you
  *> conservative in what you send.

Yes, and that is exactly why large windows can only be used if BOTH
sides indicate ability to deal with them.  Which is why a system that
understands large windows but does not want to enlarge its own window
should send a window option with zero.

  *> 
  *> I changed SGI's default window size to 60K because it improved
  *> performance significantly in interesting cases, but is there a
  *> significant harm to being conservative in sending the window shift
  *> option?  Again, in how many applications does active-opener not set a
  *> large SO_RCVBUF but the passive opener wants to receive big windows?
  *> How many of those do not have the active-opener not set a large
  *> SO_SNDBUF?

This is not a popularity contest.  But how about an FTP data connection?
It has data flowing only one way, so only the receiver needs to use
the large window size.  The sender does not want to tie up receive
buffer space unnecessarily.  And some systems simply do not have
enough buffering capacity to allow a large window at all.  An
assymmetry of window sizes seems quite natural, to me.

Bob Braden

  *> 
  *> 
  *> Vernon Schryver,  vjs@sgi.com
  *> 

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 09:34:43 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA29001 for tcp-impl-list; Wed, 30 Apr 1997 09:33:26 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA28985 for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 09:33:23 -0700
Received: from md.interlink.com (md.interlink.com [138.42.32.165]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id JAA24200
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 09:33:20 -0700
	env-from (fab@fab.md.interlink.com)
Received: from fab.md.interlink.com by md.interlink.com (5.0/SMI-SVR4)
	id AA27085; Wed, 30 Apr 1997 12:23:36 +0500
Received: by fab.md.interlink.com (5.0/SMI-SVR4)
	id AA02156; Wed, 30 Apr 1997 12:31:03 +0500
Date: Wed, 30 Apr 1997 12:31:03 +0500
From: fab@fab.md.interlink.com (Fred Bohle)
Message-Id: <9704301631.AA02156@fab.md.interlink.com>
To: braden@ISI.EDU
Subject: Re: Failure to send window scale option with shift.cnt == 0
Cc: tcp-impl@relay.engr.SGI.COM, gmiller@mci.net
X-Sun-Charset: US-ASCII
Content-Length: 711
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Bob,


>   *> However, this implementation sends the window scale option on an active open 
>   *> only if a shift of 1 or more is required for the given socket buffer size 
>   *> (i.e., it nevers sends an option with shift.cnt == 0).


> This is clearly a requirement for interoperability.  There can be no
> doubt that the implementation you describe is out of compliance with
> RFC1323.  It's broken.
> 

Thanks, but no thanks.  This code is now in your ACP, (remember that?) and
our first attempt at implementing window scaling immediately ran into systems
that would choke on new TCP options.  This was the best compromise we could
work out.  

If you can think of a better way, please tell us all.

Fred

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 09:35:25 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA29281 for tcp-impl-list; Wed, 30 Apr 1997 09:34:08 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA29260 for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 09:34:06 -0700
Received: from md.interlink.com (md.interlink.com [138.42.32.165]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id JAA24354
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 09:34:03 -0700
	env-from (fab@fab.md.interlink.com)
Received: from fab.md.interlink.com by md.interlink.com (5.0/SMI-SVR4)
	id AA27088; Wed, 30 Apr 1997 12:24:21 +0500
Received: by fab.md.interlink.com (5.0/SMI-SVR4)
	id AA02159; Wed, 30 Apr 1997 12:31:49 +0500
Date: Wed, 30 Apr 1997 12:31:49 +0500
From: fab@fab.md.interlink.com (Fred Bohle)
Message-Id: <9704301631.AA02159@fab.md.interlink.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re:  Failure to send window scale option with shift.cnt == 0
X-Sun-Charset: US-ASCII
Content-Length: 493
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Jerry,


> ...  We have never
> seen any problem with this assumption, and add the window scale option to
> every active connection request.  Has anyone had a problem with this?


Absolutely!  In our first attempt to support this option, we always sent it.
Big mistake!  Almost immediately we got field reports of TCP implementations
that would not talk to us.  Some implementations discarded the packets, some
sent back resets.  Now it is configurable, as described in earlier emails.

Fred

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 09:48:12 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA02274 for tcp-impl-list; Wed, 30 Apr 1997 09:44:43 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA02264 for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 09:44:41 -0700
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id JAA27159
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 09:44:40 -0700
	env-from (jt@mentat.com)
Received: from feller.mentat.com ([192.88.122.144]) by mentat.com (4.1/SMI-4.1)
	id AA24234; Wed, 30 Apr 97 09:38:43 PDT
Received: by feller.mentat.com (SMI-8.6/SMI-SVR4)
	id JAA01696; Wed, 30 Apr 1997 09:41:04 -0700
Date: Wed, 30 Apr 1997 09:41:04 -0700
From: jt@mentat.com (Jerry Toporek)
Message-Id: <199704301641.JAA01696@feller.mentat.com>
To: tcp-impl@relay.engr.SGI.COM, vjs@mica.denver.sgi.com
Subject: Re:  Failure to send window scale option with shift.cnt == 0
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 
> I changed SGI's default window size to 60K because it improved
> performance significantly in interesting cases, but is there a
> significant harm to being conservative in sending the window shift
> option?  Again, in how many applications does active-opener not set a
> large SO_RCVBUF but the passive opener wants to receive big windows?
> How many of those do not have the active-opener not set a large
> SO_SNDBUF?
> 

Don't know...  The whole question of how an application can intelligently
decide when a large window will be beneficial is a bit of a mystery.

Given that the active side has to be conservative, your heuristic is just
fine.  How do you decide whether to try negotiating on TCP Timestamps?

jt


From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 10:00:30 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA06316 for tcp-impl-list; Wed, 30 Apr 1997 09:58:36 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA06306 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 30 Apr 1997 09:58:34 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id KAA17505 for tcp-impl@cthulhu.engr.sgi.com; Wed, 30 Apr 1997 10:58:27 -0600
Date: Wed, 30 Apr 1997 10:58:27 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199704301658.KAA17505@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re:  Failure to send window scale option with shift.cnt == 0
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

As for turning on the option only when (your equivalent to) sb_max
is large--I think most of the time you'll find it desirable to 
run systems with a large sb_max.  A system-wide switch on the scaling
option is awfully coarse.



> From: braden@ISI.EDU

> I believe jt is correct; there have been no problems with SYN options.

I did not realize you received copies of all trouble reports sent to
SGI's hotline, not to mention all bug reports in SGI's internal database.


>   *> I have some very clear memories of problems with systems that assumed
>   *> window sizes cannot be larger than 32765, and I don't mean only boxes
>   *> running flavors of UNIX.  One was a fancy 'scope/logic analyzer that
>   *> when crazy and sent one byte every couple of minutes in what looked
>   *> like some kind of window probe.  Such experiences tend to make you
>   *> conservative in what you send.
> 
> Yes, and that is exactly why large windows can only be used if BOTH
> sides indicate ability to deal with them.  Which is why a system that
> understands large windows but does not want to enlarge its own window
> should send a window option with zero.

That makes sense only if you also agree that no systems fall over
when they receive the TCP option itself.

As of about 1989, I do not agree with that premise.  I reserve
judgement for 1997.


>   *> I changed SGI's default window size to 60K because it improved
>   *> performance significantly in interesting cases, but is there a
>   *> significant harm to being conservative in sending the window shift
>   *> option?  Again, in how many applications does active-opener not set a
>   *> large SO_RCVBUF but the passive opener wants to receive big windows?
>   *> How many of those do not have the active-opener not set a large
>   *> SO_SNDBUF?
> 
> This is not a popularity contest.

That is true.  We are all adults and will make our own decisions on our
on criteria and live with the consequences.  You are welcome to your
opinions and also free to say whatever bad things you want about me. 
I reserve the reciprocal priviledges.


>                                    But how about an FTP data connection?
> It has data flowing only one way, so only the receiver needs to use
> the large window size.  The sender does not want to tie up receive
> buffer space unnecessarily.

Some systems do not "tie up receive buffer space" as the result of
the application's SO_RCVBUF or SO_SNDBUF.  Some do not even have
a meaningful sb_max, including some based on 4.*BSD socket code.

A file copy data connection might quite plausibly want to have a large
SO_SNDBUF size to minimized context switches.  To repeat that point for
at least the third time in different words, if you don't set SO_SNDBUF
to more than 8K, you might not get very impressive ttcp or netperf
numbers, even if your system is running at a very high clock rate.


>                              And some systems simply do not have
> enough buffering capacity to allow a large window at all.

That is a red herring, since in those situations it is
best to not use the scaling option at all in either direction.

>                                                            An
> assymmetry of window sizes seems quite natural, to me.

Assmetric window sizes do make sense.  That is one reason I repeatedly
mentiond SO_SNDBUF.  Please note that "SND" is not the same string as
"RCV".


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 10:55:09 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA24306 for tcp-impl-list; Wed, 30 Apr 1997 10:53:11 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA24297 for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 10:53:09 -0700
Received: from zippy.psc.edu (zippy.psc.edu [128.182.61.149]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA16422
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 10:53:07 -0700
	env-from (mathis@psc.edu)
Received: (from mathis@localhost) by zippy.psc.edu (8.8.5/8.8.2) id NAA03811; Wed, 30 Apr 1997 13:48:56 -0400 (EDT)
Date: Wed, 30 Apr 1997 13:48:56 -0400 (EDT)
Message-Id: <199704301748.NAA03811@zippy.psc.edu>
From: Matt Mathis <mathis@psc.edu>
To: tcp-impl@relay.engr.SGI.COM
In-reply-to: fab@fab.md.interlink.com's message of Wed, 30 Apr 1997 12:31:49
	+0500
Subject: Re:  Failure to send window scale option with shift.cnt == 0
Reply-to: mathis@psc.edu
References: <9704301631.AA02159@fab.md.interlink.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>> ...  We have never
>> seen any problem with this assumption, and add the window scale option to
>> every active connection request.  Has anyone had a problem with this?
>
>Absolutely!  In our first attempt to support this option, we always
>sent it.  Big mistake!  Almost immediately we got field reports
.....

Were these "dinosaurs" or were they new systems with stacks that
had been, shall we say, "shipped before their time"?

If it was the latter, you were blamed for someone elses bug, because
you were in the minority: you were trying to be aggressive about new
features.

This phenomena is a huge drag on implementing and deploying new TCP
features.  We as an industry (as a WG?) have got to find a way to
defend correct new TCP implementations from clueless network
administrators insisting on compatibility with busted code.

If nothing else, tcpimpl could recommend putting all of these backward
compatibility features under runtime switchs.  It would be really nice
if tcpimpl could document (and even standardize switch names for) some
of the popular workarounds.  Thus, when the clueless network
administrator was told by the third new system vendor to "set
tcp_backward_conservative_winshift") the administrator might correctly
place the blame.

Note that SACK will also discover stacks that die on unexpected SYN
options.  Perhaps some of the vendors who have SACK in the field might
comment?

> Some do not even have a meaningful sb_max
True, I guess "preferred winshift" should have it's own knob.

--MM--

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 10:59:24 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA25994 for tcp-impl-list; Wed, 30 Apr 1997 10:58:06 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA25982 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 30 Apr 1997 10:58:03 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id LAA17630 for tcp-impl@cthulhu.engr.sgi.com; Wed, 30 Apr 1997 11:19:41 -0600
Date: Wed, 30 Apr 1997 11:19:41 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199704301719.LAA17630@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re:  Failure to send window scale option with shift.cnt == 0
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: jt@mentat.com (Jerry Toporek)

> ...
> Don't know...  The whole question of how an application can intelligently
> decide when a large window will be beneficial is a bit of a mystery.

"A bit of a mystery" is an understatement.  You (or at least I) do not
want to use unnecessarily large windows.  At least every few hours and
often every minute, I have reason to curse the fact that the default
window size in SGI's product is 60K.  If you are running concurrent
file transfers and interactive traffic over a path with default windows
equivalent to seconds, if you can't get the people running the routers
to turn on TOS queuing, and if you cannot get the people running the
file transfers to shrink the default windows on their hosts, you will
have fun things like 3 second delays on character echos.


> Given that the active side has to be conservative, your heuristic is just
> fine.  How do you decide whether to try negotiating on TCP Timestamps?

Why not the same?


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 11:01:32 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA26507 for tcp-impl-list; Wed, 30 Apr 1997 10:59:22 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA26492 for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 10:59:20 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id KAA18008
	for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 10:59:19 -0700
	env-from (braden@ISI.EDU)
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-24)
	id <AA22420>; Wed, 30 Apr 1997 10:55:25 -0700
Date: Wed, 30 Apr 97 10:56:57 PDT
From: braden@ISI.EDU
Posted-Date: Wed, 30 Apr 97 10:56:57 PDT
Message-Id: <9704301756.AA08892@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA08892>; Wed, 30 Apr 97 10:56:57 PDT
To: tcp-impl@relay.engr.sgi.com, fab@fab.md.interlink.com
Subject: Re:  Failure to send window scale option with shift.cnt == 0
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


  *> From owner-tcp-impl@relay.engr.SGI.COM Wed Apr 30 09:40:09 1997
  *> Date: Wed, 30 Apr 1997 12:31:49 +0500
  *> From: fab@fab.md.interlink.com (Fred Bohle)
  *> To: tcp-impl@relay.engr.SGI.COM
  *> Subject: Re:  Failure to send window scale option with shift.cnt == 0
  *> X-Sun-Charset: US-ASCII
  *> Content-Length: 493
  *> Sender: owner-tcp-impl@relay.engr.SGI.COM
  *> Precedence: bulk
  *> X-Lines: 15
  *> 
  *> 
  *> Jerry,
  *> 
  *> 
  *> > ...  We have never
  *> > seen any problem with this assumption, and add the window scale option to
  *> > every active connection request.  Has anyone had a problem with this?
  *> 
  *> 
  *> Absolutely!  In our first attempt to support this option, we always sent it.
  *> Big mistake!  Almost immediately we got field reports of TCP implementations
  *> that would not talk to us.  Some implementations discarded the packets, some
  *> sent back resets.  Now it is configurable, as described in earlier emails.
  *> 
  *> Fred
  *> 
Fred,

Discarding the option would be acceptable (and indeed is the expected)
behavior.  Sending a RST is not nice, and crashing would be worse.

Bob

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 12:08:43 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA18680 for tcp-impl-list; Wed, 30 Apr 1997 12:04:47 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA18595 for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 12:04:27 -0700
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA04963
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 12:04:25 -0700
	env-from (jt@mentat.com)
Received: from feller.mentat.com ([192.88.122.144]) by mentat.com (4.1/SMI-4.1)
	id AA25242; Wed, 30 Apr 97 11:58:45 PDT
Received: by feller.mentat.com (SMI-8.6/SMI-SVR4)
	id MAA01813; Wed, 30 Apr 1997 12:01:07 -0700
Date: Wed, 30 Apr 1997 12:01:07 -0700
From: jt@mentat.com (Jerry Toporek)
Message-Id: <199704301901.MAA01813@feller.mentat.com>
To: tcp-impl@relay.engr.SGI.COM, vjs@mica.denver.sgi.com
Subject: Re:  Failure to send window scale option with shift.cnt == 0
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > Don't know...  The whole question of how an application can intelligently
> > decide when a large window will be beneficial is a bit of a mystery.
> 
> "A bit of a mystery" is an understatement.

Yes!

> 
> > Given that the active side has to be conservative, your heuristic is just
> > fine.  How do you decide whether to try negotiating on TCP Timestamps?
> 
> Why not the same?

Well, because, as Matt anticipated, my next question was going to be about
TCP SACK.  There is no heuristic there because I always want to try to
negotiate it on.

In general, we need to examine whether it is feasible to blackball any
implementation that can not swallow unrecognized options on SYN segments,
or do we have to provide a global switch to disable use of anything other
than the MSS option.  If the problem is sufficiently widespread, you then
have to ship with the global switch in the conservative position, to avoid
having system administrators everywhere going nuts discovering that they have
to throw the switch.  Few, if any, will then enable use of these options.
I do hope that we can decide that this approach is not neccessary.

Can we try to gather some specific details on the extent of the problem?
Which specific implementations crash on receipt of options other than MSS,
and which do not crash but do black hole the connection request?

jt


From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 12:22:35 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA23241 for tcp-impl-list; Wed, 30 Apr 1997 12:18:45 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA23237 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 30 Apr 1997 12:18:43 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id NAA18078 for tcp-impl@cthulhu.engr.sgi.com; Wed, 30 Apr 1997 13:18:38 -0600
Date: Wed, 30 Apr 1997 13:18:38 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199704301918.NAA18078@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re:  Failure to send window scale option with shift.cnt == 0
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: jt@mentat.com (Jerry Toporek)

> ...
> > > Given that the active side has to be conservative, your heuristic is just
> > > fine.  How do you decide whether to try negotiating on TCP Timestamps?
> > 
> > Why not the same?
> 
> Well, because, as Matt anticipated, my next question was going to be about
> TCP SACK.  There is no heuristic there because I always want to try to
> negotiate it on.
> 
> In general, we need to examine whether it is feasible to blackball any
> implementation that can not swallow unrecognized options on SYN segments,

Blackball implementations?  I don't know about your situation, but the
sales and support people I deal with say the funniest things whenever I
say something like "that other vendor's system is broken junk; tell
the customer to throw it out and get something that works."


> or do we have to provide a global switch to disable use of anything other
> than the MSS option.  If the problem is sufficiently widespread, you then
> have to ship with the global switch in the conservative position, to avoid
> having system administrators everywhere going nuts discovering that they have
> to throw the switch.  Few, if any, will then enable use of these options.
> I do hope that we can decide that this approach is not neccessary.
> 
> Can we try to gather some specific details on the extent of the problem?
> Which specific implementations crash on receipt of options other than MSS,
> and which do not crash but do black hole the connection request?

As I've tried to say before, global options are of very little utility,
unless you are building single-purpose, probably embedded boxes.  If
you're building a multi-purpose box, then even when your box wants to
use a newfangled TCP (or IP or whatever) option with host X, it will
also have to deal with cruffty old hosts U,V,W,Y, and Z.  Consider a
box that wants to talk to some peers where a fancy TCP option is a good
thing (e.g. SACK and satellites) but also wants to use FTP and STMP to
any of millions of systems on the Internet.  No matter how you set a
global switch, it will be wrong.

Personally, I'd pick linking SACK with TCP-LW. 
    - A peer that can handle timestamps and window shifts is likely
      to do the right thing with SACK or other new options (including
      ignoring it if not understood)

    - if you want selective ACK's, you probably want large windows in
       at least one direction, and vice versa,
      
"Be conservative in what you send ..." is not just a good idea. 
It's the only thing that works outside of ivory towers.


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 12:37:39 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA28419 for tcp-impl-list; Wed, 30 Apr 1997 12:34:03 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA28405 for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 12:34:00 -0700
Received: from md.interlink.com (md.interlink.com [138.42.32.165]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id MAA12554
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 12:33:57 -0700
	env-from (fab@fab.md.interlink.com)
Received: from fab.md.interlink.com by md.interlink.com (5.0/SMI-SVR4)
	id AA28746; Wed, 30 Apr 1997 15:23:51 +0500
Received: by fab.md.interlink.com (5.0/SMI-SVR4)
	id AA02323; Wed, 30 Apr 1997 15:31:15 +0500
Date: Wed, 30 Apr 1997 15:31:15 +0500
From: fab@fab.md.interlink.com (Fred Bohle)
Message-Id: <9704301931.AA02323@fab.md.interlink.com>
To: tcp-impl@relay.engr.SGI.COM, vjs@mica.denver.sgi.com, jt@mentat.com
Subject: Re:  Failure to send window scale option with shift.cnt == 0
X-Sun-Charset: US-ASCII
Content-Length: 779
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Jerry,

> Can we try to gather some specific details on the extent of the problem?
> Which specific implementations crash on receipt of options other than MSS,
> and which do not crash but do black hole the connection request?

An S.E. did a search on our problem database and came up with the following
culprits:

A Lexmart (sp?) printer (lpd type) would lock up and require power-cycling
the box if the SYN had options in it (other than MSS?).

Old PC/NFS implementations would fail.

Some old Sun releases would fail.(!!!)

Data General would fail.

Unisys/Burroughs(sp?) would fail.

Silicon Graphics would fail. (!!! Vernon? Weren't you in on this thread?)


No, I don't have release numbers of these implementations.  Perhaps the
guilty can find out and let us know.

Fred

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 12:58:02 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA05752 for tcp-impl-list; Wed, 30 Apr 1997 12:54:53 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA05723 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 30 Apr 1997 12:54:49 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id NAA18217 for tcp-impl@cthulhu.engr.sgi.com; Wed, 30 Apr 1997 13:54:43 -0600
Date: Wed, 30 Apr 1997 13:54:43 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199704301954.NAA18217@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re:  Failure to send window scale option with shift.cnt == 0
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: fab@fab.md.interlink.com (Fred Bohle)
> To: tcp-impl@cthulhu.engr.sgi.com, vjs, jt@mentat.com

> > Can we try to gather some specific details on the extent of the problem?
> > Which specific implementations crash on receipt of options other than MSS,
> > and which do not crash but do black hole the connection request?
> 
> An S.E. did a search on our problem database and came up with the following
> culprits:
> 
> A Lexmart (sp?) printer (lpd type) would lock up and require power-cycling
> the box if the SYN had options in it (other than MSS?).
> 
> Old PC/NFS implementations would fail.
> 
> Some old Sun releases would fail.(!!!)
> 
> Data General would fail.
> 
> Unisys/Burroughs(sp?) would fail.
> 
> Silicon Graphics would fail. (!!! Vernon? Weren't you in on this thread?)
> 
> No, I don't have release numbers of these implementations.  Perhaps the
> guilty can find out and let us know.


That any versions of SGI boxes built since 1986 would do the wrong
thing with TCP options other than MSS is news to me.  I just spent some
time asking the SGI bug tracking system about likely combinations of
keywords, but found nothing, except that it doesn't go back past 1987.
(1986 is when Kipp and I tossed the previous TCP code as well as the
XNS support and jammed in the 4.3-BSD-beta network code.) I'm also
surprised that old Sun systems would croak instead of just ignore the
option.  I'd rather suspect that you guys had a bug in how you shifted
the window or something.

But no matter, Murphy says that whenever you do something out of the
ordinary, you'll get caught, regardless of whether what you are doing
is legal and right.  "Be conservative in what you send, ..."


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 13:16:38 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA14186 for tcp-impl-list; Wed, 30 Apr 1997 13:14:27 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA14173 for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 13:14:25 -0700
Received: from alpha.xerox.com (alpha.Xerox.COM [13.1.64.93]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id NAA23112
	for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 13:14:22 -0700
	env-from (fenner@parc.xerox.com)
Received: from crevenia.parc.xerox.com ([13.2.116.11]) by alpha.xerox.com with SMTP id <17934(15)>; Wed, 30 Apr 1997 13:09:42 PDT
Received: from localhost by crevenia.parc.xerox.com with SMTP id <177486>; Wed, 30 Apr 1997 13:09:27 -0700
To: jt@mentat.com (Jerry Toporek)
cc: tcp-impl@relay.engr.sgi.com, vjs@mica.denver.sgi.com
Subject: Re: Failure to send window scale option with shift.cnt == 0 
In-reply-to: Your message of "Wed, 30 Apr 97 12:01:07 PDT."
             <199704301901.MAA01813@feller.mentat.com> 
Date: Wed, 30 Apr 1997 13:09:20 PDT
From: Bill Fenner <fenner@parc.xerox.com>
Message-Id: <97Apr30.130927pdt.177486@crevenia.parc.xerox.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

jt@mentat.com (Jerry Toporek) wrote:
>Which specific implementations crash on receipt of options other than MSS,
>and which do not crash but do black hole the connection request?

FreeBSD ships with Matt's desired window scale behavior, and users
quickly discovered that Xylogics Annexes (used to?) mis-handle
VJ-compressed TCP packets with options.  A SYN exchange with options
succeeds but data packets with options get dropped.  The symptom is
that you can talk with anyone who doesn't implement RFC1323 but you
cannot talk to anyone who does.

NeXTStep's PPP server apparently has the same bug, according to the
FreeBSD FAQ.

  Bill

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 13:22:34 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA16302 for tcp-impl-list; Wed, 30 Apr 1997 13:21:12 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA16286 for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 13:21:09 -0700
Received: from md.interlink.com (md.interlink.com [138.42.32.165]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id NAA24633
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 13:21:05 -0700
	env-from (fab@fab.md.interlink.com)
Received: from fab.md.interlink.com by md.interlink.com (5.0/SMI-SVR4)
	id AA28965; Wed, 30 Apr 1997 16:10:53 +0500
Received: by fab.md.interlink.com (5.0/SMI-SVR4)
	id AA02350; Wed, 30 Apr 1997 16:18:12 +0500
Date: Wed, 30 Apr 1997 16:18:12 +0500
From: fab@fab.md.interlink.com (Fred Bohle)
Message-Id: <9704302018.AA02350@fab.md.interlink.com>
To: tcp-impl@relay.engr.SGI.COM, vjs@mica.denver.sgi.com
Subject: Re:  Failure to send window scale option with shift.cnt == 0
X-Sun-Charset: US-ASCII
Content-Length: 793
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Vernon,

> That any versions of SGI boxes built since 1986 would do the wrong
> thing with TCP options other than MSS is news to me.  I just spent some
> time asking the SGI bug tracking system about likely combinations of
> keywords, but found nothing, except that it doesn't go back past 1987.

... I'd rather suspect that you guys had a bug in how you shifted
> the window or something.

Hey! Don't shoot the messenger!  No, it wasn't a bug in our code.  It
was in the remote devices listed.

Lots of users have really ancient implementations sitting around in their
systems.  They seem to rely on the 'if it ain't broke, don't fix it' approach
to system maintenance.  Then we come along and send it an option it doesn't
recognize and it won't talk to us.  So it MUST be our fault!!


Fred

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 13:35:20 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA20653 for tcp-impl-list; Wed, 30 Apr 1997 13:33:23 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA20635 for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 13:33:20 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id NAA27721
	for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 13:33:20 -0700
	env-from (braden@ISI.EDU)
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-24)
	id <AA06131>; Wed, 30 Apr 1997 13:29:33 -0700
Date: Wed, 30 Apr 97 13:31:05 PDT
From: braden@ISI.EDU
Posted-Date: Wed, 30 Apr 97 13:31:05 PDT
Message-Id: <9704302031.AA09384@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA09384>; Wed, 30 Apr 97 13:31:05 PDT
To: fenner@parc.xerox.com
Subject: Re: Failure to send window scale option with shift.cnt == 0
Cc: tcp-impl@relay.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


   *> 
  *> FreeBSD ships with Matt's desired window scale behavior, and users
  *> quickly discovered that Xylogics Annexes (used to?) mis-handle
  *> VJ-compressed TCP packets with options.  A SYN exchange with options
  *> succeeds but data packets with options get dropped.  The symptom is
  *> that you can talk with anyone who doesn't implement RFC1323 but you
  *> cannot talk to anyone who does.
  *> 

Bill,

So, it happily negotiates a willingness to accept options in data
packets, and then black-holes data packet that contain options?  That's
really neat!

Bob

  *> NeXTStep's PPP server apparently has the same bug, according to the
  *> FreeBSD FAQ.
  *> 
  *>   Bill
  *> 

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 13:49:25 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA25842 for tcp-impl-list; Wed, 30 Apr 1997 13:46:10 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA25814 for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 13:46:07 -0700
Received: from frantic.BSDI.COM (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA01068
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 13:46:03 -0700
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id PAA08647
	for tcp-impl@relay.engr.SGI.COM; Wed, 30 Apr 1997 15:44:01 -0500 (CDT)
Date: Wed, 30 Apr 1997 15:44:01 -0500 (CDT)
From: David Borman <dab@BSDI.COM>
Message-Id: <199704302044.PAA08647@frantic.BSDI.COM>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re:  Failure to send window scale option with shift.cnt == 0
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Well, I'm stepping into the discussion, I'll try and be careful
where I step...

What BSD/OS does:

    Our system, BSD/OS, sends the Window Scale and Timestamps options
    in all SYNs, unless you turn off a system-wide sysctl variable,
    net.inet.tcp.do_rfc1323.  The value of the window scale option is
    inferred from the receive buffer.  I am not aware of any problem
    reports against BSD/OS directly relating to this.

Some History:

    I used to work at Cray Research, Inc., and I probably did the first
    commercial implementation of RFC 1072/1185, later to become RFC
    1323.  (I don't remember if I yanked it or not, but UNICOS may very
    well still support the ECHO and ECHO-REPLY options...)

    Anyway, I ran into other TCP implementations that did not deal with
    the new TCP options.  I don't remember the details, but at least
    one of the broken hosts was a terminal server.  I believe that in
    all the broken cases, the remote host was initiating the connection.
    This is why RFC-1323 says that if you don't receive a Window Scale
    in the initial SYN, then don't put one into the SYN/ACK.

    Being paranoid, and having backwards compatability stressed very
    strongly,  unless it has changed, UNICOS does not send the Window
    Scale option unless the application has explicitly asked for it
    via a setsockopt() call.  Also, in the absence of information from
    the application, UNICOS will also turn around a received Window
    Scale option, contrary to what RFC-1323 says.  I feel that this
    is important for applications like rcp, where by the time the
    application that knows that we want to use big buffers is running,
    we are long past any chance of setting the window scale option
    to a non-zero value.

A Key Point:

    There is one point that no one has seemed to recognize:

	Negotiating a non-zero window scale value does
	*not* mean that you have to use large buffers!

    That is to say, you can negotiate a window scale of 4, and have
    your send/receive buffers set at 8K, and things will work just
    fine!  The only effect of negotiating a non-zero window scale
    option is that the granularity of the minimum amount that the
    window can advance by goes up.  Let's see, with a window scale
    value of 4 (Windows up to 1MB), that's 2^4 = 16.  So, the
    advancement of the window has to be a multiple of 16 bytes.
    Gee, that's not an issue.  (Anyone who thinks they need to
    increment the window by less than 16 bytes needs to think long
    and hard about Silly Window Syndrom...)
    
    The problem is that implementations equate TCP Window with
    Receive Buffer Space and with Window Scale.  These are three
    separate items, and there is no reason why they have to be tied
    to each other.  There are relationships between them, but they
    don't have to be in lock step!

Summary:

    I know there are broke hosts out there.  I know that if you
    send a Window Scale in the SYN/ACK when you didn't receive it
    in the SYN it will cause them grief.  That's why 1323 says don't
    send the Window Scale if you didn't receive one.

	Does anyone know of a host that people (want to) do an
	active connect to, that can't handle a Window Scale in
	the initial SYN packet?

    If the broken hosts are mostly things like terminal servers
    that only initiate connections, then I don't see a problem with
    always sending a Window Scale in the initial SYN.

    I think that it is perfectly reasonable for the application to
    be able to directly set the Window Scale value, rather than
    having it just be automatically calculated as a side effect of
    setting a large recieve buffer, which also causes your initial
    advertised window to be large.  It's the difference between
    what the application wants to be able to use, vs. what it
    wants to use right now.

		-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 13:49:29 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA25862 for tcp-impl-list; Wed, 30 Apr 1997 13:46:11 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA25832 for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 13:46:09 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id NAA01142
	for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 13:46:07 -0700
	env-from (braden@ISI.EDU)
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-24)
	id <AA07588>; Wed, 30 Apr 1997 13:41:48 -0700
Date: Wed, 30 Apr 97 13:43:16 PDT
From: braden@ISI.EDU
Posted-Date: Wed, 30 Apr 97 13:43:16 PDT
Message-Id: <9704302043.AA09393@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA09393>; Wed, 30 Apr 97 13:43:16 PDT
To: tcp-impl@relay.engr.sgi.com, vjs@mica.denver.sgi.com
Subject: Re:  Failure to send window scale option with shift.cnt == 0
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


  *> > 
  *> > In general, we need to examine whether it is feasible to blackball any
  *> > implementation that can not swallow unrecognized options on SYN segments,
  *> 
  *> Blackball implementations?  I don't know about your situation, but the
  *> sales and support people I deal with say the funniest things whenever I
  *> say something like "that other vendor's system is broken junk; tell
  *> the customer to throw it out and get something that works."
  *> 
  *> 
  *> > or do we have to provide a global switch to disable use of anything other
  *> > than the MSS option.  If the problem is sufficiently widespread, you then
  *> > have to ship with the global switch in the conservative position, to avoid
  *> > having system administrators everywhere going nuts discovering that they have
  *> > to throw the switch.  Few, if any, will then enable use of these options.
  *> > I do hope that we can decide that this approach is not neccessary.
  *> > 

Sigh.  We went around on exactly this issue 10 years ago [!!] when we
did Host Requirements.  My view, and I think the "consensus" of the
Host Requirements WG, was something like the following:

	Sure, Vernon, the market place has its realities, and we
	understand that you have to deal with it.  We expect that you
	will do whatever you have to do.  But don't ask us to make the
	patently wrong protocol choices just to ease your conscience.
	Strict conformance with the protocol spec should require
	the default configuration be the long-term-desirable
	one, e.g., to send options on SYN packets.  If your company
	chooses, for its own marketing reasons, to default your system
	to not sending SYN options, to accomodate broken systems, you
	are breaking the rules.

Matt pointed out the dilemma; there is no other way to make progress.

Bob Braden

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 14:15:44 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA07195 for tcp-impl-list; Wed, 30 Apr 1997 14:13:04 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA07180 for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 14:13:02 -0700
Received: from md.interlink.com (md.interlink.com [138.42.32.165]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id OAA08542
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 14:12:55 -0700
	env-from (fab@fab.md.interlink.com)
Received: from fab.md.interlink.com by md.interlink.com (5.0/SMI-SVR4)
	id AA29453; Wed, 30 Apr 1997 17:03:20 +0500
Received: by fab.md.interlink.com (5.0/SMI-SVR4)
	id AA02380; Wed, 30 Apr 1997 17:10:45 +0500
Date: Wed, 30 Apr 1997 17:10:45 +0500
From: fab@fab.md.interlink.com (Fred Bohle)
Message-Id: <9704302110.AA02380@fab.md.interlink.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re:  Failure to send window scale option with shift.cnt == 0
X-Sun-Charset: US-ASCII
Content-Length: 1268
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Dave,

>     If the broken hosts are mostly things like terminal servers
>     that only initiate connections, then I don't see a problem with
>     always sending a Window Scale in the initial SYN.

The Lexmart printer I mentioned earlier is a counter-example.  Having
to power-cycle the box is a drastic result.  So we made disabling
window scaling (and with it, timestamps) an easy configuration option.
 
>     I think that it is perfectly reasonable for the application to
>     be able to directly set the Window Scale value, rather than
>     having it just be automatically calculated as a side effect of
>     setting a large recieve buffer, which also causes your initial
>     advertised window to be large.  It's the difference between
>     what the application wants to be able to use, vs. what it
>     wants to use right now.

Consider the application FTP.  Configure the application with large buffers
and window scaling enabled.  Now FTP to new and old hosts.  New hosts will
work fine.  Old hosts will not establish the session.  How can you configure
this application (FTP) to use large buffers and window scaling with the new
hosts, and still talk to the old hosts?  Even the application level is not
fine enough control for this situation.

Fred

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 14:29:44 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA13759 for tcp-impl-list; Wed, 30 Apr 1997 14:26:56 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA13742 for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 14:26:53 -0700
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id OAA12262
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 14:26:49 -0700
	env-from (jt@mentat.com)
Received: from rock.mentat.com ([192.88.122.136]) by mentat.com (4.1/SMI-4.1)
	id AA26313; Wed, 30 Apr 97 14:21:11 PDT
Date: Wed, 30 Apr 97 14:21:11 PDT
From: jt@mentat.com (Jerry Toporek)
Message-Id: <9704302121.AA26313@mentat.com>
To: tcp-impl@relay.engr.SGI.COM, vjs@mica.denver.sgi.com
Subject: Re:  Failure to send window scale option with shift.cnt == 0
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> 
> Blackball implementations?  I don't know about your situation, but the
> sales and support people I deal with say the funniest things whenever I
> say something like "that other vendor's system is broken junk; tell
> the customer to throw it out and get something that works."

Sure...  It has never worked particularly well for me either...

> Personally, I'd pick linking SACK with TCP-LW. 
>     - A peer that can handle timestamps and window shifts is likely
>       to do the right thing with SACK or other new options (including
>       ignoring it if not understood)
> 
>     - if you want selective ACK's, you probably want large windows in
>        at least one direction, and vice versa,
>       
> "Be conservative in what you send ..." is not just a good idea. 
> It's the only thing that works outside of ivory towers.

Understood...  But I would be disappointed to have to tie SACK to Large Windows.
I don't expect that applications like Web browsers would often try to use
large windows, but I would hope that they would benefit greatly from
selective ACKs.

jt


From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 14:46:49 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA21527 for tcp-impl-list; Wed, 30 Apr 1997 14:43:45 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA21518 for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 14:43:42 -0700
Received: from poptart.home.net (poptart.home.net [24.0.8.9]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA16487
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 14:43:41 -0700
	env-from (rja@corp.home.net)
Received: from borg.eos.home.net ([24.0.8.40]) by poptart.home.net
          (Netscape Mail Server v1.1) with ESMTP id AAA20691
          for <tcp-impl@relay.engr.SGI.COM>;
          Wed, 30 Apr 1997 14:32:41 -0700
Received: (from rja@localhost) by borg.eos.home.net (8.7.5/8.7.3) id OAA02609 for tcp-impl@relay.engr.SGI.COM; Wed, 30 Apr 1997 14:32:40 -0700 (PDT)
From: rja@corp.home.net (Ran Atkinson)
Message-Id: <970430143239.ZM2607@borg.eos.home.net>
Date: Wed, 30 Apr 1997 14:32:39 -0700
In-Reply-To: David Borman <dab@BSDI.COM>
        "Re:  Failure to send window scale option with shift.cnt == 0" (Apr 30, 15:44)
References: <199704302044.PAA08647@frantic.BSDI.COM>
X-Mailer: Z-Mail (4.0.1 13Jan97)
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: Failure to send window scale option with shift.cnt == 0
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Dave Borman's implementation [where there is a system administrator
knob (in the form of a sysctl) to enable/disable particular features
that might have interoperability issues] seems like the right thing
to do for commercial vendors who have to worry about backwards
compatibility with broken products.

In particular, if the default kernel variable value is to "do
the right thing" but the system admin can turn it off if necessary,
then this encourages the deployment of new technology while still
letting folks use those systems in environments with other
systems that are broken.

Ran
rja@home.net

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 14:50:48 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA23468 for tcp-impl-list; Wed, 30 Apr 1997 14:48:38 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA23457 for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 14:48:36 -0700
Received: from alpha.xerox.com (alpha.Xerox.COM [13.1.64.93]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id OAA17621
	for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 14:48:35 -0700
	env-from (fenner@parc.xerox.com)
Received: from crevenia.parc.xerox.com ([13.2.116.11]) by alpha.xerox.com with SMTP id <17015(1)>; Wed, 30 Apr 1997 14:44:45 PDT
Received: from localhost by crevenia.parc.xerox.com with SMTP id <177486>; Wed, 30 Apr 1997 14:44:37 -0700
To: braden@isi.edu
cc: fenner@parc.xerox.com, tcp-impl@relay.engr.sgi.com
Subject: Re: Failure to send window scale option with shift.cnt == 0 
In-reply-to: Your message of "Wed, 30 Apr 97 13:31:05 PDT."
             <9704302031.AA09384@can.isi.edu> 
Date: Wed, 30 Apr 1997 14:44:32 PDT
From: Bill Fenner <fenner@parc.xerox.com>
Message-Id: <97Apr30.144437pdt.177486@crevenia.parc.xerox.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

braden@isi.edu wrote:
>So, it happily negotiates a willingness to accept options in data
>packets, and then black-holes data packet that contain options?  That's
>really neat!

Not quite; the frustrating detail is that this applies to packets going
*through* these implementations when they are acting as routers.  The
problem occurs when you make connections *through* them (e.g. every
connection you make if you're a single host at the end of a PPP line).

This is why the symptom is that you cannot exchange data with another
host that implements RFC1323 when one of these routers is in the path
and has header compression enabled; these implementations do not need
to be either endpoint of the connection (and, in fact, presumably don't
implement RFC1323 themselves so the problem would not arise if the
system were one endpoint) to cause problems.

  Bill

From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 14:51:02 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA23893 for tcp-impl-list; Wed, 30 Apr 1997 14:49:28 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA23872 for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 14:49:26 -0700
Received: from frantic.BSDI.COM (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA17773
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 30 Apr 1997 14:49:22 -0700
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id QAA08771;
	Wed, 30 Apr 1997 16:47:45 -0500 (CDT)
Date: Wed, 30 Apr 1997 16:47:45 -0500 (CDT)
From: David Borman <dab@BSDI.COM>
Message-Id: <199704302147.QAA08771@frantic.BSDI.COM>
To: jt@mentat.com, tcp-impl@relay.engr.SGI.COM, vjs@mica.denver.sgi.com
Subject: Re:  Failure to send window scale option with shift.cnt == 0
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Understood...  But I would be disappointed to have to tie SACK to Large Windows.
> I don't expect that applications like Web browsers would often try to use
> large windows, but I would hope that they would benefit greatly from
> selective ACKs.

Repeat after me:
	Negotiating the use of the Window Scale option does *not*
	mean that you have to use large buffers and windows.

	Negotiating the use of the Window Scale option does *not*
	mean that you have to use large buffers and windows.

	...

The problem is implementations that won't allow you to send the
Window Scale option unless you set the buffers large.  There is no
reason that you can't leave your buffers small, and still negotiate
the Window Scale option.  It's strictly an implementation detail.

But this is a moot point, because using SACK is independent of using
Window Scale.  To allow SACK, you negotiate the Sack-Permitted Option
in the SYNs.

		-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Wed Apr 30 17:28:03 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA22688 for tcp-impl-list; Wed, 30 Apr 1997 17:24:51 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA22673 for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 17:24:48 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id RAA24339
	for <tcp-impl@relay.engr.sgi.com>; Wed, 30 Apr 1997 17:24:47 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Wed, 30 Apr 1997 20:21:00 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Wed, 30 Apr 1997 20:21:00 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id UAA17595; Wed, 30 Apr 1997 20:22:31 -0400
Date: Wed, 30 Apr 1997 20:22:31 -0400
Message-Id: <199705010022.UAA17595@MAILSERV-2HIGH-A.FTP.COM>
To: perry@piermont.com
Subject: Re: TCP keep-alives & Reality bites 
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: tcp-impl@relay.engr.sgi.com
Repository: mailserv-2high-a.ftp.com, [message accepted at Wed Apr 30 20:22:23 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||In other words, I did keepalives at the application layer, where,
||IMHO, they belonged.
||
||I don't understand why people expect TCP to provide this for them. One
||size does *not* fit all for these things. I needed to switch in
||seconds if I got a network outage. Most people have no such
||requirement and would be hurt by switching in seconds.

I agree completely.  Please tell all the people who are graduating from
"my first window app" to "my first winsock app" :-).

||You want your network backup program to last through a midnight router
||reload. You want your network market quote service to switch off if
||you don't get data for seconds. These are NOT the same requirement.

yes - we who understand TCP know this; everyone who gets the Microsoft
C compiler or Win32 SDK and decides they will write a TCP app doesnt...



From owner-tcp-impl@relay.engr.sgi.com  Thu May  1 10:28:08 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA01115 for tcp-impl-list; Thu, 1 May 1997 10:24:52 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA01093 for <tcp-impl@relay.engr.SGI.COM>; Thu, 1 May 1997 10:24:49 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA02731
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 1 May 1997 10:24:44 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id SAA32199; Thu, 1 May 1997 18:19:33 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wMhq8-0005FjC; Wed, 30 Apr 97 23:27 BST
Message-Id: <m0wMhq8-0005FjC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TCP keep-alives
To: braden@ISI.EDU
Date: Wed, 30 Apr 1997 23:27:08 +0100 (BST)
Cc: tcp-impl@relay.engr.SGI.COM, mouse@rodents.montreal.qc.ca
In-Reply-To: <9704292326.AA08253@can.isi.edu> from "braden@ISI.EDU" at Apr 29, 97 04:26:53 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Stateful router?  That is precisely why Internet routers do not have
> per-flow state [[and why make RSVP-type reservations for TCP
> connections is a bad idea]].  An idle TCP connection should tie up no
> resources except at the end systems.  (This is one reason why I think
> IPSEC is architecturally superior to application relays for firewalls.)

Sometimes it is hard to avoid. The Linux masquerading firewall stuff
really has to keep state as it maps a whole network onto one IP address
(ie its not a NAT). Keepalive saves having to mod all the clients.

I guess we could do spoofed ack probes from the router but we'd still
be keeping state internally.




From owner-tcp-impl@relay.engr.sgi.com  Thu May  1 11:51:45 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA26390 for tcp-impl-list; Thu, 1 May 1997 11:47:20 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA26306 for <tcp-impl@relay.engr.sgi.com>; Thu, 1 May 1997 11:47:11 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via SMTP id LAA24415
	for <tcp-impl@relay.engr.sgi.com>; Thu, 1 May 1997 11:47:00 -0700
	env-from (braden@ISI.EDU)
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-24)
	id <AA06155>; Thu, 1 May 1997 11:43:09 -0700
Date: Thu, 1 May 97 11:44:38 PDT
From: braden@ISI.EDU
Posted-Date: Thu, 1 May 97 11:44:38 PDT
Message-Id: <9705011844.AA10699@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA10699>; Thu, 1 May 97 11:44:38 PDT
To: braden@ISI.EDU, alan@lxorguk.ukuu.org.uk
Subject: Re: TCP keep-alives
Cc: tcp-impl@relay.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


  *> From alan@lxorguk.ukuu.org.uk Thu May  1 10:25:16 1997
  *> From: alan@lxorguk.ukuu.org.uk (Alan Cox)
  *> Subject: Re: TCP keep-alives
  *> To: braden@ISI.EDU
  *> Date: Wed, 30 Apr 1997 23:27:08 +0100 (BST)
  *> Cc: tcp-impl@relay.engr.SGI.COM, mouse@rodents.montreal.qc.ca
  *> In-Reply-To: <9704292326.AA08253@can.isi.edu> from "braden@ISI.EDU" at Apr 29, 97 04:26:53 pm
  *> Content-Type  *> :   *> text  *> 
  *> Content-Length: 664
  *> X-Lines: 15
  *> 
  *> > Stateful router?  That is precisely why Internet routers do not have
  *> > per-flow state [[and why make RSVP-type reservations for TCP
  *> > connections is a bad idea]].  An idle TCP connection should tie up no
  *> > resources except at the end systems.  (This is one reason why I think
  *> > IPSEC is architecturally superior to application relays for firewalls.)
  *> 
  *> Sometimes it is hard to avoid. The Linux masquerading firewall stuff
  *> really has to keep state as it maps a whole network onto one IP address
  *> (ie its not a NAT). Keepalive saves having to mod all the clients.
  *> 

Alan,

Sorry, I don't quite understand.  You mean that you are depending upon
TCP keepalives to maintain firewall state?  How about other transport
protocols?  How about UDP-based flows?

Bob Braden

From owner-tcp-impl@relay.engr.sgi.com  Fri May  2 15:06:52 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA10959 for tcp-impl-list; Fri, 2 May 1997 15:03:30 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA10923 for <tcp-impl@relay.engr.sgi.com>; Fri, 2 May 1997 15:03:25 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA01698
	for <tcp-impl@relay.engr.sgi.com>; Fri, 2 May 1997 15:02:59 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id WAA18499; Fri, 2 May 1997 22:58:06 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wNPWW-0005FjC; Fri, 2 May 97 22:05 BST
Message-Id: <m0wNPWW-0005FjC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TCP keep-alives
To: braden@ISI.EDU
Date: Fri, 2 May 1997 22:05:47 +0100 (BST)
Cc: alan@lxorguk.ukuu.org.uk, tcp-impl@relay.engr.sgi.com
In-Reply-To: <9705011844.AA10699@can.isi.edu> from "braden@ISI.EDU" at May 1, 97 11:44:38 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Sorry, I don't quite understand.  You mean that you are depending upon
> TCP keepalives to maintain firewall state?  How about other transport

Yes. It times sockets out knowing keepalives will avoid the timeout if 
the timer is set large enough

> protocols?  How about UDP-based flows?

UDP is time based. This works in general. The masquerade is pulling connection
oriented stunts on a non connection oriented protocols. Its an example of
using tcp keepalives nothing more. Its definitely not its intended purpose!

From owner-tcp-impl@relay.engr.sgi.com  Fri May  2 15:06:59 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA11286 for tcp-impl-list; Fri, 2 May 1997 15:04:23 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA11264 for <tcp-impl@relay.engr.SGI.COM>; Fri, 2 May 1997 15:04:19 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA01930
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 2 May 1997 15:04:11 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id WAA18490; Fri, 2 May 1997 22:57:20 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wN2tf-0005FjC; Thu, 1 May 97 21:56 BST
Message-Id: <m0wN2tf-0005FjC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Failure to send window scale option with shift.cnt == 0
To: fab@fab.md.interlink.com (Fred Bohle)
Date: Thu, 1 May 1997 21:56:10 +0100 (BST)
Cc: tcp-impl@relay.engr.SGI.COM
In-Reply-To: <9704302110.AA02380@fab.md.interlink.com> from "Fred Bohle" at Apr 30, 97 05:10:45 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> >     If the broken hosts are mostly things like terminal servers
> >     that only initiate connections, then I don't see a problem with
> >     always sending a Window Scale in the initial SYN.
> 
> The Lexmart printer I mentioned earlier is a counter-example.  Having
> to power-cycle the box is a drastic result.  So we made disabling
> window scaling (and with it, timestamps) an easy configuration option.

The best policy with such equipment is to publish exploits for the problem
to security lists. Most of the time vendors fix stuff. I've only had one
major vendor fail to fix a bug that got published that way.

It used to be the case that people shrugged off crashes like that, in the
internet of today they have to fix it. Thats purely by accident extremely
good news for progressing tcp standards.


From owner-tcp-impl@relay.engr.sgi.com  Fri May  2 15:28:11 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA20323 for tcp-impl-list; Fri, 2 May 1997 15:25:02 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA20231 for <tcp-impl@relay.engr.SGI.COM>; Fri, 2 May 1997 15:24:57 -0700
Received: from frantic.BSDI.COM (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA07551
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 2 May 1997 15:24:47 -0700
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id RAA12739;
	Fri, 2 May 1997 17:22:51 -0500 (CDT)
Date: Fri, 2 May 1997 17:22:51 -0500 (CDT)
From: David Borman <dab@BSDI.COM>
Message-Id: <199705022222.RAA12739@frantic.BSDI.COM>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: testing tools
Cc: rstevens@kohala.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Jumping back to early last month:

> From: rstevens@kohala.com (W. Richard Stevens)
> Date: Wed, 2 Apr 1997 04:57:04 -0700
> Subject: testing tools
> ...
> 
> [In Vern's message of Apr  2, 12:06am he writes:]
> > 
> > Calling all testing tools [10 min]
> > 	Need to document
> > 	Encourage development of new tools
> > 		maybe simple raw socket interface for testing
> > 			particular problems?
> > 		(but how do you get the host TCP to shut up?)
> 
> I have such a test program and I use a raw socket to write my own TCP
> segments, and then libpcap (e.g., BPF on a BSD/OS system) to read back
> the replies.  The way I shup up my TCP, to keep it from sending back
> RSTs to all the replies to all the segments that I generated, is with
> the following kernel hack:
> 
>         /*
>          * Locate pcb for segment.
>          */
> findpcb:
>         /* Following hack to let me read and write my own TCP segments
>            using BPF, without confusing kernel.  Just patch tcp_ignport
>            (at beginning of this file) to desired value. */
>         if (htons(tcp_ignport) &&
>             (htons(tcp_ignport) == ti->ti_dport ||
>              htons(tcp_ignport) == ti->ti_sport))
>                 goto drop;
> 
> I could never figure out another way to do this.
> 
> 	Rich Stevens

I've been thinking about this, and I now have some modifications to
BSD/OS to allow a hook for raw TCP sockets.  Its purpose is to be
able to receive all the packet destined for a particular TCP port
by short-circuiting tcp_input().  You would create a raw socket like:

	int s;
	struct sockaddr_in sin;
	int port = 4242;

	s = socket(AF_INET, SOCK_RAW, IPPROTO_TCP);

        memset(&sin, 0, sizeof(sin));
        sin.sin_family = AF_INET;
        sin.sin_len = sizeof(sin);
        sin.sin_port = htons(port);
        bind(s, (struct sockaddr *)&sin, sizeof(sin));

At this point, all TCP packets destined for port 4242 would be
received on this file descriptor.  Any replies would be sent just
as you would send any packet on a raw IP socket.  In addition, a
connect() could be done to restrict it to packets coming from a
particular host/port.

For the most part, I like this.  However, there are some issues:

	1) Does this seem useful?  I don't want to put something
	   into our system that no one wants or would use.

	2) The TCP checksum is verified before finding the appropriate
	   PCB, so packets with bad TCP checksums don't make it up
	   through this interface.

	3) The TCP checksum field is always set to zero.

	4) The IP portion of the packet is zeroed out except for the
	   fields that are part of the TCP pseudo header (part of
	   verifying the checksum).

Some other points about the implementation:
      o There is a new TCP flag, TF_RAW.  Each raw TCP socket has this
	bit set.
      o Raw TCP sockets show up in the normal netstat output, but they
	are always in the CLOSED state.
      o The TCP fields are presented in network byte order.

Thoughts? Comments?

		-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Sat May  3 03:19:08 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id DAA21661 for tcp-impl-list; Sat, 3 May 1997 03:17:31 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id DAA21653 for <tcp-impl@relay.engr.SGI.COM>; Sat, 3 May 1997 03:17:29 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id DAA12182
	for <tcp-impl@relay.engr.SGI.COM>; Sat, 3 May 1997 03:17:00 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id LAA06682; Sat, 3 May 1997 11:14:04 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wNTdI-0005FjC; Sat, 3 May 97 02:29 BST
Message-Id: <m0wNTdI-0005FjC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: testing tools
To: dab@BSDI.COM (David Borman)
Date: Sat, 3 May 1997 02:29:04 +0100 (BST)
Cc: tcp-impl@relay.engr.SGI.COM, rstevens@kohala.com
In-Reply-To: <199705022222.RAA12739@frantic.BSDI.COM> from "David Borman" at May 2, 97 05:22:51 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 	1) Does this seem useful?  I don't want to put something
> 	   into our system that no one wants or would use.

Seems like overkill

> Thoughts? Comments?

The simple firewall stuff people have contributed to BSDI can notch out
ports to order. Fixing BSD so that you can open 1 or multiple raw sockets
onto a protocol the kernel also uses would solve the 2nd half and very
cleanly tidy stuff up for other uses where this is an issue (efficient
logging tools for example)


From owner-tcp-impl@relay.engr.sgi.com  Mon May  5 10:56:54 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA10819 for tcp-impl-list; Mon, 5 May 1997 10:49:23 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA10727 for <tcp-impl@relay.engr.SGI.COM>; Mon, 5 May 1997 10:49:09 -0700
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA15279
	for <tcp-impl@relay.engr.SGI.COM>; Mon, 5 May 1997 10:49:08 -0700
	env-from (sparker@fstop.Sun.COM)
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id KAA22549; Mon, 5 May 1997 10:39:14 -0700
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id KAA28719; Mon, 5 May 1997 10:39:07 -0700
Received: from fstop. by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id KAA04817; Mon, 5 May 1997 10:39:09 -0700
Received: from fstop by fstop. (SMI-8.6/SMI-SVR4)
	id KAA29835; Mon, 5 May 1997 10:36:58 -0700
Message-Id: <199705051736.KAA29835@fstop.>
From: sparker@Eng.Sun.COM
To: David Borman <dab@BSDI.COM>
cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: testing tools 
Date: Mon, 05 May 1997 10:36:57 -0700
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


- I've been thinking about this, and I now have some modifications to
- BSD/OS to allow a hook for raw TCP sockets.  Its purpose is to be
- able to receive all the packet destined for a particular TCP port
- by short-circuiting tcp_input(). ...
- 
- 	1) Does this seem useful?  I don't want to put something
- 	   into our system that no one wants or would use.

Yes.  Right now, with the sockets API, there is no way to write TCP
tests without using a separate IP address, and providing IP & ARP
support underneath the tests.  Some agreed upon raw sockets interface
that allows TCP testing, IMHO, is a *very* useful thing to come to
agreement on.

SunOS 5.x already has such an interface.

- 	2) The TCP checksum is verified before finding the appropriate
- 	   PCB, so packets with bad TCP checksums don't make it up
- 	   through this interface.

This is consistent with our current stuff.  It also verifies the checksum
and tosses bad packets before returning data to a raw TCP socket.

- 	4) The IP portion of the packet is zeroed out except for the
- 	   fields that are part of the TCP pseudo header (part of
- 	   verifying the checksum).

Our code returns just the TCP header, and off the cuff I'm tempted to
propose that if you want the IP header, you should have set the IP_HDRINCL
socket option.

- Thoughts? Comments?

We support doing a connect() as well as a bind() on such sockets.  This
allows you to be more selective as to which packets you're interested in.
This also would allow things like tests to run in parallel.  We've recently
found this useful, because as we create tests for conditions which take
TCP though its timeout conditions, we find tests needing 10 minutes to run
not uncommon.  Being able to run those tests in parallel, IMHO, is a
worthwhile win.

I think the semantics of TCP raw socket imply that the protocol isn't
really "happening" for you, so connect, rather akin to UDP, is just
setting the address up as a short-hand.

Cheers,

	~sparker

From owner-tcp-impl@relay.engr.sgi.com  Mon May  5 14:48:18 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA29035 for tcp-impl-list; Mon, 5 May 1997 14:44:39 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA29017 for <tcp-impl@relay.engr.SGI.COM>; Mon, 5 May 1997 14:44:32 -0700
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA24845
	for <tcp-impl@relay.engr.SGI.COM>; Mon, 5 May 1997 14:44:31 -0700
	env-from (sparker@fstop.Sun.COM)
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id OAA21496; Mon, 5 May 1997 14:34:37 -0700
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id OAA19746; Mon, 5 May 1997 14:34:34 -0700
Received: from fstop. by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id OAA09834; Mon, 5 May 1997 14:34:36 -0700
Received: from fstop by fstop. (SMI-8.6/SMI-SVR4)
	id OAA00460; Mon, 5 May 1997 14:32:25 -0700
Message-Id: <199705052132.OAA00460@fstop.>
From: sparker@Eng.Sun.COM
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: testing tools 
Date: Mon, 05 May 1997 14:32:25 -0700
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Alan,

- > Thoughts? Comments?
- 
- The simple firewall stuff people have contributed to BSDI can notch out
- ports to order. Fixing BSD so that you can open 1 or multiple raw sockets
- onto a protocol the kernel also uses would solve the 2nd half and very
- cleanly tidy stuff up for other uses where this is an issue (efficient
- logging tools for example)

So you have me at a disadvantage here, since I don't know what mechanisms
you're talking about...

Speaking from the point of view of not feeling constrained by BSD's
architecture, the transport provider interface in common use on streams
systems, provides a structure where you can choose, by protocol:

	Local port #
	Local IP, local port #
	Local IP, local port #, remote IP, remote port #

a binding, and this binding is unrelated to the "transport level" code
requesting the binding.

This is just as simple as BSD, internally, I claim.  It's just different
than BSD.

That's really the key difference--When you ask for a raw socket, you
go through a different driver than when you ask for a TCP socket.
On BSD, the IP code wants to fanout incoming packets based on the
protocol number, but presumes 'raw' is where 'everything else' goes.

In general I feel this is an opportunity to clean up short-comings in
the semantics of sockets, after having identified a useful are in which
to enhance this.

Cheers,

	~sparker

From owner-tcp-impl@relay.engr.sgi.com  Tue May  6 06:15:31 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id GAA25380 for tcp-impl-list; Tue, 6 May 1997 06:13:40 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id GAA25373 for <tcp-impl@relay.engr.SGI.COM>; Tue, 6 May 1997 06:13:36 -0700
Received: from frantic.BSDI.COM (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id GAA24410
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 6 May 1997 06:13:31 -0700
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id IAA19570;
	Tue, 6 May 1997 08:12:02 -0500 (CDT)
Date: Tue, 6 May 1997 08:12:02 -0500 (CDT)
From: David Borman <dab@BSDI.COM>
Message-Id: <199705061312.IAA19570@frantic.BSDI.COM>
To: dab@BSDI.COM, sparker@Eng.Sun.COM
Subject: Re: testing tools
Cc: tcp-impl@relay.engr.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: sparker@Eng.Sun.COM
> Date: Mon, 05 May 1997 10:36:57 -0700
> ...
> - 	1) Does this seem useful?  I don't want to put something
> - 	   into our system that no one wants or would use.
> 
> Yes.  Right now, with the sockets API, there is no way to write TCP
> tests without using a separate IP address, and providing IP & ARP
> support underneath the tests.  Some agreed upon raw sockets interface
> that allows TCP testing, IMHO, is a *very* useful thing to come to
> agreement on.

Then perhaps what I have put together would be a good starting
point for further discussions about raw TCP access via the socket API.

> SunOS 5.x already has such an interface.

I assume you mean via the streams API?

> - 	4) The IP portion of the packet is zeroed out except for the
> - 	   fields that are part of the TCP pseudo header (part of
> - 	   verifying the checksum).
> 
> Our code returns just the TCP header, and off the cuff I'm tempted to
> propose that if you want the IP header, you should have set the IP_HDRINCL
> socket option.

Historically, IP_HDRINCL has only applied to outbound packets.
Also, all other SOCK_RAW sockets return the IP header, so that is
why I left it there (even though it is quite munged...)

> We support doing a connect() as well as a bind() on such sockets.  This

Yes, I also support connect().  I was trying to keep my initial message
from being too long, and I neglected to mention that connect() also works,
allowing you to fully specify the TCP connection that you want raw access to.

I should also note that you have to specify at least the local port
number.  If you haven't done that, you won't get anything (i.e, you
can't have a fully wildcarded raw TCP socket to get all the packets
that no one else is waiting for.)

> I think the semantics of TCP raw socket imply that the protocol isn't
> really "happening" for you, so connect, rather akin to UDP, is just
> setting the address up as a short-hand.

Exactly, I took the UDP connect code and special cased it into the
TCP connect code.

			-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Tue May  6 10:12:54 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA27659 for tcp-impl-list; Tue, 6 May 1997 10:08:43 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA27615 for <tcp-impl@relay.engr.SGI.COM>; Tue, 6 May 1997 10:08:36 -0700
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA24140
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 6 May 1997 10:08:35 -0700
	env-from (sparker@fstop.Sun.COM)
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id JAA15429; Tue, 6 May 1997 09:58:10 -0700
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id JAA00488; Tue, 6 May 1997 09:58:08 -0700
Received: from fstop. by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id JAA19118; Tue, 6 May 1997 09:58:09 -0700
Received: from fstop by fstop. (SMI-8.6/SMI-SVR4)
	id JAA02130; Tue, 6 May 1997 09:55:53 -0700
Message-Id: <199705061655.JAA02130@fstop.>
From: sparker@Eng.Sun.COM
To: David Borman <dab@BSDI.COM>
cc: tcp-impl@relay.engr.SGI.COM
Subject: TCP raw sockets...
Date: Tue, 06 May 1997 09:55:53 -0700
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


- > SunOS 5.x already has such an interface.
- 
- I assume you mean via the streams API?

No, actually, raw TCP sockets, although not documented, have been a part
of SunOS 5.x pretty much since 5.0.  It is something we haven't documented
because we don't want our user's depending on it, in no small part because
there has been no concensus about what a SOCK_RAW IPPROTO_TCP socket's
semantics are.  If an RFC were drafted in this area, I would expect
we would document and conform to that behavior.

You certainly can get at them via the streams interfaces, too.

- Historically, IP_HDRINCL has only applied to outbound packets.
- Also, all other SOCK_RAW sockets return the IP header, so that is
- why I left it there (even though it is quite munged...)

Yes, and I suppose being inconsistent in what the option means between
different protocols probably isn't helpful.  I certainly wouldn't mind
having an option indicating whether or not I get and send the IP headers,
though.  Mostly, though, I see myself wanting to ignore the IP headers
though....

- > We support doing a connect() as well as a bind() on such sockets.  This
- 
- Yes, I also support connect().  I was trying to keep my initial message
- from being too long, and I neglected to mention that connect() also works,
- allowing you to fully specify the TCP connection that you want raw access to.

Excellent.

Cheers,

	~sparker

From owner-tcp-impl@relay.engr.sgi.com  Tue May  6 15:45:01 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA14680 for tcp-impl-list; Tue, 6 May 1997 15:40:53 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA14667 for <tcp-impl@relay.engr.SGI.COM>; Tue, 6 May 1997 15:40:50 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA11518
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 6 May 1997 15:40:44 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lxorguk.ukuu.org.uk (Ulxorguk@localhost) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with UUCP id XAA24434; Tue, 6 May 1997 23:32:38 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wOsDb-0005FHC; Tue, 6 May 97 22:56 BST
Message-Id: <m0wOsDb-0005FHC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: testing tools
To: sparker@Eng.Sun.COM
Date: Tue, 6 May 1997 22:56:19 +0100 (BST)
Cc: alan@lxorguk.ukuu.org.uk, tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199705052132.OAA00460@fstop.> from "sparker@Eng.Sun.COM" at May 5, 97 02:32:25 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> So you have me at a disadvantage here, since I don't know what mechanisms
> you're talking about...

Ok a while ago someone wrote some nice packet filters for the BSDI OS, and
derivatives of this PD code are in FreeBSD, NetBSD, OpenBSD, Linux today.
You can use these to stop the kernel seeing a packet - but the bpf filters
on BSD and SOCK_PACKET + libpcap on Linux see them still.

> 	Local port #
> 	Local IP, local port #
> 	Local IP, local port #, remote IP, remote port #

Ditto for sockets, only you can also do Local Port, Remote IP, Remote Port
by connecting but binding to wildcard

> This is just as simple as BSD, internally, I claim.  It's just different
> than BSD.

[Umm. discussion that leads onto is off topic]

> On BSD, the IP code wants to fanout incoming packets based on the
> protocol number, but presumes 'raw' is where 'everything else' goes.

Yes. On Linux its a hash and you can listen to raw protocols that
are used by the kernel and by the system fine. It requires fixing a small
piece of code in ping (its a bug really - ping -f assumes no other 
mysterious replies will arive and breaks up if the do). Otherwise it
causes no hiccups

> In general I feel this is an opportunity to clean up short-comings in
> the semantics of sockets, after having identified a useful are in which
> to enhance this.

The BSD raw socket API has problems on all OS's. For one the BSD code
assumes some headers are byte swapped and tweaked by the kernel. The Linux
one doesnt do that, nor it seems to some other stacks. One reason for this
is it messes up DMA to user space implementations of stacks.

Alan


From owner-tcp-impl@relay.engr.sgi.com  Tue May 13 11:32:00 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA10314 for tcp-impl-list; Tue, 13 May 1997 11:23:55 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA10259 for <tcp-impl@engr.sgi.com>; Tue, 13 May 1997 11:23:49 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id LAA28481
	for <tcp-impl@engr.sgi.com>; Tue, 13 May 1997 11:23:44 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Tue, 13 May 1997 14:19:58 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Tue, 13 May 1997 14:19:58 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id OAA08126; Tue, 13 May 1997 14:21:41 -0400
Date: Tue, 13 May 1997 14:21:41 -0400
Message-Id: <199705131821.OAA08126@MAILSERV-2HIGH-A.FTP.COM>
To: tcp-impl@engr.sgi.com
Subject: No working group charter in http://www.ietf.org/html.charters/wg-dir.html
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Repository: mailserv-2high-a.ftp.com, [message accepted at Tue May 13 14:21:40 1997]
Originating-Client: tunes.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk



In trying to wind my way back in time by reading the enlightened
pearls of our past discussions I noticed that the IETF working
group page does not have an entry for this working group.

L.


From owner-tcp-impl@relay.engr.sgi.com  Tue May 13 12:00:10 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA20247 for tcp-impl-list; Tue, 13 May 1997 11:56:49 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA20238 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 13 May 1997 11:56:47 -0700
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (970321.SGI.8.8.5/950213.SGI.AUTOCF) via ESMTP id LAA16042; Tue, 13 May 1997 11:56:46 -0700 (PDT)
Message-Id: <199705131856.LAA16042@refugee.engr.sgi.com>
X-Mailer: exmh version 2.0gamma 1/27/96
To: backman@ftp.com
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: No working group charter in http://www.ietf.org/html.charters/wg-dir.html 
In-reply-to: Message from backman@ftp.com of 13 May 1997 14:21:41 EDT
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Tue, 13 May 1997 11:56:44 -0700
From: Steve Alexander <sca@refugee.engr.sgi.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

backman@ftp.com (Larry Backman) writes:
>In trying to wind my way back in time by reading the enlightened
>pearls of our past discussions I noticed that the IETF working
>group page does not have an entry for this working group.

I thought that had been corrected, but thanks.  I'll let the ADs know.

In the meantime you can get one from:
	http://reality.sgi.com/csp/tcp-impl

-- Steve



From owner-tcp-impl@relay.engr.sgi.com  Tue May 13 12:05:28 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA22381 for tcp-impl-list; Tue, 13 May 1997 12:03:41 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA22347 for <tcp-impl@relay.engr.SGI.COM>; Tue, 13 May 1997 12:03:33 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id MAA18721
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 13 May 1997 12:03:31 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Tue, 13 May 1997 14:59:38 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Tue, 13 May 1997 14:59:38 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id PAA08678; Tue, 13 May 1997 15:01:24 -0400
Date: Tue, 13 May 1997 15:01:24 -0400
Message-Id: <199705131901.PAA08678@MAILSERV-2HIGH-A.FTP.COM>
To: sca@refugee.engr.sgi.com
Subject: Re: No working group charter in http://www.ietf.org/html.charters/wg-dir.html 
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: tcp-impl@relay.engr.SGI.COM
Repository: mailserv-2high-a.ftp.com, [message accepted at Tue May 13 15:01:18 1997]
Originating-Client: tunes.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||I thought that had been corrected, but thanks.  I'll let the ADs know.
||
||In the meantime you can get one from:
||        http://reality.sgi.com/csp/tcp-impl
||
thank you - the obvious unwinding of relay.engr.sgi.com piece by piece
did not yield an archive.

L.


From owner-tcp-impl@relay.engr.sgi.com  Thu May 22 11:52:13 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA27603 for tcp-impl-list; Thu, 22 May 1997 11:46:21 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA27546 for <tcp-impl@relay.engr.sgi.com>; Thu, 22 May 1997 11:46:15 -0700
Received: from enterprise.hybrid.com (enterprise.hybrid.com [166.117.10.2]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id LAA24769
	for <tcp-impl@relay.engr.sgi.com>; Thu, 22 May 1997 11:46:14 -0700
	env-from (subir@enterprise.hybrid.com)
Received: by enterprise.hybrid.com with SMTP (Microsoft Exchange Server Internet Mail Connector Version 4.0.994.63)
	id <01BC66A4.F3B498A0@enterprise.hybrid.com>; Thu, 22 May 1997 11:40:30 -0700
Message-ID: <c=US%a=_%p=hybrid%l=ENTERPRISE-970522184030Z-14563@enterprise.hybrid.com>
From: Subir Varma <subir@enterprise.hybrid.com>
To: "'tcp-impl@relay.engr.sgi.com'" <tcp-impl@relay.engr.sgi.com>,
        "'ipcdn@terayon.com'" <ipcdn@terayon.com>
Cc: Subir Varma <subir@enterprise.hybrid.com>,
        Rick Enns
	 <rne@enterprise.hybrid.com>
Subject: TCP performance in asymmetric networks
Date: Thu, 22 May 1997 11:40:30 -0700
X-Mailer:  Microsoft Exchange Server Internet Mail Connector Version 4.0.994.63
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hello,
        Most of the emerging broadband access networks are asymmetric in
nature, i.e., high BW in the downstream direction and low BW in the
upstream direction. Examples include HFC, ADSL, Satellite, Wireless
access networks. In these systems, TCP Tahoe shows better performance as
compared to TCP Reno. This problem has been analyzed by Lakshman et.al.
in their paper "Window based error recovery and flow control with a slow
ack channel: A study of TCP/IP performance", available at
http://tesla.csl.uiuc.edu/~madhow/publications.html. The reason for this
is the following: A single packet loss in TCP Reno (due to buffer
overflow or link error) leads to the subsequent loss of multiple
packets. As has been well documented, this results in a timer expiry.
The chain of events that lead to the loss of multiple packets from that
of a single packet, is unique to the way the Fast Recovery algorithm
interacts with asymmetric channels. Assuming that the initial packet
loss occurs at a window size of W, in a symmetric network, TCP Reno is
able to inflate its window size in the Fast Recovery phase and transmit
about W/2 new packets. Thus at the start of the next cycle, when the
window size is set to W/2, the network is not subjected to a burst of
packets. In asymmetric networks on the other hand, due to the slow
upstream link, TCP Reno is not able to inflate its window size
sufficiently to transmit any new packets. This results in a burst of W/2
packets into the network at the start of the next cycle, which leads to
multiple packet losses and a timer expiry.
This performance problem is of concern to us, and we would like some
feedback on what can be done to alleviate it. In particular:
 - TCP SACK can solve this problem. Is the IETF close to agreeing on a
SACK based TCP congestion control scheme? RFC 2018 defines the packet
formats for relaying the SACK information to the sender, but it seems
there are several proposals on how to modify the basic window
increase/decrease algorithms in the presence of SACK.
- What is the present state of migration of the major TCP stacks from
Tahoe to Reno? In light of the fact that more and more end systems will
be at the asymmetric links in the next few years, it may be a good idea
to continue with TCP Tahoe until SACK is available.

Regards,
Subir Varma

Hybrid Networks,
Cupertino, CA
www.hybrid.com


From owner-tcp-impl@relay.engr.sgi.com  Thu May 22 12:58:47 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA14038 for tcp-impl-list; Thu, 22 May 1997 12:53:07 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA14021 for <tcp-impl@relay.engr.SGI.COM>; Thu, 22 May 1997 12:53:04 -0700
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA17616
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 22 May 1997 12:53:02 -0700
	env-from (curtis@brookfield.ans.net)
Received: from brookfield.ans.net (brookfield.ans.net [204.148.1.1]) by brookfield.ans.net (8.7.5/8.7.3) with ESMTP id PAA04093; Thu, 22 May 1997 15:49:07 -0400 (EDT)
Message-Id: <199705221949.PAA04093@brookfield.ans.net>
To: Subir Varma <subir@enterprise.hybrid.com>
cc: "'tcp-impl@relay.engr.sgi.com'" <tcp-impl@relay.engr.SGI.COM>,
        "'ipcdn@terayon.com'" <ipcdn@terayon.com>,
        Rick Enns <rne@enterprise.hybrid.com>
Reply-To: curtis@ans.net
Subject: Re: TCP performance in asymmetric networks 
In-reply-to: Your message of "Thu, 22 May 1997 11:40:30 PDT."
             <c=US%a=_%p=hybrid%l=ENTERPRISE-970522184030Z-14563@enterprise.hybrid.com> 
Date: Thu, 22 May 1997 15:49:07 -0400
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <c=US%a=_%p=hybrid%l=ENTERPRISE-970522184030Z-14563@enterprise.hybri
d.com>, Subir Varma writes:
> Hello,
>         Most of the emerging broadband access networks are asymmetric in
> nature, i.e., high BW in the downstream direction and low BW in the
> upstream direction. Examples include HFC, ADSL, Satellite, Wireless
> access networks. In these systems, TCP Tahoe shows better performance as
> compared to TCP Reno. This problem has been analyzed by Lakshman et.al.
> in their paper "Window based error recovery and flow control with a slow
> ack channel: A study of TCP/IP performance", available at
> http://tesla.csl.uiuc.edu/~madhow/publications.html. The reason for this
...

Put RED at the bottleneck and this becomes a non-issue and then Reno
performs better.

There is a fix for TCP Reno that (me thinks) is now making it into the
BSD variants (and so should be in SysV variants in 3-5 years and
pee-cees in 5-10?).  With this Reno should also perform better.

SACK also fixes this.

> - What is the present state of migration of the major TCP stacks from
> Tahoe to Reno? In light of the fact that more and more end systems will
> be at the asymmetric links in the next few years, it may be a good idea
> to continue with TCP Tahoe until SACK is available.

That's sort of off topic, but...  An off the cuff (also somewhat glib
and possibly wrong) estimate is: workstations - mostly Reno already
though Solaris is something completely different (and a very poor
performer), PCs and Macs - mostly still trying to figure out what fast
retransmit does and why they need it (some notable exceptions, I think
ftp software is well beyond this).  What matters is what the sender
does, so for web access it matters what the server does.

So IMO if you do Reno, do the "newreno" fix until you can do SACK
rather than fall back to Tahoe.

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Thu May 22 13:15:56 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA17225 for tcp-impl-list; Thu, 22 May 1997 13:09:14 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA17118 for <tcp-impl@relay.engr.SGI.COM>; Thu, 22 May 1997 13:08:58 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA20928
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 22 May 1997 13:08:07 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id UAA29504; Thu, 22 May 1997 20:50:15 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wUdz2-0005GcC; Thu, 22 May 97 20:57 BST
Message-Id: <m0wUdz2-0005GcC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TCP performance in asymmetric networks
To: subir@enterprise.hybrid.com (Subir Varma)
Date: Thu, 22 May 1997 20:57:07 +0100 (BST)
Cc: tcp-impl@relay.engr.SGI.COM, ipcdn@terayon.com,
        subir@enterprise.hybrid.com, rne@enterprise.hybrid.com
In-Reply-To: <c=US%a=_%p=hybrid%l=ENTERPRISE-970522184030Z-14563@enterprise.hybrid.com> from "Subir Varma" at May 22, 97 11:40:30 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> - What is the present state of migration of the major TCP stacks from
> Tahoe to Reno? In light of the fact that more and more end systems will
> be at the asymmetric links in the next few years, it may be a good idea
> to continue with TCP Tahoe until SACK is available.

Please don't blindly assume Tahoe and Reno are the only options. Neither
of them have good delayed ack timing behaviour for example. 

Alan


From owner-tcp-impl@relay.engr.sgi.com  Thu May 22 15:44:19 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA26914 for tcp-impl-list; Thu, 22 May 1997 15:36:37 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA26905 for <tcp-impl@relay.engr.sgi.com>; Thu, 22 May 1997 15:36:35 -0700
Received: from fw.com21.com (fw.com21.com [207.33.62.21]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA02628
	for <tcp-impl@relay.engr.sgi.com>; Thu, 22 May 1997 15:36:33 -0700
	env-from (nichols@com21.com)
Received: by fw.com21.com; id SAA11126; Thu, 22 May 1997 18:29:50 -0400 (EDT)
Received: from terra.com21.com(140.174.223.21) by fw.com21.com via smap (3.2)
	id xma011095; Thu, 22 May 97 18:29:38 -0400
Received: from walnut.com21.com (walnut.com21.com [140.174.223.120]) by terra.com21.com (8.8.5/8.6.5) with SMTP id PAA06854; Thu, 22 May 1997 15:33:31 -0700 (PDT)
Received: by walnut.com21.com (SMI-8.6/SMI-SVR4)
	id PAA01329; Thu, 22 May 1997 15:27:01 -0700
Date: Thu, 22 May 1997 15:27:01 -0700
From: nichols@com21.com (Kathleen Nichols)
Message-Id: <199705222227.PAA01329@walnut.com21.com>
To: tcp-impl@relay.engr.sgi.com, ipcdn@Terayon.COM,
        subir@enterprise.hybrid.com
Subject: Re: TCP performance in asymmetric networks
Cc: rne@enterprise.hybrid.com
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> From subir@enterprise.hybrid.com Thu May 22 13:04:42 1997

>         Most of the emerging broadband access networks are asymmetric in
> nature, i.e., high BW in the downstream direction and low BW in the
> upstream direction. Examples include HFC, ADSL, Satellite, Wireless
> access networks. In these systems, TCP Tahoe shows better performance as
> compared to TCP Reno. This problem has been analyzed by Lakshman et.al.
> in their paper "Window based error recovery and flow control with a slow
> ack channel: A study of TCP/IP performance", available at
> http://tesla.csl.uiuc.edu/~madhow/publications.html. The reason for this

Subir,
Have you actually observed the problems mentioned in this paper in your
cable data systems? I took a look at that paper and there seem to me
to be some assumptions made about the underlying MAC that would be
a poor design. Lakshman et al state that a solution is to control 
access to the upstream link. Com21's systems do this and it is
certainly possibly within the MCNS spec. It seems like most of the
buffer overflow problems could be solved with sufficiently large buffers.
The asymmetry problem appears somewhat overblown, too, as the downstream
bandwidth is the limiter for two-way cable modems and the telco return
is only return limited if there's no sharing of the downstream. Admittedly,
there are still a lot of unknowns about cable data system performance,
but we haven't seen this sort of thing in detailed simulations or in
packet traces on our systems. I'd be inclined to try to get the MAC
designed right before trying to change the implementation of a protocol
that runs on a lot more kinds of systems.

	Kathie
	nichols@com21.com

From owner-tcp-impl@relay.engr.sgi.com  Thu May 22 16:45:05 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA14903 for tcp-impl-list; Thu, 22 May 1997 16:41:08 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA14885 for <tcp-impl@relay.engr.SGI.COM>; Thu, 22 May 1997 16:41:05 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id QAA17964
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 22 May 1997 16:41:03 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Thu, 22 May 1997 19:37:16 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Thu, 22 May 1997 19:37:16 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id TAA15560; Thu, 22 May 1997 19:39:10 -0400
Date: Thu, 22 May 1997 19:39:10 -0400
Message-Id: <199705222339.TAA15560@MAILSERV-2HIGH-A.FTP.COM>
To: curtis@ans.net
Subject: Re: TCP performance in asymmetric networks 
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: subir@enterprise.hybrid.com, tcp-impl@relay.engr.SGI.COM,
        ipcdn@terayon.com, rne@enterprise.hybrid.com
Repository: mailserv-2high-a.ftp.com, [message accepted at Thu May 22 19:39:08 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||Put RED at the bottleneck and this becomes a non-issue and then Reno
||performs better.
||
||There is a fix for TCP Reno that (me thinks) is now making it into the
||BSD variants (and so should be in SysV variants in 3-5 years and
||pee-cees in 5-10?).  With this Reno should also perform better.
||
||SACK also fixes this.

ahem - our Win95 stack is 4.4 Bsd based, has SACK as well as Fast recovery
and 1323 big window.  Of course our Win16 and DOS stacks, originally
from PCIP code base don't have any of this and are further limited by only
being able to queue one out of sequence packet.

As for PC's - it seems to me that the vendor pool has dwindled to far fewer
than a 2-3 yeasr back...evolution in action or perhaps monopoly in action :-)
(fish, fish, as I troll for flame mail from the upper left coast :-))

||That's sort of off topic, but...  An off the cuff (also somewhat glib
||and possibly wrong) estimate is: workstations - mostly Reno already
||though Solaris is something completely different (and a very poor
||performer), PCs and Macs - mostly still trying to figure out what fast
||retransmit does and why they need it (some notable exceptions, I think
||ftp software is well beyond this).  What matters is what the sender
||does, so for web access it matters what the server does.

:-) :-)  Yup - we have fast rexmit; but the fun part of setting up a
good testbed, diddly w/ slow start, rexmit, large window, SACK remains.,..




From owner-tcp-impl@relay.engr.sgi.com  Thu May 22 17:14:29 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA22201 for tcp-impl-list; Thu, 22 May 1997 17:08:22 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA22190 for <tcp-impl@relay.engr.SGI.COM>; Thu, 22 May 1997 17:08:19 -0700
Received: from doggate.microsoft.com (doggate.microsoft.com [131.107.2.63]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id RAA26217
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 22 May 1997 17:08:18 -0700
	env-from (henrysa@EXCHANGE.MICROSOFT.com)
Received: by DOGGATE with Internet Mail Service (5.0.1457.3)
	id <L2F3ZVNV>; Thu, 22 May 1997 17:06:46 -0700
Message-ID: <7D9A01DBBFD5CF11AD0F0000F8411F8A5E2877@ROADKILL>
From: "Henry Sanders (Exchange)" <henrysa@EXCHANGE.MICROSOFT.com>
To: curtis@ans.net, "'backman@ftp.com'" <backman@ftp.com>
Cc: subir@enterprise.hybrid.com, tcp-impl@relay.engr.SGI.COM,
        ipcdn@terayon.com, rne@enterprise.hybrid.com
Subject: RE: TCP performance in asymmetric networks 
Date: Thu, 22 May 1997 17:06:45 -0700
X-Priority: 3
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.0.1457.3)
Content-Type: text/plain
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> ahem - our Win95 stack is 4.4 Bsd based, has SACK as well as Fast
> recovery
> and 1323 big window.  Of course our Win16 and DOS stacks, originally
> from PCIP code base don't have any of this and are further limited by
> only
> being able to queue one out of sequence packet.
> 
FWIW, fast retransmit is shipping in NT. Window scaling, SACK, etc will
ship in the next versions of NT and Win95.

> As for PC's - it seems to me that the vendor pool has dwindled to far
> fewer
> than a 2-3 yeasr back...evolution in action or perhaps monopoly in
> action :-)
> (fish, fish, as I troll for flame mail from the upper left coast :-))
> 
Sorry Larry, I know you too well, you'll have to try harder. You could
try bringing up Winsock 1.1 FD_CLOSE behavior.... :)

Henry



From owner-tcp-impl@relay.engr.sgi.com  Fri May 23 09:54:21 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA06439 for tcp-impl-list; Fri, 23 May 1997 09:52:45 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA06397 for <tcp-impl@relay.engr.sgi.com>; Fri, 23 May 1997 09:52:39 -0700
Received: from enterprise.hybrid.com (enterprise.hybrid.com [166.117.10.2]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id JAA14263
	for <tcp-impl@relay.engr.sgi.com>; Fri, 23 May 1997 09:52:38 -0700
	env-from (subir@enterprise.hybrid.com)
Received: by enterprise.hybrid.com with SMTP (Microsoft Exchange Server Internet Mail Connector Version 4.0.994.63)
	id <01BC675E.5DEE0D00@enterprise.hybrid.com>; Fri, 23 May 1997 09:47:46 -0700
Message-ID: <c=US%a=_%p=hybrid%l=ENTERPRISE-970523164744Z-14991@enterprise.hybrid.com>
From: Subir Varma <subir@enterprise.hybrid.com>
To: Subir Varma <subir@enterprise.hybrid.com>,
        "'tcp-impl@relay.engr.sgi.com'" <tcp-impl@relay.engr.sgi.com>,
        "'ipcdn@Terayon.COM'" <ipcdn@Terayon.COM>,
        "'nichols@com21.com'" <nichols@com21.com>
Cc: Rick Enns <rne@enterprise.hybrid.com>
Subject: RE: TCP performance in asymmetric networks
Date: Fri, 23 May 1997 09:47:44 -0700
X-Mailer:  Microsoft Exchange Server Internet Mail Connector Version 4.0.994.63
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Kathie,
           You have raised a number of questions, some of which I answer
below, but you have not covered the question of general interest which
addressed the differences in performances of Reno and Tahoe in
asymmetric networks.
>
>Subir,
>Have you actually observed the problems mentioned in this paper in your
>cable data systems? 

As you point out this paper assumes a rather poorly designed system, I
suspect the authors did so to make their analysis easier. Our system has
been designed for large deployments taking real world constraints into
account, and includes a number of optimizations that help to mitigate
these kind of problems.

>I took a look at that paper and there seem to me
>to be some assumptions made about the underlying MAC that would be
>a poor design. Lakshman et al state that a solution is to control 
>access to the upstream link. Com21's systems do this and it is
>certainly possibly within the MCNS spec. 

The Hybrid system also controls access to the upstream link. It should
be clear from reading the paper that any vendor who does not do so will
have a poorly performing system.

>It seems like most of the
>buffer overflow problems could be solved with sufficiently large buffers.

If this was always the case, then there would be no need for congestion
control at all ;-). There is no way to avoid buffer overflows in real
systems.

>The asymmetry problem appears somewhat overblown, too, as the downstream
>bandwidth is the limiter for two-way cable modems and the telco return
>is only return limited if there's no sharing of the downstream. 

Our studies show that the downstream is the limiting factor in both type
of systems.

>Admittedly,
>there are still a lot of unknowns about cable data system performance,
>but we haven't seen this sort of thing in detailed simulations or in
>packet traces on our systems. 

This depends on how extensive these tests have been.

>I'd be inclined to try to get the MAC
>designed right before trying to change the implementation of a protocol
>that runs on a lot more kinds of systems.

I still think that the problem pointed out in the paper is a real one.
It is true that the authors have over simplified the system so that it
can be conveniently analyzed, but this is a common technique among
academics. The problem may not occur if there are only a few users in
the system, but as soon as there are a sufficiently large number of
users, buffer shortages will occur, and TCP congestion control will kick
in. I would also like to re-focus the discussion on the congestion
control scheme, rather than specific vendor implementations.

>	Kathie
>	nichols@com21.com

Subir Varma
Hybrid Networks.

From owner-tcp-impl@relay.engr.sgi.com  Fri May 23 13:05:38 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA28516 for tcp-impl-list; Fri, 23 May 1997 12:58:46 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA28500 for <tcp-impl@relay.engr.SGI.COM>; Fri, 23 May 1997 12:58:44 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA10997
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 23 May 1997 12:57:41 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@guru-transit1-336.swansea.cymru.net [163.164.160.20]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id UAA06814; Fri, 23 May 1997 20:52:13 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wUpAC-0005FdC; Fri, 23 May 97 08:53 BST
Message-Id: <m0wUpAC-0005FdC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TCP performance in asymmetric networks
To: henrysa@EXCHANGE.MICROSOFT.com (Henry Sanders)
Date: Fri, 23 May 1997 08:53:24 +0100 (BST)
Cc: curtis@ans.net, backman@ftp.com, subir@enterprise.hybrid.com,
        tcp-impl@relay.engr.SGI.COM, ipcdn@terayon.com,
        rne@enterprise.hybrid.com
In-Reply-To: <7D9A01DBBFD5CF11AD0F0000F8411F8A5E2877@ROADKILL> from "Henry Sanders" at May 22, 97 05:06:45 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> FWIW, fast retransmit is shipping in NT. Window scaling, SACK, etc will
> ship in the next versions of NT and Win95.

And slow start. Surely you aren't going to release window scaling, large
windows and no slow start onto the net ?

Alan


From owner-tcp-impl@relay.engr.sgi.com  Tue May 27 08:50:45 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA00624 for tcp-impl-list; Tue, 27 May 1997 08:48:12 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA00594 for <tcp-impl@relay.engr.SGI.COM>; Tue, 27 May 1997 08:48:09 -0700
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA19539
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 27 May 1997 08:48:07 -0700
	env-from (curtis@brookfield.ans.net)
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.8.5/8.7.3) with ESMTP id LAA12257; Tue, 27 May 1997 11:45:06 -0400 (EDT)
Message-Id: <199705271545.LAA12257@brookfield.ans.net>
To: "Henry Sanders (Exchange)" <henrysa@EXCHANGE.MICROSOFT.com>
cc: curtis@ans.net, "'backman@ftp.com'" <backman@ftp.com>,
        subir@enterprise.hybrid.com, tcp-impl@relay.engr.SGI.COM,
        ipcdn@terayon.com, rne@enterprise.hybrid.com
Reply-To: curtis@ans.net
Subject: Re: TCP performance in asymmetric networks 
In-reply-to: Your message of "Thu, 22 May 1997 17:06:45 PDT."
             <7D9A01DBBFD5CF11AD0F0000F8411F8A5E2877@ROADKILL> 
Date: Tue, 27 May 1997 11:45:05 -0400
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <7D9A01DBBFD5CF11AD0F0000F8411F8A5E2877@ROADKILL>, "Henry Sanders (E
xchange)" writes:
> > ahem - our Win95 stack is 4.4 Bsd based, has SACK as well as Fast
> > recovery
> > and 1323 big window.  Of course our Win16 and DOS stacks, originally
> > from PCIP code base don't have any of this and are further limited by
> > only
> > being able to queue one out of sequence packet.
> > 
> FWIW, fast retransmit is shipping in NT. Window scaling, SACK, etc will
> ship in the next versions of NT and Win95.


Henry,

You didn't mention fast recovery.  Does that mean you are essentially
shipping a Tahoe like TCP with NT and that Win95 and older NTs (most
currently deployed) are on a par with BSD TCP circa 1989?  At what
version did NT get fast recovery?

Of course, the next version fixes everything (or is it the one after
that:).  I'd just like clarification on what NT and Win95 do now.

Thanks,

Curtis


From owner-tcp-impl@relay.engr.sgi.com  Tue May 27 10:22:58 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA26447 for tcp-impl-list; Tue, 27 May 1997 10:21:09 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA26430 for <tcp-impl@relay.engr.SGI.COM>; Tue, 27 May 1997 10:21:06 -0700
Received: from doggate.microsoft.com (doggate.microsoft.com [131.107.2.63]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id KAA15434
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 27 May 1997 10:21:06 -0700
	env-from (henrysa@EXCHANGE.MICROSOFT.com)
Received: by DOGGATE with Internet Mail Service (5.0.1457.3)
	id <L4NWBQ2F>; Tue, 27 May 1997 10:19:01 -0700
Message-ID: <7D9A01DBBFD5CF11AD0F0000F8411F8A5E288F@ROADKILL>
From: "Henry Sanders (Exchange)" <henrysa@EXCHANGE.MICROSOFT.com>
To: "'alan@lxorguk.ukuu.org.uk'" <alan@lxorguk.ukuu.org.uk>
Cc: curtis@ans.net, backman@ftp.com, subir@enterprise.hybrid.com,
        tcp-impl@relay.engr.SGI.COM, ipcdn@terayon.com,
        rne@enterprise.hybrid.com
Subject: RE: TCP performance in asymmetric networks
Date: Tue, 27 May 1997 10:18:39 -0700
X-Priority: 3
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.0.1457.3)
Content-Type: text/plain
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> And slow start. Surely you aren't going to release window scaling,
> large
> windows and no slow start onto the net ?
> 
Slow start has always been implemented in all versions of our NT/Win95
TCP stacks. 

Henry



From owner-tcp-impl@relay.engr.sgi.com  Tue May 27 12:26:58 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA04767 for tcp-impl-list; Tue, 27 May 1997 12:24:59 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA04761 for <tcp-impl@relay.engr.SGI.COM>; Tue, 27 May 1997 12:24:52 -0700
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA25663
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 27 May 1997 12:24:42 -0700
	env-from (curtis@brookfield.ans.net)
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.8.5/8.7.3) with ESMTP id PAA13195; Tue, 27 May 1997 15:19:28 -0400 (EDT)
Message-Id: <199705271919.PAA13195@brookfield.ans.net>
To: "Henry Sanders (Exchange)" <henrysa@EXCHANGE.MICROSOFT.com>
cc: "'alan@lxorguk.ukuu.org.uk'" <alan@lxorguk.ukuu.org.uk>, curtis@ans.net,
        backman@ftp.com, subir@enterprise.hybrid.com,
        tcp-impl@relay.engr.SGI.COM, ipcdn@terayon.com,
        rne@enterprise.hybrid.com
Reply-To: curtis@ans.net
Subject: Re: TCP performance in asymmetric networks 
In-reply-to: Your message of "Tue, 27 May 1997 10:18:39 PDT."
             <7D9A01DBBFD5CF11AD0F0000F8411F8A5E288F@ROADKILL> 
Date: Tue, 27 May 1997 15:19:27 -0400
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <7D9A01DBBFD5CF11AD0F0000F8411F8A5E288F@ROADKILL>, "Henry Sanders (E
xchange)" writes:
> > And slow start. Surely you aren't going to release window scaling,
> > large
> > windows and no slow start onto the net ?
> > 
> Slow start has always been implemented in all versions of our NT/Win95
> TCP stacks. 
> 
> Henry


Henry,

Are you claiming the all versions of NT and Win95 implement TCP slow
start and TCP congestion avoidance in conformance to RFC 2001?  Or are
you just stating that some form of slow start was implemented, but not
congestion avoidance?

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Tue May 27 15:30:42 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA21408 for tcp-impl-list; Tue, 27 May 1997 15:25:31 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA20257 for <tcp-impl@relay.engr.SGI.COM>; Tue, 27 May 1997 15:21:51 -0700
Received: from doggate.microsoft.com (doggate.microsoft.com [131.107.2.63]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id PAA25654
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 27 May 1997 15:21:50 -0700
	env-from (henrysa@EXCHANGE.MICROSOFT.com)
Received: by DOGGATE with Internet Mail Service (5.0.1457.3)
	id <LYZKNB4C>; Tue, 27 May 1997 15:19:39 -0700
Message-ID: <7D9A01DBBFD5CF11AD0F0000F8411F8A5E28A0@ROADKILL>
From: "Henry Sanders (Exchange)" <henrysa@EXCHANGE.MICROSOFT.com>
To: "'curtis@ans.net'" <curtis@ans.net>
Cc: "'alan@lxorguk.ukuu.org.uk'" <alan@lxorguk.ukuu.org.uk>, backman@ftp.com,
        subir@enterprise.hybrid.com, tcp-impl@relay.engr.SGI.COM,
        ipcdn@terayon.com, rne@enterprise.hybrid.com
Subject: RE: TCP performance in asymmetric networks 
Date: Tue, 27 May 1997 15:19:42 -0700
X-Priority: 3
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.0.1457.3)
Content-Type: text/plain
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Are you claiming the all versions of NT and Win95 implement TCP slow
> start and TCP congestion avoidance in conformance to RFC 2001?  Or are
> you just stating that some form of slow start was implemented, but not
> congestion avoidance?
> 
No, the TCPs in NT and Win95 implement slow start and congestion
avoidance. To the best of my knowledge the implementation conforms to
2001 in these areas.  

In answer to your previous mail, both fast recovery and fast retransmit
are available in NT 4.0 as of service pack 2 but are not yet available
for Win95. 

Henry



From owner-tcp-impl@relay.engr.sgi.com  Wed May 28 07:26:28 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA28395 for tcp-impl-list; Wed, 28 May 1997 07:22:37 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA28380 for <tcp-impl@engr.sgi.com>; Wed, 28 May 1997 07:22:30 -0700
Received: from mailhub.axion.bt.co.uk (mailhub.axion.bt.co.uk [132.146.5.4]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id HAA01372
	for <tcp-impl@engr.sgi.com>; Wed, 28 May 1997 07:22:19 -0700
	env-from (martin.tatham@bt-sys.bt.co.uk)
Received: from rambo.futures.bt.co.uk by mailhub.axion.bt.co.uk with SMTP (PP); Wed, 28 May 1997 11:02:54 +0100
Received: from maczebedee (actually macsmtp.futures.bt.co.uk) by rambo.futures.bt.co.uk with SMTP (PP);
          Wed, 28 May 1997 11:02:44 +0100
Message-ID: <n1347305251.71043@maczebedee>
Date: 28 May 1997 11:03:12 U
From: Martin Tatham <martin.tatham@bt-sys.bt.co.uk>
Subject: Slow-start after idle implementations
To: tcp-impl <tcp-impl@engr.sgi.com>
X-Mailer: Mail*Link SMTP for Quarterdeck Mail; Version 4.0.0
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hi,

There's been a bit of discussion on this list regarding slow-start after idle
- I would like to know which TCP implementations do slow-start after idle and
which do not. Does anyone have that information to hand?

Thanks,

Martin
------------------------------------------------------
Martin Tatham             Tel: +44-1473-642498
B29 Room 129              Fax: +44-1473-649421
BT Laboratories           martin.tatham@bt-sys.bt.co.uk
------------------------------------------------------
________________________________________________________
Notice: This contribution is the personal view of the author and does not
 necessarily reflect the technical nor commercial direction of British 
Telecommunications plc.
________________________________________________________



From owner-tcp-impl@relay.engr.sgi.com  Wed May 28 07:38:55 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA01289 for tcp-impl-list; Wed, 28 May 1997 07:34:18 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA01245 for <tcp-impl@relay.engr.SGI.COM>; Wed, 28 May 1997 07:34:11 -0700
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA06087
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 28 May 1997 07:34:10 -0700
	env-from (curtis@brookfield.ans.net)
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.8.5/8.7.3) with ESMTP id KAA02799; Wed, 28 May 1997 10:27:59 -0400 (EDT)
Message-Id: <199705281427.KAA02799@brookfield.ans.net>
To: "Henry Sanders (Exchange)" <henrysa@EXCHANGE.MICROSOFT.com>
cc: "'curtis@ans.net'" <curtis@ans.net>,
        "'alan@lxorguk.ukuu.org.uk'" <alan@lxorguk.ukuu.org.uk>,
        backman@ftp.com, subir@enterprise.hybrid.com,
        tcp-impl@relay.engr.SGI.COM, ipcdn@terayon.com,
        rne@enterprise.hybrid.com
Reply-To: curtis@ans.net
Subject: Re: TCP performance in asymmetric networks 
In-reply-to: Your message of "Tue, 27 May 1997 15:19:42 PDT."
             <7D9A01DBBFD5CF11AD0F0000F8411F8A5E28A0@ROADKILL> 
Date: Wed, 28 May 1997 10:27:58 -0400
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <7D9A01DBBFD5CF11AD0F0000F8411F8A5E28A0@ROADKILL>, "Henry Sanders (E
xchange)" writes:
> > Are you claiming the all versions of NT and Win95 implement TCP slow
> > start and TCP congestion avoidance in conformance to RFC 2001?  Or are
> > you just stating that some form of slow start was implemented, but not
> > congestion avoidance?
> > 
> No, the TCPs in NT and Win95 implement slow start and congestion
> avoidance. To the best of my knowledge the implementation conforms to
> 2001 in these areas.  
> 
> In answer to your previous mail, both fast recovery and fast retransmit
> are available in NT 4.0 as of service pack 2 but are not yet available
> for Win95. 
> 
> Henry


Thanks for the clarifications,

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Fri May 30 14:36:37 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA23342 for tcp-impl-list; Fri, 30 May 1997 14:34:39 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA23330 for <tcp-impl@relay.engr.SGI.COM>; Fri, 30 May 1997 14:34:37 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA09912
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 30 May 1997 14:34:36 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id OAA16650; Fri, 30 May 1997 14:24:42 -0700 (PDT)
Message-Id: <199705302124.OAA16650@daffy.ee.lbl.gov>
To: tcp-impl@relay.engr.SGI.COM
Subject: dissertation available on end-to-end Internet dynamics
Date: Fri, 30 May 1997 14:24:42 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

[My apologies to those of you who will receive multiple copies of this
announcement from other mailing lists ...]

My dissertation on "Measurements and Analysis of End-to-End Internet Dynamics"
is now available from:

	ftp://ftp.ee.lbl.gov/papers/vp-thesis/dis.ps.gz

1.8 MB gzip compressed, 7 MB uncompressed, about 400 pages total.

Individual chapters are available, too, as described in:

	ftp://ftp.ee.lbl.gov/papers/vp-thesis/README

For tcp-impl, the most interesting is:

	ftp://ftp.ee.lbl.gov/papers/vp-thesis/tcp.ps.gz

which discusses a number of TCP implementation problems.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Jun  3 14:30:49 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA03190 for tcp-impl-list; Tue, 3 Jun 1997 14:28:38 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA02994 for <tcp-impl@engr.SGI.COM>; Tue, 3 Jun 1997 14:27:56 -0700
Received: from mailhub.Stanford.EDU (mailhub.Stanford.EDU [36.21.0.128]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA18295
	for <tcp-impl@engr.SGI.COM>; Tue, 3 Jun 1997 14:27:55 -0700
	env-from (aaa@stanford.edu)
Received: from tree2.Stanford.EDU (tree2.Stanford.EDU [36.83.0.37])
	by mailhub.Stanford.EDU (8.8.5/8.8.5/L) with SMTP id OAA19259;
	Tue, 3 Jun 1997 14:14:23 -0700 (PDT)
Date: Tue, 3 Jun 1997 14:14:07 -0700 (PDT)
From: "Amr A. Awadallah" <aaa@stanford.edu>
Subject: TCP: Brief Comment on cwnd Inflation during Fast Recovery.
Message-ID: <Pine.GSO.3.96.970603125349.17878A-100000@tree2.Stanford.EDU>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Apparently-To: <tcp-impl@engr.SGI.COM>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


All,

   For those of you who expressed interest in our comments on the effects
of cwnd inflation. We added more comments/results on this subject at this
URL ( http://www-leland.stanford.edu/~aaa/tcp ). We included the patch
(diff file) for the current FreeBSD tcp_input.c. We would appreciate it if
interested developers provide feedback to us about this change (in terms
of observed throughput). The change is only a couple of lines of code and
apparently leads to higher throughput TCP sources. 

  We also provide arguments for and against the modification. The main
argument for the modification is that with the current cwnd inflation,
more packets are being sent into the network during the fast recovery
period at the rate at which duplicate-ACKs are coming back. Which is
counter intuitive to the fact that entering fast recovery means that the
TCP source just lost a packet (indicated by the duplicate ACKs), hence the
source should throttle back its sending rate. By continuing to send at the
rate at which duplicate ACKs are coming back, the source may force the
network to drop another one of its packets (e.g. due to RED gateways
[Floyd and Jacobsen, IEEE Transactions on Networking, August 1993], or
simple buffer overflow). This may lead to invoking another fast recovery
cycle, or worse invoking slow-start (this can be clearly seen in the cwnd
vs time plots on the web page). The modification we did provides the
network a breathing period of less than 1 RTT which allows the network to
catch up its breath by dequeuing congested buffers.  This avoids another
packet loss, thus leading to a smoother cwnd vs time behavior. It still
allows for packets to be sent during the fast recovery period but at a
much lower rate. 

  The main argument against the modification is that by using normal
congestion avoidance cwnd-increase during the fast recovery period (rather
than cwnd inflation), the source will not be able to keep the pipe full
(thus violating VJ recommendations).  Hence this leads to a burst of
back-to-back packets at the end of the fast recovery period. We note
though that schemes like FACK [Mathis and Mahdavi SIGCOMM '96] allows for
the regulation of such a burst (by pacing the burst). SACK [Floyd and Fall
CCR paper] also tackles this problem. We also note that this burst of
back-to-back packets is known to exist in current TCP implementations (at
least we observed it rather frequently in FreeBSD 2.1.6, as shown on the
web). The burst simply occurs due to cwnd sliding a considerable distance
when the non-duplicate ACK arrives, hence opening up lots of space for new
packets to be sent. The modification leads to a more aggressive TCP source
since it starts with a larger window size at the end of the fast-recovery
period.  It has also been pointed to us that most TCP researchers think
the principles behind the current fast recovery algorithm works well.

  One last comment, we stumbled on the cwnd inflation spikes during our
research on TCP (which is on a totally different aspect). The spikes
appeared strange to us at first because in most papers on TCP congestion
avoidance (at least those that we read), one would rarely see a cwnd vs
time plot showing the cwnd inflation spikes. This was the main reason why
we were originally misled to believe that this was a bug in TCP, until
others corrected us by pointing out that this is truly how fast recovery
was designed to work (that is the spikes are a feature of TCP ! ).

Thanks for your interest and feedback,

Sincerely,

Amr A. Awadallah (aaa@stanford.edu)
Chetan Rai       (crai@cs.stanford.edu)

-----------------------------------------------------

PS: Sorry if you receive this e-mail more than once, this means you are
subscribed to too many mailing lists :)


From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 11:15:31 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA05479 for tcp-impl-list; Thu, 5 Jun 1997 11:13:37 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA05466 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 11:13:35 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA03856
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 11:13:32 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id LAA27754; Thu, 5 Jun 1997 11:03:08 -0700 (PDT)
Message-Id: <199706051803.LAA27754@daffy.ee.lbl.gov>
To: tcp-impl@relay.engr.SGI.COM
Subject: TIME-WAIT truncation
Date: Thu, 05 Jun 1997 11:03:08 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Someone passed along the following URL via private email:

	http://www.microsoft.com/kb/articles/Q151/4/18.htm

It discusses a TCP implementation problem in which a connection can leave
TIME-WAIT before the full 2 MSL interval has elapsed, because there are
a limited (albeit large) number of TCBs available for TIME-WAIT.  My
impression is that there are other implementations that truncate TIME-WAIT
before a full 2 MSL because they use a definition of MSL smaller than
the standard one.

Is there a volunteer interested in documenting this problem?  (Further
discussion of it is fine too, of course.)

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 11:34:23 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA11394 for tcp-impl-list; Thu, 5 Jun 1997 11:30:19 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA11370 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 11:30:16 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id LAA08532
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 11:30:15 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA16437>; Thu, 5 Jun 1997 11:26:28 -0700
Date: Thu, 5 Jun 1997 11:26:28 -0700
Posted-Date: Thu, 5 Jun 1997 11:26:28 -0700
Message-Id: <199706051826.AA07116@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07116>; Thu, 5 Jun 1997 11:26:28 -0700
To: tcp-impl@relay.engr.SGI.COM, vern@ee.lbl.gov
Subject: Re: TIME-WAIT truncation
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From owner-tcp-impl@relay.engr.SGI.COM Thu Jun  5 11:22:20 1997
> To: tcp-impl@relay.engr.SGI.COM
> Subject: TIME-WAIT truncation
> Date: Thu, 05 Jun 1997 11:03:08 PDT
> From: Vern Paxson <vern@ee.lbl.gov>
> 
> Someone passed along the following URL via private email:
> 
> 	http://www.microsoft.com/kb/articles/Q151/4/18.htm
> 
> It discusses a TCP implementation problem in which a connection can leave
> TIME-WAIT before the full 2 MSL interval has elapsed, because there are
> a limited (albeit large) number of TCBs available for TIME-WAIT.  My
> impression is that there are other implementations that truncate TIME-WAIT
> before a full 2 MSL because they use a definition of MSL smaller than
> the standard one.
> 
> Is there a volunteer interested in documenting this problem?  (Further
> discussion of it is fine too, of course.)

Yup - we have some work here that's related, so this
would be a fine place for me to volunteer...

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 11:37:37 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA13421 for tcp-impl-list; Thu, 5 Jun 1997 11:35:22 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA13403 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 11:35:20 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA09802
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 11:35:15 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id TAA22558; Thu, 5 Jun 1997 19:34:18 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wZhVw-0005FdC; Thu, 5 Jun 97 19:44 BST
Message-Id: <m0wZhVw-0005FdC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TIME-WAIT truncation
To: vern@ee.lbl.gov (Vern Paxson)
Date: Thu, 5 Jun 1997 19:43:59 +0100 (BST)
Cc: tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199706051803.LAA27754@daffy.ee.lbl.gov> from "Vern Paxson" at Jun 5, 97 11:03:08 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 	http://www.microsoft.com/kb/articles/Q151/4/18.htm
> 
> It discusses a TCP implementation problem in which a connection can leave
> TIME-WAIT before the full 2 MSL interval has elapsed, because there are
> a limited (albeit large) number of TCBs available for TIME-WAIT.  My
> impression is that there are other implementations that truncate TIME-WAIT
> before a full 2 MSL because they use a definition of MSL smaller than
> the standard one.

This is just a variant of an existing problem since TIME-WAIT doesnt actually
work right anyway. Ian Heavans documentation on all the other time wait
problems certainly needs to be tacked in with this


From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 12:07:23 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA23339 for tcp-impl-list; Thu, 5 Jun 1997 12:04:04 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA23313 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 12:03:58 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id MAA17229
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 12:03:56 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Thu, 5 Jun 1997 15:00:09 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Thu, 5 Jun 1997 15:00:09 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id PAA16365; Thu, 5 Jun 1997 15:02:20 -0400
Date: Thu, 5 Jun 1997 15:02:20 -0400
Message-Id: <199706051902.PAA16365@MAILSERV-2HIGH-A.FTP.COM>
To: vern@ee.lbl.gov
Subject: Re: TIME-WAIT truncation
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: tcp-impl@relay.engr.SGI.COM
Repository: mailserv-2high-a.ftp.com, [message accepted at Thu Jun  5 15:02:11 1997]
Originating-Client: tunes.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||Someone passed along the following URL via private email:
||
||        http://www.microsoft.com/kb/articles/Q151/4/18.htm
||
||It discusses a TCP implementation problem in which a connection can leave
||TIME-WAIT before the full 2 MSL interval has elapsed, because there are
||a limited (albeit large) number of TCBs available for TIME-WAIT.  My
||impression is that there are other implementations that truncate TIME-WAIT
||before a full 2 MSL because they use a definition of MSL smaller than
||the standard one.
||
||Is there a volunteer interested in documenting this problem?  (Further
||discussion of it is fine too, of course.)


we did it because they did it :-(.

Seriously we were forced into reclaiming the oldest time-wait connection(s)
in a queue rather than continueing to allocate memory till we ate the
machine.

Traffic pattern was of course shaped by web servers who had thousands of
incoming connections which had not been cleanly closed by browsers/http
test programs.

L.


From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 12:35:56 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA04075 for tcp-impl-list; Thu, 5 Jun 1997 12:34:03 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA04069 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 12:34:01 -0700
Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.219]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA25203
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 12:34:01 -0700
	env-from (raj@hpisrdq.cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185]) by palrel3.hp.com with SMTP (8.7.5/8.7.3) id MAA28660 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 12:33:53 -0700 (PDT)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA19113; Thu, 5 Jun 1997 12:32:45 -0700
Message-Id: <3397145C.3C5C@cup.hp.com>
Date: Thu, 05 Jun 1997 12:32:44 -0700
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: backman@ftp.com
Cc: vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
Subject: Re: TIME-WAIT truncation
References: <199706051902.PAA16365@MAILSERV-2HIGH-A.FTP.COM>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> we did it because they did it :-(.
> 
> Seriously we were forced into reclaiming the oldest time-wait connection(s)
> in a queue rather than continueing to allocate memory till we ate the
> machine.

Would it have been more "correct" (indeed much less pallatable) to stop
accepting or establishing new connections until there was a TIME_WAIT
slot available?

> Traffic pattern was of course shaped by web servers who had thousands of
> incoming connections which had not been cleanly closed by browsers/http
> test programs.

Wouldn't that have been FIN_WAIT_2's instead of TIME_WAIT? (Does that
mean there may be another limitation out there?)

rick jones

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 12:53:54 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA12238 for tcp-impl-list; Thu, 5 Jun 1997 12:51:18 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA12220 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 12:51:16 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id MAA29487
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 12:51:15 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Thu, 5 Jun 1997 15:47:25 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Thu, 5 Jun 1997 15:47:25 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id PAA17006; Thu, 5 Jun 1997 15:49:36 -0400
Date: Thu, 5 Jun 1997 15:49:36 -0400
Message-Id: <199706051949.PAA17006@MAILSERV-2HIGH-A.FTP.COM>
To: raj@hpisrdq.cup.hp.com
Subject: Re: TIME-WAIT truncation
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
Repository: mailserv-2high-a.ftp.com, [message accepted at Thu Jun  5 15:49:28 1997]
Originating-Client: tunes.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||> in a queue rather than continueing to allocate memory till we ate the
||> machine.
||
||Would it have been more "correct" (indeed much less pallatable) to stop
||accepting or establishing new connections until there was a TIME_WAIT
||slot available?

yes.  except "they" did it this way and so we had to copy "them" because
XYZ worked that way over them and not over us.

Defacto standards; etc.
||
||> Traffic pattern was of course shaped by web servers who had thousands of
||> incoming connections which had not been cleanly closed by browsers/http
||> test programs.
||
||Wouldn't that have been FIN_WAIT_2's instead of TIME_WAIT? (Does that
||mean there may be another limitation out there?)
||

Its been a year and a quarter, but code seyz:

/* XXX  23-Mar-96
** Go through list of TCP control blocks and free one which has been in
** TIME_WAIT for the longest time
*/
void
tcp_free_timewait()
{
    SM_QUEUE_ELEMENT *p, *pnext;
    register struct tcpcb *tp, *tpoldest;
    short   timer = 0x7fff;


    /*
     * Search through tcb's and find the oldest tcb in TIME_WAIT
     */
    tpoldest = NULL;
    
    for (p = SMQueueGetFirst(TcpCtl.trctl_queue); p != NULL; p = pnext) {

        tp = TRQ2TCB(p);
        pnext = SMQueueGetNext(TcpCtl.trctl_queue, p);

        if(tp->t_timer[TCPT_2MSL] != 0 &&
           tp->t_state == TCPS_TIME_WAIT &&
           tp->t_timer[TCPT_2MSL] < timer) {
            tpoldest = tp;
            timer = tp->t_timer[TCPT_2MSL];
        }
    }
    if(tpoldest) {
        SM_TRACE((TR_KR_TCP,
              "Forced close of TIME_WAIT. Timer #w,Socket:#d",
              timer, tpoldest->t_up));
        tcp_close(tpoldest);
    }
        
}

This is called from:
/*
 * Create a new TCP control block, making an
 * empty reassembly queue and hooking it to the argument
 * protocol control block.
 */
struct tcpcb *
tcp_newtcpcb(struct pcb *p_pcb)
{
    register struct tcpcb *tp;

    tp = (struct tcpcb *)SMGetBuffer(TcpPcbHdl);
    if (tp == NULL) {
        /* MLK 23-Mar-96
        ** Try to free the oldest TCP CB which is in TIME_WAIT
        */
        tcp_free_timewait();
        tp = (struct tcpcb *)SMGetBuffer(TcpPcbHdl);
        if (tp == NULL) {
            return (tp);
        }
    }


Ignoring our OS abstractions and functions you should recognize the
Bsd coide from tcpcb.c

L.



From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 13:07:02 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA17811 for tcp-impl-list; Thu, 5 Jun 1997 13:04:10 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA17787 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 13:04:07 -0700
Received: from doggate.exchange.microsoft.com (doggate.microsoft.com [131.107.2.63]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA02181
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 13:04:06 -0700
	env-from (henrysa@EXCHANGE.MICROSOFT.com)
Received: by DOGGATE with Internet Mail Service (5.0.1577.8)
	id <MJDG8DZ6>; Thu, 5 Jun 1997 13:02:33 -0700
Message-ID: <7D9A01DBBFD5CF11AD0F0000F8411F8A68086F@ROADKILL>
From: "Henry Sanders (Exchange)" <henrysa@EXCHANGE.MICROSOFT.com>
To: vern@ee.lbl.gov, "'backman@ftp.com'" <backman@ftp.com>
Cc: tcp-impl@relay.engr.SGI.COM
Subject: RE: TIME-WAIT truncation
Date: Thu, 5 Jun 1997 13:02:38 -0700
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.0.1577.8)
Content-Type: text/plain
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Seriously we were forced into reclaiming the oldest time-wait
> connection(s)
> in a queue rather than continueing to allocate memory till we ate the
> machine.
> 
Yep, that's exactly why we did it in the first place. It might be less
of an issue now than it was then, with machines having grown larger and
more web clients using keep alive.


From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 13:07:04 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA17595 for tcp-impl-list; Thu, 5 Jun 1997 13:03:26 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA17573 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 13:03:23 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA02086
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 13:03:22 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA21608>; Thu, 5 Jun 1997 12:59:33 -0700
Date: Thu, 5 Jun 1997 12:59:24 -0700
Posted-Date: Thu, 5 Jun 1997 12:59:24 -0700
Message-Id: <199706051959.AA07240@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07240>; Thu, 5 Jun 1997 12:59:24 -0700
To: backman@ftp.com, raj@hpisrdq.cup.hp.com
Subject: Re: TIME-WAIT truncation
Cc: vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From owner-tcp-impl@relay.engr.SGI.COM Thu Jun  5 12:37:22 1997
> Date: Thu, 05 Jun 1997 12:32:44 -0700
> From: Rick Jones <raj@hpisrdq.cup.hp.com>
> Organization: Hewlett-Packard Co.
> To: backman@ftp.com
> Cc: vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
> Subject: Re: TIME-WAIT truncation
> References: <199706051902.PAA16365@MAILSERV-2HIGH-A.FTP.COM>
> 
> > we did it because they did it :-(.
> > 
> > Seriously we were forced into reclaiming the oldest time-wait connection(s)
> > in a queue rather than continueing to allocate memory till we ate the
> > machine.
> 
> Would it have been more "correct" (indeed much less pallatable) to stop
> accepting or establishing new connections until there was a TIME_WAIT
> slot available?

It would have been "correct" do stop accepting new connections.
However, because the MSL on which TIME_WAIT depends
is not specified (each site is free to determine it), it
would be most correct to have changed the MSL, which is
what some recent OS patches for web performance do.

I.e., MSL currently is around 2 minutes; the OS patches
drop that to 15 seconds.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 13:21:53 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA22452 for tcp-impl-list; Thu, 5 Jun 1997 13:18:51 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA22414 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 13:18:47 -0700
Received: from rekk.dna.lth.se (rekk.dna.lth.se [130.235.16.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA07289
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 13:18:41 -0700
	env-from (erics@rekk.dna.lth.se)
Received: from rekk.dna.lth.se (localhost [127.0.0.1])
	by rekk.dna.lth.se (8.8.5/8.8.5) with ESMTP id WAA03952;
	Thu, 5 Jun 1997 22:16:15 +0200
Message-Id: <199706052016.WAA03952@rekk.dna.lth.se>
To: backman@ftp.com
cc: Eric.Schenk@dna.lth.se, vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
From: Eric.Schenk@dna.lth.se
Subject: Re: TIME-WAIT truncation 
In-reply-to: Your message of "Thu, 05 Jun 1997 15:02:20 EDT."
             <199706051902.PAA16365@MAILSERV-2HIGH-A.FTP.COM> 
Date: Thu, 05 Jun 1997 22:16:14 +0200
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Larry Backman <backman@ftp.com> writes:
>Seriously we were forced into reclaiming the oldest time-wait connection(s)
>in a queue rather than continueing to allocate memory till we ate the
>machine.
>
>Traffic pattern was of course shaped by web servers who had thousands of
>incoming connections which had not been cleanly closed by browsers/http
>test programs.

Rather than just tossing out the oldest time-waiter, it would be better
to grab the last sequence number information from that socket and make
sure to offer a sequence number just following that on the new connection.
If I recall correctly BSD boxes already do this if there is an incoming
connection to a specific port that is in TIME_WAIT, and recent Linux
releases (2.0 or older) do this as well. It should be possible to extend
this behavior to looking for a free port. This would at least keep
the sequence number space advancing correctly over successive connections.

As to the concern about eating all of memory, it probably makes sense
to reduce TIME_WAITer's to a smaller control structure. Most of the
stuff in the TCP isn't needed. I think you can get away with a timer,
the endpoint addresses and the last sequence number, but I haven't
thought about this for a while. You'll still potentially need a couple
of megabytes of memory, but on a web server taking thousands of
connections a minute this is probably acceptable.

Of course none of this deals with the various other ways that TCP
connections can get into a non-synchronized state with the potential
for data duplication. My current favorite candidate for this is
dynamic IP addresses assigned by an ISP. One box drops off the net,
to be replaced by another with the same address, and with no knowledge
of the port numbers being used by the previous user of that IP address.
Lots of scenarios for bad things to happen there. Alan Cox has already
mentioned Ian Heavens's draft on the subject.

-- 
Eric Schenk                               www: http://www.dna.lth.se/~erics
Dept. of Comp. Sci., Lund University          email: Eric.Schenk@dna.lth.se
Box 118, S-221 00 LUND, Sweden   fax: +46-46 13 10 21  ph: +46-46 222 96 38

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 13:21:56 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA21696 for tcp-impl-list; Thu, 5 Jun 1997 13:16:42 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA21683 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 13:16:40 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA06576
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 13:16:38 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Thu, 5 Jun 1997 16:12:49 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Thu, 5 Jun 1997 16:12:49 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id QAA17315; Thu, 5 Jun 1997 16:15:00 -0400
Date: Thu, 5 Jun 1997 16:15:00 -0400
Message-Id: <199706052015.QAA17315@MAILSERV-2HIGH-A.FTP.COM>
To: henrysa@EXCHANGE.MICROSOFT.com
Subject: RE: TIME-WAIT truncation
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: vern@ee.lbl.gov, backman@ftp.com, tcp-impl@relay.engr.SGI.COM
Repository: mailserv-2high-a.ftp.com, [message accepted at Thu Jun  5 16:14:50 1997]
Originating-Client: tunes.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||> Seriously we were forced into reclaiming the oldest time-wait
||> connection(s)
||> in a queue rather than continueing to allocate memory till we ate the
||> machine.
||> 
||Yep, that's exactly why we did it in the first place. It might be less
||of an issue now than it was then, with machines having grown larger and
||more web clients using keep alive.

 not sure that keepalive helps the situation :-) but the more memory in a
machine simply delays the problem but doesn't solve it.

At one point in time we happily allocated mbuf after mbuf until the machine
ran out of memory and died.  No one complained about all the TIME-WAIT
sessions then :-) Then we put in a max memory threshold which exposed our
inability to create a new session which showed all the TIMe-WAIT sessions
which caused us to copy you in reclaiming old sessions rather than waiting
the proper 2MSL.

I also seem to recall that the issue was not uncovered with browsers; but
instead with test scripts emulating browsers.  I know that whatever memory
threshold we set for our memory allocation we could most definitely exhaust
the pool with TIME-WAIT connections.  Granted we were using perhaps only
64 Meg of memory (says the man who started when 512K was plenty for DOS..)
but even w/ a maxed out NT server the problem could still be recreated with
an agressive enough test suite.

L.


From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 13:33:33 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA26267 for tcp-impl-list; Thu, 5 Jun 1997 13:29:35 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA26251 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 13:29:32 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA10800
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 13:29:30 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Thu, 5 Jun 1997 16:23:02 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Thu, 5 Jun 1997 16:23:02 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id QAA17438; Thu, 5 Jun 1997 16:25:12 -0400
Date: Thu, 5 Jun 1997 16:25:12 -0400
Message-Id: <199706052025.QAA17438@MAILSERV-2HIGH-A.FTP.COM>
To: touch@ISI.EDU
Subject: Re: TIME-WAIT truncation
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: backman@ftp.com, raj@hpisrdq.cup.hp.com, vern@ee.lbl.gov,
        tcp-impl@relay.engr.SGI.COM
Repository: mailserv-2high-a.ftp.com, [message accepted at Thu Jun  5 16:25:07 1997]
Originating-Client: tunes.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||It would have been "correct" do stop accepting new connections.
||However, because the MSL on which TIME_WAIT depends
||is not specified (each site is free to determine it), it
||would be most correct to have changed the MSL, which is
||what some recent OS patches for web performance do.
||
||I.e., MSL currently is around 2 minutes; the OS patches
||drop that to 15 seconds.
||
whoa!  The way I read 1122 and friends MSL is to be set to 2 minutes
and it would take an act of God and Jon postel to change it.

On rescanning 1122 quickly I see it notes that it was set arbitarily to
2 minutes but I seem to recall some serious math around slow WAN links
leading tyo that 2 minute figure.

Which perhaps brings up a very interesting topic for the list -
default values like MSL, initial RTT and backoff array.

L.


From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 13:45:56 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA01624 for tcp-impl-list; Thu, 5 Jun 1997 13:43:22 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA01611 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 13:43:20 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA14099
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 13:43:18 -0700
	env-from (braden@ISI.EDU)
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA23562>; Thu, 5 Jun 1997 13:39:18 -0700
Date: Thu, 5 Jun 97 13:40:49 PDT
From: braden@ISI.EDU
Posted-Date: Thu, 5 Jun 97 13:40:49 PDT
Message-Id: <9706052040.AA04058@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA04058>; Thu, 5 Jun 97 13:40:49 PDT
To: henrysa@exchange.microsoft.com, backman@ftp.com
Subject: RE: TIME-WAIT truncation
Cc: vern@ee.lbl.gov, tcp-impl@relay.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


I can't resist commenting that T/TCP (Transaction TCP) would have
avoided this problem.

I think it is generally agreed that the original Web browsers were
abusing TCP, trying to use it as a transaction transport protocol by
creating many short connections.  My Webby friends tell me that this
has now been fixed, and proper Web browsers using the latest HTTP
should use many fewer TCP connections.  There is an incentive to upgrade,
since the resulting user performance is signfificantly better.  So
perhaps it is time to fix broken TCPs that short-cut TIME-WAIT state.

Bob Braden

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 13:47:46 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA02331 for tcp-impl-list; Thu, 5 Jun 1997 13:45:34 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA02320 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 13:45:26 -0700
Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.219]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA14562
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 13:45:25 -0700
	env-from (raj@hpisrdq.cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185]) by palrel3.hp.com with SMTP (8.7.5/8.7.3) id NAA06892 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 13:44:58 -0700 (PDT)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA19441; Thu, 5 Jun 1997 13:43:51 -0700
Message-Id: <33972506.5DCC@cup.hp.com>
Date: Thu, 05 Jun 1997 13:43:50 -0700
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: TIME-WAIT truncation
References: <199706052015.QAA17315@MAILSERV-2HIGH-A.FTP.COM>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>  not sure that keepalive helps the situation :-) but the more memory in a

Depending on the type of keepalive - TCP keepalives can help the
"FIN_WAIT_2" scenario. Http keepalives gives you a lower TCP connection
rate for a given URL retrieval rate, which depending on your URL
retrieval rate and all might keep your TCP connection rate from
exceeding some arbitrary limit on the number of TIME_WAIT connections.

> machine simply delays the problem but doesn't solve it.

Indeed, it does not solve the problem, only help it out a little.

> At one point in time we happily allocated mbuf after mbuf until the machine
> ran out of memory and died.  No one complained about all the TIME-WAIT
> sessions then :-) Then we put in a max memory threshold which exposed our
> inability to create a new session which showed all the TIME-WAIT sessions
> which caused us to copy you in reclaiming old sessions rather than waiting
> the proper 2MSL.

That would seem to imply that the system+software combination was not
sized properly for the workload at hand and/or the application was
flawed. 

TCP calls for a 2MSL TIME_WAIT state after connection close (for some
not-asymtotic-to-zero value of MSL :). If the system (hardware+software)
cannot hold enough TIME_WAIT  states to satisfy that connection rate
without undoing part of TCP's correctness algorithms, should it really
be operating (or allowed to operate) at that rate? 

Lots of ways to go there - raise the bridge (more RAM), lower the river
(smaller TIME_WAIT space requirements), or dry-up the river (don't
accept more connection until you have a slot for them, or get the
multitude of client systems to shoulder the TIME_WAIT burden by
rewriting the app).

rick jones

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 13:53:20 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA04456 for tcp-impl-list; Thu, 5 Jun 1997 13:51:46 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA04451 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 13:51:44 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA16097
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 13:51:43 -0700
	env-from (braden@ISI.EDU)
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA24239>; Thu, 5 Jun 1997 13:47:37 -0700
Date: Thu, 5 Jun 97 13:49:08 PDT
From: braden@ISI.EDU
Posted-Date: Thu, 5 Jun 97 13:49:08 PDT
Message-Id: <9706052049.AA04066@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA04066>; Thu, 5 Jun 97 13:49:08 PDT
To: Eric.Schenk@dna.lth.se
Subject: Re: TIME-WAIT truncation
Cc: tcp-impl@relay.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


  *> 
  *> Rather than just tossing out the oldest time-waiter, it would be better
  *> to grab the last sequence number information from that socket and make
  *> sure to offer a sequence number just following that on the new connection.
  *> If I recall correctly BSD boxes already do this if there is an incoming
  *> connection to a specific port that is in TIME_WAIT, and recent Linux
  *> releases (2.0 or older) do this as well.

Eric,

When you start messing with TCP's reliable delivery mechanism, you have
to be very careful.  I am not positive, but I think I recall that the
scheme you mention is in fact formally incorrect and can be
demonstrated in particular circumstances to allow corrupted data.

Bob Braden

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 13:53:20 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA04348 for tcp-impl-list; Thu, 5 Jun 1997 13:51:29 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA04269 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 13:51:09 -0700
Received: from doggate.exchange.microsoft.com (doggate.microsoft.com [131.107.2.63]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA15952
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 13:51:08 -0700
	env-from (henrysa@EXCHANGE.MICROSOFT.com)
Received: by DOGGATE with Internet Mail Service (5.0.1577.8)
	id <MJDG818X>; Thu, 5 Jun 1997 13:49:33 -0700
Message-ID: <7D9A01DBBFD5CF11AD0F0000F8411F8A680872@ROADKILL>
From: "Henry Sanders (Exchange)" <henrysa@EXCHANGE.MICROSOFT.com>
To: "'backman@ftp.com'" <backman@ftp.com>
Cc: vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
Subject: RE: TIME-WAIT truncation
Date: Thu, 5 Jun 1997 13:49:36 -0700
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.0.1577.8)
Content-Type: text/plain
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> the pool with TIME-WAIT connections.  Granted we were using perhaps
> only
> 64 Meg of memory (says the man who started when 512K was plenty for
> DOS..)
> but even w/ a maxed out NT server the problem could still be recreated
> with
> an agressive enough test suite.
> 
Well, we first saw the issue on real life servers with a lot less than
64 Meg (this was a few years ago). With increased memory sizes and more
browsers using keep-alive it may be the case that the real life scenario
isn't such a problem any more. You're right that an aggressive enough
test can still cause this but I'm less worried about artificial test
cases like that.

Henry


From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 14:07:18 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA07789 for tcp-impl-list; Thu, 5 Jun 1997 14:01:52 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA07760 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:01:45 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id OAA19152
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:01:44 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA25175>; Thu, 5 Jun 1997 13:57:53 -0700
Date: Thu, 5 Jun 1997 13:57:45 -0700
Posted-Date: Thu, 5 Jun 1997 13:57:45 -0700
Message-Id: <199706052057.AA07424@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07424>; Thu, 5 Jun 1997 13:57:45 -0700
To: backman@ftp.com, Eric.Schenk@dna.lth.se
Subject: Re: TIME-WAIT truncation
Cc: vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Eric.Schenk@dna.lth.se
> Subject: Re: TIME-WAIT truncation 
> 
> Larry Backman <backman@ftp.com> writes:
> >Seriously we were forced into reclaiming the oldest time-wait connection(s)
> >in a queue rather than continueing to allocate memory till we ate the
> >machine.
> 
> Rather than just tossing out the oldest time-waiter, it would be better
> to grab the last sequence number information from that socket and make
> sure to offer a sequence number just following that on the new connection.
> If I recall correctly BSD boxes already do this if there is an incoming
> connection to a specific port that is in TIME_WAIT, and recent Linux

The client needs to do this, not the server, since the client is
the one opening the connection, and the source port is already chosen
by the time the SYN is sent.

Getting rid of the oldest number isn't quite right, however. It is
correct to keep a range of numbers (assuming the entire contents
of the range are in TIME_WAIT). However, with lots of quick connections,
the number space wraps fast and we're back where we started,
with a problem.

The easier solution is to make the client CLOSE the connection,
rather than the server (e.g., using size info in the HTTP header,
for HTTP). Then each client holds its own TCBs, and there's no
clogging at the server. 

Another solution is to modify TCP (which is what we're working on
here). The idea is to cause the client and server to essentially
swap functions at the CLOSE. Basically:

	client				server

	ESTABLISHED			ESTABLISHED
					(get a CLOSE call)
					send FIN
					goto FIN_WAIT_1
		<------ FIN -----------
	send ACK
		------- ACK ---------->
	goto CLOSE_WAIT			goto FIN_WAIT_2
	** SEND FIN
		------- FIN ---------->
	** SEND RST			goto TIME_WAIT
	** goto TIME_WAIT
		------- RST ---------->
					goto CLOSED

We're still working out the details, though, and this is NOT
something we'd recommend until it's clearer.

Joe

	


Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 14:08:55 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA09524 for tcp-impl-list; Thu, 5 Jun 1997 14:06:50 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA09471 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:06:43 -0700
Received: from gatekeeper2.kaiperm.org (gatekeeper2.kaiperm.org [198.5.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id OAA21083
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:06:41 -0700
	env-from (abarclay@kaiperm.org)
Received: by gatekeeper2.kaiperm.org; (5.65v3.2/1.3/10May95) id AA12382; Thu, 5 Jun 1997 14:02:54 -0700
Received: (from comaxb@localhost)
          by nexus.ncal.kaiperm.org (8.8.4/8.8.4)
	  id OAA28066 for tcp-impl@relay.engr.SGI.COM; Thu, 5 Jun 1997 14:02:52 -0700 (PDT)
From: "Andy W. Barclay" <abarclay@kaiperm.org>
Message-Id: <199706052102.OAA28066@nexus.ncal.kaiperm.org>
Subject: Re: TIME-WAIT truncation
To: tcp-impl@relay.engr.SGI.COM
Date: Thu, 5 Jun 1997 14:02:52 -0700 (PDT)
X-Mailer: ELM [version 2.4 PL25]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hi;

In a previous message, Larry Backman wrote:
> ||> in a queue rather than continueing to allocate memory till we ate the
> ||> machine.
> ||
> ||Would it have been more "correct" (indeed much less pallatable) to stop
> ||accepting or establishing new connections until there was a TIME_WAIT
> ||slot available?
> 
> yes.  except "they" did it this way and so we had to copy "them" because
> XYZ worked that way over them and not over us.

To stop accepting connections until a slot becomes available leaves the
OS open to denial of service attacks, doesn't it? 
I understood that CERT suggested that one "randomly" pick a slot in the
TIME_WAIT state to free up, rather than use the oldest. 
The justification for this was that if someone was launching a Denial of
Service attack on your machine, picking a slot at random was most likely
to pick one that was opened as part of the attack.

If I am incorrect here, please set me straight.

-- 
Regards,
Andy W. Barclay.        abarclay@kaiperm.org
Always a UNIX Evangelist (and sometimes a system and network architect)

Nature sides with the hidden flaw.

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 14:08:57 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA09815 for tcp-impl-list; Thu, 5 Jun 1997 14:07:24 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA09700 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:07:12 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id OAA21230
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:07:09 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA25765>; Thu, 5 Jun 1997 14:03:22 -0700
Date: Thu, 5 Jun 1997 14:03:21 -0700
Posted-Date: Thu, 5 Jun 1997 14:03:21 -0700
Message-Id: <199706052103.AA07483@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07483>; Thu, 5 Jun 1997 14:03:21 -0700
To: touch@ISI.EDU, backman@ftp.com
Subject: Re: TIME-WAIT truncation
Cc: raj@hpisrdq.cup.hp.com, vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: backman@ftp.com (Larry Backman)
> Originating-Client: tunes.ftp.com
> 
> 
> ||It would have been "correct" do stop accepting new connections.
> ||However, because the MSL on which TIME_WAIT depends
> ||is not specified (each site is free to determine it), it
> ||would be most correct to have changed the MSL, which is
> ||what some recent OS patches for web performance do.
> ||
> ||I.e., MSL currently is around 2 minutes; the OS patches
> ||drop that to 15 seconds.
> ||
> whoa!  The way I read 1122 and friends MSL is to be set to 2 minutes
> and it would take an act of God and Jon postel to change it.
> 
> On rescanning 1122 quickly I see it notes that it was set arbitarily to
> 2 minutes but I seem to recall some serious math around slow WAN links
> leading tyo that 2 minute figure.
> 
> Which perhaps brings up a very interesting topic for the list -
> default values like MSL, initial RTT and backoff array.

	Yes!

RSF793 indicates: 

  For this specification the MSL is taken to be 2 minutes.  This
  is an engineering choice, and may be changed if experience indicates
  it is desirable to do so.

Which seems to contraindicate:

	- MSL *MUST* me 2 minutes for TCP according to RFC793

	- MSL may be changed if experience warrants

I could see the OS patches using the latter interpretation.
I would hope this is one of the things we could clear up.

Joe

----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 14:27:04 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA13824 for tcp-impl-list; Thu, 5 Jun 1997 14:23:34 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA13743 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:23:27 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id OAA26406
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:23:26 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA26693>; Thu, 5 Jun 1997 14:19:38 -0700
Date: Thu, 5 Jun 1997 14:19:37 -0700
Posted-Date: Thu, 5 Jun 1997 14:19:37 -0700
Message-Id: <199706052119.AA07516@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07516>; Thu, 5 Jun 1997 14:19:37 -0700
To: tcp-impl@relay.engr.SGI.COM, abarclay@kaiperm.org
Subject: Re: TIME-WAIT truncation
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: "Andy W. Barclay" <abarclay@kaiperm.org>
> Subject: Re: TIME-WAIT truncation
> 
> > ||Would it have been more "correct" (indeed much less pallatable) to stop
> > ||accepting or establishing new connections until there was a TIME_WAIT
> > ||slot available?
> 
> To stop accepting connections until a slot becomes available leaves the
> OS open to denial of service attacks, doesn't it? 
> I understood that CERT suggested that one "randomly" pick a slot in the
> TIME_WAIT state to free up, rather than use the oldest. 

This isn't quite proper. TIME_WAITs should never be freed, until
2*MSL, period. Otherwise, new connections can accept data from
old connections, violating TCP's conservative design goal.

> The justification for this was that if someone was launching a Denial of
> Service attack on your machine, picking a slot at random was most likely
> to pick one that was opened as part of the attack.

The only way to avoid SYN flooding is to deny the connection
attempts, which will happen anyway once the TCB memory space is full.

The better (and TCP friendly) solution is to queue incoming SYN requests,
and pick one randomly when a slot opens. This prevents the attacker
from timing the incoming SYNs to grab TCBs as they get freed.

That would satisfy CERT, too, I presume (someone should
check, though I suspect they'd look to us for verification anyway).

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 14:27:19 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA14341 for tcp-impl-list; Thu, 5 Jun 1997 14:25:17 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA14321 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:25:13 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA26734
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:25:07 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id WAA27423; Thu, 5 Jun 1997 22:24:08 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wZkBI-0005FkC; Thu, 5 Jun 97 22:34 BST
Message-Id: <m0wZkBI-0005FkC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TIME-WAIT truncation
To: touch@ISI.EDU
Date: Thu, 5 Jun 1997 22:34:52 +0100 (BST)
Cc: backman@ftp.com, Eric.Schenk@dna.lth.se, vern@ee.lbl.gov,
        tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199706052057.AA07424@ash.isi.edu> from "touch@ISI.EDU" at Jun 5, 97 01:57:45 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 		------- FIN ---------->
> 	** SEND RST			goto TIME_WAIT
> 	** goto TIME_WAIT
> 		------- RST ---------->
> 					goto CLOSED

This is unclear as it doesnt indicate what changes are timeouts, but it
appears the last transition doesn't agree with RFC1337 and the earlier RST
ones seem designed to cause the bug Ian found to occur more often ?


From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 14:30:31 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA15762 for tcp-impl-list; Thu, 5 Jun 1997 14:29:05 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA15744 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:29:03 -0700
Received: from rekk.dna.lth.se (rekk.dna.lth.se [130.235.16.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA27807
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:29:01 -0700
	env-from (erics@rekk.dna.lth.se)
Received: from rekk.dna.lth.se (localhost [127.0.0.1])
	by rekk.dna.lth.se (8.8.5/8.8.5) with ESMTP id XAA04902;
	Thu, 5 Jun 1997 23:26:10 +0200
Message-Id: <199706052126.XAA04902@rekk.dna.lth.se>
To: braden@ISI.EDU
cc: Eric.Schenk@dna.lth.se, tcp-impl@relay.engr.SGI.COM
From: Eric.Schenk@dna.lth.se
Subject: Re: TIME-WAIT truncation 
In-reply-to: Your message of "Thu, 05 Jun 1997 13:49:08 PDT."
             <9706052049.AA04066@can.isi.edu> 
Date: Thu, 05 Jun 1997 23:26:10 +0200
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


braden@ISI.EDU <braden@ISI.EDU> writes:
>When you start messing with TCP's reliable delivery mechanism, you have
>to be very careful.  I am not positive, but I think I recall that the
>scheme you mention is in fact formally incorrect and can be
>demonstrated in particular circumstances to allow corrupted data.

Being a theoretician by training, I would be very interested to see
any such proof. Anyone got a hint at a reference?

In any case, I don't think the change I suggested should be considered
lightly. I would not be comfortable suggesting it's implementation
before constructing a formal proof that it does not make the
reliable delivery problems of TCP worse than they already are.

That said, I think truncating TIME_WAIT is an even worse idea.

-- 
Eric Schenk                               www: http://www.dna.lth.se/~erics
Dept. of Comp. Sci., Lund University          email: Eric.Schenk@dna.lth.se
Box 118, S-221 00 LUND, Sweden   fax: +46-46 13 10 21  ph: +46-46 222 96 38

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 14:37:11 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA17876 for tcp-impl-list; Thu, 5 Jun 1997 14:35:13 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA17860 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 14:35:10 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id OAA29165
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 14:35:08 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Thu, 5 Jun 1997 17:31:19 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Thu, 5 Jun 1997 17:31:19 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id RAA18624; Thu, 5 Jun 1997 17:33:29 -0400
Date: Thu, 5 Jun 1997 17:33:29 -0400
Message-Id: <199706052133.RAA18624@MAILSERV-2HIGH-A.FTP.COM>
To: braden@isi.edu
Subject: RE: TIME-WAIT truncation
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: henrysa@exchange.microsoft.com, backman@ftp.com, vern@ee.lbl.gov,
        tcp-impl@relay.engr.sgi.com
Repository: mailserv-2high-a.ftp.com, [message accepted at Thu Jun  5 17:33:28 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||I can't resist commenting that T/TCP (Transaction TCP) would have
||avoided this problem.
||

yes it would have.  If anyone ever implemented it.  We looked at it
2-3 years back; thought it was a good idea; than asked the question
"so who would we do TTCP with?".  That finished that.

||I think it is generally agreed that the original Web browsers were
||abusing TCP, trying to use it as a transaction transport protocol by
||creating many short connections.  My Webby friends tell me that this
||has now been fixed, and proper Web browsers using the latest HTTP
||should use many fewer TCP connections.  There is an incentive to upgrade,
||since the resulting user performance is signfificantly better.  So
||perhaps it is time to fix broken TCPs that short-cut TIME-WAIT state.
||
>From what I see on the wire the two leading commercial PC browsers
use HTTP V 1.0, not V 1.1 which would *supposedly* solve the problem.
I say supposedly because until HTTP V 1.1 is commericially deployed and
tested I'm not sure I'm ready to buy into its "solves the worlds problems"
promise.

It looks good in theory, but theory...



From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 14:37:13 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA17916 for tcp-impl-list; Thu, 5 Jun 1997 14:35:18 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA17901 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:35:16 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id OAA29208
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:35:15 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Thu, 5 Jun 1997 17:31:30 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Thu, 5 Jun 1997 17:31:30 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id RAA18634; Thu, 5 Jun 1997 17:33:40 -0400
Date: Thu, 5 Jun 1997 17:33:40 -0400
Message-Id: <199706052133.RAA18634@MAILSERV-2HIGH-A.FTP.COM>
To: raj@hpisrdq.cup.hp.com
Subject: Re: TIME-WAIT truncation
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: tcp-impl@relay.engr.SGI.COM
Repository: mailserv-2high-a.ftp.com, [message accepted at Thu Jun  5 17:33:32 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||
||Lots of ways to go there - raise the bridge (more RAM), lower the river
||(smaller TIME_WAIT space requirements), or dry-up the river (don't
||accept more connection until you have a slot for them, or get the
||multitude of client systems to shoulder the TIME_WAIT burden by
||rewriting the app).

must be nice to live in a world where they don't bring 18 wheelers
down the garden path :-).

I seem to recall that NT was supposed to run on 8 meg in a 386....

L.



From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 14:42:03 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA19558 for tcp-impl-list; Thu, 5 Jun 1997 14:40:42 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA19551 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:40:40 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id OAA00865
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:40:39 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA27381>; Thu, 5 Jun 1997 14:36:52 -0700
Date: Thu, 5 Jun 1997 14:36:43 -0700
Posted-Date: Thu, 5 Jun 1997 14:36:43 -0700
Message-Id: <199706052136.AA07533@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07533>; Thu, 5 Jun 1997 14:36:43 -0700
To: touch@ISI.EDU, alan@lxorguk.ukuu.org.uk
Subject: Re: TIME-WAIT truncation
Cc: backman@ftp.com, Eric.Schenk@dna.lth.se, vern@ee.lbl.gov,
        tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From alan@lxorguk.ukuu.org.uk Thu Jun  5 14:25:45 1997
> From: alan@lxorguk.ukuu.org.uk (Alan Cox)
> Subject: Re: TIME-WAIT truncation
> To: touch@ISI.EDU
> Date: Thu, 5 Jun 1997 22:34:52 +0100 (BST)
> Cc: backman@ftp.com, Eric.Schenk@dna.lth.se, vern@ee.lbl.gov,
>         tcp-impl@relay.engr.SGI.COM
> 
> > 		------- FIN ---------->
> > 	** SEND RST			goto TIME_WAIT
> > 	** goto TIME_WAIT
> > 		------- RST ---------->
> > 					goto CLOSED
> 
> This is unclear as it doesnt indicate what changes are timeouts, but it
> appears the last transition doesn't agree with RFC1337 and the earlier RST
> ones seem designed to cause the bug Ian found to occur more often ?

(I'll take all discussion of this topic off-line, since
this isn't even a draft yet...)

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 14:50:56 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA22055 for tcp-impl-list; Thu, 5 Jun 1997 14:48:16 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA22028 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:48:13 -0700
Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.219]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA02480
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:48:12 -0700
	env-from (raj@hpisrdq.cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185]) by palrel3.hp.com with SMTP (8.7.5/8.7.3) id OAA15745 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:48:05 -0700 (PDT)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA19565; Thu, 5 Jun 1997 14:46:57 -0700
Message-Id: <339733D0.1AB2@cup.hp.com>
Date: Thu, 05 Jun 1997 14:46:56 -0700
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: TIME-WAIT truncation
References: <199706052133.RAA18634@MAILSERV-2HIGH-A.FTP.COM>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> ||Lots of ways to go there - raise the bridge (more RAM), lower the river
> ...
> must be nice to live in a world where they don't bring 18 wheelers
> down the garden path :-).

No, here they only ask to bring them down the bike path :) When they ask
to go down the garden path we simply tell them "No"  and explain the
reason - it is amazing the effect of saying "That could lead to
undetected data corruption" has on folks.

> I seem to recall that NT was supposed to run on 8 meg in a 386....

Perhaps, but was it also supposed to handle 500 connection requests a
second? There is a difference between "running in 8 meg" and having to
support a high connection rate in 8 meg in a TCP correct manner. 

High TCP connection rates are not in the same space as 8MB 386's...

rick jones

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 14:50:56 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA21232 for tcp-impl-list; Thu, 5 Jun 1997 14:46:34 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA21203 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:46:32 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id OAA02137
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:46:31 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA27710>; Thu, 5 Jun 1997 14:42:45 -0700
Date: Thu, 5 Jun 1997 14:42:45 -0700
Posted-Date: Thu, 5 Jun 1997 14:42:45 -0700
Message-Id: <199706052142.AA07549@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07549>; Thu, 5 Jun 1997 14:42:45 -0700
To: braden@ISI.EDU, backman@ftp.com
Subject: RE: TIME-WAIT truncation
Cc: henrysa@exchange.microsoft.com, vern@ee.lbl.gov,
        tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From owner-tcp-impl@relay.engr.SGI.COM Thu Jun  5 14:39:21 1997
> Date: Thu, 5 Jun 1997 17:33:29 -0400
> To: braden@ISI.EDU
> Subject: RE: TIME-WAIT truncation
> From: backman@ftp.com (Larry Backman)
> Cc: henrysa@exchange.microsoft.com, backman@ftp.com, vern@ee.lbl.gov,
>         tcp-impl@relay.engr.SGI.COM
> Repository: mailserv-2high-a.ftp.com, [message accepted at Thu Jun  5 17:33:28 1997]
> Originating-Client: vxd-eth.ftp.com
> 
> 
> ||I can't resist commenting that T/TCP (Transaction TCP) would have
> ||avoided this problem.
> ||
> 
> yes it would have.  If anyone ever implemented it.  We looked at it
> 2-3 years back; thought it was a good idea; than asked the question
> "so who would we do TTCP with?".  That finished that.

See FreeBSD. To quote the old tomato sauce commercial, "it's in there".

Joe

----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 14:53:07 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA23009 for tcp-impl-list; Thu, 5 Jun 1997 14:51:00 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA22994 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 5 Jun 1997 14:50:57 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id PAA06270 for tcp-impl@relay.engr.sgi.com; Thu, 5 Jun 1997 15:50:54 -0600
Date: Thu, 5 Jun 1997 15:50:54 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706052150.PAA06270@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TIME-WAIT truncation
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: touch@ISI.EDU

> ...
> > The justification for this was that if someone was launching a Denial of
> > Service attack on your machine, picking a slot at random was most likely
> > to pick one that was opened as part of the attack.
> 
> The only way to avoid SYN flooding is to deny the connection
> attempts, which will happen anyway once the TCB memory space is full.

SYN attacks do not involve TIME_WAIT.

"Denying the connection attempts" is a poor defense against a denial of
service attack such as SYN bombing, since causing you to deny
connection attemptss is the purpose for the SYM bombing.

Have you estimated how big your listen queue would need to be at either
75 or 180 seconds (common listen queue timeouts) if you were being SYN
bombed at T1 or T3 rates, and were not denying any legitimate connections?


> The better (and TCP friendly) solution is to queue incoming SYN requests,
> and pick one randomly when a slot opens. This prevents the attacker
> from timing the incoming SYNs to grab TCBs as they get freed.

Pick one what when a slot opens, a slot or an incoming SYN request?
Under which circumstances when a slot opens should do that picking?
After you've picked whichever it is, what do you do with it?  If it is
a SYN request, then you must discard it, which is what you'd do without
any picking when your listen queue is full, and is a bad idea since it
is what the attacker wants.  If it is a slot, then you are doing one of
the two forms of the second half of the standard SYN bomb defense.


> That would satisfy CERT, too, I presume (someone should
> check, though I suspect they'd look to us for verification anyway).

The SYN bombing issue is long over and solved, essentially
independently of CERT, and certainly independently of any verifying by
you or ISI.


The solution to SYN bombing is 
    - use as large a listen queue as you can afford, and try to make
       the cost of each slot on the listen queue small and searching a
       big queue fast so you can afford more.  At least 30K slots is
       one recommendation.

    - drop either the oldest or a random, pre-existing entry in your
       listen queue when it is full and a new SYN arrives.


Again, this has little if anything to do with TIME_WAIT.


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 14:53:41 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA23453 for tcp-impl-list; Thu, 5 Jun 1997 14:52:16 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA23442 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:52:14 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id OAA03418
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:52:13 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA28390>; Thu, 5 Jun 1997 14:48:28 -0700
Date: Thu, 5 Jun 1997 14:48:27 -0700
Posted-Date: Thu, 5 Jun 1997 14:48:27 -0700
Message-Id: <199706052148.AA07555@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07555>; Thu, 5 Jun 1997 14:48:27 -0700
To: touch@ISI.EDU, backman@ftp.com
Subject: Re: TIME-WAIT truncation
Cc: raj@hpisrdq.cup.hp.com, vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From backman@mailserv-2high-a.ftp.com Thu Jun  5 13:30:58 1997
> Date: Thu, 5 Jun 1997 16:25:12 -0400
> To: touch@ISI.EDU
> Subject: Re: TIME-WAIT truncation
> From: backman@ftp.com (Larry Backman)
> Cc: backman@ftp.com, raj@hpisrdq.cup.hp.com, vern@ee.lbl.gov,
>         tcp-impl@relay.engr.SGI.COM
> Repository: mailserv-2high-a.ftp.com, [message accepted at Thu Jun  5 16:25:07 1997]
> Originating-Client: tunes.ftp.com
> 
> 
> ||It would have been "correct" do stop accepting new connections.
> ||However, because the MSL on which TIME_WAIT depends
> ||is not specified (each site is free to determine it), it
> ||would be most correct to have changed the MSL, which is
> ||what some recent OS patches for web performance do.
> ||
> ||I.e., MSL currently is around 2 minutes; the OS patches
> ||drop that to 15 seconds.
> ||
> whoa!  The way I read 1122 and friends MSL is to be set to 2 minutes
> and it would take an act of God and Jon postel to change it.

FYI - some locally accessible values:

SunOS 4.1.3	30 seconds

FreeBSD	2.2.1	30 seconds

Linux 2.0.28	1 minute

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 14:56:45 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA25090 for tcp-impl-list; Thu, 5 Jun 1997 14:55:16 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA25085 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:55:14 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id OAA04074
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 14:55:12 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Thu, 5 Jun 1997 17:48:45 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Thu, 5 Jun 1997 17:48:45 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id RAA18849; Thu, 5 Jun 1997 17:50:56 -0400
Date: Thu, 5 Jun 1997 17:50:56 -0400
Message-Id: <199706052150.RAA18849@MAILSERV-2HIGH-A.FTP.COM>
To: touch@ISI.EDU
Subject: 2MSL and other constants
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: touch@ISI.EDU, backman@ftp.com, raj@hpisrdq.cup.hp.com, vern@ee.lbl.gov,
        tcp-impl@relay.engr.SGI.COM
Repository: mailserv-2high-a.ftp.com, [message accepted at Thu Jun  5 17:50:47 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


OK - Vern - I got to confess - after 12 years in TCP-dom, 2 implementations
of my own and bossing/managing over 4 or 5 others I don't understand
the rationales behind the initial constant values like 2MSL in TCP.

I have sat around at least a half dozen conference table/whiteboard
discussions focusing on the tradeoff's of reducing 2MSL to a lower value;
decreasing the fast timeout; etc.

Each time I go into my mail archives; go to the same papers, and reread the
same RFC sections.

I suspect I am not the only person confused.

It would be a good thing; it would be a blessing; it would be an immense
clarification to define *suggested* TCP constants and explain the rationale
behind those constants.

L.

||> ||
||> whoa!  The way I read 1122 and friends MSL is to be set to 2 minutes
||> and it would take an act of God and Jon postel to change it.
||> 
||> On rescanning 1122 quickly I see it notes that it was set arbitarily to
||> 2 minutes but I seem to recall some serious math around slow WAN links
||> leading tyo that 2 minute figure.
||> 
||> Which perhaps brings up a very interesting topic for the list -
||> default values like MSL, initial RTT and backoff array.
||
||        Yes!
||
||RSF793 indicates: 
||
||  For this specification the MSL is taken to be 2 minutes.  This
||  is an engineering choice, and may be changed if experience indicates
||  it is desirable to do so.
||
||Which seems to contraindicate:
||
||        - MSL *MUST* me 2 minutes for TCP according to RFC793
||
||        - MSL may be changed if experience warrants
||
||I could see the OS patches using the latter interpretation.
||I would hope this is one of the things we could clear up.



From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 15:17:08 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA00385 for tcp-impl-list; Thu, 5 Jun 1997 15:15:01 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA00370 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 15:14:59 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id PAA11443
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 15:14:58 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA29471>; Thu, 5 Jun 1997 15:11:10 -0700
Date: Thu, 5 Jun 1997 15:11:08 -0700
Posted-Date: Thu, 5 Jun 1997 15:11:08 -0700
Message-Id: <199706052211.AA07597@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07597>; Thu, 5 Jun 1997 15:11:08 -0700
To: tcp-impl@relay.engr.SGI.COM, vjs@mica.denver.sgi.com
Subject: Re: TIME-WAIT truncation
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: vjs@mica.denver.sgi.com (Vernon Schryver)
> 
> > From: touch@ISI.EDU
> 
> > ...
> > > The justification for this was that if someone was launching a Denial of
> > > Service attack on your machine, picking a slot at random was most likely
> > > to pick one that was opened as part of the attack.
> > 
> > The only way to avoid SYN flooding is to deny the connection
> > attempts, which will happen anyway once the TCB memory space is full.
> 
> SYN attacks do not involve TIME_WAIT.

Correct - they push things into the SYN_RCVD state, which, like TIME_WAIT,
eat up TCB space. Different timer, different cause, same space, same
general effect (except that TIME_WAIT is caused by established connections,
requiring more effort by the malicious host).

> "Denying the connection attempts" is a poor defense against a denial of
> service attack such as SYN bombing, since causing you to deny
> connection attemptss is the purpose for the SYM bombing.

There is NO other solution to SYN attacks. SYN attacks serve two purposes:

	- deny other SYN attempts (presumably legitimate)

	- deny service to ongoing (ESTABLISHED) connections

There is no way to avoid denying access to other SYN attempts.
Ever purported solution, e.g., authenticating incoming SYNs
via digests, checking them against incoming lists, etc., have
the additional effect of slowing SYN processing in general, and consuming
CPU resources in general, both of which slow SYN processing for
the legitimate requests.

The best you can do is:

	1- prevent SYN attacks from consuming large numbers of resources
		e.g., partition the TCB space, and limit the percentage
		allocated to SYN_RCVD

	2- prevent SYN attacks from consuming large amounts of CPU resources
		e.g., limiting the frequency of processing SYN requests

	3- prevent SYN attacks from affecting ESTABLISHED connections
		e.g., via solutions #1 and #2 together.

	4- prevent SYN attacks from synchronizing and completely starving
	   legitimate incoming SYNs
		e.g., pick randomly from queued incoming SYNs, not 'end
		of queue'

> Have you estimated how big your listen queue would need to be at either
> 75 or 180 seconds (common listen queue timeouts) if you were being SYN
> bombed at T1 or T3 rates, and were not denying any legitimate connections?

I'm all open to a specific alternative, if anyone has one.
All alteratives I've seen have the above properties.

> > The better (and TCP friendly) solution is to queue incoming SYN requests,
> > and pick one randomly when a slot opens. This prevents the attacker
> > from timing the incoming SYNs to grab TCBs as they get freed.
> 
> Pick one what when a slot opens, a slot or an incoming SYN request?

When a TCB in the SYN_RCVD times out, there is the opportunity
to process one incoming SYN. Pick one SYN randomly from the 
incoming queue of SYNs. If there is only one (the case most
of the time when there is no attack), everything works fine.

> > That would satisfy CERT, too, I presume (someone should
> > check, though I suspect they'd look to us for verification anyway).
> 
> The SYN bombing issue is long over and solved, essentially
> independently of CERT, and certainly independently of any verifying by
> you or ISI.

Geez - *US* meant this group. Not me or ISI, though, if the SYN
bombing is so over and solved, why is the issue of parititioning
the processing and having to live with slower SYN processing 
still misunderstood?

> The solution to SYN bombing is 
>     - use as large a listen queue as you can afford, and try to make
>        the cost of each slot on the listen queue small and searching a
>        big queue fast so you can afford more.  At least 30K slots is
>        one recommendation.
> 
>     - drop either the oldest or a random, pre-existing entry in your
>        listen queue when it is full and a new SYN arrives.

No. The solution is:

	- LIMIT the size of the listen queue to some percentage of 
		the overall mbuf or TCB space

	- I don't know how to drop, but I know that picking randomly 
	  works, if the queue can hold them - I do know that picking
	  the newest one is wrong, since it allows the attacker to
	  synchronize with the timeout of the SYN_RCVD states.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 15:33:42 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA05458 for tcp-impl-list; Thu, 5 Jun 1997 15:30:59 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA05432 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 15:30:54 -0700
Received: from rekk.dna.lth.se (rekk.dna.lth.se [130.235.16.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA14882
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 15:30:53 -0700
	env-from (erics@rekk.dna.lth.se)
Received: from rekk.dna.lth.se (localhost [127.0.0.1])
	by rekk.dna.lth.se (8.8.5/8.8.5) with ESMTP id AAA05808;
	Fri, 6 Jun 1997 00:28:23 +0200
Message-Id: <199706052228.AAA05808@rekk.dna.lth.se>
To: vjs@mica.denver.sgi.com (Vernon Schryver)
cc: Eric.Schenk@dna.lth.se, tcp-impl@relay.engr.SGI.COM
From: Eric.Schenk@dna.lth.se
Subject: SYN bombing defense (was: TIME-WAIT truncation)
In-reply-to: Your message of "Thu, 05 Jun 1997 15:50:54 MDT."
             <199706052150.PAA06270@mica.denver.sgi.com> 
Date: Fri, 06 Jun 1997 00:28:23 +0200
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Straying from the original topic (note subject change):

Vernon Schryver <vjs@mica.denver.sgi.com> writes:
>The SYN bombing issue is long over and solved, essentially
>independently of CERT, and certainly independently of any verifying by
>you or ISI.
>
>The solution to SYN bombing is 
>    - use as large a listen queue as you can afford, and try to make
>       the cost of each slot on the listen queue small and searching a
>       big queue fast so you can afford more.  At least 30K slots is
>       one recommendation.
>
>    - drop either the oldest or a random, pre-existing entry in your
>       listen queue when it is full and a new SYN arrives.

A few remarks about SYN bombing. First, the approach Vernon
outlines above is not the only known solution. There is the
SYN cookie solution designed by Dan Bernstein and myself, and the
RST cookie solution which I designed independently based on
the idea of SYN cookies. SYN cookies have been implemented
for several TCP stacks. To my knowledge RST cookies have only
been implemented for the Linux TCP stack.

For those unfamiliar, both of these defenses rely on sending out a
cryptographic challange cookie in response to a SYN packet after the
listen queue size passes some threshold value. The server side then
simply forgets that the original SYN ever occured. If the client side
side manages to respond to the challange, the server either recontructs
the original SYN from the information carried in the response (SYN cookie),
or adds the client address to a "verified" list and waits for
the client to retry (RST cookies) (verified clients are forgotten
after some reasonable timeout).

Both of these defenses can sustain normal operation against extremely
high attack rates without increasing the backlog queue size. In robustness
testing of the Linux implementation we have flooded a machine as
fast as a 100 base-T ethernet would allow. This was around 20000 connection
attempts a second. The machine under attack was still able to take incoming
connections from across the Atlantic without noticable delay, at least until
we increased the packet rate to the point that the ethernet was taking 50%
packet loss. I don't even want to think about how big a backlog queue you
would need to survive that kind of attack rate with a random drop strategy.

-- 
Eric Schenk                               www: http://www.dna.lth.se/~erics
Dept. of Comp. Sci., Lund University          email: Eric.Schenk@dna.lth.se
Box 118, S-221 00 LUND, Sweden   fax: +46-46 13 10 21  ph: +46-46 222 96 38

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 15:34:50 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA06193 for tcp-impl-list; Thu, 5 Jun 1997 15:32:55 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA06179 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 15:32:53 -0700
Received: from caipfs.rutgers.edu (caipfs.rutgers.edu [128.6.155.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA15404
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 15:32:47 -0700
	env-from (davem@caipfs.rutgers.edu)
Received: from jenolan.caipgeneral (jenolan.rutgers.edu [128.6.111.5])
	by caipfs.rutgers.edu (8.8.5/8.8.5) with SMTP id SAA10836;
	Thu, 5 Jun 1997 18:29:01 -0400 (EDT)
Received: by jenolan.caipgeneral (SMI-8.6/SMI-SVR4)
	id SAA00877; Thu, 5 Jun 1997 18:26:56 -0400
Date: Thu, 5 Jun 1997 18:26:56 -0400
Message-Id: <199706052226.SAA00877@jenolan.caipgeneral>
From: "David S. Miller" <davem@jenolan.rutgers.edu>
To: vjs@mica.denver.sgi.com
CC: tcp-impl@relay.engr.SGI.COM
In-reply-to: <199706052150.PAA06270@mica.denver.sgi.com>
	(vjs@mica.denver.sgi.com)
Subject: Re: TIME-WAIT truncation
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   Date: Thu, 5 Jun 1997 15:50:54 -0600
   From: vjs@mica.denver.sgi.com (Vernon Schryver)

   SYN attacks do not involve TIME_WAIT.

   "Denying the connection attempts" is a poor defense against a
   denial of service attack such as SYN bombing, since causing you to
   deny connection attemptss is the purpose for the SYM bombing.

Bullshit, if you can verify that the connection attempt is in fact
coming from the attacker, denying the connection attempt is the
perfect (and currently) the only known good solution to SYN bombing.

I kindly refer you to Dr. Bernstein's SYN cookies, and Eric Schenk's
RST cookies in Linux to see how this can and is being done.  We have
measured this to be effective up to and until the point where a
100baseT ether was completely saturated with SYN packets, the system
was barely affected, SYN/RST cookies make this possible.

At least the SYN cookies should be standardized and if anything
recommended for an implementation, they turn SYN bomb's into some old
protocol bug we laugh with our friends about and is no longer an issue.

All the other techniques are flawed in one way or another, and are not
nearly as effective as the SYN/RST cookies, some of them require
configuration changes in userland, SYN/RST cookies only require a
new version of the OS to be booted.

---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s   ////
ethernet.  Beat that!                     ////
-----------------------------------------////__________  o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 15:43:30 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA08930 for tcp-impl-list; Thu, 5 Jun 1997 15:41:39 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA08917 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 15:41:36 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id PAA17193
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 15:41:35 -0700
	env-from (braden@ISI.EDU)
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA01111>; Thu, 5 Jun 1997 15:37:29 -0700
Date: Thu, 5 Jun 97 15:39:03 PDT
From: braden@ISI.EDU
Posted-Date: Thu, 5 Jun 97 15:39:03 PDT
Message-Id: <9706052239.AA04188@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA04188>; Thu, 5 Jun 97 15:39:03 PDT
To: braden@ISI.EDU, Eric.Schenk@dna.lth.se
Subject: Re: TIME-WAIT truncation
Cc: tcp-impl@relay.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


  *> From erics@rekk.dna.lth.se Thu Jun  5 14:28:26 1997
  *> To: braden@ISI.EDU
  *> Cc: Eric.Schenk@dna.lth.se, tcp-impl@relay.engr.SGI.COM
  *> From: Eric.Schenk@dna.lth.se
  *> Subject: Re: TIME-WAIT truncation 
  *> In-Reply-To: Your message of "Thu, 05 Jun 1997 13:49:08 PDT."
  *>              <9706052049.AA04066@can.isi.edu> 
  *> Date: Thu, 05 Jun 1997 23:26:10 +0200
  *> Sender: erics@rekk.dna.lth.se
  *> Content-Length: 992
  *> X-Lines: 21
  *> 
  *> 
  *> braden@ISI.EDU <braden@ISI.EDU> writes:
  *> >When you start messing with TCP's reliable delivery mechanism, you have
  *> >to be very careful.  I am not positive, but I think I recall that the
  *> >scheme you mention is in fact formally incorrect and can be
  *> >demonstrated in particular circumstances to allow corrupted data.
  *> 
  *> Being a theoretician by training, I would be very interested to see
  *> any such proof. Anyone got a hint at a reference?
  *> 

Eric,

Perhaps I should not have said "formally".  It's not a proof, but I
buried an informal explanation of this error in the appendix to
RFC-1185.  I *hope* you don't want to argue about it, because thinking
about TCP reliability always makes my head hurt!

Sandy Murphy of TIS wrote a thesis and a published a paper some years
ago containing a formal proof of correctness for a TCP-like protocol.
We could ask her about this case.

Bob Braden


From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 15:46:18 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA09837 for tcp-impl-list; Thu, 5 Jun 1997 15:44:55 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA09822 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 15:44:52 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id PAA17844
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 15:44:50 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA01248>; Thu, 5 Jun 1997 15:41:04 -0700
Date: Thu, 5 Jun 1997 15:41:02 -0700
Posted-Date: Thu, 5 Jun 1997 15:41:02 -0700
Message-Id: <199706052241.AA07640@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07640>; Thu, 5 Jun 1997 15:41:02 -0700
To: vjs@mica.denver.sgi.com, davem@jenolan.rutgers.edu
Subject: SYN cookies
Cc: tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I kindly refer you to Dr. Bernstein's SYN cookies, and Eric Schenk's
> RST cookies in Linux to see how this can and is being done.  We have
> measured this to be effective up to and until the point where a
> 100baseT ether was completely saturated with SYN packets, the system
> was barely affected, SYN/RST cookies make this possible.
> 
> At least the SYN cookies should be standardized and if anything
> recommended for an implementation, they turn SYN bomb's into some old
> protocol bug we laugh with our friends about and is no longer an issue.
> 
> All the other techniques are flawed in one way or another, and are not
> nearly as effective as the SYN/RST cookies, some of them require
> configuration changes in userland, SYN/RST cookies only require a
> new version of the OS to be booted.

Does this thread mean there is interest in including
SYN flooding defense in the RFC?

(I don't - I think the first RFC should deal with known
errors in known specs, and leave 'advanced function' to
later RFCs).

PS - authentication cookies sound great until you run at
rates in the 250 Mbps range - at which point, you may
deny more connection attempts due to CPU overload than
the SYN flooding you're trying to avoid.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 15:49:04 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA10754 for tcp-impl-list; Thu, 5 Jun 1997 15:47:35 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA10739 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 5 Jun 1997 15:47:32 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id QAA06508 for tcp-impl@cthulhu.engr.sgi.com; Thu, 5 Jun 1997 16:47:29 -0600
Date: Thu, 5 Jun 1997 16:47:29 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706052247.QAA06508@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TIME-WAIT truncation
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: touch@ISI.EDU

> ...
> > "Denying the connection attempts" is a poor defense against a denial of
> > service attack such as SYN bombing, since causing you to deny
> > connection attemptss is the purpose for the SYM bombing.
> 
> There is NO other solution to SYN attacks. SYN attacks serve two purposes:
> 
> 	- deny other SYN attempts (presumably legitimate)
> 
> 	- deny service to ongoing (ESTABLISHED) connections
> 
> There is no way to avoid denying access to other SYN attempts.
> Ever purported solution, e.g., authenticating incoming SYNs
> via digests, checking them against incoming lists, etc., have
> the additional effect of slowing SYN processing in general, and consuming
> CPU resources in general, both of which slow SYN processing for
> the legitimate requests.

Yes, those were obviously non-solutions when they were proposed last year
in netnews groups.


> The best you can do is:
> 
> 	1- prevent SYN attacks from consuming large numbers of resources
> 		e.g., partition the TCB space, and limit the percentage
> 		allocated to SYN_RCVD
> 
> 	2- prevent SYN attacks from consuming large amounts of CPU resources
> 		e.g., limiting the frequency of processing SYN requests
> 
> 	3- prevent SYN attacks from affecting ESTABLISHED connections
> 		e.g., via solutions #1 and #2 together.
> 
> 	4- prevent SYN attacks from synchronizing and completely starving
> 	   legitimate incoming SYNs
> 		e.g., pick randomly from queued incoming SYNs, not 'end
> 		of queue'

There is one more thing you can do, and which the well known and widely
implemented and deployed defenses do, and that is continue providing
your normal services despite the SYM bombing.  #4 and drop-oldest with
big listen queues, like RED in a different universe, is quite effective
for that.



> > The SYN bombing issue is long over and solved, essentially
> > independently of CERT, and certainly independently of any verifying by
> > you or ISI.
> 
> Geez - *US* meant this group. 

This mailing list is also long after the fact for solving SYN bombing.

As I said, the SYN bombing problem was solved and new releases or
patches distributed for major implementations last year or at least
last winter.  I've recognized other names of some those directly
involved among contributors to this list.


>                               Not me or ISI, though, if the SYN
> bombing is so over and solved, why is the issue of parititioning
> the processing and having to live with slower SYN processing 
> still misunderstood?

Could it be because except superficially, the problems are quite different?
What problem doesn't involve partitioning and appropriate scaling?


> > The solution to SYN bombing is 
> >     - use as large a listen queue as you can afford, and try to make
> >        the cost of each slot on the listen queue small and searching a
> >        big queue fast so you can afford more.  At least 30K slots is
> >        one recommendation.
> > 
> >     - drop either the oldest or a random, pre-existing entry in your
> >        listen queue when it is full and a new SYN arrives.
> 
> No. The solution is:
> 
> 	- LIMIT the size of the listen queue to some percentage of 
> 		the overall mbuf or TCB space

You mean the system shouldn't crash?  Does such a goal really need
stating here?

In fact, increasing the listen queue from its ridiculously tiny (in
1997 if not 1982) size of 10 (or 15) to 100 or 1000 goes a long way to
dealing with SYN bombs, as well as various similar side effects of
dealing with large streams of HTTP hits.  I think Dave Borman's
solution involves a specialized listen queue of 30,000 slots.  Details
involve the cryptographic strength of the hash table used for the
listen queue (assuming you use a hash table).


> 	- I don't know how to drop, but I know that picking randomly 
> 	  works, if the queue can hold them - I do know that picking
> 	  the newest one is wrong, since it allows the attacker to
> 	  synchronize with the timeout of the SYN_RCVD states.

It would be nice if there were archives of the mailing list in which
this was hashed out last year.


As I recall, the relevant parameters for SYN bombing are
    L = listen queue length in slots
    R = SYN bombing rate in SYNs/sec
    T = RTT to the most distant legitimate client

Interesting and common values of L are 15 (i.e. 10*3/2), 100, 1000, and
30,000.  Interesting values of T less than 300 ms for directly
connected peers, and up to about 2 seconds for PPP/modems.  Interesting
and currently relevant values of R range from 5 (people just tying to
be a pain or real dumbies) to 200 (v.34 modem with VJ header
compression) to 500 (ISDN) to 5000 as a generally agreed design limit.

Drop-oldest-entry in listen queue works fine provided T<L/R when T>L/R.
Then drop-random protects with probability ((L-1)/L)**(T*R)

I like an explicit combination of drop-oldest with random-drop.
Others like a hybrid drop-oldest-within-random-hash-bucket.


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 15:54:57 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA12221 for tcp-impl-list; Thu, 5 Jun 1997 15:53:09 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA12203 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 15:53:07 -0700
Received: from caipfs.rutgers.edu (caipfs.rutgers.edu [128.6.155.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA19779
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 15:53:04 -0700
	env-from (davem@caipfs.rutgers.edu)
Received: from jenolan.caipgeneral (jenolan.rutgers.edu [128.6.111.5])
	by caipfs.rutgers.edu (8.8.5/8.8.5) with SMTP id SAA11262;
	Thu, 5 Jun 1997 18:49:07 -0400 (EDT)
Received: by jenolan.caipgeneral (SMI-8.6/SMI-SVR4)
	id SAA00891; Thu, 5 Jun 1997 18:47:03 -0400
Date: Thu, 5 Jun 1997 18:47:03 -0400
Message-Id: <199706052247.SAA00891@jenolan.caipgeneral>
From: "David S. Miller" <davem@jenolan.rutgers.edu>
To: touch@ISI.EDU
CC: braden@ISI.EDU, backman@ftp.com, henrysa@exchange.microsoft.com,
        vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
In-reply-to: <199706052142.AA07549@ash.isi.edu> (touch@ISI.EDU)
Subject: Re: TIME-WAIT truncation
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   From: touch@ISI.EDU
   Date: Thu, 5 Jun 1997 14:42:45 -0700

   > yes it would have.  If anyone ever implemented it.  We looked at it
   > 2-3 years back; thought it was a good idea; than asked the question
   > "so who would we do TTCP with?".  That finished that.

   See FreeBSD. To quote the old tomato sauce commercial, "it's in there".

Thats nice, that is one implementation, also what about the work done
at MIT where researchers have preliminary results saying T/TCP does in
fact have bugs and can be proven to operate incorrectly (such as
perform a transaction twice, under certain circumstances)?

Also, you are ignoring the issue of, for example, T/TCP's initial
connection sequence can bomb out the TCP stack in printers and other
things such as this (I would like to point out that issues such as
this were enough for the IPNG working group to choose a new ethernet
header protocol number for IPv6).

I just don't want people to hear this "it's in there" for one system,
and thus thinking this should justify the work necessary for an
implementation of T/TCP in someone else's stack.

BTW, T/TCP is still in the RFC's as an experimental protocol
extension, is it not?

---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s   ////
ethernet.  Beat that!                     ////
-----------------------------------------////__________  o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><


From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 15:55:06 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA12446 for tcp-impl-list; Thu, 5 Jun 1997 15:53:51 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA12435 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 15:53:49 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA19916
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 15:53:47 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id XAA30210; Thu, 5 Jun 1997 23:52:03 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wZjkT-0005FiC; Thu, 5 Jun 97 22:07 BST
Message-Id: <m0wZjkT-0005FiC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TIME-WAIT truncation
To: touch@ISI.EDU
Date: Thu, 5 Jun 1997 22:07:09 +0100 (BST)
Cc: backman@ftp.com, raj@hpisrdq.cup.hp.com, vern@ee.lbl.gov,
        tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199706051959.AA07240@ash.isi.edu> from "touch@ISI.EDU" at Jun 5, 97 12:59:24 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> would be most correct to have changed the MSL, which is
> what some recent OS patches for web performance do.
> 
> I.e., MSL currently is around 2 minutes; the OS patches
> drop that to 15 seconds.

This is just as bad without PAWS. There are internet paths longer than
15 seconds worst case. Indeed my radio link home as the dubious pleasure
of being one of them.

The problem isnt a TCP one. There is a TCP "right answer" which is to stop
accepting. There is a customer write answer. The two are not compatible



From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 15:56:34 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA12851 for tcp-impl-list; Thu, 5 Jun 1997 15:55:12 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA12826 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 15:55:10 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA20133
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 15:55:08 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id XAA30216; Thu, 5 Jun 1997 23:53:16 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wZjtP-0005FlC; Thu, 5 Jun 97 22:16 BST
Message-Id: <m0wZjtP-0005FlC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: TIME-WAIT truncation - also another one ICMP verification
To: backman@ftp.com
Date: Thu, 5 Jun 1997 22:16:23 +0100 (BST)
Cc: tcp-impl@relay.engr.sgi.com
In-Reply-To: <199706052025.QAA17438@MAILSERV-2HIGH-A.FTP.COM> from "Larry Backman" at Jun 5, 97 04:25:12 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> whoa!  The way I read 1122 and friends MSL is to be set to 2 minutes
> and it would take an act of God and Jon postel to change it.

I thought they were one and the same.

> On rescanning 1122 quickly I see it notes that it was set arbitarily to
> 2 minutes but I seem to recall some serious math around slow WAN links
> leading tyo that 2 minute figure.

Two minutes seems very high, but the results of being too low can be quite
unpleasant. If you start an ACK fight between two low rtt T3's you will
often cripple the routers on the network and damage the link for others. 
Furthermore no flow control is applying at this point. You get a bit of
whatever fair queuing the router is trying but no VJ to save you.

RFC1337 showed that this case is hard to cause and fixes some problems with
TIME_WAIT but not all, and Ian Heavans found more. The fact these problems
though always present almost never occur perhaps indicates we are safer than it
looks.

I wouldn't like to assume such a conclusion until someone with a decent
footing in queuing theory and probability worked it out. It could be that
the data corruption probability is actually lower than the odds of a bad
but correctly checksummed packet and that the ACK fight one is either very
unlikely or can be cured by some simple ack rate limiting.


And now a completely unrelated topic. What constraints are people putting
on believing RST frames and ICMP returns. Attacks based on these are getting
somewhat more common, especially one real nasty that IPSEC doesnt really
solve either - people spoofing ICMP fragmentation needed MTU = 68 type
responses.

Alan


From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 15:58:52 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA13484 for tcp-impl-list; Thu, 5 Jun 1997 15:57:26 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA13476 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 15:57:24 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA20678
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 15:57:20 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id XAA30223; Thu, 5 Jun 1997 23:54:45 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wZjwv-0005FdC; Thu, 5 Jun 97 22:20 BST
Message-Id: <m0wZjwv-0005FdC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TIME-WAIT truncation
To: braden@ISI.EDU
Date: Thu, 5 Jun 1997 22:20:01 +0100 (BST)
Cc: Eric.Schenk@dna.lth.se, tcp-impl@relay.engr.sgi.com
In-Reply-To: <9706052049.AA04066@can.isi.edu> from "braden@ISI.EDU" at Jun 5, 97 01:49:08 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> When you start messing with TCP's reliable delivery mechanism, you have
> to be very careful.  I am not positive, but I think I recall that the
> scheme you mention is in fact formally incorrect and can be
> demonstrated in particular circumstances to allow corrupted data.

If so why has nobody ever RFC'd this fact and noted that RFC1122 is wrong
to note this optional idea. It would also be a hard one to lose. The way
some BSD apps work pretty much relies on this property.

And as we all know, TIME_WAIT is broken anyway




From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 15:58:53 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA13523 for tcp-impl-list; Thu, 5 Jun 1997 15:57:34 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA13512 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 15:57:32 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id PAA20701
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 15:57:31 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA01746>; Thu, 5 Jun 1997 15:53:45 -0700
Date: Thu, 5 Jun 1997 15:53:44 -0700
Posted-Date: Thu, 5 Jun 1997 15:53:44 -0700
Message-Id: <199706052253.AA07668@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07668>; Thu, 5 Jun 1997 15:53:44 -0700
To: touch@ISI.EDU, davem@jenolan.rutgers.edu
Subject: Re: TIME-WAIT truncation
Cc: braden@ISI.EDU, backman@ftp.com, henrysa@exchange.microsoft.com,
        vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From davem@caipfs.rutgers.edu Thu Jun  5 15:49:13 1997
> Date: Thu, 5 Jun 1997 18:47:03 -0400
> From: "David S. Miller" <davem@jenolan.rutgers.edu>
> To: touch@ISI.EDU
> Cc: braden@ISI.EDU, backman@ftp.com, henrysa@exchange.microsoft.com,
>         vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
> Subject: Re: TIME-WAIT truncation
> 
>    From: touch@ISI.EDU
>    Date: Thu, 5 Jun 1997 14:42:45 -0700
> 
>    > yes it would have.  If anyone ever implemented it.  We looked at it
>    > 2-3 years back; thought it was a good idea; than asked the question
>    > "so who would we do TTCP with?".  That finished that.
> 
>    See FreeBSD. To quote the old tomato sauce commercial, "it's in there".
> 
> Thats nice, that is one implementation, also what about the work done
> at MIT where researchers have preliminary results saying T/TCP does in
> fact have bugs and can be proven to operate incorrectly (such as
> perform a transaction twice, under certain circumstances)?
> 
> Also, you are ignoring the issue of, for example, T/TCP's initial
> connection sequence can bomb out the TCP stack in printers and other
> things such as this (I would like to point out that issues such as
> this were enough for the IPNG working group to choose a new ethernet
> header protocol number for IPv6).
> 
> I just don't want people to hear this "it's in there" for one system,
> and thus thinking this should justify the work necessary for an
> implementation of T/TCP in someone else's stack.
> 
> BTW, T/TCP is still in the RFC's as an experimental protocol
> extension, is it not?

I believe all this is correct. As is the case that
"it's in there" is the correct counterexample to
"if anyone had ever implemented it". 

:-)

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 16:02:36 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA14475 for tcp-impl-list; Thu, 5 Jun 1997 16:01:01 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA14463 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 16:00:59 -0700
Received: from caipfs.rutgers.edu (caipfs.rutgers.edu [128.6.155.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA21426
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 16:00:56 -0700
	env-from (davem@caipfs.rutgers.edu)
Received: from jenolan.caipgeneral (jenolan.rutgers.edu [128.6.111.5])
	by caipfs.rutgers.edu (8.8.5/8.8.5) with SMTP id SAA11357;
	Thu, 5 Jun 1997 18:57:07 -0400 (EDT)
Received: by jenolan.caipgeneral (SMI-8.6/SMI-SVR4)
	id SAA00895; Thu, 5 Jun 1997 18:55:02 -0400
Date: Thu, 5 Jun 1997 18:55:02 -0400
Message-Id: <199706052255.SAA00895@jenolan.caipgeneral>
From: "David S. Miller" <davem@jenolan.rutgers.edu>
To: touch@ISI.EDU
CC: vjs@mica.denver.sgi.com, tcp-impl@relay.engr.SGI.COM
In-reply-to: <199706052241.AA07640@ash.isi.edu> (touch@ISI.EDU)
Subject: Re: SYN cookies
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   From: touch@ISI.EDU
   Date: Thu, 5 Jun 1997 15:41:02 -0700

   PS - authentication cookies sound great until you run at rates in
   the 250 Mbps range - at which point, you may deny more connection
   attempts due to CPU overload than the SYN flooding you're trying to
   avoid.

This is an implementation misfeature then, we have demonstrated that
even with a 100baseT pipe filled with SYN bombs, not only could new
legitimate connections come successfully in over a LAN, the CPU
utilization was very low, the attack could barely be noticed.
Response times were also acceptable for the legitimate web client
connections.  Only when the ether was brought to the point were 50%
collisions were present, did the attack become noticable, at this
point you are dealing with a denial of bandwidth attack, the attacker
needs to be directly connected to your high speed LAN the server is on
to even get to this point, and here you have other problems.

The biggest problem most implementations have, which could contribute
to CPU overload during a SYN bomb, is that a SYN causes an entire new
TCP control block to be allocated, this is not very intelligent and
makes you more prone to SYN bomb's in the first place.  Under Linux we
only allocate micro control blocks (perhaps 16 bytes or so in size) on
the SYN, and only acquire a real TCP once established state is
reached.

In more recent implementations a fast persistant storage allocator
such as SLAB is used to make the allocation and overhead cost of the
micro control blocks all lost in the noise.

So what is the problem with SYN/RST cookie based defense mechanisms
again?

---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s   ////
ethernet.  Beat that!                     ////
-----------------------------------------////__________  o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 16:11:03 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA16544 for tcp-impl-list; Thu, 5 Jun 1997 16:08:39 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA16525 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:08:36 -0700
Received: from jekyll.piermont.com (jekyll.piermont.com [206.1.51.15]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA23330
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:08:34 -0700
	env-from (perry@jekyll.piermont.com)
Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by jekyll.piermont.com (8.8.5/8.6.12) with SMTP id TAA04101; Thu, 5 Jun 1997 19:04:31 -0400 (EDT)
Message-Id: <199706052304.TAA04101@jekyll.piermont.com>
X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
To: braden@isi.edu
cc: tcp-impl@relay.engr.sgi.com
Subject: Re: TIME-WAIT truncation 
In-reply-to: Your message of "Thu, 05 Jun 1997 13:40:49 PDT."
             <9706052040.AA04058@can.isi.edu> 
Reply-To: perry@piermont.com
X-Reposting-Policy: redistribute only with permission
Date: Thu, 05 Jun 1997 19:04:30 -0400
From: "Perry E. Metzger" <perry@piermont.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


braden@ISI.EDU writes:
> I can't resist commenting that T/TCP (Transaction TCP) would have
> avoided this problem.

Of course, T/TCP has its own passel of troubles, including some
security problems...

Perry

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 16:11:08 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA16696 for tcp-impl-list; Thu, 5 Jun 1997 16:09:06 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA16678 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:09:04 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id QAA23414
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:09:03 -0700
	env-from (braden@ISI.EDU)
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA02262>; Thu, 5 Jun 1997 16:05:14 -0700
Date: Thu, 5 Jun 97 16:06:49 PDT
From: braden@ISI.EDU
Posted-Date: Thu, 5 Jun 97 16:06:49 PDT
Message-Id: <9706052306.AA04226@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA04226>; Thu, 5 Jun 97 16:06:49 PDT
To: tcp-impl@relay.engr.sgi.com, vjs@mica.denver.sgi.com
Subject: Re: TIME-WAIT truncation
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk



  *> 
  *> It would be nice if there were archives of the mailing list in which
  *> this was hashed out last year.
  *> 
  *> 

Vernon,

I don't know if this is the discussion you mean, but there was an
intense discussion of SYN flooding on the end2end-interest list
last year.  The archive is:

	ftp://ftp.isi.edu/end2end/end2end-interest-1996.mail

Bob Braden


From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 16:19:51 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA19180 for tcp-impl-list; Thu, 5 Jun 1997 16:16:57 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA19167 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 16:16:55 -0700
Received: from caipfs.rutgers.edu (caipfs.rutgers.edu [128.6.155.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA25583
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 16:16:53 -0700
	env-from (davem@caipfs.rutgers.edu)
Received: from jenolan.caipgeneral (jenolan.rutgers.edu [128.6.111.5])
	by caipfs.rutgers.edu (8.8.5/8.8.5) with SMTP id TAA11840;
	Thu, 5 Jun 1997 19:13:06 -0400 (EDT)
Received: by jenolan.caipgeneral (SMI-8.6/SMI-SVR4)
	id TAA00925; Thu, 5 Jun 1997 19:11:01 -0400
Date: Thu, 5 Jun 1997 19:11:01 -0400
Message-Id: <199706052311.TAA00925@jenolan.caipgeneral>
From: "David S. Miller" <davem@jenolan.rutgers.edu>
To: vjs@mica.denver.sgi.com
CC: tcp-impl@relay.engr.SGI.COM
In-reply-to: <199706052247.QAA06508@mica.denver.sgi.com>
	(vjs@mica.denver.sgi.com)
Subject: Re: TIME-WAIT truncation
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   Date: Thu, 5 Jun 1997 16:47:29 -0600
   From: vjs@mica.denver.sgi.com (Vernon Schryver)

   I like an explicit combination of drop-oldest with random-drop.
   Others like a hybrid drop-oldest-within-random-hash-bucket.

All of the techniques you have listed are insufficient solutions to
the SYN bombing problem, only SYN/RST cookies come close to being a
total solution.

A solution exists which knows precisely which SYN attempts should not
be allowed, it has zero chance of zapping legitimate users, and it
have been demonstrated that it can be implemented at very low cost,
why do you dislike these techniques so much?

---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s   ////
ethernet.  Beat that!                     ////
-----------------------------------------////__________  o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 16:22:15 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA19947 for tcp-impl-list; Thu, 5 Jun 1997 16:19:21 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA19935 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:19:18 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id QAA26036
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:19:16 -0700
	env-from (braden@ISI.EDU)
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA02650>; Thu, 5 Jun 1997 16:15:29 -0700
Date: Thu, 5 Jun 97 16:16:57 PDT
From: braden@ISI.EDU
Posted-Date: Thu, 5 Jun 97 16:16:57 PDT
Message-Id: <9706052316.AA04245@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA04245>; Thu, 5 Jun 97 16:16:57 PDT
To: braden@ISI.EDU, alan@lxorguk.ukuu.org.uk
Subject: Re: TIME-WAIT truncation
Cc: tcp-impl@relay.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


  *> From owner-tcp-impl@relay.engr.SGI.COM Thu Jun  5 16:03:49 1997
  *> From: alan@lxorguk.ukuu.org.uk (Alan Cox)
  *> Subject: Re: TIME-WAIT truncation
  *> To: braden@ISI.EDU
  *> Date: Thu, 5 Jun 1997 22:20:01 +0100 (BST)
  *> Cc: Eric.Schenk@dna.lth.se, tcp-impl@relay.engr.SGI.COM
  *> In-Reply-To: <9706052049.AA04066@can.isi.edu> from "braden@ISI.EDU" at Jun 5, 97 01:49:08 pm
  *> Content-Type  *> :   *> text  *> 
  *> Sender: owner-tcp-impl@relay.engr.SGI.COM
  *> Precedence: bulk
  *> Content-Length: 531
  *> X-Lines: 13
  *> 
  *> > When you start messing with TCP's reliable delivery mechanism, you have
  *> > to be very careful.  I am not positive, but I think I recall that the
  *> > scheme you mention is in fact formally incorrect and can be
  *> > demonstrated in particular circumstances to allow corrupted data.
  *> 
  *> If so why has nobody ever RFC'd this fact and noted that RFC1122 is wrong
  *> to note this optional idea. It would also be a hard one to lose. The way
  *> some BSD apps work pretty much relies on this property.
  *> 

Alan,

Well, I did incorporate it into the Appendix of RFC-1185, as I noted earlier.
It is also briefly discussed by Rich Stevens in his Volume 1.

Bob Braden


From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 16:28:07 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA22230 for tcp-impl-list; Thu, 5 Jun 1997 16:26:42 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA22215 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 5 Jun 1997 16:26:39 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id RAA06719 for tcp-impl@cthulhu.engr.sgi.com; Thu, 5 Jun 1997 17:26:36 -0600
Date: Thu, 5 Jun 1997 17:26:36 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706052326.RAA06719@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TIME-WAIT truncation
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: "David S. Miller" <davem@jenolan.rutgers.edu>

> ...
>    I like an explicit combination of drop-oldest with random-drop.
>    Others like a hybrid drop-oldest-within-random-hash-bucket.
> 
> All of the techniques you have listed are insufficient solutions to
> the SYN bombing problem, only SYN/RST cookies come close to being a
> total solution.
> 
> A solution exists which knows precisely which SYN attempts should not
> be allowed, it has zero chance of zapping legitimate users, and it
> have been demonstrated that it can be implemented at very low cost,
> why do you dislike these techniques so much?


I didn't subscribe to this mailing list to get embroiled in the standard
linux.advocacy flamewars. 

Please direct your labels of "bullshit" and other, similarly assertive
comments elsewhere.


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 16:38:12 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA24385 for tcp-impl-list; Thu, 5 Jun 1997 16:31:25 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA24372 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 16:31:23 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id QAA28838
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 16:31:21 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA03586>; Thu, 5 Jun 1997 16:27:33 -0700
Date: Thu, 5 Jun 1997 16:27:32 -0700
Posted-Date: Thu, 5 Jun 1997 16:27:32 -0700
Message-Id: <199706052327.AA07724@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07724>; Thu, 5 Jun 1997 16:27:32 -0700
To: vjs@mica.denver.sgi.com, davem@jenolan.rutgers.edu
Subject: Re: TIME-WAIT truncation
Cc: tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: "David S. Miller" <davem@jenolan.rutgers.edu>
> To: vjs@mica.denver.sgi.com
> 
>    Date: Thu, 5 Jun 1997 16:47:29 -0600
>    From: vjs@mica.denver.sgi.com (Vernon Schryver)
> 
>    I like an explicit combination of drop-oldest with random-drop.
>    Others like a hybrid drop-oldest-within-random-hash-bucket.
> 
> All of the techniques you have listed are insufficient solutions to
> the SYN bombing problem, only SYN/RST cookies come close to being a
> total solution.
> 
> A solution exists which knows precisely which SYN attempts should not
> be allowed, it has zero chance of zapping legitimate users, and it
> have been demonstrated that it can be implemented at very low cost,
> why do you dislike these techniques so much?

1. doesn't work when incoming requests can be from arbitrary sites
	e.g., web servers

2. can consume other resources at the server (CPU, in particular)
	though you've shown it doesn't consume sufficient for
	100 Mbps links

	in this sense, cookies do exactly the wrong thing -
	they make the SYN attack more effective when processing
	is the bottleneck (or may in fact make processing the
	bottleneck instead)

I'm not down on cookies, just that they still don't prevent
overloading a server with bad cookies and thus killing
existing valid connections.

Only limiting the processing for SYNs does that, which
then requires a drop policy for an overflowing SYN queue
that doesn't promote synchronization attacks either.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 16:39:55 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA26781 for tcp-impl-list; Thu, 5 Jun 1997 16:37:57 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA26722 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 5 Jun 1997 16:37:50 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id RAA06775 for tcp-impl@cthulhu.engr.sgi.com; Thu, 5 Jun 1997 17:37:43 -0600
Date: Thu, 5 Jun 1997 17:37:43 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706052337.RAA06775@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TIME-WAIT truncation
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

ONe reason I don't like the cookie ideas is that they are more
complicated and use up a lot of the sequence number space.

Dealing with a stream of SYNs is only part of the problem.  It's also
nice to spread out your initial sequence numbers, so that if, for
example, you do happen to cheat on the 2*MSL in TIME_WAIT, then any
stale segments will be far away from the new window and so relatively
harmless.  If you use the initial sequence number for a cookie,
it can be hard to ensure that is far away from previous uses (plural)
of the same (addr,port,addr,port) 4-tuple.

For CPU cycles/SYN, it's hard to beat Dave Borman's solution.

Large listen queues and careful drop policies are effective and tiny to
modest changes to existing code.  Given a solution that works, only
religion or compelling technical arguments can make you switch to some
other solution


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 16:41:01 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA27682 for tcp-impl-list; Thu, 5 Jun 1997 16:39:39 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA27667 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:39:37 -0700
Received: from jekyll.piermont.com (jekyll.piermont.com [206.1.51.15]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA00369
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:39:35 -0700
	env-from (perry@jekyll.piermont.com)
Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by jekyll.piermont.com (8.8.5/8.6.12) with SMTP id TAA04967; Thu, 5 Jun 1997 19:35:43 -0400 (EDT)
Message-Id: <199706052335.TAA04967@jekyll.piermont.com>
X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
To: touch@isi.edu
cc: tcp-impl@relay.engr.sgi.com, vjs@mica.denver.sgi.com
Subject: Re: TIME-WAIT truncation 
In-reply-to: Your message of "Thu, 05 Jun 1997 15:11:08 PDT."
             <199706052211.AA07597@ash.isi.edu> 
Reply-To: perry@piermont.com
X-Reposting-Policy: redistribute only with permission
Date: Thu, 05 Jun 1997 19:35:42 -0400
From: "Perry E. Metzger" <perry@piermont.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


touch@ISI.EDU writes:
> > From: vjs@mica.denver.sgi.com (Vernon Schryver)
> > "Denying the connection attempts" is a poor defense against a denial of
> > service attack such as SYN bombing, since causing you to deny
> > connection attemptss is the purpose for the SYM bombing.
> 
> There is NO other solution to SYN attacks.

I think Vern is right in this instance -- and he's especially right
about the measures he listed to ameliorate such attacks. Make your
incoming queue large, make it fast to index, try to compress the TCBs
until you go to a full connection, and drop oldest (or random -- Vern
and I have disagreed on this) if you overflow.

This is also the consensus among a lot of other folks, too, btw.

Perry

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 16:49:10 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA01032 for tcp-impl-list; Thu, 5 Jun 1997 16:44:59 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA01001 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:44:56 -0700
Received: from jekyll.piermont.com (jekyll.piermont.com [206.1.51.15]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA02361
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:44:54 -0700
	env-from (perry@jekyll.piermont.com)
Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by jekyll.piermont.com (8.8.5/8.6.12) with SMTP id TAA04985; Thu, 5 Jun 1997 19:40:52 -0400 (EDT)
Message-Id: <199706052340.TAA04985@jekyll.piermont.com>
X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
To: "David S. Miller" <davem@jenolan.rutgers.edu>
cc: vjs@mica.denver.sgi.com, tcp-impl@relay.engr.sgi.com
Subject: Re: TIME-WAIT truncation 
In-reply-to: Your message of "Thu, 05 Jun 1997 18:26:56 EDT."
             <199706052226.SAA00877@jenolan.caipgeneral> 
Reply-To: perry@piermont.com
X-Reposting-Policy: redistribute only with permission
Date: Thu, 05 Jun 1997 19:40:51 -0400
From: "Perry E. Metzger" <perry@piermont.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


"David S. Miller" writes:
> I kindly refer you to Dr. Bernstein's SYN cookies,

If anything is bullshit, its Bernstein's SYN cookies. They aren't
quite as funny as IDENT, but they are close.

The method Vern mentions is well tested in the field in systems like
BSDI and works VERY well.

Perry

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 16:49:12 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA02505 for tcp-impl-list; Thu, 5 Jun 1997 16:46:52 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA02468 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:46:49 -0700
Received: from jekyll.piermont.com (jekyll.piermont.com [206.1.51.15]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA03558
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:46:47 -0700
	env-from (perry@jekyll.piermont.com)
Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by jekyll.piermont.com (8.8.5/8.6.12) with SMTP id TAA04993; Thu, 5 Jun 1997 19:42:50 -0400 (EDT)
Message-Id: <199706052342.TAA04993@jekyll.piermont.com>
X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
To: touch@isi.edu
cc: vjs@mica.denver.sgi.com, davem@jenolan.rutgers.edu,
        tcp-impl@relay.engr.sgi.com
Subject: Re: SYN cookies 
In-reply-to: Your message of "Thu, 05 Jun 1997 15:41:02 PDT."
             <199706052241.AA07640@ash.isi.edu> 
Reply-To: perry@piermont.com
X-Reposting-Policy: redistribute only with permission
Date: Thu, 05 Jun 1997 19:42:50 -0400
From: "Perry E. Metzger" <perry@piermont.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


touch@ISI.EDU writes:
> Does this thread mean there is interest in including
> SYN flooding defense in the RFC?

I was already writing a section for it.

(I'm embarassingly late at it, but...)

Perry

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 16:53:00 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA05891 for tcp-impl-list; Thu, 5 Jun 1997 16:50:34 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA05869 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:50:31 -0700
Received: from jekyll.piermont.com (jekyll.piermont.com [206.1.51.15]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA05934
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:50:30 -0700
	env-from (perry@jekyll.piermont.com)
Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by jekyll.piermont.com (8.8.5/8.6.12) with SMTP id TAA05009; Thu, 5 Jun 1997 19:46:31 -0400 (EDT)
Message-Id: <199706052346.TAA05009@jekyll.piermont.com>
X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
To: "David S. Miller" <davem@jenolan.rutgers.edu>
cc: touch@isi.edu, vjs@mica.denver.sgi.com, tcp-impl@relay.engr.sgi.com
Subject: Re: SYN cookies 
In-reply-to: Your message of "Thu, 05 Jun 1997 18:55:02 EDT."
             <199706052255.SAA00895@jenolan.caipgeneral> 
Reply-To: perry@piermont.com
X-Reposting-Policy: redistribute only with permission
Date: Thu, 05 Jun 1997 19:46:31 -0400
From: "Perry E. Metzger" <perry@piermont.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


"David S. Miller" writes:
> So what is the problem with SYN/RST cookie based defense mechanisms

Where to begin...

Perry

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 16:54:56 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA07587 for tcp-impl-list; Thu, 5 Jun 1997 16:52:52 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA07491 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:52:45 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id QAA07207
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:52:43 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA04651>; Thu, 5 Jun 1997 16:48:53 -0700
Date: Thu, 5 Jun 1997 16:48:51 -0700
Posted-Date: Thu, 5 Jun 1997 16:48:51 -0700
Message-Id: <199706052348.AA07769@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07769>; Thu, 5 Jun 1997 16:48:51 -0700
To: touch@ISI.EDU, perry@piermont.com
Subject: Re: TIME-WAIT truncation
Cc: tcp-impl@relay.engr.sgi.com, vjs@mica.denver.sgi.com
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From perry@jekyll.piermont.com Thu Jun  5 16:35:49 1997
> X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
> To: touch@isi.edu
> Cc: tcp-impl@relay.engr.sgi.com, vjs@mica.denver.sgi.com
> Subject: Re: TIME-WAIT truncation 
> X-Reposting-Policy: redistribute only with permission
> Date: Thu, 05 Jun 1997 19:35:42 -0400
> From: "Perry E. Metzger" <perry@piermont.com>
> 
> 
> touch@ISI.EDU writes:
> > > From: vjs@mica.denver.sgi.com (Vernon Schryver)
> > > "Denying the connection attempts" is a poor defense against a denial of
> > > service attack such as SYN bombing, since causing you to deny
> > > connection attemptss is the purpose for the SYM bombing.
> > 
> > There is NO other solution to SYN attacks.
> 
> I think Vern is right in this instance -- and he's especially right
> about the measures he listed to ameliorate such attacks. Make your
> incoming queue large, make it fast to index, try to compress the TCBs
> until you go to a full connection, and drop oldest (or random -- Vern
> and I have disagreed on this) if you overflow.
> 
> This is also the consensus among a lot of other folks, too, btw.

To be more clear -

there are plenty of things you can do to push
the cost of attacks out on the horizon -
including buying more memory

however, in the end, the SYNs will impinge
on the link BW or CPU processing overhead

In all cases, you can obviously:

	buy more resources

	limit the usage of the resources you have

Denying connection attempts is the only solution when
you're out of resources. The two solutions proposed are:
	
	- drop randomly

	- drop based on a validity check (cookies)

the former always works, but punishes all incoming connections
equally

the latter punishes all incoming connections by slowing
the processing of SYNs, but (apparently) this slowdown
has been shown acceptable for some systems, and it also rewards
'a-prior known good' connections with successful processing

The question remains:

	- is this something that needs to be discussed in the RFC?
		(I'm asking - my first reaction is 'not yet - there
		are plenty of documented specs that aren't implemented
		correctly, or bugs in the documentations that deal with
		just basic operation, rather than security or performance)

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 16:54:58 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA07671 for tcp-impl-list; Thu, 5 Jun 1997 16:52:58 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA07634 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:52:55 -0700
Received: from jekyll.piermont.com (jekyll.piermont.com [206.1.51.15]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA07298
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 16:52:54 -0700
	env-from (perry@jekyll.piermont.com)
Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by jekyll.piermont.com (8.8.5/8.6.12) with SMTP id TAA05047; Thu, 5 Jun 1997 19:49:03 -0400 (EDT)
Message-Id: <199706052349.TAA05047@jekyll.piermont.com>
X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
To: "David S. Miller" <davem@jenolan.rutgers.edu>
cc: vjs@mica.denver.sgi.com, tcp-impl@relay.engr.sgi.com
Subject: Re: TIME-WAIT truncation 
In-reply-to: Your message of "Thu, 05 Jun 1997 19:11:01 EDT."
             <199706052311.TAA00925@jenolan.caipgeneral> 
Reply-To: perry@piermont.com
X-Reposting-Policy: redistribute only with permission
Date: Thu, 05 Jun 1997 19:49:03 -0400
From: "Perry E. Metzger" <perry@piermont.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


"David S. Miller" writes:
> All of the techniques you have listed are insufficient solutions to
> the SYN bombing problem, only SYN/RST cookies come close to being a
> total solution.

As with many mechanisms developed by Dan Bernstein, that is not a
universally embraced opinion.

Perry

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 16:56:50 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA08864 for tcp-impl-list; Thu, 5 Jun 1997 16:55:07 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA08822 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 5 Jun 1997 16:55:04 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id RAA06891 for tcp-impl@cthulhu.engr.sgi.com; Thu, 5 Jun 1997 17:55:01 -0600
Date: Thu, 5 Jun 1997 17:55:01 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706052355.RAA06891@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TIME-WAIT truncation
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: touch@ISI.EDU

> ...
> however, in the end, the SYNs will impinge
> on the link BW or CPU processing overhead

There is little you can do about bandwidth denial of service attacks,
except ensure that they use enough bit/sec to allow you to trace them
to their sources.

> ...
> Denying connection attempts is the only solution when
> you're out of resources. The two solutions proposed are:
> 	
> 	- drop randomly
> 
> 	- drop based on a validity check (cookies)
> ...

Drop-oldest has been mentioned several times this afternoon, and most
people seem to agree that it is a useful and important tool.


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 16:58:07 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA09455 for tcp-impl-list; Thu, 5 Jun 1997 16:56:17 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA09437 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 16:56:15 -0700
Received: from caipfs.rutgers.edu (caipfs.rutgers.edu [128.6.155.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA08127
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 16:56:14 -0700
	env-from (davem@caipfs.rutgers.edu)
Received: from jenolan.caipgeneral (jenolan.rutgers.edu [128.6.111.5])
	by caipfs.rutgers.edu (8.8.5/8.8.5) with SMTP id TAA12402;
	Thu, 5 Jun 1997 19:52:24 -0400 (EDT)
Received: by jenolan.caipgeneral (SMI-8.6/SMI-SVR4)
	id TAA00988; Thu, 5 Jun 1997 19:50:21 -0400
Date: Thu, 5 Jun 1997 19:50:21 -0400
Message-Id: <199706052350.TAA00988@jenolan.caipgeneral>
From: "David S. Miller" <davem@jenolan.rutgers.edu>
To: touch@ISI.EDU
CC: vjs@mica.denver.sgi.com, tcp-impl@relay.engr.SGI.COM
In-reply-to: <199706052327.AA07724@ash.isi.edu> (touch@ISI.EDU)
Subject: Re: TIME-WAIT truncation
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   From: touch@ISI.EDU
   Date: Thu, 5 Jun 1997 16:27:32 -0700

   1. doesn't work when incoming requests can be from arbitrary sites
	   e.g., web servers

False, I've specifically tested the effects of SYN/RST cookies over
100BaseT on a web server with 8 thousand or so web connections, the
results were the same.  I've even lauched SYN bombs at a machine which
was running one of the various web performance benchmarks using 5 or 6
high speed clients, again over a saturated 100baseT.  The effects were
minimal, when the SYN bomb was not taking place the web server
performance was not measureably descreased due to the SYN/RST
protection code in the kernel (that is, I ran a web benchmark on
two versions of Linux, one had SYN/RST cookies compiled in, one did
not, this was the only difference, and the numbers were the same for
both sets of runs)

This further shows that you can get a close to zero cost
implementation with no effect on CPU utilization or responsiveness for
legitimate connections...

   2. can consume other resources at the server (CPU, in particular)
      though you've shown it doesn't consume sufficient for 100 Mbps
      links

Where is the CPU loss?  Again this can be implemented in such a way
that this does not even happen at all.  I mean seriously, what is the
code path for a well written SYN cookie implementation?  Could be made
to be nothing more than:

retry_search:
	tcb = hashed_demultiplex(saddr, sport, daddr, dport);
	if(!tcb)
		goto drop;
	[ ... ]
	if(tcb->state == TCP_LISTEN) {
		if(!SYN && !RST) {
			/* See if this is a syn cookie validation
			 * response.
			 */
			if(syn_cookie_response_is_valid(tcb, packet)) {
				setup_new_connection(tcb, packet);
				goto retry_search;
			}
			send_reset(tcb, packet);
		}
		if(RST || !SYN || ACK || MULTICAST_OR_BRDCAST(daddr))
			goto drop;
		seq = get_secure_sequence_number(...);
		tcp_conn_request();
	}

tcp_conn_request() uses a hashing scheme to see whether a valid cookie
probe response has been received from the other end in this request
within some aging timeout threshold.

Once you get a valid probe, the cost is one hash lookup for validation
on each SYN and perhaps 2 or 3 compare instructions in the new
connection code path.  Every validation timeout interval, one cookie
probe packet is sent and one received response must be verified. If you
think about it, for web client connection patterns to a server this is
should perform extremely well.

Where is all this CPU overhead you speak of?

	   in this sense, cookies do exactly the wrong thing -
	   they make the SYN attack more effective when processing
	   is the bottleneck (or may in fact make processing the
	   bottleneck instead)

Where is this extra processing which becomes the bottleneck?  We don't
even setup the micro TCB if we know the source is invalid.

   I'm not down on cookies, just that they still don't prevent
   overloading a server with bad cookies and thus killing
   existing valid connections.

It doesn't happen in practice, I've run extensive tests and have
studied real web servers on the net running with our SYN/RST cookie
code dealing with large numbers of concurrent connections, the
problems you describe are not occurring in any of the situations I
have studied.

   Only limiting the processing for SYNs does that, which
   then requires a drop policy for an overflowing SYN queue
   that doesn't promote synchronization attacks either.

SYN cookies have a drop policy, if source never answers the probes,
drop connections sent by him.

In fact in one of my tests I believe I left the incoming connection
queue at something silly like 15 or 16 connections for the web server,
a 5 client web benchmark and a SYN bomb over directly connected
100baseT never overflowed the queue.

---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s   ////
ethernet.  Beat that!                     ////
-----------------------------------------////__________  o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 16:59:36 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA10011 for tcp-impl-list; Thu, 5 Jun 1997 16:57:33 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA10003 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 16:57:31 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA08365
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 16:57:27 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id QAA28685; Thu, 5 Jun 1997 16:47:14 -0700 (PDT)
Message-Id: <199706052347.QAA28685@daffy.ee.lbl.gov>
To: touch@ISI.EDU
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: TIME-WAIT truncation
In-reply-to: Your message of Thu, 05 Jun 1997 14:03:21 PDT.
Date: Thu, 05 Jun 1997 16:47:13 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> RSF793 indicates: 
> 
>   For this specification the MSL is taken to be 2 minutes.  This
>   is an engineering choice, and may be changed if experience indicates
>   it is desirable to do so.
> 
> Which seems to contraindicate:
> 
> 	- MSL *MUST* me 2 minutes for TCP according to RFC793
> 
> 	- MSL may be changed if experience warrants

FWIW, I've always interpreted that wording as meaning "if the collective
Internet engineering experience indicates, then it may be desireable to
change MSL" - namely, just a statement that the standard constant might
be changed in the future; but not a license to change it per individual
experience.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 17:00:26 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA10741 for tcp-impl-list; Thu, 5 Jun 1997 16:59:03 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA10722 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 16:59:00 -0700
Received: from caipfs.rutgers.edu (caipfs.rutgers.edu [128.6.155.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA08636
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 16:58:59 -0700
	env-from (davem@caipfs.rutgers.edu)
Received: from jenolan.caipgeneral (jenolan.rutgers.edu [128.6.111.5])
	by caipfs.rutgers.edu (8.8.5/8.8.5) with SMTP id TAA12415;
	Thu, 5 Jun 1997 19:55:13 -0400 (EDT)
Received: by jenolan.caipgeneral (SMI-8.6/SMI-SVR4)
	id TAA00992; Thu, 5 Jun 1997 19:53:10 -0400
Date: Thu, 5 Jun 1997 19:53:10 -0400
Message-Id: <199706052353.TAA00992@jenolan.caipgeneral>
From: "David S. Miller" <davem@jenolan.rutgers.edu>
To: vjs@mica.denver.sgi.com
CC: tcp-impl@relay.engr.SGI.COM
In-reply-to: <199706052326.RAA06719@mica.denver.sgi.com>
	(vjs@mica.denver.sgi.com)
Subject: Re: TIME-WAIT truncation
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   Date: Thu, 5 Jun 1997 17:26:36 -0600
   From: vjs@mica.denver.sgi.com (Vernon Schryver)

   Please direct your labels of "bullshit" and other, similarly
   assertive comments elsewhere.

I have not mentioned Linux in that mail, I am asking you to address
what you think are the shortcomings of Dr. Bernsteins SYN cookie
protection scheme, compared to the one's which you state you prefer.

Please address the issue and do not ignore it, if there is a clear
deficiency in Dr. Bernstein's schemes everyone would benefit from them
being state here.

---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s   ////
ethernet.  Beat that!                     ////
-----------------------------------------////__________  o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 17:02:24 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA11959 for tcp-impl-list; Thu, 5 Jun 1997 17:01:01 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA11941 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 17:00:59 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id RAA09042
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 17:00:58 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA04972>; Thu, 5 Jun 1997 16:57:12 -0700
Date: Thu, 5 Jun 1997 16:57:10 -0700
Posted-Date: Thu, 5 Jun 1997 16:57:10 -0700
Message-Id: <199706052357.AA07798@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07798>; Thu, 5 Jun 1997 16:57:10 -0700
To: touch@ISI.EDU, vern@ee.lbl.gov
Subject: Re: TIME-WAIT truncation
Cc: tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From vern@ee.lbl.gov Thu Jun  5 16:47:34 1997
> To: touch@ISI.EDU
> Cc: tcp-impl@relay.engr.SGI.COM
> Subject: Re: TIME-WAIT truncation
> Date: Thu, 05 Jun 1997 16:47:13 PDT
> From: Vern Paxson <vern@ee.lbl.gov>
> 
> > RSF793 indicates: 
> > 
> >   For this specification the MSL is taken to be 2 minutes.  This
> >   is an engineering choice, and may be changed if experience indicates
> >   it is desirable to do so.
> > 
> > Which seems to contraindicate:
> > 
> > 	- MSL *MUST* me 2 minutes for TCP according to RFC793
> > 
> > 	- MSL may be changed if experience warrants
> 
> FWIW, I've always interpreted that wording as meaning "if the collective
> Internet engineering experience indicates, then it may be desireable to
> change MSL" - namely, just a statement that the standard constant might
> be changed in the future; but not a license to change it per individual
> experience.
> 
> 		Vern


Me too - that's why I was surprised to see alternate
values in so many places.

Another problem is "who does it hurt"-

picking a small value affects only connections you participate
in (of course), but it can affect both sides of the connection -

if either side reuses the port number too quickly, data from an
old (closed) connection can float in and corrupt a current connection
on either end

This is strong evidence for a global value, but what value?
	
2 minutes seems large
	when was the last time a packet arrived that late?
	with store-and-forward, 20 hops, with 1 Kbps per hop, 
	and a 5000-bit packet, 2 minutes is about right
	but networks aren't that slow on all the hops these days...

30 seconds seems OK
	20 hops, 10 Kbps per hop makes this AOK.

smaller than that seems too small...
	requires much fewer hops worst case, or much larger BW, average
	per hop.

(sure - some analysis would help nail the numbers down, but 
order-of-magnitude seems OK)

Joe 

----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 17:07:28 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA14033 for tcp-impl-list; Thu, 5 Jun 1997 17:06:01 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA14002 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 17:05:56 -0700
Received: from caipfs.rutgers.edu (caipfs.rutgers.edu [128.6.155.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA10316
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 17:05:54 -0700
	env-from (davem@caipfs.rutgers.edu)
Received: from jenolan.caipgeneral (jenolan.rutgers.edu [128.6.111.5])
	by caipfs.rutgers.edu (8.8.5/8.8.5) with SMTP id UAA12599;
	Thu, 5 Jun 1997 20:02:05 -0400 (EDT)
Received: by jenolan.caipgeneral (SMI-8.6/SMI-SVR4)
	id UAA01002; Thu, 5 Jun 1997 20:00:02 -0400
Date: Thu, 5 Jun 1997 20:00:02 -0400
Message-Id: <199706060000.UAA01002@jenolan.caipgeneral>
From: "David S. Miller" <davem@jenolan.rutgers.edu>
To: perry@piermont.com
CC: vjs@mica.denver.sgi.com, tcp-impl@relay.engr.sgi.com
In-reply-to: <199706052340.TAA04985@jekyll.piermont.com> (perry@piermont.com)
Subject: Re: TIME-WAIT truncation
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   Date: Thu, 05 Jun 1997 19:40:51 -0400
   From: "Perry E. Metzger" <perry@piermont.com>

   "David S. Miller" writes:
   > I kindly refer you to Dr. Bernstein's SYN cookies,

   If anything is bullshit, its Bernstein's SYN cookies. They aren't
   quite as funny as IDENT, but they are close.

   The method Vern mentions is well tested in the field in systems like
   BSDI and works VERY well.

The SYN/RST cookies have been tested in the field as well.  This does
not make Vernon's nor the SYN cookies solution any better or worse
than one another at face value, it just says that both can be made to
work.

Now instead of saying it does or does not work, why not list
explicitly the problems Bersteins's SYN cookies have and under what
conditions it can cause problems, and how the other solutions handle
this problem, because there are people here who are very much
convinced that SYN/RST cookies are in fact one of the best solutions
to the problem.  Vernon has given a good explanation in another mail I
believe.

You continually go "where to start listing the problems..." and other
vague responses without content like this, this does not further the
discussion.

---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s   ////
ethernet.  Beat that!                     ////
-----------------------------------------////__________  o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 17:09:08 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA14630 for tcp-impl-list; Thu, 5 Jun 1997 17:07:38 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA14625 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 17:07:36 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id RAA10765
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 17:07:35 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA05388>; Thu, 5 Jun 1997 17:03:50 -0700
Date: Thu, 5 Jun 1997 17:03:48 -0700
Posted-Date: Thu, 5 Jun 1997 17:03:48 -0700
Message-Id: <199706060003.AA07811@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07811>; Thu, 5 Jun 1997 17:03:48 -0700
To: touch@ISI.EDU, davem@jenolan.rutgers.edu
Subject: Re: TIME-WAIT truncation
Cc: vjs@mica.denver.sgi.com, tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>    1. doesn't work when incoming requests can be from arbitrary sites
> 	   e.g., web servers
> 
> False, I've specifically tested the effects of SYN/RST cookies over
> 100BaseT on a web server with 8 thousand or so web connections, the
> results were the same.  I've even lauched SYN bombs at a machine which

Did you try probes over a firewall?

Did you try them through a NAT designed for the web traffic,
but nothing else?


>    2. can consume other resources at the server (CPU, in particular)
>       though you've shown it doesn't consume sufficient for 100 Mbps
>       links
> 
> Where is the CPU loss?  Again this can be implemented in such a way
> that this does not even happen at all.  I mean seriously, what is the
> code path for a well written SYN cookie implementation?  Could be made
> to be nothing more than:

You run code that doesn't consume CPU?

> Once you get a valid probe, the cost is one hash lookup for validation
> on each SYN and perhaps 2 or 3 compare instructions in the new
> connection code path.  Every validation timeout interval, one cookie
> probe packet is sent and one received response must be verified. If you
> think about it, for web client connection patterns to a server this is
> should perform extremely well.
> 
> Where is all this CPU overhead you speak of?

Right there...

> Where is this extra processing which becomes the bottleneck?  We don't
> even setup the micro TCB if we know the source is invalid.

The hash, the probe, etc.

> It doesn't happen in practice, I've run extensive tests and have
> studied real web servers on the net running with our SYN/RST cookie
> code dealing with large numbers of concurrent connections, the
> problems you describe are not occurring in any of the situations I
> have studied.

I have a server providing info off a RAM disk over a gigabit network.
It does happen in practice - just not yet.

> SYN cookies have a drop policy, if source never answers the probes,
> drop connections sent by him.

Why are the probes not just like SYN attacks? What if I don't want
to answer probes, for security reasons? Why should TCP fail in those
cases?

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 17:11:47 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA15332 for tcp-impl-list; Thu, 5 Jun 1997 17:10:01 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA15325 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 17:09:59 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id RAA11231
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 17:09:57 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA05525>; Thu, 5 Jun 1997 17:06:09 -0700
Date: Thu, 5 Jun 1997 17:06:08 -0700
Posted-Date: Thu, 5 Jun 1997 17:06:08 -0700
Message-Id: <199706060006.AA07817@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA07817>; Thu, 5 Jun 1997 17:06:08 -0700
To: tcp-impl@relay.engr.SGI.COM, vjs@mica.denver.sgi.com
Subject: Re: TIME-WAIT truncation
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: vjs@mica.denver.sgi.com (Vernon Schryver)
> To: tcp-impl@relay.engr.SGI.COM
> 
> > From: touch@ISI.EDU
> 
> > ...
> > however, in the end, the SYNs will impinge
> > on the link BW or CPU processing overhead
> 
> There is little you can do about bandwidth denial of service attacks,
> except ensure that they use enough bit/sec to allow you to trace them
> to their sources.

Or knock them out upstream (with an active filter), e.g., 
at the high-BW entry point to a LAN.

> Drop-oldest has been mentioned several times this afternoon, and most
> people seem to agree that it is a useful and important tool.

Drop oldest doesn't deal with the sync problem - if I know
approximately when you're TCBs will free-up, I pulse the floods
to coincide. That's why I prefer some other drop - random or somesuch.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 17:11:50 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA15514 for tcp-impl-list; Thu, 5 Jun 1997 17:10:32 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA15500 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 17:10:30 -0700
Received: from rekk.dna.lth.se (rekk.dna.lth.se [130.235.16.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA11334
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 17:10:28 -0700
	env-from (erics@rekk.dna.lth.se)
Received: from rekk.dna.lth.se (localhost [127.0.0.1])
	by rekk.dna.lth.se (8.8.5/8.8.5) with ESMTP id CAA07350;
	Fri, 6 Jun 1997 02:07:49 +0200
Message-Id: <199706060007.CAA07350@rekk.dna.lth.se>
To: touch@ISI.EDU
cc: Eric.Schenk@dna.lth.se, vjs@mica.denver.sgi.com, davem@jenolan.rutgers.edu,
        tcp-impl@relay.engr.SGI.COM
From: Eric.Schenk@dna.lth.se
Subject: Re: TIME-WAIT truncation 
In-reply-to: Your message of "Thu, 05 Jun 1997 16:27:32 PDT."
             <199706052327.AA07724@ash.isi.edu> 
Date: Fri, 06 Jun 1997 02:07:48 +0200
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


touch@ISI.EDU <touch@ISI.EDU> writes:
>> A solution exists which knows precisely which SYN attempts should not
>> be allowed, it has zero chance of zapping legitimate users, and it
>> have been demonstrated that it can be implemented at very low cost,
>> why do you dislike these techniques so much?
>
>1. doesn't work when incoming requests can be from arbitrary sites
>	e.g., web servers

Huh? A cookie defense doesn't care where the incoming requests come from.

>2. can consume other resources at the server (CPU, in particular)
>	though you've shown it doesn't consume sufficient for
>	100 Mbps links
>
>	in this sense, cookies do exactly the wrong thing -
>	they make the SYN attack more effective when processing
>	is the bottleneck (or may in fact make processing the
>	bottleneck instead)

There is certainly a tradeoff here. The question is if it is
a tradeoff we need to worry about in practice. First, you should
never invoke a defense against SYN flooding unless you really
are under attack. This means that under normal circumstances there
is no penalty (for either drop methods or cookie methods).

The cost of creating SYN cookies (measured) isn't much more than
the cost of driving your networking connections at full blast.
If you don't have the CPU power for that, then you are going to
have other troubles.

Also, if someone has the capacity to attack your machine with
enough SYN packets to saturate your LAN, you're dead anyway,
the choice of defense is meaningless in this situation.

-- 
Eric Schenk                               www: http://www.dna.lth.se/~erics
Dept. of Comp. Sci., Lund University          email: Eric.Schenk@dna.lth.se
Box 118, S-221 00 LUND, Sweden   fax: +46-46 13 10 21  ph: +46-46 222 96 38

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 17:23:20 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA18373 for tcp-impl-list; Thu, 5 Jun 1997 17:19:56 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA18359 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 17:19:53 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA13166
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 17:19:52 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.5/1.43r)
	id RAA28794; Thu, 5 Jun 1997 17:09:56 -0700 (PDT)
Message-Id: <199706060009.RAA28794@daffy.ee.lbl.gov>
To: touch@ISI.EDU
Cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: TIME-WAIT truncation
In-reply-to: Your message of Thu, 05 Jun 1997 16:57:10 PDT.
Date: Thu, 05 Jun 1997 17:09:55 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> This is strong evidence for a global value, but what value?

This is hard!  I've observed ping times on my local Ethernet of 60 sec -
totally bizarre, and doubtless due to the remote end sitting on the ICMP's
(a bunch of them, actually), who knows why.

There's a basic problem which is that now that TTL is de facto only a hop
count (per RFC 1812), there's really no enforcement of segment lifetime in
the network, so nothing fundamental to build on.

> 30 seconds seems OK
> 	20 hops, 10 Kbps per hop makes this AOK.

Alan mentioned personal experience on paths for which this is too low ...

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 17:23:24 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA19005 for tcp-impl-list; Thu, 5 Jun 1997 17:21:18 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA18962 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 5 Jun 1997 17:21:14 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id SAA07076 for tcp-impl@cthulhu.engr.sgi.com; Thu, 5 Jun 1997 18:21:09 -0600
Date: Thu, 5 Jun 1997 18:21:09 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706060021.SAA07076@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TIME-WAIT truncation
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: touch@ISI.EDU

> ...
> > There is little you can do about bandwidth denial of service attacks,
> > except ensure that they use enough bit/sec to allow you to trace them
> > to their sources.
> 
> Or knock them out upstream (with an active filter), e.g., 
> at the high-BW entry point to a LAN.

True.

> > Drop-oldest has been mentioned several times this afternoon, and most
> > people seem to agree that it is a useful and important tool.
> 
> Drop oldest doesn't deal with the sync problem - if I know
> approximately when you're TCBs will free-up, I pulse the floods
> to coincide. That's why I prefer some other drop - random or somesuch.

There is no "sync problem" that I can see.  SYN bombing defenses must
be and are based on worst case assumptions.  We all assume that the bad
guy will send as many SYNs/sec as his connections allow all of the
time.  If he could do any pulsing, it is assumed that he would make his
pulse have the worst case duty cycle of 100%.  Whenever the bad guy
pauses in his flood of SYNs, he is doing you a favor by relaxing the
pressure.

I think that everyone agrees that drop-oldest does poorly when the SYN
rate per RTT to the farthest good guy excedes your listen queue
length.  That is when random-drop works well.  As long as the SYN rate
is much less, then drop-oldest works very well.

When drop-oldest works (i.e. SYN rate < L/RTT), it is cheapest of all,
provided your system is already able to handle listen queue lengths
necessary to excede 10,000,000 hits/day.  (A simple, single linked list
of 1000 or more entries can be a little slow for the pruning done by
the timers.)


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 17:26:13 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA21096 for tcp-impl-list; Thu, 5 Jun 1997 17:24:46 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA21070 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 17:24:43 -0700
Received: from rekk.dna.lth.se (rekk.dna.lth.se [130.235.16.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA14099
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 17:24:42 -0700
	env-from (erics@rekk.dna.lth.se)
Received: from rekk.dna.lth.se (localhost [127.0.0.1])
	by rekk.dna.lth.se (8.8.5/8.8.5) with ESMTP id CAA07517;
	Fri, 6 Jun 1997 02:20:19 +0200
Message-Id: <199706060020.CAA07517@rekk.dna.lth.se>
To: vjs@mica.denver.sgi.com (Vernon Schryver)
cc: Eric.Schenk@dna.lth.se, tcp-impl@relay.engr.SGI.COM
From: Eric.Schenk@dna.lth.se
Subject: Re: TIME-WAIT truncation 
In-reply-to: Your message of "Thu, 05 Jun 1997 17:37:43 MDT."
             <199706052337.RAA06775@mica.denver.sgi.com> 
Date: Fri, 06 Jun 1997 02:20:18 +0200
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Vernon Schryver <vjs@mica.denver.sgi.com> writes:
>ONe reason I don't like the cookie ideas is that they are more
>complicated and use up a lot of the sequence number space.

Complicated yes. However, the idea can be integrated with the
generation of secure sequence numbers in such a way that
they do not use up much extra sequence number space.
Currently my implementations wrap the space about twice
as fast as a purely clock based implementation.

>Dealing with a stream of SYNs is only part of the problem.  It's also
>nice to spread out your initial sequence numbers, so that if, for
>example, you do happen to cheat on the 2*MSL in TIME_WAIT, then any
>stale segments will be far away from the new window and so relatively
>harmless.  If you use the initial sequence number for a cookie,
>it can be hard to ensure that is far away from previous uses (plural)
>of the same (addr,port,addr,port) 4-tuple.

Acutally, there are a few tricks involved, but if you do it right
you can guarantee that the sequence numbers generated by a cookie
system are monotonically increasing for any given (addr,port,addr,port)
4-tuple, without memorizing any information, and that the cookie
numbers advance at roughly the usual clocked TCP rate. All without
giving up more than a couple of bits of the security.
This was all hashed out by Dan Berstein and myself on the syncookies
mailing list last year.

>For CPU cycles/SYN, it's hard to beat Dave Borman's solution.
>
>Large listen queues and careful drop policies are effective and tiny to
>modest changes to existing code.  Given a solution that works, only
>religion or compelling technical arguments can make you switch to some
>other solution

Initially I was fairly strongly in favor of this approach. I have
come to dislike it for two reasons.

1) It requires changes to user space programs to enlarge the backlog
   queues. This means that lots of systems run by the relatively less
   informed will remain open to attack.

2) Unless you have insanely large backlog queues you end up cutting
   off people who have long delay paths to your server, or when the
   network is experiencing persistent loss. (Think packet radio,
   transatlantic cables, etc...) This is (barely) tollerable if
   you assume that the attacker will be attacking you over a modem
   or an ISDN link, unfortunately this has proved to be rather optimistic.
   I am aware of actual attacks that the ISP failed to track that were on
   the order of 2000 packets a second. In one case this went on for days.
   In this case a random drop or tail drop approach would not have helped
   at all.

-- 
Eric Schenk                               www: http://www.dna.lth.se/~erics
Dept. of Comp. Sci., Lund University          email: Eric.Schenk@dna.lth.se
Box 118, S-221 00 LUND, Sweden   fax: +46-46 13 10 21  ph: +46-46 222 96 38

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 17:31:18 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA22955 for tcp-impl-list; Thu, 5 Jun 1997 17:29:15 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA22943 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 17:29:13 -0700
Received: from caipfs.rutgers.edu (caipfs.rutgers.edu [128.6.155.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA15185
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 17:29:10 -0700
	env-from (davem@caipfs.rutgers.edu)
Received: from jenolan.caipgeneral (jenolan.rutgers.edu [128.6.111.5])
	by caipfs.rutgers.edu (8.8.5/8.8.5) with SMTP id UAA13277;
	Thu, 5 Jun 1997 20:25:20 -0400 (EDT)
Received: by jenolan.caipgeneral (SMI-8.6/SMI-SVR4)
	id UAA01025; Thu, 5 Jun 1997 20:23:16 -0400
Date: Thu, 5 Jun 1997 20:23:16 -0400
Message-Id: <199706060023.UAA01025@jenolan.caipgeneral>
From: "David S. Miller" <davem@jenolan.rutgers.edu>
To: touch@ISI.EDU
CC: touch@ISI.EDU, vjs@mica.denver.sgi.com, tcp-impl@relay.engr.SGI.COM
In-reply-to: <199706060003.AA07811@ash.isi.edu> (touch@ISI.EDU)
Subject: Re: TIME-WAIT truncation
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   From: touch@ISI.EDU
   Date: Thu, 5 Jun 1997 17:03:48 -0700

   > False, I've specifically tested the effects of SYN/RST cookies over
   > 100BaseT on a web server with 8 thousand or so web connections, the
   > results were the same.  I've even lauched SYN bombs at a machine which

   Did you try probes over a firewall?

   Did you try them through a NAT designed for the web traffic,
   but nothing else?

No I did not, this is an interesting twist I should investigate.

   I have a server providing info off a RAM disk over a gigabit network.
   It does happen in practice - just not yet.

Is the problem sequence number wrap around or something along these
lines, or is it a pure CPU overhead problem?  Also, if it is a CPU
overhead problem, is your CPU in your implementation touching the data
in the packets before a validation could be performed?

   Why are the probes not just like SYN attacks? What if I don't want
   to answer probes, for security reasons? Why should TCP fail in those
   cases?

Ok, this is an important case.  Although it could be argued that your
end is not running a "properly functioning TCP" because one which was
would answer the probes.  What other aspects of TCP fail to work in
the presence of such a firewall and is this desirable?

---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s   ////
ethernet.  Beat that!                     ////
-----------------------------------------////__________  o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 18:12:52 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA02059 for tcp-impl-list; Thu, 5 Jun 1997 18:10:53 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA02053 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 18:10:52 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id SAA28543
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 18:10:50 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from ash.isi.edu (ash-a.isi.edu) by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA07810>; Thu, 5 Jun 1997 18:07:04 -0700
Date: Thu, 5 Jun 1997 18:07:03 -0700
Posted-Date: Thu, 5 Jun 1997 18:07:03 -0700
Message-Id: <199706060107.AA08006@ash.isi.edu>
Received: by ash.isi.edu (5.65c/4.0.3-6)
	id <AA08006>; Thu, 5 Jun 1997 18:07:03 -0700
To: touch@ISI.EDU, davem@jenolan.rutgers.edu
Subject: Re: TIME-WAIT truncation
Cc: vjs@mica.denver.sgi.com, tcp-impl@relay.engr.SGI.COM
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>    I have a server providing info off a RAM disk over a gigabit network.
>    It does happen in practice - just not yet.
> 
> Is the problem sequence number wrap around or something along these
> lines, or is it a pure CPU overhead problem?  Also, if it is a CPU
> overhead problem, is your CPU in your implementation touching the data
> in the packets before a validation could be performed?

CPU overhead (perhaps also DMA). CPU isn't touching the data, but DMA is.

>    Why are the probes not just like SYN attacks? What if I don't want
>    to answer probes, for security reasons? Why should TCP fail in those
>    cases?
> 
> Ok, this is an important case.  Although it could be argued that your
> end is not running a "properly functioning TCP" because one which was
> would answer the probes.  What other aspects of TCP fail to work in
> the presence of such a firewall and is this desirable?

"properly functioning" means 'modified to support probes',
i.e., nonstandard?

Or do probles work with existing implementations?



At this point, I think we have some pretty strong evidence
that:

	SYN flooding is important

	RFCs addressing this would be useful

Until the latter is true, I propose that this issue
is premature to be discussed further for this RFC,
however.

(I thought we were trying to limit ourselves to
things in RFCs, which TIME_WAIT is but SYN flooding isn't)

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 18:17:18 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA03018 for tcp-impl-list; Thu, 5 Jun 1997 18:15:52 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA03010 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 18:15:50 -0700
Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.219]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id SAA00304
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 18:15:49 -0700
	env-from (raj@hpisrdq.cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185]) by palrel3.hp.com with SMTP (8.7.5/8.7.3) id SAA10209 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 18:15:40 -0700 (PDT)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA20601; Thu, 5 Jun 1997 18:14:32 -0700
Message-Id: <33976478.19CE@cup.hp.com>
Date: Thu, 05 Jun 1997 18:14:32 -0700
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: TIME-WAIT truncation
References: <199706060020.CAA07517@rekk.dna.lth.se>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Eric.Schenk@dna.lth.se wrote:
> Vernon Schryver <vjs@mica.denver.sgi.com> writes:
> >Large listen queues and careful drop policies are effective and tiny to
> ...
> Initially I was fairly strongly in favor of this approach. I have
> come to dislike it for two reasons.
> 
> 1) It requires changes to user space programs to enlarge the backlog
>    queues. This means that lots of systems run by the relatively less
>    informed will remain open to attack.

There is nothing to prevent an OS patch from changing the default queue
length for SYN-RECVD, and then use the parameters to listen only for the
established-waiting -for-accept queue.

rick jones

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 18:23:49 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA04755 for tcp-impl-list; Thu, 5 Jun 1997 18:21:28 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA04744 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 18:21:25 -0700
Received: from jekyll.piermont.com (jekyll.piermont.com [206.1.51.15]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id SAA01479
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 18:21:23 -0700
	env-from (perry@jekyll.piermont.com)
Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by jekyll.piermont.com (8.8.5/8.6.12) with SMTP id VAA05166; Thu, 5 Jun 1997 21:17:27 -0400 (EDT)
Message-Id: <199706060117.VAA05166@jekyll.piermont.com>
X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
To: touch@isi.edu
cc: perry@piermont.com, tcp-impl@relay.engr.sgi.com, vjs@mica.denver.sgi.com
Subject: Re: TIME-WAIT truncation 
In-reply-to: Your message of "Thu, 05 Jun 1997 16:48:51 PDT."
             <199706052348.AA07769@ash.isi.edu> 
Reply-To: perry@piermont.com
X-Reposting-Policy: redistribute only with permission
Date: Thu, 05 Jun 1997 21:17:22 -0400
From: "Perry E. Metzger" <perry@piermont.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


touch@ISI.EDU writes:
> 	- drop randomly
> 
> 	- drop based on a validity check (cookies)
> 
> the former always works, but punishes all incoming connections
> equally
> 
> the latter punishes all incoming connections by slowing
> the processing of SYNs, but (apparently) this slowdown
> has been shown acceptable for some systems, and it also rewards
> 'a-prior known good' connections with successful processing

As bandwidths increase, the latter becomes easier and easier to
subvert, actually. Its a bad hack.


Perry

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 18:23:52 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA05183 for tcp-impl-list; Thu, 5 Jun 1997 18:22:37 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA05169 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 18:22:35 -0700
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id SAA01653
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 18:22:34 -0700
	env-from (craig@aland.bbn.com)
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id SAA23493; Thu, 5 Jun 1997 18:18:15 -0700 (PDT)
Message-Id: <199706060118.SAA23493@aland.bbn.com>
To: touch@ISI.EDU
cc: tcp-impl@relay.engr.SGI.COM
Subject: Re: TIME-WAIT truncation 
In-reply-to: Your message of Thu, 05 Jun 97 18:07:03 -0700.
             <199706060107.AA08006@ash.isi.edu> 
Date: Thu, 05 Jun 97 18:18:15 -0700
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


    (I thought we were trying to limit ourselves to
    things in RFCs, which TIME_WAIT is but SYN flooding isn't)

There's a known solution to this problem (often employed).  Someone
quickly writes an RFC on the new problem.  That RFC then gets cited
by the BFP/STD that follows.

Craig

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 18:30:10 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA06425 for tcp-impl-list; Thu, 5 Jun 1997 18:28:11 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA06398 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 5 Jun 1997 18:28:08 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id TAA07424 for tcp-impl@cthulhu.engr.sgi.com; Thu, 5 Jun 1997 19:28:01 -0600
Date: Thu, 5 Jun 1997 19:28:01 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706060128.TAA07424@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TIME-WAIT truncation
References: <199706060020.CAA07517@rekk.dna.lth.se>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Rick Jones <raj@hpisrdq.cup.hp.com>


> > >Large listen queues and careful drop policies are effective and tiny to
> > ...
> > Initially I was fairly strongly in favor of this approach. I have
> > come to dislike it for two reasons.
> > 
> > 1) It requires changes to user space programs to enlarge the backlog
> >    queues. This means that lots of systems run by the relatively less
> >    informed will remain open to attack.
> 
> There is nothing to prevent an OS patch from changing the default queue
> length for SYN-RECVD, and then use the parameters to listen only for the
> established-waiting -for-accept queue.

True.

However, in some notable cases, it pays to consider increasing the
listen queue length regardless of worries about SYN attacks or
defenses.  All defenses seem to want a listen queue big enough to not
trigger the defense except when the system is under attack.  Some
applications are too smart, and do something silly like listen(s,10).
Sendmail is an example.


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 18:42:18 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA09358 for tcp-impl-list; Thu, 5 Jun 1997 18:40:23 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA09351 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 18:40:21 -0700
Received: from caipfs.rutgers.edu (caipfs.rutgers.edu [128.6.155.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id SAA04286
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 18:40:20 -0700
	env-from (davem@caipfs.rutgers.edu)
Received: from jenolan.caipgeneral (jenolan.rutgers.edu [128.6.111.5])
	by caipfs.rutgers.edu (8.8.5/8.8.5) with SMTP id VAA14517;
	Thu, 5 Jun 1997 21:36:30 -0400 (EDT)
Received: by jenolan.caipgeneral (SMI-8.6/SMI-SVR4)
	id VAA01078; Thu, 5 Jun 1997 21:34:26 -0400
Date: Thu, 5 Jun 1997 21:34:26 -0400
Message-Id: <199706060134.VAA01078@jenolan.caipgeneral>
From: "David S. Miller" <davem@jenolan.rutgers.edu>
To: touch@ISI.EDU
CC: vjs@mica.denver.sgi.com, tcp-impl@relay.engr.SGI.COM
In-reply-to: <199706060107.AA08006@ash.isi.edu> (touch@ISI.EDU)
Subject: Re: TIME-WAIT truncation
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   From: touch@ISI.EDU
   Date: Thu, 5 Jun 1997 18:07:03 -0700

   "properly functioning" means 'modified to support probes',
   i.e., nonstandard?

The cookie probes work with any compliant/normal TCP implementation.
We've seen the SYN version work with just about anything starting the
connections, the RST cookies seem to sometimes cause the win95 stack
to not function properly.

   Or do probles work with existing implementations?

Yes, they work with existing implementations as just described.

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 18:58:28 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA12151 for tcp-impl-list; Thu, 5 Jun 1997 18:56:49 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA12144 for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 18:56:47 -0700
Received: from caipfs.rutgers.edu (caipfs.rutgers.edu [128.6.155.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id SAA07171
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 18:56:45 -0700
	env-from (davem@caipfs.rutgers.edu)
Received: from jenolan.caipgeneral (jenolan.rutgers.edu [128.6.111.5])
	by caipfs.rutgers.edu (8.8.5/8.8.5) with SMTP id VAA14707
	for <tcp-impl@relay.engr.sgi.com>; Thu, 5 Jun 1997 21:52:59 -0400 (EDT)
Received: by jenolan.caipgeneral (SMI-8.6/SMI-SVR4)
	id VAA01096; Thu, 5 Jun 1997 21:50:55 -0400
Date: Thu, 5 Jun 1997 21:50:55 -0400
Message-Id: <199706060150.VAA01096@jenolan.caipgeneral>
From: "David S. Miller" <davem@jenolan.rutgers.edu>
To: tcp-impl@relay.engr.sgi.com
Subject: a quick clarification...
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


The SYN/RST cookie flood prevention requires no changes whatsoever at
the client end machines, only the server needs the necessary
additional code in the stack.  Some people were not clear on this
point.

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 19:22:42 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA16081 for tcp-impl-list; Thu, 5 Jun 1997 19:20:58 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA15989 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 19:20:14 -0700
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id TAA12606
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 19:20:09 -0700
	env-from (mouse@Twig.Rodents.Montreal.QC.CA)
Received: (from mouse@localhost)
	by Twig.Rodents.Montreal.QC.CA (8.8.5/8.8.5) id WAA16509;
	Thu, 5 Jun 1997 22:16:14 -0400 (EDT)
Date: Thu, 5 Jun 1997 22:16:14 -0400 (EDT)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199706060216.WAA16509@Twig.Rodents.Montreal.QC.CA>
To: tcp-impl@relay.engr.SGI.COM
Subject: SYN/RST cookies (was Re: a quick clarification...)
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> The SYN/RST cookie flood prevention requires no changes whatsoever at
> the client end machines, only the server needs the necessary
> additional code in the stack.

What _are_ those changes?  Anything with Dan B.'s name on it is
somewhat suspect, but the idea seems sane, sane enough to make me want
to judge it on its own merits.

What I haven't seen is enough information to allow me to do so.  Could
someone who knows how this mechanism works possibly explain to me where
the cookie gets stashed in the packet such that the stack can be
confident it will come back intact when the peer is for real?

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun  5 19:42:30 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA19666 for tcp-impl-list; Thu, 5 Jun 1997 19:40:22 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA19654 for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 19:40:19 -0700
Received: from caipfs.rutgers.edu (caipfs.rutgers.edu [128.6.155.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id TAA15982
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 5 Jun 1997 19:40:17 -0700
	env-from (davem@caipfs.rutgers.edu)
Received: from jenolan.caipgeneral (jenolan.rutgers.edu [128.6.111.5])
	by caipfs.rutgers.edu (8.8.5/8.8.5) with SMTP id WAA15374;
	Thu, 5 Jun 1997 22:36:31 -0400 (EDT)
Received: by jenolan.caipgeneral (SMI-8.6/SMI-SVR4)
	id WAA01171; Thu, 5 Jun 1997 22:34:26 -0400
Date: Thu, 5 Jun 1997 22:34:26 -0400
Message-Id: <199706060234.WAA01171@jenolan.caipgeneral>
From: "David S. Miller" <davem@jenolan.rutgers.edu>
To: mouse@Rodents.Montreal.QC.CA
CC: tcp-impl@relay.engr.SGI.COM
In-reply-to: <199706060216.WAA16509@Twig.Rodents.Montreal.QC.CA> (message from
	der Mouse on Thu, 5 Jun 1997 22:16:14 -0400 (EDT))
Subject: Re: SYN/RST cookies (was Re: a quick clarification...)
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   Date: Thu, 5 Jun 1997 22:16:14 -0400 (EDT)
   From: der Mouse  <mouse@Rodents.Montreal.QC.CA>

   What _are_ those changes?  Anything with Dan B.'s name on it is
   somewhat suspect, but the idea seems sane, sane enough to make me
   want to judge it on its own merits.

An archive of the syncookies mailing lists, where pretty much all the
details of the technique were discussed, is available from:

ftp://koobera.math.uic.edu/pub/docs/syncookies-archive

---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s   ////
ethernet.  Beat that!                     ////
-----------------------------------------////__________  o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun  6 00:07:58 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA28008 for tcp-impl-list; Fri, 6 Jun 1997 00:05:29 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA27981 for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 00:05:27 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id AAA25234
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 00:05:24 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id IAA12528; Fri, 6 Jun 1997 08:03:08 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wZtDq-0005FdC; Fri, 6 Jun 97 08:14 BST
Message-Id: <m0wZtDq-0005FdC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TIME-WAIT truncation
To: touch@ISI.EDU
Date: Fri, 6 Jun 1997 08:14:06 +0100 (BST)
Cc: braden@ISI.EDU, backman@ftp.com, henrysa@exchange.microsoft.com,
        vern@ee.lbl.gov, tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199706052142.AA07549@ash.isi.edu> from "touch@ISI.EDU" at Jun 5, 97 02:42:45 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > yes it would have.  If anyone ever implemented it.  We looked at it
> > 2-3 years back; thought it was a good idea; than asked the question
> > "so who would we do TTCP with?".  That finished that.
> See FreeBSD. To quote the old tomato sauce commercial, "it's in there".

And demonstrating the point admirably. T/TCP has shown a huge number of
interworking problems with other stacks. Not because the BSD code is wrong
but because T/TCP has real bugs and because it violates aspects of RFC793.
Also because of bugs in other client stacks.

Thats one of the things that led me to conclude the only way to get T/TCP
workable is going to be to say (if (ipv6) (do_ttcp))

T/TCP also only helps if you are getting lots of connections from each
host. The pathalogical worst case - all clients dont support ttcp, or
all clients are different, is as bad

Alan


From owner-tcp-impl@relay.engr.sgi.com  Fri Jun  6 00:23:02 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA01175 for tcp-impl-list; Fri, 6 Jun 1997 00:20:54 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA01153 for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 00:20:49 -0700
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id AAA27643
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 00:20:49 -0700
	env-from (Jerry.Chu@Eng.Sun.COM)
Received: from Eng.Sun.COM ([129.146.1.13]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id AAA24794; Fri, 6 Jun 1997 00:10:53 -0700
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id AAA18973; Fri, 6 Jun 1997 00:10:00 -0700
Received: from taipei.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id AAA10799; Fri, 6 Jun 1997 00:10:58 -0700
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id AAA21677; Fri, 6 Jun 1997 00:08:35 -0700
Date: Fri, 6 Jun 1997 00:08:35 -0700
From: Jerry.Chu@Eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199706060708.AAA21677@taipei.eng.sun.com>
To: Eric.Schenk@dna.lth.se, braden@ISI.EDU
Subject: Re: TIME-WAIT truncation
Cc: tcp-impl@relay.engr.SGI.COM
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>  *>
>  *> Rather than just tossing out the oldest time-waiter, it would be better
>  *> to grab the last sequence number information from that socket and make
>  *> sure to offer a sequence number just following that on the new connection.
>  *> If I recall correctly BSD boxes already do this if there is an incoming
>  *> connection to a specific port that is in TIME_WAIT, and recent Linux
>  *> releases (2.0 or older) do this as well.
> 
>Eric,
> 
>When you start messing with TCP's reliable delivery mechanism, you have
>to be very careful.  I am not positive, but I think I recall that the
>scheme you mention is in fact formally incorrect and can be
>demonstrated in particular circumstances to allow corrupted data.
>   
>Bob Braden

Not to mention even the simple scheme BSD code didn't get it right.

                        /* If a new connection request is received
                         * while in TIME_WAIT, drop the old connection
                         * and start over if the sequence numbers
                         * are above the previous ones.
                         */
                        if (tiflags & TH_SYN &&
                            tp->t_state == TCPS_TIME_WAIT &&
                            SEQ_GT(ti->ti_seq, tp->rcv_nxt)) {
                                iss = tp->rcv_nxt + TCP_ISSINCR;
					    ^
			     /* should be snd_nxt, not rcv_nxt */

                                tp = tcp_close(tp);
                                goto findpcb;
                        }

Jerry

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun  6 00:35:07 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA03150 for tcp-impl-list; Fri, 6 Jun 1997 00:33:23 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA03142 for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 00:33:19 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id AAA29215
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 00:33:16 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id IAA13457; Fri, 6 Jun 1997 08:32:11 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wZtg1-0005FfC; Fri, 6 Jun 97 08:43 BST
Message-Id: <m0wZtg1-0005FfC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: SYN cookies
To: perry@piermont.com
Date: Fri, 6 Jun 1997 08:43:13 +0100 (BST)
Cc: davem@jenolan.rutgers.edu, touch@ISI.EDU, vjs@mica.denver.sgi.com,
        tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199706052346.TAA05009@jekyll.piermont.com> from "Perry E. Metzger" at Jun 5, 97 07:46:31 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> "David S. Miller" writes:
> > So what is the problem with SYN/RST cookie based defense mechanisms
> 
> Where to begin...

Begin at the beginning and end at the end. I know several RST cookie problems
- notably a lot of "smart" firewalls say "ooh I wonder what this is I think
I'll drop it"

SYN cookies have some interesting sequence space issues and like the other
dropping techniques and time wait queue shortening techniques Im not convinced
they follow all the formal requirements we ought to have. As to performance
that one is a red herring - indeed the secure sequence number takes longer
to compute 

Alan


From owner-tcp-impl@relay.engr.sgi.com  Fri Jun  6 00:42:37 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA05995 for tcp-impl-list; Fri, 6 Jun 1997 00:41:21 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA05977 for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 00:41:18 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id AAA00352
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 00:41:15 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id IAA13677; Fri, 6 Jun 1997 08:39:30 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wZtn3-0005FdC; Fri, 6 Jun 97 08:50 BST
Message-Id: <m0wZtn3-0005FdC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TIME-WAIT truncation
To: touch@ISI.EDU
Date: Fri, 6 Jun 1997 08:50:29 +0100 (BST)
Cc: davem@jenolan.rutgers.edu, vjs@mica.denver.sgi.com,
        tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199706060003.AA07811@ash.isi.edu> from "touch@ISI.EDU" at Jun 5, 97 05:03:48 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Did you try probes over a firewall?

Yep. Several. They break RST cookies. SYN cookies are fine - think about
how they work and that should be absolutely obvious.

> > that this does not even happen at all.  I mean seriously, what is the
> > code path for a well written SYN cookie implementation?  Could be made
> > to be nothing more than:
> You run code that doesn't consume CPU?

It uses very little - and may well be using less than the cost of
picking a random node from a queue each time and discarding it. Remember
for a good SYN drop scheme your drop random function ought to be 
cryptographically secure to stop timing attacks.

> > probe packet is sent and one received response must be verified. If you
> > think about it, for web client connection patterns to a server this is
> > should perform extremely well.
> > Where is all this CPU overhead you speak of?
> Right there...

And the 8080 isnt used for fast web servers so the clock count on this is
almost irrelevant. It'll cost you more to pull the cache lines for the 
packet that got into 'I' state from the snoop of the packet DMA and far
more to do the other basic stuff.

> I have a server providing info off a RAM disk over a gigabit network.
> It does happen in practice - just not yet.

Dave - this we can try with the Myrinet boards. I'd still expect memory
bandwidth to kick in first.

> Why are the probes not just like SYN attacks? What if I don't want
> to answer probes, for security reasons? Why should TCP fail in those
> cases?

The probes are valid TCP responses to your SYN. If you don't wish to answer
a SYN|ACK then the conversation is a trifle irrelevant


From owner-tcp-impl@relay.engr.sgi.com  Fri Jun  6 00:48:14 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA07510 for tcp-impl-list; Fri, 6 Jun 1997 00:45:19 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA07498 for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 00:45:16 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id AAA01196
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 00:45:13 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id IAA13699; Fri, 6 Jun 1997 08:44:22 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wZtrl-0005FdC; Fri, 6 Jun 97 08:55 BST
Message-Id: <m0wZtrl-0005FdC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TIME-WAIT truncation
To: vern@ee.lbl.gov (Vern Paxson)
Date: Fri, 6 Jun 1997 08:55:21 +0100 (BST)
Cc: touch@ISI.EDU, tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199706060009.RAA28794@daffy.ee.lbl.gov> from "Vern Paxson" at Jun 5, 97 05:09:55 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> There's a basic problem which is that now that TTL is de facto only a hop
> count (per RFC 1812), there's really no enforcement of segment lifetime in
> the network, so nothing fundamental to build on.

IPv6 discards the whole thing. This pretty much makes PAWS mandatory on an
IPv6 network. The right answer to all this is the PAWS stuff. We just need
more people supporting it, and its back to the usual cache-22 situation with
T/TCP etc again (this time not so badly).

> > 30 seconds seems OK
> > 	20 hops, 10 Kbps per hop makes this AOK.
> 
> Alan mentioned personal experience on paths for which this is too low ...

Mixed 1200/9600 baud AX.25 networks regularly go over the 30 seconds. Radio
endpoints onto real networks sometimes hit 15 seconds RTT, and you can certainly
drive 64Kbit lines to a 3 second rtt with some poorer router technology.

The radio one is ok until you have a radio user each end and a slow path in
the middle. Then you will indeed just about occasionally hit the 30 seconds.
This kind of technology ought to be the butt end of the worst of internet
now.

I also don't have good figures for some of the emerging satellite networks -
can anyone comment on RTT's across those ?

Alan



From owner-tcp-impl@relay.engr.sgi.com  Fri Jun  6 00:52:37 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA09281 for tcp-impl-list; Fri, 6 Jun 1997 00:51:24 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA09267 for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 00:51:21 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id AAA01944
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 00:51:19 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id IAA14042; Fri, 6 Jun 1997 08:49:37 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wZtwl-0005FeC; Fri, 6 Jun 97 09:00 BST
Message-Id: <m0wZtwl-0005FeC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TIME-WAIT truncation
To: davem@jenolan.rutgers.edu (David S. Miller)
Date: Fri, 6 Jun 1997 09:00:31 +0100 (BST)
Cc: touch@ISI.EDU, vjs@mica.denver.sgi.com, tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199706060134.VAA01078@jenolan.caipgeneral> from "David S. Miller" at Jun 5, 97 09:34:26 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> We've seen the SYN version work with just about anything starting the
> connections, the RST cookies seem to sometimes cause the win95 stack
> to not function properly.

I am not yet convinced it is Win95. Various firewalls drop RST cookies which
is specifically why I recommend vendors do SYN but not RST. Win95 machines
are much more likely to be behind a firewall and so the small sample size
I have doesnt conclusively prove RST cookies and Win95 is a problem. It does
prove RST cookies while very clever are not real world



From owner-tcp-impl@relay.engr.sgi.com  Fri Jun  6 01:08:00 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA12072 for tcp-impl-list; Fri, 6 Jun 1997 01:06:32 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id BAA12063 for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 01:06:28 -0700
Received: from rekk.dna.lth.se (rekk.dna.lth.se [130.235.16.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id BAA04078
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 01:06:27 -0700
	env-from (erics@rekk.dna.lth.se)
Received: from rekk.dna.lth.se (localhost [127.0.0.1])
	by rekk.dna.lth.se (8.8.5/8.8.5) with ESMTP id KAA12627;
	Fri, 6 Jun 1997 10:02:49 +0200
Message-Id: <199706060802.KAA12627@rekk.dna.lth.se>
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
cc: Eric.Schenk@dna.lth.se, tcp-impl@relay.engr.SGI.COM
From: Eric.Schenk@dna.lth.se
Subject: Re: SYN cookies 
In-reply-to: Your message of "Fri, 06 Jun 1997 08:43:13 BST."
             <m0wZtg1-0005FfC@lightning.swansea.linux.org.uk> 
Date: Fri, 06 Jun 1997 10:02:48 +0200
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


(trimmed the CC list, everyone here is on the mailing list).

Alan Cox <alan@lxorguk.ukuu.org.uk> writes:
>Begin at the beginning and end at the end. I know several RST cookie problems
>- notably a lot of "smart" firewalls say "ooh I wonder what this is I think
>I'll drop it"

Indeed. This is the one flaw in what I think is otherwise probably the best
scheme (in terms of questions of whether or not it changes TCP semantics).
The specific problem is that some firewalls drop RST frames.
The way RST cookies work is to send a SYNACK that is guaranteed to
elicit an RST from the client. When the RST is received this is taken
as a sign that the client actually exists, and further attempts by
the client to connect will be permitted (and no further cookies
will be sent until the current permission expires).

>SYN cookies have some interesting sequence space issues and like the other
>dropping techniques and time wait queue shortening techniques Im not convinced
>they follow all the formal requirements we ought to have.

I think the sequence space issues in SYN cookies are mostly taken care of,
however I have just noticed one case where things are a bit dicey.
In particular, there is a transition problem between sequence numbers
generated in the usual secure sequence number way, and SYN cookie
sequence numbers. Either one used consistently generates proper monotone
increasing sequences that don't clock over too fast. However, they don't
generate the same sequences, and switching from one mode to the other might
result in a backwards jump in the sequence space. Off hand I do not know
if this issue can be resolved, I would have to think about this a bit.

>As to performance
>that one is a red herring - indeed the secure sequence number takes longer
>to compute 

Well, that's not quite true. The secure sequence number costs about the
same to compute as a SYN cookie, perhaps a bit more, depending on your
specific secure sequence number hash. In fact the algorithms are
suspiciously similar. The major difference is that the clock portion
of the calculation is taken from the remote sides sequence number
in the case of a SYN cookie.

However, the main point you make stands. The cost of generating a
cookie is the same as the cost of generating a SYNACK frame.
In a random or tail drop scheme you must look for a TCB to drop,
and you must generate a new TCB for the incoming frame.
In a cookie scheme you don't do anything extra at this point,
you don't even have to create a TCB for the incoming frame at this point.
In both schemes we do send out a response to every incoming SYN packet.

-- 
Eric Schenk                               www: http://www.dna.lth.se/~erics
Dept. of Comp. Sci., Lund University          email: Eric.Schenk@dna.lth.se
Box 118, S-221 00 LUND, Sweden   fax: +46-46 13 10 21  ph: +46-46 222 96 38

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun  6 06:53:39 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id GAA21347 for tcp-impl-list; Fri, 6 Jun 1997 06:51:58 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id GAA21335 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 6 Jun 1997 06:51:55 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id HAA08478 for tcp-impl@cthulhu.engr.sgi.com; Fri, 6 Jun 1997 07:51:50 -0600
Date: Fri, 6 Jun 1997 07:51:50 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706061351.HAA08478@mica.denver.sgi.com>
Subject: Re: SYN cookies
Cc: tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Eric.Schenk@dna.lth.se

> ...
> >As to performance
> >that one is a red herring - indeed the secure sequence number takes longer
> >to compute 
> 
> Well, that's not quite true. The secure sequence number costs about the
> same to compute as a SYN cookie, perhaps a bit more, depending on your
> specific secure sequence number hash. In fact the algorithms are
> suspiciously similar. The major difference is that the clock portion
> of the calculation is taken from the remote sides sequence number
> in the case of a SYN cookie.

What is this about "<<the>> secure sequence number"?
Using a cryptographic PRNG is one way.  Another is to use an extreme
cheap (and weak) hash and a local source of randomness, such as a
microsecond or better clock.  The bad guy must guess your initial
sequence number exactly, not just get close.

> However, the main point you make stands. The cost of generating a
> cookie is the same as the cost of generating a SYNACK frame.
> In a random or tail drop scheme you must look for a TCB to drop,
> and you must generate a new TCB for the incoming frame.
> In a cookie scheme you don't do anything extra at this point,
> you don't even have to create a TCB for the incoming frame at this point.
> In both schemes we do send out a response to every incoming SYN packet.

Again, tail-drop is free if you have the machinery to deal with large
MHit/day webservering.  In real life (as opposed to benchmarks), large
loads imply large listen queues, because so many clients never send
more than the initial SYN before dropping off the net.  Random-drop
need not cost many cycles, depending on your queue structure.  If you
use a hash table (as in the BSDI code), it need not be completely
resistant to any collisions caused by bad guys, but only resistent to
substantial numbers of collisions per second.

In the cookie scheme, you do generate and save an entry in a local hash
table, don't you?  To prevent bad things from happening from collisions
in cookies, since the 32-bit initial sequence number does not have
enough space to encode the state of the new connection.  That state
must include at least
    (addr,port,addr,port)
    start-time, so that you can discard it
    local initial sequence number

The initial cookie proposals I saw involved no local state.  There was
a hash computed of (addr,port,addr,port,remote seq #) and returned as a
SYN-ACK, and forgotten.  The hope was that if the peer responded with
an ACK and the right sequence number, it must be the right good guy.
Obviously, without local state, collisions can and will happen, and
some good guys will suffer, just as with random-drop, although perhaps
less often than with random-drop.  The recent statements that
absolutely no good guys can be lost imply that there is now local state
saved for each SYN-ACK sent, since otherwise collisions would happen,
and good guys would be hit.

Given state in a local hash table, I see no advantage to cookies
compared to the BSDI code.

If the bad guy knows the target is using cookies without local state,
wouldn't the effective attack not be just SYNs but a mixture of just
enough SYNs to trigger the cookie defense and mostly ACKs intended to
collide with good guys?


Vernon Schryver,  vjs@sgi.com
¾ï$ýùdOI+óë2zåtàÙÓ‚ÏÉÅÁj¹°«¥¡‡A9›”&\
.pdefaults
.loc-cshrc
.nevotinit
.gamtables
.signature
.sgihelprc
.insightrc
.Xdefaults

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun  6 07:57:11 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA05389 for tcp-impl-list; Fri, 6 Jun 1997 07:54:40 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA05381 for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 07:54:38 -0700
Received: from mailhost1.BayNetworks.COM (ext-ns3.baynetworks.com [134.177.3.16]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA28895
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 07:54:36 -0700
	env-from (mpatel@BayNetworks.COM)
Received: from mailhost.BayNetworks.COM ([134.177.1.107]) 
	by mailhost1.BayNetworks.COM (8.8.5/BNET-97/05/05-E) with ESMTP
	id HAA09337; Fri, 6 Jun 1997 07:44:39 -0700 (PDT)
	for <tcp-impl@relay.engr.SGI.COM>
Received: from pobox.engeast.BayNetworks.COM (pobox.corpeast.baynetworks.com [192.32.151.199]) 
	by mailhost.BayNetworks.COM (8.8.5/BNET-97/06/05-I) with ESMTP
	id HAA08362; Fri, 6 Jun 1997 07:44:38 -0700 (PDT)
	for <tcp-impl@relay.engr.SGI.COM>
Posted-Date: Fri, 6 Jun 1997 07:44:38 -0700 (PDT)
Received: from horizon.engeast (horizon [192.32.170.181])
	by pobox.engeast.BayNetworks.COM (SMI-8.6/BNET-97/04/24-S) with SMTP
	id KAA12569; Fri, 6 Jun 1997 10:44:39 -0400
	for <tcp-impl@relay.engr.SGI.COM>
Received: from horizon (localhost) by horizon.engeast (4.1/SMI-4.1)
	id AA04907; Fri, 6 Jun 97 10:44:39 EDT
Message-Id: <33982256.345BF651@baynetworks.com>
Date: Fri, 06 Jun 1997 10:44:38 -0400
From: "Manish Patel(x67071)" <mpatel@BayNetworks.COM>
Organization: Bay Networks
X-Mailer: Mozilla 3.01Gold (X11; I; SunOS 4.1.4 sun4m)
Mime-Version: 1.0
To: tcp-impl@relay.engr.SGI.COM
Subject: subscribe
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

subscribe

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun  6 08:30:10 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA15578 for tcp-impl-list; Fri, 6 Jun 1997 08:27:52 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA15567 for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 08:27:50 -0700
Received: from frantic.BSDI.COM (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA07654
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 08:27:42 -0700
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id KAA01535;
	Fri, 6 Jun 1997 10:26:44 -0500 (CDT)
Date: Fri, 6 Jun 1997 10:26:44 -0500 (CDT)
From: David Borman <dab@BSDI.COM>
Message-Id: <199706061526.KAA01535@frantic.BSDI.COM>
To: davem@jenolan.rutgers.edu, tcp-impl@relay.engr.SGI.COM
Subject: Re: SYN/RST cookies (was Re: a quick clarification...)
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Well, I've been tryng to not get embroiled in this SYN-attack
discussion, and I'm going to try and limit it to just this
one note.  However, since I did do a fair amount of work on
this last fall, I might be able to clarify a few things.

First, some generic observations.  When planning a defense against
any kind of attack, you need to plan for the worst case scenario.

Problem:
    The listen queue is too small, and when full it just denies
    new connections.  SYN flooding keeps the queue full with bogus
    connections that don't go away until the connection times out.

Assumptions:
    o No one else is going to stop the SYN-attack for you
    o You cannot identify the bad SYN packets from the good SYN packets

The goal was:
    Allow legitimate connections to succeed in spite of a SYN-attack.

The only way to do this is to:
    1) Make the listen queue much larger
    2) When the listen queue is full, drop an existing
       embryonic connection, not the new SYN, so that
    3) You respond to every SYN

Several approaches were discussed.  The main ones were:
    1) Make the listen queues deeper, and implement random drop
    2) Create a minimal state SYN cache
    3) Send the state to the src, and create the connection when
       the reply is received.

The SYN-cookie falls under #3.  The main advantage of it is that
(1) it places the queue with state information into the network,
making it infinitely deep, and (2) we don't have to do anything to
time-out those items.

The disadvantage of #3 is that all TCP state that cannot be gotten
from the returning ACK has to be encoded in the 32 bit sequence
number, with enough cryptography to ensure that you don't make
yourself vulnerable to ACK spoofing.  From looking at the SYN-cookie
archives, the only additional state encoded is the MSS, in the top
3 bit.  It does not save Window Scale information.  That requires 8
more bits, four for each direction.  Knowing whether or not SACK-ALLOWED
was received reqires 1 more bit.  Now you are down to 20 bits for
the cookie.  Is that enough? I don't know.

I chose to implement a minimal state SYN cache, with oldest
drop on overflow.  In 32 bytes I can retain all the information
needed (for an IPv4 connection, IPv6 is another issue...)  The
cached syns are kept in a hash table for easy lookup.  On table
overflow, the oldest entry in the hash bucket is dropped.  No
application changes are needed, because the concept of listen()
backlog was changed to be the maximum number of *established*
connections that are allowed to be queued up.  The whole point
of the listen backlog is to keep a listen socket from continuing
to accept new connections when the application is not accepting
them.  So, we only drop incoming SYNs if the backlog of established
connections exceeds the amount specified by listen().

Oldest drop vs. Random drop:

This is the difference between a cliff and a slope.  With oldest
drop, if the RTT for a legitimate connection is less than the
queue depth/attack rate, you can guarantee that the connection
will succeed.  If your RTT is larger, then you can guarantee
that the connection will fail.  With random drop, every connection
has some probability of loosing, and that probability goes up
with your RTT.

As to the pros and cons of syn-cookies vs. a SYN-cache, both
have their flaws.

  o Neither one solves the problem of the returning ACK from a
    valid connection being lost.  The cookie approach can't
    retransmit the SYN/ACK, since it doesn't have any state, and
    the SYN-cache chooses not to retransmit the SYN/ACK, since
    most of the retransmissions would be for bogus connections.

  o SYN cookies don't retain state about SACK or window scale.

  o SYN cache requires memory, about 1 MB to allow a queue of
    30,000 deep.

  o SYN cookies must be cryptographically secure to prevent
    a forged ACK attack.  If you don't already have MD5 in
    your kernel, you have to add it.

I'll also point out that along with the code that I released for
4.4BSD-Lite2 (ftp://ftp.bsdi.com/contrib/bsdi_contrib/44Lite-SYNcache.gz)
I put in comments about how to rip out just the caching code and
replace it with something like the SYN-cookie defense.

If you want more informaton about the syn-caching code, pick up
the distribution and read all about it.

The bottom line for me is that the SYN-cache solves the problem
that we set out to solve, and for our situation (BSD/OS), we felt
that it was a better solution than the SYN-cookie approach.  If
at some point that proves to be false, we'll rethink our solution.

I won't tell anyone which approach is better, because they both
have their flaws.  The best I can do is point out the pros and
cons of each, and let the other person decide which trade-offs
best fits their situation.

I've got well over a megabyte of mail discussion about the
SYN-flood-attack and defenses against it, let's not continue
to rehash the whole thing here.

		-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Fri Jun  6 09:08:01 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA00874 for tcp-impl-list; Fri, 6 Jun 1997 09:05:13 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA00808 for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 09:05:09 -0700
Received: from socks2.raleigh.ibm.com (socks2.raleigh.ibm.com [204.146.167.123]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id JAA21598
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 09:05:05 -0700
	env-from (lori@raleigh.ibm.com)
Received: from rtpmail02.raleigh.ibm.com by socks2.raleigh.ibm.com (AIX 4.1/UCB 5.64/RTP-FW1.0)
          id AA51724; Fri, 6 Jun 1997 12:01:18 -0400
Received: from rtpnsi01.raleigh.ibm.com (rtpnsi01.raleigh.ibm.com [9.67.71.43])
	by rtpmail02.raleigh.ibm.com (8.8.5/8.8.5/RTP-ral-1.1) with SMTP id MAA05668
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 12:01:17 -0400
Received: by rtpnsi01.raleigh.ibm.com (IBM OS/2 SENDMAIL VERSION 1.3.14/4.03)
	  id AA1589; Fri, 06 Jun 97 12:01:15 -0400
Message-Id: <9706061601.AA1589@rtpnsi01.raleigh.ibm.com>
Received: from RTPNOTES with "Lotus Notes Mail Gateway for SMTP" id
  E4B97F8D3FA23741852564AE0057AC10; Fri,  6 Jun 97 12:01:14 
To: tcp-impl <tcp-impl@relay.engr.SGI.COM>
From: Lori Napoli <lori@raleigh.ibm.com>
Date:  6 Jun 97 11:58:13 
Subject: -No Subject-
Mime-Version: 1.0
Content-Type: Text/Plain
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

subscribe

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun  6 09:34:08 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA10697 for tcp-impl-list; Fri, 6 Jun 1997 09:30:35 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA10614 for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 09:30:30 -0700
Received: from himmelsborg.dna.lth.se (himmelsborg.dna.lth.se [130.235.16.11]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA29012
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 09:30:28 -0700
	env-from (erics@regin.dna.lth.se)
Received: (from erics@localhost) by himmelsborg.dna.lth.se (8.7.6/8.7.3/perf) id SAA10268 for <@EMIL:tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 18:19:25 +0200 (MET DST)
X-Authentication-Warning: himmelsborg.dna.lth.se: erics set sender to erics@regin.dna.lth.se using -f
Received: from regin (regin [130.235.16.115]) by himmelsborg.dna.lth.se (8.7.6/8.7.3/perf) with ESMTP id SAA10261; Fri, 6 Jun 1997 18:19:18 +0200 (MET DST)
Received: from regin by regin (SMI-8.6/) id SAA10717; Fri, 6 Jun 1997 18:19:17 +0200
Message-Id: <199706061619.SAA10717@regin>
To: David Borman <dab@BSDI.COM>
cc: Eric.Schenk@dna.lth.se, tcp-impl@relay.engr.SGI.COM
From: Eric.Schenk@dna.lth.se
Subject: Re: SYN/RST cookies (was Re: a quick clarification...) 
In-reply-to: Your message of "Fri, 06 Jun 1997 10:26:44 CDT."
             <199706061526.KAA01535@frantic.BSDI.COM> 
Date: Fri, 06 Jun 1997 18:19:16 +0200
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk



David Borman <dab@BSDI.COM> writes:
>I've got well over a megabyte of mail discussion about the
>SYN-flood-attack and defenses against it, let's not continue
>to rehash the whole thing here.

Let me agree with this. As David has said, there are advantages
and disadvantages to both approaches. I think this has strayed
well off topic for this list. I will respond to a few points in
private email, and if anyone is interested in discussing the
pros and cons of various approaches I'll probably be interested,
but I will not be posting anything further on this topic here.

-- 
Eric Schenk                               www: http://www.dna.lth.se/~erics
Dept. of Comp. Sci., Lund University          email: Eric.Schenk@dna.lth.se
Box 118, S-221 00 LUND, Sweden   fax: +46-46 13 10 21  ph: +46-46 222 96 38

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun  6 10:52:30 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA05726 for tcp-impl-list; Fri, 6 Jun 1997 10:48:34 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA05653 for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 10:48:27 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA25705
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Jun 1997 10:48:24 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id SAA00860; Fri, 6 Jun 1997 18:39:07 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wa37h-0005FdC; Fri, 6 Jun 97 18:48 BST
Message-Id: <m0wa37h-0005FdC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: SYN/RST cookies (was Re: a quick clarification...)
To: dab@BSDI.COM (David Borman)
Date: Fri, 6 Jun 1997 18:48:25 +0100 (BST)
Cc: davem@jenolan.rutgers.edu, tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199706061526.KAA01535@frantic.BSDI.COM> from "David Borman" at Jun 6, 97 10:26:44 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I've got well over a megabyte of mail discussion about the
> SYN-flood-attack and defenses against it, let's not continue
> to rehash the whole thing here.

Fair point - can well agree to document this solely as a

"TCP stacks may be exposed to attacks of the following form ....."

"Several methods of protection for this exist. There is no agreement as
 to the best. [1]"

[1] Reference to appropriate list archives/RFC documents


From owner-tcp-impl@relay.engr.sgi.com  Fri Jun  6 11:52:14 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA26235 for tcp-impl-list; Fri, 6 Jun 1997 11:49:07 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA25644 for <tcp-impl@relay.engr.sgi.com>; Fri, 6 Jun 1997 11:47:49 -0700
Received: from jekyll.piermont.com (jekyll.piermont.com [206.1.51.15]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA16444
	for <tcp-impl@relay.engr.sgi.com>; Fri, 6 Jun 1997 11:47:47 -0700
	env-from (perry@jekyll.piermont.com)
Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by jekyll.piermont.com (8.8.5/8.6.12) with SMTP id OAA08257; Fri, 6 Jun 1997 14:43:34 -0400 (EDT)
Message-Id: <199706061843.OAA08257@jekyll.piermont.com>
X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
cc: dab@bsdi.com (David Borman), davem@jenolan.rutgers.edu,
        tcp-impl@relay.engr.sgi.com
Subject: Re: SYN/RST cookies (was Re: a quick clarification...) 
In-reply-to: Your message of "Fri, 06 Jun 1997 18:48:25 BST."
             <m0wa37h-0005FdC@lightning.swansea.linux.org.uk> 
Reply-To: perry@piermont.com
X-Reposting-Policy: redistribute only with permission
Date: Fri, 06 Jun 1997 14:43:20 -0400
From: "Perry E. Metzger" <perry@piermont.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Alan Cox writes:
> > I've got well over a megabyte of mail discussion about the
> > SYN-flood-attack and defenses against it, let's not continue
> > to rehash the whole thing here.
> 
> Fair point - can well agree to document this solely as a
> 
> "TCP stacks may be exposed to attacks of the following form ....."
> 
> "Several methods of protection for this exist. There is no agreement as
>  to the best. [1]"
> 
> [1] Reference to appropriate list archives/RFC documents

I disagree. The approach outlined by Vern and essentially in use in
BSDI has very wide support and should be implemented by most
vendors. The SYN cookie approach is at very best (I'll be very very
charitable here) experimental.


Perry

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun  6 11:52:19 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA27038 for tcp-impl-list; Fri, 6 Jun 1997 11:50:56 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA27018 for <tcp-impl@relay.engr.sgi.com>; Fri, 6 Jun 1997 11:50:54 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA17141
	for <tcp-impl@relay.engr.sgi.com>; Fri, 6 Jun 1997 11:50:50 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id TAA02996; Fri, 6 Jun 1997 19:49:34 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wa4Fk-0005FdC; Fri, 6 Jun 97 20:00 BST
Message-Id: <m0wa4Fk-0005FdC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: SYN/RST cookies (was Re: a quick clarification...)
To: perry@piermont.com
Date: Fri, 6 Jun 1997 20:00:47 +0100 (BST)
Cc: alan@lxorguk.ukuu.org.uk, dab@bsdi.com, davem@jenolan.rutgers.edu,
        tcp-impl@relay.engr.sgi.com
In-Reply-To: <199706061843.OAA08257@jekyll.piermont.com> from "Perry E. Metzger" at Jun 6, 97 02:43:20 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I disagree. The approach outlined by Vern and essentially in use in
> BSDI has very wide support and should be implemented by most
> vendors. The SYN cookie approach is at very best (I'll be very very
> charitable here) experimental.

You have failed to provide reasons, logic or justification. I don't think
you have a leg to stand on in any such argument other than as proof that
all should be documented.

In future if you want to disparage Dan Bernstein in general and stuff he
has been involved in do it in an appropriate forum - say alt.flame for
example, not at the expense of a working group.

And I have nothing more to say on that matter.

Alan


From owner-tcp-impl@relay.engr.sgi.com  Fri Jun  6 12:25:35 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA07510 for tcp-impl-list; Fri, 6 Jun 1997 12:23:06 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA07484 for <tcp-impl@relay.engr.sgi.com>; Fri, 6 Jun 1997 12:22:58 -0700
Received: from jekyll.piermont.com (jekyll.piermont.com [206.1.51.15]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA27546
	for <tcp-impl@relay.engr.sgi.com>; Fri, 6 Jun 1997 12:22:54 -0700
	env-from (perry@jekyll.piermont.com)
Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by jekyll.piermont.com (8.8.5/8.6.12) with SMTP id PAA08377; Fri, 6 Jun 1997 15:18:21 -0400 (EDT)
Message-Id: <199706061918.PAA08377@jekyll.piermont.com>
X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
cc: tcp-impl@relay.engr.sgi.com
Subject: Re: SYN/RST cookies (was Re: a quick clarification...) 
In-reply-to: Your message of "Fri, 06 Jun 1997 20:00:47 BST."
             <m0wa4Fk-0005FdC@lightning.swansea.linux.org.uk> 
Reply-To: perry@piermont.com
X-Reposting-Policy: redistribute only with permission
Date: Fri, 06 Jun 1997 15:18:16 -0400
From: "Perry E. Metzger" <perry@piermont.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Alan Cox writes:
> > I disagree. The approach outlined by Vern and essentially in use in
> > BSDI has very wide support and should be implemented by most
> > vendors. The SYN cookie approach is at very best (I'll be very very
> > charitable here) experimental.
> 
> You have failed to provide reasons, logic or justification.

I haven't re-done flame wars done elsewhere, yes. I'm not that
interested in them.

Here, though, is a starter: the TCP sequence number etc. isn't big
enough to provide enough space to encode port/source info and all the
rest of the TCP state you need, let alone encode them AND provide
enough room for cryptographic authentication. Too much of the space is
meaningful -- would be recognised by a host as "valid". Writing a
program to burn a machine running SYN cookie code to the floor doesn't
look particularly hard. It is, in fact, far worse, since I can induce
fake state on many hosts that WILL NOT GO AWAY -- the machines will
think that a SYN-SYNACK exchange has happened after just getting a
single packet. Do a little creative guessing, and you can tie the
machine in knots -- worse knots than you could tie it in without SYN
cookies. Given this, why bother with the exercise? The whole thing
protects NOTHING.

The "cookie" idea was not invented by Dan. It was invented by me and
several other people on a private mailing list formed immediately
after the SYN flood thing first appeared. We rejected our own idea --
as well as variations on the theme like PING state probes -- because
they DON'T WORK. Dan came along, re-invented it, and has gotten lots
of people to adopt the concept.

I mean, I really loved my ping state probe idea. It was simple and
elegant, and just plain won't work on a real internet where things
filter ICMP.

> I don't think you have a leg to stand on in any such argument other
> than as proof that all should be documented.

Whatever, Alan. I must admit that I haven't done much on this topic
because I frankly don't care about the security of Linux boxes and
they are the only ones adopting this thing. Call me callous, but after
flames I got in the past trying to point out problems to linux types I
just decided "let 'em burn -- doesn't hurt me or my clients".

However, that doesn't mean its going to get into an RFC.

Perry

From owner-tcp-impl@relay.engr.sgi.com  Mon Jun  9 13:15:57 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA09863 for tcp-impl-list; Mon, 9 Jun 1997 13:13:45 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from odin.corp.sgi.com (odin.corp.sgi.com [192.26.51.194]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA09842; Mon, 9 Jun 1997 13:13:42 -0700
Received: from sgi.sgi.com by odin.corp.sgi.com via ESMTP (951211.SGI.8.6.12.PATCH1502/951211.SGI)
	 id NAA12069; Mon, 9 Jun 1997 13:13:42 -0700
Received: from im.marketcom2.com ([207.214.251.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA15166; Mon, 9 Jun 1997 13:09:37 -0700
	env-from (message@marketcom2.com)
From: message@marketcom2.com
Message-Id: <199706092009.NAA15166@sgi.sgi.com>
Received: from [206.97.144.207] by im.marketcom2.com
  (SMTPD32-3.03) id ADC415320146; Mon, 09 Jun 1997 12:47:16 -0700
Comments: Authenticated sender is <pubannouncement@[207.214.251.1]>
To: skproductions@answerme.com
Date: Mon, 9 Jun 1997 14:46:20 +0000
X-Distribution: Moderate
MIME-Version: 1.0
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Subject: Web Site Visibility
Reply-to: user@domain.com
X-mailer: Pegasus Mail for Win32 (v2.53/R1)
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


You are on the Web.  But can anybody find you?

Web Site Visibility is critical to your success online.

FREE demo of MARKETCOM WEB PROMOTION SPIDER 

does  Visibility Analysis for you.  Check it out at:

http://www.marketcom.com


From owner-tcp-impl@relay.engr.sgi.com  Thu Jun 12 04:23:37 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id EAA18741 for tcp-impl-list; Thu, 12 Jun 1997 04:21:43 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id EAA18733 for <tcp-impl@relay.engr.SGI.COM>; Thu, 12 Jun 1997 04:21:40 -0700
Received: from mercury.spider.com (mercury.spider.com [194.217.109.6]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id EAA09712
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 12 Jun 1997 04:21:37 -0700
	env-from (ian@spider.com)
Received: from asimov.spider.com (asimov.spider.com [194.217.109.66]) by mercury.spider.com (8.8.3/8.8.3) with SMTP id MAA14199; Thu, 12 Jun 1997 12:17:42 +0100 (BST)
Received: from malatesta by asimov.spider.com (SMI-8.6/SMI-SVR4)
	id MAA10373; Thu, 12 Jun 1997 12:17:16 +0100
Message-ID: <339FDABB.7F41@spider.com>
Date: Thu, 12 Jun 1997 12:17:15 +0100
From: Ian Heavens <ian@spider.com>
Organization: Spider Software Ltd.
X-Mailer: Mozilla 3.01 (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: touch@ISI.EDU
CC: tcp-impl@relay.engr.SGI.COM, vern@ee.lbl.gov
Subject: Re: TIME-WAIT truncation
References: <199706051826.AA07116@ash.isi.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

touch@ISI.EDU wrote:
> 
> > From owner-tcp-impl@relay.engr.SGI.COM Thu Jun  5 11:22:20 1997
> > To: tcp-impl@relay.engr.SGI.COM
> > Subject: TIME-WAIT truncation
> > Date: Thu, 05 Jun 1997 11:03:08 PDT
> > From: Vern Paxson <vern@ee.lbl.gov>
> >
> > Someone passed along the following URL via private email:
> >
> >       http://www.microsoft.com/kb/articles/Q151/4/18.htm
> >
> > It discusses a TCP implementation problem in which a connection can leave
> > TIME-WAIT before the full 2 MSL interval has elapsed, because there are
> > a limited (albeit large) number of TCBs available for TIME-WAIT.  My
> > impression is that there are other implementations that truncate TIME-WAIT
> > before a full 2 MSL because they use a definition of MSL smaller than
> > the standard one.
> >
> > Is there a volunteer interested in documenting this problem?  (Further
> > discussion of it is fine too, of course.)
> 
> Yup - we have some work here that's related, so this
> would be a fine place for me to volunteer...
>

I think it would be good to identify clearly the separate TIME-WAIT
issues since it is revisited fairly frequently.  There's at least the
following issues:

Need for TW (port wraparound and SN wraparound)
Adequacy of TIME-WAIT as a mechanism (MSL choice, TTL issues, 
    T/TCP/HSE extensions)
Implementation issues (port usage, memory and performance)
Workarounds - pros and cons (MSL reduction, TCB reuse)
TIME-WAIT and RSTs (RSTs are not a workaround to avoid TIME-WAIT).

Perhaps a section on TIME-WAIT in the draft?

ian


-- 
Ian Heavens, Spider Software Ltd., http://www.spider.com/
8 John's Place, Leith, Edinburgh EH6 7EL. 
Tel +44 131 475 7015 fax. +44 131 475 7001  ian@spider.com

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun 20 11:18:24 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA16029 for tcp-impl-list; Fri, 20 Jun 1997 11:10:21 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA15997 for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 11:10:17 -0700
Received: from VNET.IBM.COM (vnet.ibm.com [204.146.168.194]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id LAA14749
	for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 11:10:14 -0700
	env-from (kennethw@VNET.IBM.COM)
From: kennethw@VNET.IBM.COM
Message-Id: <199706201810.LAA14749@sgi.sgi.com>
Received: from RALVM12 by VNET.IBM.COM (IBM VM SMTP V2R3) with BSMTP id 2639;
   Fri, 20 Jun 97 14:04:05 EDT
Date: Fri, 20 Jun 97 13:58:31 EDT
To: tcp-impl@relay.engr.sgi.com
cc: andyc@VNET.IBM.COM
Subject: Proposed TCP Group Extensions
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Andy Capella and would like to submit to this working group an
internet-draft defining groups of objects to extend those
objects defined by RFC2012. We would base the initial submission
on a set of objects that we have implemented for our TCP
implementation in an Enterprise Specific MIB. We feel that there
is need to extend the group of objects defined by the current
TCP Group of objects from both a TCP Configuration prospective
and to provide useful information for a particular TCP Connection
by AUGMENTation of the TCP Connection Table.

We would prefer that this is done via a standard MIB as oppose
to Enterprise Specific MIB definition to enable interoperable Management
Application support. The following are the groups of objects
that we would define by function:

TCP CONFIGURATION OBJECTS <= These objects would be mandatory

tcpKeepAliveTimer  Integer32 (0..35791)     DEFVAL { 20 }
   "TCP Keepalive timer, expressed in minutes.
    A value of 0 deactivates the timer."
tcpReceiveBufferSize  Integer32 (256..262144)  DEFVAL { 16384 }
   "TCP Receive buffer size, expressed in OCTETs."
tcpSendBufferSize  Integer32 (256..262144)  DEFVAL { 16384 }
   "TCP Send buffer size, expressed in bytes."
tcpRestrictLowPorts  INTEGER { false(0), true(1) }   DEFVAL { false }
   "Indicates if TCP low ports are restricted to
    authorized servers/socket applications."

TCP CONNECTION TABLE AUGMENTATION <= Mandatory. I am working with
   a few others in the TN3270E working group on Response Time
   Measurement and I believe that from an application prospective
   that the following set of objects would be useful from at least
   a problem determination prospective.

tcpConnLastActivity  TimeTicks
  "The number of 100ths of seconds  since  this  entry
   was last used."
tcpConnBytesIn  Integer32
  "The number of bytes received from IP for this
   connection."
tcpConnBytesOut  Integer32
  "The number of bytes sent to IP for this connection."
tcpConnReXmt  Integer32
  "Number of retransmissions"
tcpConnRoundTripTime   Integer32
  "The amount of time that has elapsed, measured in
   milliseconds, from when the last TCP segment was
   transmitted by the TCP Stack until the ACK was
   received."
tcpConnRoundTripVariance  Integer32
  "Round trip time variance."

TCP CONNECTION TABLE AUGMENTATION <= optional

tcpConnActiveOpen  Integer32
  "The number of times that this connection has made a
   direct transition to the SYN-RCVD state from the
   listen state."
tcpConnOptions   OCTET STRING (SIZE(1..40))
  "IP options (see RFC 791)"
tcpConnOutBuffered  Integer32
  "Number of outgoing bytes buffered"
tcpConnUsrSndNxt  Integer32
  "Sequence number of next byte for user"
tcpConnSndNxt  Integer32
  "Sequence number of next byte for TCP"
tcpConnSndUna  Integer32
  "Sequence number of sent/unacked byte"
tcpConnOutgoingPush  Integer32
   "Sequence number of last pushed byte"
tcpConnOutgoingUrg  Integer32
   "Sequence number of last urg byte"
tcpConnOutgoingWinSeq  Integer32
   "Last sequence number in snd window"
tcpConnSendWindowSeq  Integer32
   "Last sequence number used, win update"
tcpConnSendWindowAck   Integer32
   "Last Ack number used, win update"
tcpConnInBuffered  Integer32
   "Number of incoming bytes buffered"
tcpConnRcvNxt  Integer32
   "Sequence number of next byte for TCP"
tcpConnUsrRcvNxt  Integer32
   "Sequence number of next byte for user"
tcpConnIncomingPush  Integer32
   "Sequence number of last pushed byte"
tcpConnIncomingUrg  Integer32
  "Sequence number of 'urgent' byte received"
tcpConnIncomingWinSeq   Integer32
  "Last sequence number in receive window"
tcpConnMaxSndWnd   Integer32
  "Maximum send window seen"
tcpConnReXmtCount  Integer32
  "Current retransmission count"
tcpConnCongestionWnd  Integer32
  "Congestion window"
tcpConnSSThresh  Integer32
  "Slow start threshold"
tcpConnInitSndSeq  Integer32
  "Initial Send Sequence Number"
tcpConnInitRcvSeq  Integer32
  "Initial Receive Sequence Number"
tcpConnSendMSS   Integer32
  "Maximum Segment Size we can send"
tcpConnSndWl1  Integer32
  "Sequence of last window"
tcpConnSndWl2   Integer32
  "Ack of last window"
tcpConnSndWnd   Integer32
  "Send Window size"
tcpConnPendTcpRecv  Integer32
  "TCP non_block read flag"
tcpConnRcvBufSize  Integer32
  "Receive buffer size"
tcpConnRttSeq   Integer32
  "4 byte value for the TCP sequence number"
tcpConnBackoffCount  Integer32
  "The value of this object will always be zero since
   it currently isn't reported."

Is there an interest in this type of submission?

Thanks, Ken White
IBM Networking Systems

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun 20 11:51:51 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA28878 for tcp-impl-list; Fri, 20 Jun 1997 11:48:36 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA28870 for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 11:48:33 -0700
Received: from jekyll.piermont.com (jekyll.piermont.com [206.1.51.15]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA25057
	for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 11:48:27 -0700
	env-from (perry@jekyll.piermont.com)
Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by jekyll.piermont.com (8.8.5/8.6.12) with SMTP id OAA15435; Fri, 20 Jun 1997 14:44:18 -0400 (EDT)
Message-Id: <199706201844.OAA15435@jekyll.piermont.com>
X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
To: kennethw@vnet.ibm.com
cc: tcp-impl@relay.engr.sgi.com, andyc@vnet.ibm.com
Subject: Re: Proposed TCP Group Extensions 
In-reply-to: Your message of "Fri, 20 Jun 1997 13:58:31 EDT."
             <199706201810.LAA14749@sgi.sgi.com> 
Reply-To: perry@piermont.com
X-Reposting-Policy: redistribute only with permission
Date: Fri, 20 Jun 1997 14:44:16 -0400
From: "Perry E. Metzger" <perry@piermont.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


If I were a security breaker, I'd love to have that list of information
available over the net. It would make cracking systems ever so much
easier. Having TCP sequence numbers handed to you without having to
guess would make hijacking connections far simpler than it is now, for
instance.

Indeed, I'd say you've just listed virtually everything I'd want to
know about the internal state of a remote machine's TCP to make it
easier for me to attack the system. I couldn't have drawn up a better
list if I'd been intentionally trying.

Perry

kennethw@VNET.IBM.COM writes:
> Andy Capella and would like to submit to this working group an
> internet-draft defining groups of objects to extend those
> objects defined by RFC2012. We would base the initial submission
> on a set of objects that we have implemented for our TCP
> implementation in an Enterprise Specific MIB. We feel that there
> is need to extend the group of objects defined by the current
> TCP Group of objects from both a TCP Configuration prospective
> and to provide useful information for a particular TCP Connection
> by AUGMENTation of the TCP Connection Table.
> 
> We would prefer that this is done via a standard MIB as oppose
> to Enterprise Specific MIB definition to enable interoperable Management
> Application support. The following are the groups of objects
> that we would define by function:
> 
> TCP CONFIGURATION OBJECTS <= These objects would be mandatory
> 
> tcpKeepAliveTimer  Integer32 (0..35791)     DEFVAL { 20 }
>    "TCP Keepalive timer, expressed in minutes.
>     A value of 0 deactivates the timer."
> tcpReceiveBufferSize  Integer32 (256..262144)  DEFVAL { 16384 }
>    "TCP Receive buffer size, expressed in OCTETs."
> tcpSendBufferSize  Integer32 (256..262144)  DEFVAL { 16384 }
>    "TCP Send buffer size, expressed in bytes."
> tcpRestrictLowPorts  INTEGER { false(0), true(1) }   DEFVAL { false }
>    "Indicates if TCP low ports are restricted to
>     authorized servers/socket applications."
> 
> TCP CONNECTION TABLE AUGMENTATION <= Mandatory. I am working with
>    a few others in the TN3270E working group on Response Time
>    Measurement and I believe that from an application prospective
>    that the following set of objects would be useful from at least
>    a problem determination prospective.
> 
> tcpConnLastActivity  TimeTicks
>   "The number of 100ths of seconds  since  this  entry
>    was last used."
> tcpConnBytesIn  Integer32
>   "The number of bytes received from IP for this
>    connection."
> tcpConnBytesOut  Integer32
>   "The number of bytes sent to IP for this connection."
> tcpConnReXmt  Integer32
>   "Number of retransmissions"
> tcpConnRoundTripTime   Integer32
>   "The amount of time that has elapsed, measured in
>    milliseconds, from when the last TCP segment was
>    transmitted by the TCP Stack until the ACK was
>    received."
> tcpConnRoundTripVariance  Integer32
>   "Round trip time variance."
> 
> TCP CONNECTION TABLE AUGMENTATION <= optional
> 
> tcpConnActiveOpen  Integer32
>   "The number of times that this connection has made a
>    direct transition to the SYN-RCVD state from the
>    listen state."
> tcpConnOptions   OCTET STRING (SIZE(1..40))
>   "IP options (see RFC 791)"
> tcpConnOutBuffered  Integer32
>   "Number of outgoing bytes buffered"
> tcpConnUsrSndNxt  Integer32
>   "Sequence number of next byte for user"
> tcpConnSndNxt  Integer32
>   "Sequence number of next byte for TCP"
> tcpConnSndUna  Integer32
>   "Sequence number of sent/unacked byte"
> tcpConnOutgoingPush  Integer32
>    "Sequence number of last pushed byte"
> tcpConnOutgoingUrg  Integer32
>    "Sequence number of last urg byte"
> tcpConnOutgoingWinSeq  Integer32
>    "Last sequence number in snd window"
> tcpConnSendWindowSeq  Integer32
>    "Last sequence number used, win update"
> tcpConnSendWindowAck   Integer32
>    "Last Ack number used, win update"
> tcpConnInBuffered  Integer32
>    "Number of incoming bytes buffered"
> tcpConnRcvNxt  Integer32
>    "Sequence number of next byte for TCP"
> tcpConnUsrRcvNxt  Integer32
>    "Sequence number of next byte for user"
> tcpConnIncomingPush  Integer32
>    "Sequence number of last pushed byte"
> tcpConnIncomingUrg  Integer32
>   "Sequence number of 'urgent' byte received"
> tcpConnIncomingWinSeq   Integer32
>   "Last sequence number in receive window"
> tcpConnMaxSndWnd   Integer32
>   "Maximum send window seen"
> tcpConnReXmtCount  Integer32
>   "Current retransmission count"
> tcpConnCongestionWnd  Integer32
>   "Congestion window"
> tcpConnSSThresh  Integer32
>   "Slow start threshold"
> tcpConnInitSndSeq  Integer32
>   "Initial Send Sequence Number"
> tcpConnInitRcvSeq  Integer32
>   "Initial Receive Sequence Number"
> tcpConnSendMSS   Integer32
>   "Maximum Segment Size we can send"
> tcpConnSndWl1  Integer32
>   "Sequence of last window"
> tcpConnSndWl2   Integer32
>   "Ack of last window"
> tcpConnSndWnd   Integer32
>   "Send Window size"
> tcpConnPendTcpRecv  Integer32
>   "TCP non_block read flag"
> tcpConnRcvBufSize  Integer32
>   "Receive buffer size"
> tcpConnRttSeq   Integer32
>   "4 byte value for the TCP sequence number"
> tcpConnBackoffCount  Integer32
>   "The value of this object will always be zero since
>    it currently isn't reported."
> 
> Is there an interest in this type of submission?
> 
> Thanks, Ken White
> IBM Networking Systems

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun 20 13:05:31 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA19984 for tcp-impl-list; Fri, 20 Jun 1997 13:02:36 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA19950 for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 13:02:32 -0700
Received: from VNET.IBM.COM (vnet.ibm.com [204.146.168.194]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA15196
	for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 13:02:29 -0700
	env-from (kennethw@VNET.IBM.COM)
From: kennethw@VNET.IBM.COM
Message-Id: <199706202002.NAA15196@sgi.sgi.com>
Received: from RALVM12 by VNET.IBM.COM (IBM VM SMTP V2R3) with BSMTP id 7862;
   Fri, 20 Jun 97 16:02:24 EDT
Date: Fri, 20 Jun 97 15:58:28 EDT
To: tcp-impl@relay.engr.sgi.com
cc: andyc@VNET.IBM.COM, perry@piermont.com
Subject: Re: Proposed TCP Group Extensions
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Perry,

>If I were a security breaker, I'd love to have that list of information
>available over the net. It would make cracking systems ever so much
>easier. Having TCP sequence numbers handed to you without having to
>guess would make hijacking connections far simpler than it is now, for
>instance.

>Indeed, I'd say you've just listed virtually everything I'd want to
>know about the internal state of a remote machine's TCP to make it
>easier for me to attack the system. I couldn't have drawn up a better
>list if I'd been intentionally trying.

You raise a good point that certainly needs to be addressed from
several prospectives. First, the set of objects that I think
you are referring to would be defined in an optional group and not
mandatory. The objects in the mandatory groups are no worst off
from a security prospective than those already defined in the current
TCP Group. Our intent is not to obsolete the current TCP Group defined
by RFC2012 but to define some useful extensions. All MIB internet-drafts
must address security in a security section. Typically, remote creation
and in general SET support is addressed by stating that implementers
do not need to support these objects/functions in a insecure
environment. I could add the recommendation that the optional group
also not be supported in an insecure network.

The SNMPv3 Working Group is planning to come out with a set of standards
by the end of this year to provide both a authentication and privacy
(encryption) framework. An administrator can define object views
with the security QoS defined as appropriate on that view. The
security framework that is available to a platform implementing these
objects would dictate what gets implemented with the internet-draft
providing guidance. The implementation that I worked on has implemented
SNMPv2u (RFC1909 and RFC1910) in order to provide some level of
protection while waiting for SNMPv3.

The second prospective on the issue is exactly how useful this
information would be to an entity attempting to steal or misuse
a connection. Using SNMP to retrieve this information while
looking at the data flow in real-time to attempt to steal or
misuse a connection would be challenging for an active connection
since I suspect that the data retrieved via SNMP would not
correlate exactly to the current connection data flow.

Finally, I understand that there are various security layers that
have and are being defined within the IETF. For example, IP
Security. If the IP layer is protected does presentation of the
objects in the optional group till make misuse easier? I haven't
been following the security work so don't know. Regardless of
this these objects would still fall within the SNMP Security
Framework available to a platform implementing any of these
objects. Does this address your concern?

Regards, Ken

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun 20 13:31:13 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA27899 for tcp-impl-list; Fri, 20 Jun 1997 13:28:23 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA27883 for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 13:28:20 -0700
Received: from jekyll.piermont.com ([206.1.51.15]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA20587
	for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 13:28:16 -0700
	env-from (perry@jekyll.piermont.com)
Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by jekyll.piermont.com (8.8.5/8.6.12) with SMTP id QAA15828; Fri, 20 Jun 1997 16:23:42 -0400 (EDT)
Message-Id: <199706202023.QAA15828@jekyll.piermont.com>
X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
To: kennethw@vnet.ibm.com
cc: tcp-impl@relay.engr.sgi.com, andyc@vnet.ibm.com
Subject: Re: Proposed TCP Group Extensions 
In-reply-to: Your message of "Fri, 20 Jun 1997 15:58:28 EDT."
             <199706202002.QAA14292@frankenstein.piermont.com> 
Reply-To: perry@piermont.com
X-Reposting-Policy: redistribute only with permission
Date: Fri, 20 Jun 1997 16:23:42 -0400
From: "Perry E. Metzger" <perry@piermont.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


kennethw@VNET.IBM.COM writes:
> The second prospective on the issue is exactly how useful this
> information would be to an entity attempting to steal or misuse
> a connection. Using SNMP to retrieve this information while
> looking at the data flow in real-time to attempt to steal or
> misuse a connection would be challenging for an active connection
> since I suspect that the data retrieved via SNMP would not
> correlate exactly to the current connection data flow.

I see you haven't been examining the literature on session
stealing. You can already do it with very, very rough guesses -- the
SNMP provided data would make the guess range microscopic -- so small
as to guarantee success. You are giving out the sorts of information
that improve the chances of success with a wide variety of attacks
that are now marginal, too.

> Finally, I understand that there are various security layers that
> have and are being defined within the IETF. For example, IP
> Security. If the IP layer is protected does presentation of the
> objects in the optional group till make misuse easier?

I'm an active participant in the security area. As one of the people
who has his name on some of the IPSEC RFCs, let me say that encryption
isn't going to be ubiquitous any time soon.

> Regardless of this these objects would still fall within the SNMP
> Security Framework available to a platform implementing any of these
> objects. Does this address your concern?

No. You still have leaked very, very sensitive information to a far
wider perimeter than has obvious need for the data. It makes me very,
very jittery, especially given the quality of past SNMP security
efforts.

Perry

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun 20 13:33:12 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA29082 for tcp-impl-list; Fri, 20 Jun 1997 13:31:37 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA29038 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 20 Jun 1997 13:31:34 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id OAA18037 for tcp-impl@cthulhu.engr.sgi.com; Fri, 20 Jun 1997 14:31:16 -0600
Date: Fri, 20 Jun 1997 14:31:16 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706202031.OAA18037@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Proposed TCP Group Extensions
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> The second prospective on the issue is exactly how useful this
> information would be to an entity attempting to steal or misuse
> a connection. Using SNMP to retrieve this information while
> looking at the data flow in real-time to attempt to steal or
> misuse a connection would be challenging for an active connection
> since I suspect that the data retrieved via SNMP would not
> correlate exactly to the current connection data flow.

If the data are useful for the merely curious, then they are also
useful for those with "practical" goals.


More to the point, of what use are those values?  I've long had a
jandiced view of what many people call "MIB Mania," but stuffing not
just the TCP state machine state into a MIB but also the de facto
congestion avoidance and other state seems over the top even if you
don't believe the IETF is suffering from MIB Mania.

Never mind the non-trivial costs of keeping all of that information
where it can be gathered by your SNMP code.  Don't worry about the
costs of gathering it.  Never mind the Heisenberg effects of gathering
and reporting it (e.g. consider the effects on the RTT as all of those
SNMP packets stack up in router queues).  What are you going to do when
we implement more knobs and buttons?

Sometimes it is necessary to determine details the state of a system.
However, I think that is best done other than with SNMP.  SNMP is not a
replacement for all other debugging and remote-access tools.


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun 20 14:02:21 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA06517 for tcp-impl-list; Fri, 20 Jun 1997 13:59:00 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA06505 for <tcp-impl@relay.engr.SGI.COM>; Fri, 20 Jun 1997 13:58:58 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA01476
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 20 Jun 1997 13:58:52 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA04347>; Fri, 20 Jun 1997 13:55:04 -0700
Date: Fri, 20 Jun 1997 13:54:57 -0700
Posted-Date: Fri, 20 Jun 1997 13:54:57 -0700
Message-Id: <199706202054.NAA25625@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <NAA25625>; Fri, 20 Jun 1997 13:54:57 -0700
To: tcp-impl@relay.engr.SGI.COM, vjs@mica.denver.sgi.com
Subject: Re: Proposed TCP Group Extensions
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From owner-tcp-impl@relay.engr.SGI.COM Fri Jun 20 13:35:15 1997
> Date: Fri, 20 Jun 1997 14:31:16 -0600
> From: vjs@mica.denver.sgi.com (Vernon Schryver)
> To: tcp-impl@relay.engr.SGI.COM
> Subject: Re: Proposed TCP Group Extensions
> 
> > The second prospective on the issue is exactly how useful this
> > information would be to an entity attempting to steal or misuse
> > a connection. Using SNMP to retrieve this information while
> > looking at the data flow in real-time to attempt to steal or
> > misuse a connection would be challenging for an active connection
> > since I suspect that the data retrieved via SNMP would not
> > correlate exactly to the current connection data flow.
> 
> If the data are useful for the merely curious, then they are also
> useful for those with "practical" goals.
> 
> 
> More to the point, of what use are those values?  I've long had a
> jandiced view of what many people call "MIB Mania," but stuffing not
> just the TCP state machine state into a MIB but also the de facto
> congestion avoidance and other state seems over the top even if you
> don't believe the IETF is suffering from MIB Mania.

See RFC-2140. Also addresses the security implications.

The general idea is that they _might_ be useful for
LAN-wide cacheing of TCP startup parameters. Not to mention 
statistics (but I'm not sure MIB info on such statistics is
safe, security-wise, in any case).

Joe

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun 20 14:13:02 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA09950 for tcp-impl-list; Fri, 20 Jun 1997 14:10:29 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA09943 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 20 Jun 1997 14:10:26 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id PAA18296 for tcp-impl@cthulhu.engr.sgi.com; Fri, 20 Jun 1997 15:10:13 -0600
Date: Fri, 20 Jun 1997 15:10:13 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706202110.PAA18296@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Proposed TCP Group Extensions
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: touch@ISI.EDU

> ...
> > More to the point, of what use are those values?  I've long had a
> > jandiced view of what many people call "MIB Mania," but stuffing not
> > just the TCP state machine state into a MIB but also the de facto
> > congestion avoidance and other state seems over the top even if you
> > don't believe the IETF is suffering from MIB Mania.
> 
> See RFC-2140. Also addresses the security implications.
> 
> The general idea is that they _might_ be useful for
> LAN-wide cacheing of TCP startup parameters. Not to mention 
> statistics (but I'm not sure MIB info on such statistics is
> safe, security-wise, in any case).

Your Informational RFC does not convince me.  For example, of what
earthly good is it to cache the MSS?  Are you proposing to remove the
MSS negotiation?  I don't mean to imply that you should not
experiment.  Maybe something will turn out to be useful, particularly
the RTT and perhaps the congestion windows.  (Why didn't you mention
congestion windows?)   Personally, I would write, deploy, and test code
first and wrte RFC's second, but that probably just shows my age and
decrepitude.

I disagree with the statement that RFC 2140 "addresses <<the>> security
implications."  I would agree that it addresses <<some of the>> obvious
security worries.


As long as no one expects the suggest MIB values to be widely
implemented, defining them is probably the politically correct,
if MIB Manical course.


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun 20 14:24:30 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA12878 for tcp-impl-list; Fri, 20 Jun 1997 14:21:00 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA12864 for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 14:20:58 -0700
Received: from VNET.IBM.COM (vnet.ibm.com [204.146.168.194]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id OAA09482
	for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 14:20:56 -0700
	env-from (kennethw@VNET.IBM.COM)
From: kennethw@VNET.IBM.COM
Message-Id: <199706202120.OAA09482@sgi.sgi.com>
Received: from RALVM12 by VNET.IBM.COM (IBM VM SMTP V2R3) with BSMTP id 0897;
   Fri, 20 Jun 97 17:14:52 EDT
Date: Fri, 20 Jun 97 17:11:15 EDT
To: tcp-impl@relay.engr.sgi.com
Subject: Re: Proposed TCP Group Extensions
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Vernon,

>> The second prospective on the issue is exactly how useful this
>> information would be to an entity attempting to steal or misuse
>> a connection. Using SNMP to retrieve this information while
>> looking at the data flow in real-time to attempt to steal or
>> misuse a connection would be challenging for an active connection
>> since I suspect that the data retrieved via SNMP would not
>> correlate exactly to the current connection data flow.

>If the data are useful for the merely curious, then they are also
>useful for those with "practical" goals.

>More to the point, of what use are those values?  I've long had a
>jandiced view of what many people call "MIB Mania," but stuffing not
>just the TCP state machine state into a MIB but also the de facto
>congestion avoidance and other state seems over the top even if you
>don't believe the IETF is suffering from MIB Mania.

I placed the TCP State information in an optional group exactly
because I was concerned with the cost and security issues with
respect to this. I don't believe that those objects in the
mandatory groups are as debatable but would appreciate some
feed back on them as well as the optional group of TCP Connection
Table objects.

>Never mind the non-trivial costs of keeping all of that information
>where it can be gathered by your SNMP code.  Don't worry about the
>costs of gathering it.  Never mind the Heisenberg effects of gathering
>and reporting it (e.g. consider the effects on the RTT as all of those
>SNMP packets stack up in router queues).  What are you going to do when
>we implement more knobs and buttons?

I was thinking that the two groups that AUGMENT the TCP Connection
Table would be implemented in separate tables to make it easier
for a management application. I can visualize a management application
performing a GET-BULK of the entire mandatory group but not the
optional group. The optional group of objects are of use in
problem determination. You would retrieve a single row in this table
as oppose to the whole table. Perhaps some of the objects in the
optional group should move to the mandatory one like tcpConnMaxSndWnd
for example.

As a MIB implementer the cost of implementation is some what
invariant with respect to the number of objects. If you have access
to the TCP connection information than doing 1 or 50 objects
is not that much more. I do agree that the modeling of the data
presented needs to be carefully thought out and that use of the
data being presented debated.

>Sometimes it is necessary to determine details the state of a system.
>However, I think that is best done other than with SNMP.  SNMP is not a
>replacement for all other debugging and remote-access tools.

I agree that SNMP is not the only tool that can nor should be used.
It is however the management protocol of the IETF. Weather or not
SNMP is used to retrieve management information in particular
circumstances is in my opinion implementation dependent and doesn't
preclude modeling as MIB objects.

What is your opinion with respect to the objects in the mandatory
groups? Are there other objects that should be added? Are there
objects in the optional group that should move to a mandatory
group? Should the optional group be split into categories of objects?
Perhaps not all are as controversial.

My original posting of objects was not intended to present a
final list of objects and their groupings but to determine if there
is interest in defining sets of objects to extend the TCP Group.

Thanks, Ken

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun 20 14:39:01 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA17536 for tcp-impl-list; Fri, 20 Jun 1997 14:35:44 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA17513 for <tcp-impl@relay.engr.SGI.COM>; Fri, 20 Jun 1997 14:35:40 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id OAA14943
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 20 Jun 1997 14:35:39 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA06328>; Fri, 20 Jun 1997 14:31:50 -0700
Date: Fri, 20 Jun 1997 14:31:43 -0700
Posted-Date: Fri, 20 Jun 1997 14:31:43 -0700
Message-Id: <199706202131.OAA26597@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <OAA26597>; Fri, 20 Jun 1997 14:31:43 -0700
To: tcp-impl@relay.engr.SGI.COM, vjs@mica.denver.sgi.com
Subject: Re: Proposed TCP Group Extensions
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From owner-tcp-impl@relay.engr.SGI.COM Fri Jun 20 14:15:29 1997
> Date: Fri, 20 Jun 1997 15:10:13 -0600
> From: vjs@mica.denver.sgi.com (Vernon Schryver)
> To: tcp-impl@relay.engr.SGI.COM
> Subject: Re: Proposed TCP Group Extensions
> 
> > From: touch@ISI.EDU
> 
> > ...
> > > More to the point, of what use are those values?  I've long had a
> > > jandiced view of what many people call "MIB Mania," but stuffing not
> > > just the TCP state machine state into a MIB but also the de facto
> > > congestion avoidance and other state seems over the top even if you
> > > don't believe the IETF is suffering from MIB Mania.
> > 
> > See RFC-2140. Also addresses the security implications.
> > 
> > The general idea is that they _might_ be useful for
> > LAN-wide cacheing of TCP startup parameters. Not to mention 
> > statistics (but I'm not sure MIB info on such statistics is
> > safe, security-wise, in any case).
> 
> Your Informational RFC does not convince me.  For example, of what
> earthly good is it to cache the MSS?  Are you proposing to remove the
> MSS negotiation?  I don't mean to imply that you should not

MSS isn't the interesting case. Window size, RTT, and variance thereof
are much more useful, as mentioned in the RFC.

If my host spends 10 RTTs converging on a good set of these values
to connect to a host overseas, it might be useful if another host
in the LAN can re-use my values, rather than using (conservative, but)
wrong values.

> experiment.  Maybe something will turn out to be useful, particularly
> the RTT and perhaps the congestion windows.  (Why didn't you mention
> congestion windows?)   Personally, I would write, deploy, and test code

I did (page 1, see list).

Sharing code has already been implemented in T/TCP, as also indicated
in the RFC. The issue is what algorithm should be used to govern the
sharing. Sharing (or not) is an implementation detail that this RFC
raises as having protocol implications. Another implementation is
exactly what doesn't help solve that issue.

Joe

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun 20 14:59:08 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA24549 for tcp-impl-list; Fri, 20 Jun 1997 14:56:19 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA24541 for <tcp-impl@relay.engr.SGI.COM>; Fri, 20 Jun 1997 14:56:16 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA21918
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 20 Jun 1997 14:56:14 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id WAA23258; Fri, 20 Jun 1997 22:53:23 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wfBqg-0005FdC; Fri, 20 Jun 97 23:08 BST
Message-Id: <m0wfBqg-0005FdC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Proposed TCP Group Extensions
To: perry@piermont.com
Date: Fri, 20 Jun 1997 23:08:06 +0100 (BST)
Cc: kennethw@vnet.ibm.com, tcp-impl@relay.engr.SGI.COM, andyc@vnet.ibm.com
In-Reply-To: <199706202023.QAA15828@jekyll.piermont.com> from "Perry E. Metzger" at Jun 20, 97 04:23:42 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I'm an active participant in the security area. As one of the people
> who has his name on some of the IPSEC RFCs, let me say that encryption
> isn't going to be ubiquitous any time soon.

This is becoming more and more obvious. Also IPSEC still doesnt address
the ICMP MUST FRAGMENT tcp denial attacks


From owner-tcp-impl@relay.engr.sgi.com  Fri Jun 20 15:54:27 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA07746 for tcp-impl-list; Fri, 20 Jun 1997 15:52:09 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA07725 for <tcp-impl@relay.engr.SGI.COM>; Fri, 20 Jun 1997 15:52:02 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA08745
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 20 Jun 1997 15:51:46 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id XAA23676; Fri, 20 Jun 1997 23:45:19 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wfC2r-0005FdC; Fri, 20 Jun 97 23:20 BST
Message-Id: <m0wfC2r-0005FdC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Proposed TCP Group Extensions
To: kennethw@VNET.IBM.COM
Date: Fri, 20 Jun 1997 23:20:41 +0100 (BST)
Cc: tcp-impl@relay.engr.SGI.COM, andyc@VNET.IBM.COM
In-Reply-To: <199706201810.LAA14749@sgi.sgi.com> from "kennethw@VNET.IBM.COM" at Jun 20, 97 01:58:31 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> tcpRestrictLowPorts  INTEGER { false(0), true(1) }   DEFVAL { false }
>    "Indicates if TCP low ports are restricted to
>     authorized servers/socket applications."

Some systems already let you set bitmasks of ports. So this doesnt work.

> tcpConnBytesIn  Integer32
>   "The number of bytes received from IP for this
>    connection."

We wrap int32's of bytes on a connection in minutes over a fast network

> tcpConnRoundTripTime   Integer32
>   "The amount of time that has elapsed, measured in
>    milliseconds, from when the last TCP segment was
>    transmitted by the TCP Stack until the ACK was
>    received."

But that isnt the round trip time the stack keeps

> Is there an interest in this type of submission?

Several of your other values blindly assume VJ flow control. This may be
true now but not I suspect in the future.


From owner-tcp-impl@relay.engr.sgi.com  Fri Jun 20 15:59:39 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA09284 for tcp-impl-list; Fri, 20 Jun 1997 15:57:42 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA09250 for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 15:57:32 -0700
Received: from jekyll.piermont.com (jekyll.piermont.com [206.1.51.15]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA10721
	for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 15:57:31 -0700
	env-from (perry@jekyll.piermont.com)
Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by jekyll.piermont.com (8.8.5/8.6.12) with SMTP id SAA16366; Fri, 20 Jun 1997 18:53:13 -0400 (EDT)
Message-Id: <199706202253.SAA16366@jekyll.piermont.com>
X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
cc: tcp-impl@relay.engr.sgi.com
Subject: Re: Proposed TCP Group Extensions 
In-reply-to: Your message of "Fri, 20 Jun 1997 23:08:06 BST."
             <m0wfBqg-0005FdC@lightning.swansea.linux.org.uk> 
Reply-To: perry@piermont.com
X-Reposting-Policy: redistribute only with permission
Date: Fri, 20 Jun 1997 18:53:07 -0400
From: "Perry E. Metzger" <perry@piermont.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Alan Cox writes:
> > I'm an active participant in the security area. As one of the people
> > who has his name on some of the IPSEC RFCs, let me say that encryption
> > isn't going to be ubiquitous any time soon.
> 
> This is becoming more and more obvious. Also IPSEC still doesnt address
> the ICMP MUST FRAGMENT tcp denial attacks

What attacks would those be?

Perry

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun 20 16:37:26 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA17886 for tcp-impl-list; Fri, 20 Jun 1997 16:34:12 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA17859 for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 16:34:09 -0700
Received: from jekyll.piermont.com (jekyll.piermont.com [206.1.51.15]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA20705
	for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 16:34:07 -0700
	env-from (perry@jekyll.piermont.com)
Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by jekyll.piermont.com (8.8.5/8.6.12) with SMTP id TAA16496; Fri, 20 Jun 1997 19:29:46 -0400 (EDT)
Message-Id: <199706202329.TAA16496@jekyll.piermont.com>
X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
cc: kennethw@vnet.ibm.com, tcp-impl@relay.engr.sgi.com, andyc@vnet.ibm.com
Subject: Re: Proposed TCP Group Extensions 
In-reply-to: Your message of "Fri, 20 Jun 1997 23:20:41 BST."
             <m0wfC2r-0005FdC@lightning.swansea.linux.org.uk> 
Reply-To: perry@piermont.com
X-Reposting-Policy: redistribute only with permission
Date: Fri, 20 Jun 1997 19:29:46 -0400
From: "Perry E. Metzger" <perry@piermont.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Alan Cox writes:
> > tcpRestrictLowPorts  INTEGER { false(0), true(1) }   DEFVAL { false }
> >    "Indicates if TCP low ports are restricted to
> >     authorized servers/socket applications."
> 
> Some systems already let you set bitmasks of ports. So this doesnt work.

Also, this is valuable information to an attacker.

> > tcpConnBytesIn  Integer32
> >   "The number of bytes received from IP for this
> >    connection."
> 
> We wrap int32's of bytes on a connection in minutes over a fast network

BTW, this is also useful to an attacker. It lets you estimate what the
sequence number should be for the connection even if you don't get it
explicitly.

> Several of your other values blindly assume VJ flow control. This may be
> true now but not I suspect in the future.

I would be very shocked to discover something other than VJ congestion
control showing up in the forseeable future, though I won't say
'never'.

Perry

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun 20 17:30:39 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA29382 for tcp-impl-list; Fri, 20 Jun 1997 17:28:38 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA29362 for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 17:28:35 -0700
Received: from VNET.IBM.COM (vnet.ibm.com [204.146.168.194]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id RAA01852
	for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 17:28:31 -0700
	env-from (kennethw@VNET.IBM.COM)
From: kennethw@VNET.IBM.COM
Message-Id: <199706210028.RAA01852@sgi.sgi.com>
Received: from RALVM12 by VNET.IBM.COM (IBM VM SMTP V2R3) with BSMTP id 3886;
   Fri, 20 Jun 97 20:22:28 EDT
Date: Fri, 20 Jun 97 20:16:21 EDT
To: tcp-impl@relay.engr.sgi.com
cc: alan@lxorguk.ukuu.org.uk, andyc@VNET.IBM.COM
Subject: Re: Proposed TCP Group Extensions
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Alan,
   Thanks for your feedback.

>> tcpRestrictLowPorts  INTEGER { false(0), true(1) }   DEFVAL { false }
>>    "Indicates if TCP low ports are restricted to
>>     authorized servers/socket applications."

>Some systems already let you set bitmasks of ports. So this doesnt work

This object was intended to indicate weather the low ports are
restricted on not. Some systems restrict low port usage to
authorized applications. We allow the customer to remove this
restriction. My view point on this was that the concept of
low port restriction is a common concept. In our implementation we also
have a PORT Table to specify which applications are allowed access
to which ports. I wasn't sure about representing this in a common
way. I assumed that it was more common for implementations to
either restrict or not low port usage and that specification of
which apps are being allocated to which ports to be too implementation
dependent.

>> tcpConnBytesIn  Integer32
>>   "The number of bytes received from IP for this
>>    connection."

>We wrap int32's of bytes on a connection in minutes over a fast network

Good point. First both tcpConnBytesIn and BytesOut should be
Counter32 not Integer32. Second, we could do something similar to
what the new ifMib has done and define two new objects:

   tcpConnHCBytesIn   Counter64
   tcpConnHCBytesOut  Counter64

The ifMib states:

   "For interfaces that operate at 20,000,000 (20 million) bits per
    second or less, 32-bit byte and packet counters MUST be used.  For
    interfaces that operate faster than 20,000,000 bits/second, and
    slower than 650,000,000 bits/second, 32-bit packet counters MUST
    be used and 64-bit octet counters MUST be used.  For interfaces
    that operate at 650,000,000 bits/second or faster, 64-bit packet
    counters AND 64-bit octet counters MUST be used."

TCP doesn't have an equivalent to a fixed MTU and a packet count
so we could just state that implementations that are designed
to handle connections transferring at 20,000,000 bits per second or
less would be required to only implement the 32 bit counters.
Implementations that support connections that can handle data transfer
rates faster than 20,000,000 bits/second must implement the 64-bit
counters as well. I'm not a fan of Counter64 objects. I've done them
for ATM Interfaces. The only benefit you get is that once you
implement the 64 bit counters then the 32 bit ones are "free" since
they are just the lower 32-bits of the 64-bit ones. Quoting from the
ifMib the expect wrap conditions are:

"(1) The cost of maintaining 64-bit counters is relatively high,
     so minimizing the number of agents which must support them is
     desirable.  Common interfaces (such as 10Mbs Ethernet) should
     not require them.
(2)  64-bit counters are a new feature, introduced in SNMPv2.  It
     is reasonable to expect that support for them will be spotty
     for the immediate future.  Thus, we wish to limit them to as
     few systems as possible.  This, in effect, means that 64-bit
     counters should be limited to higher speed interfaces.
     Ethernet (10,000,000 bps) and Token Ring (16,000,000 bps) are
     fairly wide-spread so it seems reasonable to not require 64-
     bit counters for these interfaces.
(3)  The 32-bit octet counters will wrap in the following times,
     for the following interfaces (when transmitting maximum-sized
     packets back-to-back):
     -   10Mbs Ethernet: 57 minutes,
     -   16Mbs Token Ring: 36 minutes,
     -   a US T3 line (45 megabits): 12 minutes,
     -   FDDI: 5.7 minutes
(4)  The 32-bit packet counters wrap in about 57 minutes when 64-
     byte packets are transmitted back-to-back on a 650,000,000
     bit/second link.
As an aside, a 1-terabit/second (1,000 Gbs) link will cause a 64
bit octet counter to wrap in just under 5 years.  Conversely, an
81,000,000 terabit/second link is required to cause a 64-bit
counter to wrap in 30 minutes.  We believe that, while technology
rapidly marches forward, this link speed will not be achieved for
at least several years, leaving sufficient time to evaluate the
introduction of 96 bit counters."

Most of the TCP Connections in my environment won't ever wrap
a Counter32 object. Most are Telnet sessions transferring
relatively small amounts of data. Even the FTP sessions that I've
seen don't get near to a wrap condition. In general wrapping
32 bit counters is less likely at the TCP layer than at either the
interface or the IP layers.

We may also want to consider adding an object to indicate when
counter discontinuty occurs similar to ifCounterDiscontinutyTime:

  tcpConnCounterDiscontinutyTime   TimeStamp

The question should be raised so I will raise it myself as to why
keep track of byteIn and byteOut counts to begin with. I see two
primary reasons:

   1. Accounting - Most customers that I have talked with want an
        an accounting record of the amount of data transferred on
        a TCP Connection basis. Agreement on a common set of objects
        that can be kept on a connection basis does get us closer
        to enabling the development of a standard accounting
        application or at least identifying the data that should
        be kept on a connection basis that other methods than SNMP
        are used to collect and store. To really perform accounting
        via SNMP would require definition of several more objects
        depending on the model implemented. I view TCP accounting
        itself to be out of scope for consideration by this working
        group but can see the need for a standard set of TCP
        Connection statistics.
   2. Problem determination - seeing some indication of the amount of
        data being transferred does help in problem determination.

Another object that we keep that is some what implementation
dependent is an identification of the local application that the
connection is going to.  This is also very useful
in identifying FTP versus Telnet connections for example. Such
and object if it exists could be as follows:

    tcpConnLocalReceiverId  OCTET STRING(0..??)

An implementation that doesn't know this would just not implement the
object or if they were kind return a null OCTET STRING.

>> tcpConnRoundTripTime   Integer32
>>   "The amount of time that has elapsed, measured in
>>    milliseconds, from when the last TCP segment was
>>    transmitted by the TCP Stack until the ACK was
>>    received."

>But that isnt the round trip time the stack keeps

Why not? Is there a better metric that should be represented?

>> Is there an interest in this type of submission?

>Several of your other values blindly assume VJ flow control. This may be
>true now but not I suspect in the future.

I'm only attempting to address what is not plan for potential changes.
Thanks again for your comments.

Ken

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun 20 18:43:14 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA11505 for tcp-impl-list; Fri, 20 Jun 1997 18:39:29 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA11379 for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 18:39:15 -0700
Received: from VNET.IBM.COM (vnet.ibm.com [204.146.168.194]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id SAA17694
	for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 18:39:13 -0700
	env-from (kennethw@VNET.IBM.COM)
From: kennethw@VNET.IBM.COM
Message-Id: <199706210139.SAA17694@sgi.sgi.com>
Received: from RALVM12 by VNET.IBM.COM (IBM VM SMTP V2R3) with BSMTP id 4401;
   Fri, 20 Jun 97 21:33:09 EDT
Date: Fri, 20 Jun 97 21:28:10 EDT
To: tcp-impl@relay.engr.sgi.com
cc: andyc@VNET.IBM.COM, perry@piermont.com
Subject: Re: Proposed TCP Group Extensions
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Date: Fri, 20 Jun 1997 19:29:46 -0400

Perry,
   Your main objection to the proposal to me, and please let me
know if I'm interpreting your notes wrong, appears to be that
some of the objects defined in the proposal make it easier for
an attacker to gain access to TCP connections. I think I've
attempted to explain that though security needs to be addressed
that it shouldn't be the deciding factor in whether the objects
exists or not. The security issue with respect to SNMP should
not be the deciding factor. I do agree that it does effect how the
objects are modeled and the platforms were they will be implemented.
  A number of platforms have implemented their own secure SET support
using SNMPv1 or SNMPv2 community strings while others have implemented
SNMPv2u or SNMPv2*. Unfortunately, these solutions don't interoperate
in general nor are they universally deployed. It looks like the SNMPv3
working group is still on track to defining a standard in this
area by the end of this year.
  An extension to the TCP MIB needs to provide guidance with respect
to whether SETs should be allowed or even if a certain set of objects
should be retrievable but leave it up to its implementer to decide for
their environment what is appropriate. The TCP/IP product that I work
on didn't implement SET support for those objects defined by RFC1213
until we could provide some way of insuring that the sets were secure.
This didn't prevent for example MIB-2 defining tcpConnState as a
read-write object nor all of the work being done in various working
groups to enable remote creation and management. All of which cause
large security problems if blindly implemented.

Regards, Ken

From owner-tcp-impl@relay.engr.sgi.com  Fri Jun 20 19:36:37 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA18989 for tcp-impl-list; Fri, 20 Jun 1997 19:32:37 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA18979 for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 19:32:30 -0700
Received: from jekyll.piermont.com (jekyll.piermont.com [206.1.51.15]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id TAA01221
	for <tcp-impl@relay.engr.sgi.com>; Fri, 20 Jun 1997 19:32:27 -0700
	env-from (perry@jekyll.piermont.com)
Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by jekyll.piermont.com (8.8.5/8.6.12) with SMTP id WAA16735; Fri, 20 Jun 1997 22:28:37 -0400 (EDT)
Message-Id: <199706210228.WAA16735@jekyll.piermont.com>
X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
To: kennethw@vnet.ibm.com
cc: tcp-impl@relay.engr.sgi.com, andyc@vnet.ibm.com
Subject: Re: Proposed TCP Group Extensions 
In-reply-to: Your message of "Fri, 20 Jun 1997 21:28:10 EDT."
             <199706210133.VAA16345@frankenstein.piermont.com> 
Reply-To: perry@piermont.com
X-Reposting-Policy: redistribute only with permission
Date: Fri, 20 Jun 1997 22:28:33 -0400
From: "Perry E. Metzger" <perry@piermont.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


kennethw@VNET.IBM.COM writes:
>    Your main objection to the proposal to me, and please let me
> know if I'm interpreting your notes wrong, appears to be that
> some of the objects defined in the proposal make it easier for
> an attacker to gain access to TCP connections.

No, thats not my only objection, but that's certainly _an_ objection.

> I think I've attempted to explain that though security needs to be
> addressed that it shouldn't be the deciding factor in whether the
> objects exists or not.

Dunno. Given the security considerations, and the fact that it isn't
entirely obvious why one wants these pieces of data exposed, I wonder
quite a bit. The latter, btw, is something I'm quite serious about: I
see no particularly good legitimate reason for exposing many of these
bits of information other than "gee whiz" factors.

Perry

From owner-tcp-impl@relay.engr.sgi.com  Sat Jun 21 06:59:24 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id GAA00596 for tcp-impl-list; Sat, 21 Jun 1997 06:57:42 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id GAA00591 for <tcp-impl@relay.engr.SGI.COM>; Sat, 21 Jun 1997 06:57:39 -0700
Received: from merit.edu (merit.edu [198.108.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id GAA16127
	for <tcp-impl@relay.engr.SGI.COM>; Sat, 21 Jun 1997 06:57:37 -0700
	env-from (wsimpson@greendragon.com)
Received: from Bill.Simpson.DialUp.Mich.Net (pm035-25.dialip.mich.net [141.211.7.36])
	by merit.edu (8.8.5/8.8.5) with SMTP id JAA08505
	for <tcp-impl@relay.engr.SGI.COM>; Sat, 21 Jun 1997 09:53:46 -0400 (EDT)
Date: Sat, 21 Jun 97 02:07:27 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <6066.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: Proposed TCP Group Extensions
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: kennethw@VNET.IBM.COM
> Andy Capella and would like to submit to this working group an
> internet-draft defining groups of objects to extend those
> objects defined by RFC2012.

Wrong Working Group.


> We would prefer that this is done via a standard MIB as oppose
> to Enterprise Specific MIB definition to enable interoperable Management
> Application support. The following are the groups of objects
> that we would define by function:
>...
> Is there an interest in this type of submission?
>
Count me as strongly opposed.  One of the objectives of the TCP MIB was
to narrow the list to be as small as possible.

Recent messages to this list talked about keeping small TCBs for Syn
attacks.  Have you counted the number of bytes you just proposed adding?

The general rule for MIBs is:

    Never add a variable unless more than 50% of the management stations
    will need to view it on a regular basis!

Nota Bene: It is not SNMonitoringP.  SNMP is about MANAGING networks,
not debugging platforms.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Sat Jun 21 08:11:05 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA04705 for tcp-impl-list; Sat, 21 Jun 1997 08:09:25 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA04700 for <tcp-impl@relay.engr.SGI.COM>; Sat, 21 Jun 1997 08:09:22 -0700
Received: from VNET.IBM.COM (vnet.ibm.com [204.146.168.194]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id IAA23527
	for <tcp-impl@relay.engr.SGI.COM>; Sat, 21 Jun 1997 08:09:22 -0700
	env-from (kennethw@VNET.IBM.COM)
From: kennethw@VNET.IBM.COM
Message-Id: <199706211509.IAA23527@sgi.sgi.com>
Received: from RALVM12 by VNET.IBM.COM (IBM VM SMTP V2R3) with BSMTP id 7367;
   Sat, 21 Jun 97 11:03:19 EDT
Date: Sat, 21 Jun 97 10:33:02 EDT
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: Proposed TCP Group Extensions
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

William,

>> From: kennethw@VNET.IBM.COM
>> Andy Capella and would like to submit to this working group an
>> internet-draft defining groups of objects to extend those
>> objects defined by RFC2012.

> Wrong Working Group.

>> We would prefer that this is done via a standard MIB as oppose
>> to Enterprise Specific MIB definition to enable interoperable Management
>> Application support. The following are the groups of objects
>> that we would define by function:
>>...
>> Is there an interest in this type of submission?
>>
>Count me as strongly opposed.  One of the objectives of the TCP MIB was
>to narrow the list to be as small as possible.

The TCP-MIB in RFC2012 has 19 objects where the first four:

     tcpRtoAlgorthim
     tcpRtoMin
     tcpRtoMax
     tcpMaxConn

all allow an entity to see the configuration of a TCP. There are also
10 global counters. All 14 of these objects are simple objects.
The first part of the proposal consists of add 3 to 4 simple objects

   tcpKeepAliveTimer
   tcpReceiveBufferSize
   tcpSendBufferSize
   tcpRestrictLowPorts

(tcpRestrictLowPorts may not be needed) to enhance configuration. I also
believe that all of the configuration objects should be defined as
read-write capable and not read-only to enable remote configuration. The
amount of storage that this adds isn't an issue.

>Recent messages to this list talked about keeping small TCBs for Syn
>attacks. Have you counted the number of bytes you just proposed adding?

The TCP Connection Table contains 5 objects of which the first four
are its indexes. The second part of the proposal attempted to
define a small set of objects that could be added:

   tcpConnLastActivity
   tcpConnBytesIn
   tcpConnBytesOut
   tcpConnReXmt
   tcpConnRoundTripTime
   tcpConnRoundTripVariance

I claim that except for BytesIn and BytesOut that TCP needs
to keep these values any way so that the actual storage increase is
8 OCTETs per connection which isn't that significant compared to
the total.

>The general rule for MIBs is:

> Never add a variable unless more than 50% of the management stations
>   will need to view it on a regular basis!

I don't think that you mean this as an absolute since almost none
of the transport MIB work (AToMMIB for example) work would be useful
since I doubt that 50% or more of the current Management platforms
are doing ATM Management. I think that this needs to be considered
from the function being managed.

Adding a few objects to enhance configuration and added a hand full
of objects to the TCP Connection table is the core of the proposal.
I was never adamant that the TCP state information be included which
was why they were listed as optional. I had thought that possibly
a few of them should be included with the non-optional group
and that by listing them all I could get that type of feedback.

Given that the TCP Connection Table exists and that it does allow
a SET to tcpConnState the TCP Group also adds the capability to
manage connections.

>Nota Bene: It is not SNMonitoringP.  SNMP is about MANAGING networks,
>not debugging platforms.

Regards, Ken

From owner-tcp-impl@relay.engr.sgi.com  Sat Jun 21 09:52:52 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA11272 for tcp-impl-list; Sat, 21 Jun 1997 09:51:11 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA11256 for <tcp-impl@relay.engr.SGI.COM>; Sat, 21 Jun 1997 09:51:08 -0700
Received: from merit.edu (merit.edu [198.108.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA10411
	for <tcp-impl@relay.engr.SGI.COM>; Sat, 21 Jun 1997 09:51:07 -0700
	env-from (wsimpson@greendragon.com)
Received: from Bill.Simpson.DialUp.Mich.Net (pm012-20.dialip.mich.net [141.211.7.188])
	by merit.edu (8.8.5/8.8.5) with SMTP id MAA09868
	for <tcp-impl@relay.engr.SGI.COM>; Sat, 21 Jun 1997 12:47:20 -0400 (EDT)
Date: Sat, 21 Jun 97 16:18:09 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <6068.wsimpson@greendragon.com>
To: tcp-impl@relay.engr.SGI.COM
Subject: Re: Proposed TCP Group Extensions
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: kennethw@VNET.IBM.COM
> >Count me as strongly opposed.  One of the objectives of the TCP MIB was
> >to narrow the list to be as small as possible.
>
> The TCP-MIB in RFC2012 has 19 objects where the first four:
>
>      tcpRtoAlgorthim
>      tcpRtoMin
>      tcpRtoMax
>      tcpMaxConn
>
Yes.  Your point?  Note that these are all values needed for MANAGING
networks!  They tell the poor swamped network manager how and why the
box is sending particular _rates_ of packets on the link.


> all allow an entity to see the configuration of a TCP. There are also
> 10 global counters. All 14 of these objects are simple objects.
> The first part of the proposal consists of add 3 to 4 simple objects
>
>    tcpKeepAliveTimer
>    tcpReceiveBufferSize
>    tcpSendBufferSize
>    tcpRestrictLowPorts
>
How do _any_ of these affect packets sent on the link?

Maybe keepalivetimer, but it cannot be mandatory, since there is a
strong aversion to keepalive in this community!


> (tcpRestrictLowPorts may not be needed) to enhance configuration. I also
> believe that all of the configuration objects should be defined as
> read-write capable and not read-only to enable remote configuration. The
> amount of storage that this adds isn't an issue.
>
You think RTO algorithm is a remotely managable quantity?

You think buffersizes are a remotely managable quantity, on a global
connection basis?

You think restricting low ports should be remotely managed?

And Perry thought there were security holes.  He ain't seen nothin' yet.

Folks, it is my personal opinion that there is some serious confusion
here as to the purpose of SNMP, and I cannot see any purpose to
continuing this discussion.  Please have a heart to heart with the O&M
Area Directors.


> >The general rule for MIBs is:
> >
> > Never add a variable unless more than 50% of the management stations
> >   will need to view it on a regular basis!
>
> I don't think that you mean this as an absolute since almost none
> of the transport MIB work (AToMMIB for example) work would be useful
> since I doubt that 50% or more of the current Management platforms
> are doing ATM Management. I think that this needs to be considered
> from the function being managed.
>
There's a good idea.  If you are using ATM for transport, just put this
stuff in the already bloated ATM MIB.  You don't need TCP at all in that
case....

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Sat Jun 21 18:19:38 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA14174 for tcp-impl-list; Sat, 21 Jun 1997 18:15:40 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA14166 for <tcp-impl@relay.engr.sgi.com>; Sat, 21 Jun 1997 18:15:37 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id SAA12438
	for <tcp-impl@relay.engr.sgi.com>; Sat, 21 Jun 1997 18:15:36 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id CAA00302; Sun, 22 Jun 1997 02:13:52 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wfQoe-0005FhC; Sat, 21 Jun 97 15:07 BST
Message-Id: <m0wfQoe-0005FhC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Proposed TCP Group Extensions
To: kennethw@VNET.IBM.COM
Date: Sat, 21 Jun 1997 15:07:00 +0100 (BST)
Cc: tcp-impl@relay.engr.sgi.com, alan@lxorguk.ukuu.org.uk, andyc@VNET.IBM.COM
In-Reply-To: <199706210022.BAA24669@snowcrash.cymru.net> from "kennethw@VNET.IBM.COM" at Jun 20, 97 08:16:21 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>      -   10Mbs Ethernet: 57 minutes,
>      -   16Mbs Token Ring: 36 minutes,
>      -   a US T3 line (45 megabits): 12 minutes,
>      -   FDDI: 5.7 minutes

1Gbit ethernet:  under 1 minute... 

> Most of the TCP Connections in my environment won't ever wrap
> a Counter32 object. Most are Telnet sessions transferring

An SNMP MIB should work for all environments or be vendor private...

> >> tcpConnRoundTripTime   Integer32
> >>   "The amount of time that has elapsed, measured in
> >>    milliseconds, from when the last TCP segment was
> >>    transmitted by the TCP Stack until the ACK was
> >>    received."
> 
> >But that isnt the round trip time the stack keeps
> Why not? Is there a better metric that should be represented?

TCP doesn't know the round trip time from when the last TCP segment was
transmitted tot he ACK for many cases - if the last segment was retransmitted
then an ACK occured we never know if the ack is the original or the 
retransmission ack. 

If you assume Van Jacobson then you can return the smoothed round trip
time estimate - thats probably the useful bit for graphing network 
behaviour and flow.

> >Several of your other values blindly assume VJ flow control. This may be
> >true now but not I suspect in the future.
> I'm only attempting to address what is not plan for potential changes.

Ok perhaps a "flow control type" enumeration of

	0	Unknown
	1	Van Jacobson
	2	Vegas


would be useful ?


From owner-tcp-impl@relay.engr.sgi.com  Sat Jun 21 19:36:36 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA19537 for tcp-impl-list; Sat, 21 Jun 1997 19:34:11 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA19520 for <tcp-impl@relay.engr.sgi.com>; Sat, 21 Jun 1997 19:34:08 -0700
Received: from jekyll.piermont.com (jekyll.piermont.com [206.1.51.15]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id TAA19029
	for <tcp-impl@relay.engr.sgi.com>; Sat, 21 Jun 1997 19:34:01 -0700
	env-from (perry@jekyll.piermont.com)
Received: from [[UNIX: localhost]] ([[UNIX: localhost]]) by jekyll.piermont.com (8.8.5/8.6.12) with SMTP id WAA28616; Sat, 21 Jun 1997 22:24:53 -0400 (EDT)
Message-Id: <199706220224.WAA28616@jekyll.piermont.com>
X-Authentication-Warning: jekyll.piermont.com: [[UNIX: localhost]] didn't use HELO protocol
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
cc: tcp-impl@relay.engr.sgi.com
Subject: Re: Proposed TCP Group Extensions 
In-reply-to: Your message of "Sat, 21 Jun 1997 15:07:00 BST."
             <m0wfQoe-0005FhC@lightning.swansea.linux.org.uk> 
Reply-To: perry@piermont.com
X-Reposting-Policy: redistribute only with permission
Date: Sat, 21 Jun 1997 22:24:47 -0400
From: "Perry E. Metzger" <perry@piermont.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Alan Cox writes:
> Ok perhaps a "flow control type" enumeration of
> 
> 	0	Unknown
> 	1	Van Jacobson
> 	2	Vegas

I think that Vegas is pretty much stillborn -- and if you are new
enough to implement a MIB you'd damn well better be implementing
Van J.

Perry

From owner-tcp-impl@relay.engr.sgi.com  Sun Jun 22 05:12:15 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id FAA22166 for tcp-impl-list; Sun, 22 Jun 1997 05:10:09 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA22155 for <tcp-impl@relay.engr.sgi.com>; Sun, 22 Jun 1997 05:10:06 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id FAA04314
	for <tcp-impl@relay.engr.sgi.com>; Sun, 22 Jun 1997 05:10:04 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id NAA05696; Sun, 22 Jun 1997 13:08:44 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wfRPu-0005FnC; Sat, 21 Jun 97 15:45 BST
Message-Id: <m0wfRPu-0005FnC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Proposed TCP Group Extensions
To: perry@piermont.com
Date: Sat, 21 Jun 1997 15:45:29 +0100 (BST)
Cc: alan@lxorguk.ukuu.org.uk, tcp-impl@relay.engr.sgi.com
In-Reply-To: <199706202253.SAA16366@jekyll.piermont.com> from "Perry E. Metzger" at Jun 20, 97 06:53:07 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > This is becoming more and more obvious. Also IPSEC still doesnt address
> > the ICMP MUST FRAGMENT tcp denial attacks
> What attacks would those be?

Send an ICMP MUST FRAGMENT MTU=68 to someone running a secure session. Now
they can pick either

1.	Ignore it because it isnt signed - doesn't work because the packet
	may be real mtu lowering info

2.	Believe it and watch the peformance crash

The problem is this is the one case where we have to trust a potentially
unsigned frame if we want to do TCP mtu discovery. If you drop MTU 
discovery from secure TCP sessions it seems ok

Alan


From owner-tcp-impl@relay.engr.sgi.com  Sun Jun 22 05:49:28 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id FAA24014 for tcp-impl-list; Sun, 22 Jun 1997 05:46:43 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA24009 for <tcp-impl@relay.engr.sgi.com>; Sun, 22 Jun 1997 05:46:41 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id FAA06718
	for <tcp-impl@relay.engr.sgi.com>; Sun, 22 Jun 1997 05:45:58 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id NAA05980; Sun, 22 Jun 1997 13:44:55 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wfmFp-0005FdC; Sun, 22 Jun 97 14:00 BST
Message-Id: <m0wfmFp-0005FdC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Proposed TCP Group Extensions
To: perry@piermont.com
Date: Sun, 22 Jun 1997 14:00:29 +0100 (BST)
Cc: alan@lxorguk.ukuu.org.uk, tcp-impl@relay.engr.sgi.com
In-Reply-To: <199706220224.WAA28616@jekyll.piermont.com> from "Perry E. Metzger" at Jun 21, 97 10:24:47 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > Ok perhaps a "flow control type" enumeration of
> > 	0	Unknown
> > 	1	Van Jacobson
> > 	2	Vegas
> 
> I think that Vegas is pretty much stillborn -- and if you are new
> enough to implement a MIB you'd damn well better be implementing
> Van J.

For now, but a MIB should also reflect future expansion. Who knows what
flow control we will be using in 5 years time. Most people use variants
of VJ, and stuff like SACK give us further information that may in future
make us change from VJ.

Alan


From owner-tcp-impl@relay.engr.sgi.com  Sun Jun 22 07:49:24 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA29588 for tcp-impl-list; Sun, 22 Jun 1997 07:46:45 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA29584 for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 22 Jun 1997 07:46:37 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id IAA22738 for tcp-impl@cthulhu.engr.sgi.com; Sun, 22 Jun 1997 08:46:33 -0600
Date: Sun, 22 Jun 1997 08:46:33 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706221446.IAA22738@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Proposed TCP Group Extensions
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: alan@lxorguk.ukuu.org.uk (Alan Cox)

> > > This is becoming more and more obvious. Also IPSEC still doesnt address
> > > the ICMP MUST FRAGMENT tcp denial attacks
> > What attacks would those be?
> 
> Send an ICMP MUST FRAGMENT MTU=68 to someone running a secure session. Now
> they can pick either
> 
> 1.	Ignore it because it isnt signed - doesn't work because the packet
> 	may be real mtu lowering info
> 
> 2.	Believe it and watch the peformance crash
> 
> The problem is this is the one case where we have to trust a potentially
> unsigned frame if we want to do TCP mtu discovery. If you drop MTU 
> discovery from secure TCP sessions it seems ok


You omitteded the most reasonable response:

 3. ignore it because 68 is so ridiculously tiny that it is obviously bogus
   and if the path MTU were that bad, you may as well close the connection.

"Be generous in what you accept" is not the same "be stupid and do not
sanity-check anything."

Reasonable systems should drop any demands for an MTU less than 128 and
probably anything less than 256.  256 is wrong for modems made in the
last 5 years, but people are still using recommendations based on delay
computations based on 2400 bit/sec modems.

If your path really has a 9K (ATM) or 64K MTU (HIPPI), reducing it to
256 has as at least as much effect as reducing a 1500 to 68.  Merely
reducing 4352 or 1500 to 256 is a Very Bad Thing for performance, so the
point stands, if rephrased with non-bogus number, and somewhat weakly.


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Sun Jun 22 08:46:28 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA03805 for tcp-impl-list; Sun, 22 Jun 1997 08:44:07 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA03794 for <tcp-impl@relay.engr.SGI.COM>; Sun, 22 Jun 1997 08:44:04 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA20732
	for <tcp-impl@relay.engr.SGI.COM>; Sun, 22 Jun 1997 08:43:55 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id QAA07632; Sun, 22 Jun 1997 16:42:04 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wfp0y-0005FdC; Sun, 22 Jun 97 16:57 BST
Message-Id: <m0wfp0y-0005FdC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Proposed TCP Group Extensions
To: vjs@mica.denver.sgi.com (Vernon Schryver)
Date: Sun, 22 Jun 1997 16:57:20 +0100 (BST)
Cc: tcp-impl@relay.engr.SGI.COM
In-Reply-To: <199706221446.IAA22738@mica.denver.sgi.com> from "Vernon Schryver" at Jun 22, 97 08:46:33 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Reasonable systems should drop any demands for an MTU less than 128 and
> probably anything less than 256.  256 is wrong for modems made in the
> last 5 years, but people are still using recommendations based on delay
> computations based on 2400 bit/sec modems.

People are still running 2400 baud modems, and things like AX.25 (
AX.25 MTU is sometimes 128 on bad paths, 256 on fast).

I accept we can take the "under 128, assume wrong" case.

Alan


From owner-tcp-impl@relay.engr.sgi.com  Sun Jun 22 14:17:17 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA25504 for tcp-impl-list; Sun, 22 Jun 1997 14:11:40 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA25499 for <tcp-impl@relay.engr.sgi.com>; Sun, 22 Jun 1997 14:11:37 -0700
Received: from lox.sandelman.ottawa.on.ca (lox.sandelman.ottawa.on.ca [205.233.54.146]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA20871
	for <tcp-impl@relay.engr.sgi.com>; Sun, 22 Jun 1997 14:11:35 -0700
	env-from (mcr@istari.sandelman.ottawa.on.ca)
Received: from istari.sandelman.ottawa.on.ca (istari.sandelman.ottawa.on.ca [205.233.54.136]) by lox.sandelman.ottawa.on.ca (8.7.5/8.7.3) with ESMTP id QAA28779; Sun, 22 Jun 1997 16:53:39 -0400 (EDT)
Received: from istari.sandelman.ottawa.on.ca ([[UNIX: localhost]]) by istari.sandelman.ottawa.on.ca (8.7.5/8.7.3) with ESMTP id RAA23811; Sun, 22 Jun 1997 17:22:38 -0400 (EDT)
Message-Id: <199706222122.RAA23811@istari.sandelman.ottawa.on.ca>
To: ipsec@tis.com, tcp-impl@relay.engr.sgi.com
CC: vjs@mica.denver.sgi.com (Vernon Schryver)
Reply-To: ipsec@tis.com
Subject: ICMP must fragment and IPsec
In-reply-to: Your message of "Sun, 22 Jun 1997 08:46:33 MDT."
             <199706221446.IAA22738@mica.denver.sgi.com> 
Date: Sun, 22 Jun 1997 17:22:33 -0400
From: "Michael C. Richardson" <mcr@sandelman.ottawa.on.ca>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

-----BEGIN PGP SIGNED MESSAGE-----


>>>>> "Alan" == Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

    Alan> This is becoming more and more obvious. Also IPSEC still
    Alan> doesnt address the ICMP MUST FRAGMENT tcp denial attacks

>>>>> "Perry" == Perry Metzger <perry@piermont.com> writes:
    Perry> What attacks would those be?

    Alan> Send an ICMP MUST FRAGMENT MTU=68 to someone running a secure
    Alan> session. Now they can pick either
    Alan> 
    Alan> 1. Ignore it because it isnt signed - doesn't work because the
    Alan> packet may be real mtu lowering info
    Alan> 
    Alan> 2. Believe it and watch the peformance crash
    Alan> 
    Alan> The problem is this is the one case where we have to trust a
    Alan> potentially unsigned frame if we want to do TCP mtu
    Alan> discovery. If you drop MTU discovery from secure TCP sessions
    Alan> it seems ok

>>>>> "Vernon" == Vernon Schryver <vjs@mica.denver.sgi.com> writes:
    Vernon> You omitteded the most reasonable response:

    Vernon>  3. ignore it because 68 is so ridiculously tiny that it
    Vernon> is obviously bogus and if the path MTU were that bad, you
    Vernon> may as well close the connection.

    Vernon> "Be generous in what you accept" is not the same "be
    Vernon> stupid and do not sanity-check anything."

    Vernon> Reasonable systems should drop any demands for an MTU less
    Vernon> than 128 and probably anything less than 256.  256 is

  Vernon's suggestions seem rather good judgement, but the need for
some kind of authenticated Path MTU discovery is important, in my
opinion. 

  One way might be to have an ICMP or TCP option that requests the
other end to provide a response, giving the size of the largest
fragment received. This would be enclosed in the SA that the TCP data
is flowing in. This is in some sense a variation of the TCP MSS option.

  A TCP option is probably harder to implement, and harder to deploy,
but has the advantage of not requiring another packet, and not
requiring any dances with ISAKMP per-protocol/port ("connection") SA
keying. However, we already have to deal with making sure that other
TCP related ICMPs get transported/tunnelled properly. [ICMP_UNREACH,
ICMP_SOURCEQUENCH, are there others?]

  An ICMP option has the advantages that it doesn't require TCP to
know anything about the fragmentation of the packets, which is just
gross. Still, ICMP then needs to know, but that might be easier for
some people to implement.
  
  Another possibility would be to have the receiver advise the sender
in a TCP option, whenever the max-size fragment received in past 2MSL
changes by more than 10%. This might be more effective in the cases
where the route changes in a way that affects the MTU, than Path MTU
discovery, which as far as I understand, only is done at the beginning
of the connection. 
	[... not according to rfc1191. The DF bit is set on all
	packets, so PMTU is always done]
	
  I can't think of many cases in the current deployed internet where
the MTU might change during a connection. Usually, the smallest MTU is
on the edges at that 28.8 (or that 2400 baud modem) link, and that
isn't likely to change suddendly. I can see mobileip possibly changing
this. If/when mobileip is deployed en-mass, it will definitely include
IPsec.

  Getting Path MTU information back to the sending TCP is not an
unsurmountable challenge to VPN uses of IPsec, but it isn't easy. In
the case of "Virtual Circuit" style security gateway traversal (see
draft-richardson-ipsec-traversal-cert-00.txt), there are several MTU's
involved: the end to end MTU, and the MTU between each security
gateway.

  Both are important to know. An argument against the TCP option (and
for an authenticated ICMP) is that it would not be usable with non-TCP
things, like ESP. 
  Note: when I say authenticated ICMP, I mean using either ESP or
AH. Probably the ICMP is no less sensitive as the data, so ESP if the
data is encrypted.
  
  In conclusion: an ICMP need frag message placed in the tunnel

   :!mcr!:            |  Network security programming, currently
   Michael Richardson | on contract with DataFellows F-Secure IPSec
 WWW: <A HREF="http://www.sandelman.ottawa.on.ca/People/Michael_Richardson/Bio.html">mcr@sandelman.ottawa.on.ca</A>. PGP key available.

  
  
  

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3ia
Charset: latin1
Comment: Processed by Mailcrypt 3.4, an Emacs/PGP interface

iQB1AwUBM62XlKZpLyXYhL+BAQH9dAMAuR/oq9ylmAAvBl7Kedg/v9O4Z3/5kwwJ
gzJ8WqQ8FIKXqPf7GYHNsIQxNBDztZzgZ/c+9r7sbRTqSVeelvcNr51lLp+fBhHz
3OFufP+Gn0W4TOFIPaPIbOtPf2d+rIkH
=SIJJ
-----END PGP SIGNATURE-----

From owner-tcp-impl@relay.engr.sgi.com  Sun Jun 22 16:36:09 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA05491 for tcp-impl-list; Sun, 22 Jun 1997 16:34:00 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA05477 for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 22 Jun 1997 16:33:55 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id RAA23828; Sun, 22 Jun 1997 17:33:49 -0600
Date: Sun, 22 Jun 1997 17:33:49 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706222333.RAA23828@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com, ipsec@tis.com
Subject: Re:  ICMP must fragment and IPsec
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> To: ipsec@tis.com, tcp-impl@cthulhu.engr.sgi.com
> CC: vjs (Vernon Schryver)

>   One way might be to have an ICMP or TCP option that requests the
> other end to provide a response, giving the size of the largest
> fragment received. This would be enclosed in the SA that the TCP data
> is flowing in. This is in some sense a variation of the TCP MSS option.

What is this "other end"?
If talking to the other end of a TCP connection were enough, then the
MSS negotiation would be enough and the Path MTU Discovery mechanism
would not be needed.  In fact, the MSS negotiation is often not enough
because a vast number of boxes between the ends might legitimately tell
you to reduce your MTU.  The boxes in the path vary, as routes change.
Not only might every router that might sometimes be in the path need to
send a reduce-MTU indication of some kind, but so might bridges (e.g
FDDI-Ethernet bridges that necessarily IP fragment, honor the DF bit,
and send the ICMP message.)

The trouble with authenticating path MTU information (regardless of its
form) is key distribution.  How would you get the right key to all of
the boxes that might be in the path so that you could trust them?
Given the implications for the security of the keys in sending them to
every router in the net that might touch youyr packets, who would want
to?  You can't just determine that a box is who it says it is.  After
you authenticate box99.bad.guy.com as the sender of the ICMP error
message telling you to use an MTU of 68, what do you do?

The only response to worries about path MTU messages, as well as source
quences, port, net, and host Unreachables, and many other such
indications is to be to cross your fingers and ignore any that would
have serious consequences, such as an message telling you to use an MTU
of 68.


> ...
>   I can't think of many cases in the current deployed internet where
> the MTU might change during a connection. Usually, the smallest MTU is
> on the edges at that 28.8 (or that 2400 baud modem) link, and that
> isn't likely to change suddendly. I can see mobileip possibly changing
> this. If/when mobileip is deployed en-mass, it will definitely include
> IPsec.

There is a vast amount of topology on the edges, what with ATM (9180),
FDDI (4352), Ethernet (1500 or 1492), PPP (256 to more than 1500), and
Frame Relay.

Besides, if you stay away from the edges and in the center where routes
don't flap among links with different MTU's, you may as well fix your
MTU=1500 and forget Path MTU discover.


>   Getting Path MTU information back to the sending TCP is not an
> unsurmountable challenge to VPN uses of IPsec, but it isn't easy.

On the contrary, I think the key distribution problem is insurmountable
and makes authenticated path MTU information impossible.  For example,
look at any real life network and see how many FDDI-EThernet bridges
there are that have not been given a IP address (manually or by bootp
or DHCP) and so cannot send the ICMP message when the drop an packet
with the DF bit set.  If people cannot manage to set their IP addresses
(thus wrecking not only MTU discovery but SNMP), how can you expect
them to do something useful about keys?


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Sun Jun 22 19:20:55 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA16042 for tcp-impl-list; Sun, 22 Jun 1997 19:07:39 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA16037 for <tcp-impl@relay.engr.sgi.com>; Sun, 22 Jun 1997 19:07:32 -0700
Received: from lox.sandelman.ottawa.on.ca (lox.sandelman.ottawa.on.ca [205.233.54.146]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id TAA22036
	for <tcp-impl@relay.engr.sgi.com>; Sun, 22 Jun 1997 19:07:28 -0700
	env-from (mcr@istari.sandelman.ottawa.on.ca)
Received: from istari.sandelman.ottawa.on.ca (istari.sandelman.ottawa.on.ca [205.233.54.136]) by lox.sandelman.ottawa.on.ca (8.7.5/8.7.3) with ESMTP id VAA29717; Sun, 22 Jun 1997 21:51:28 -0400 (EDT)
Received: from istari.sandelman.ottawa.on.ca ([[UNIX: localhost]]) by istari.sandelman.ottawa.on.ca (8.7.5/8.7.3) with ESMTP id WAA24200; Sun, 22 Jun 1997 22:20:29 -0400 (EDT)
Message-Id: <199706230220.WAA24200@istari.sandelman.ottawa.on.ca>
To: vjs@mica.denver.sgi.com (Vernon Schryver)
CC: tcp-impl@relay.engr.sgi.com, ipsec@tis.com
Subject: Re: ICMP must fragment and IPsec 
In-reply-to: Your message of "Sun, 22 Jun 1997 17:33:49 MDT."
             <199706222333.RAA23828@mica.denver.sgi.com> 
Date: Sun, 22 Jun 1997 22:20:22 -0400
From: "Michael C. Richardson" <mcr@sandelman.ottawa.on.ca>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

-----BEGIN PGP SIGNED MESSAGE-----


  [I'm hesitant about continuing to post this to both lists, but I'm
not sure where else it should go. I know that it doesn't really fit
into either charter, since we aren't documenting TCP bugs that need
fixing, and aren't discussing the things that we REALLY need to finish
to terminate the IPsec group. 
  Please give me advice by private email. Maybe I should write this up
as a draft, since I can see it getting more fleshed out in my mind.]

>>>>> "Vernon" == Vernon Schryver <vjs@mica.denver.sgi.com> writes:
    Vernon> What is this "other end"?  If talking to the other end of
    Vernon> a TCP connection were enough, then the MSS negotiation
    Vernon> would be enough and the Path MTU Discovery mechanism would
    Vernon> not be needed.  In fact, the MSS negotiation is often not
 
  I am assuming that packets that are sent that are too big, or become
too big due to n-levels of ESP/AH transport/tunnels will be
fragmented if the DF bit is not set.

  How much the packet gets fragmented can be determined by the
receiving host and/or tunnel end-point: it can observe the largest
fragment that was successfully received and participated in
reassembly. This information can be relayed to the sending host via an
ICMP Datagram Too Big message that can be put into the tunnel.

  This appears to screwed up by multiple paths with different
MTUs. However, it is easily fixed by only taking PMTU information from
packets that were fragmented: a larger packet that arrives intact
clearly took a different route, so it doesn't matter. Eventually, if 
the correspondant nodes can adjust their PMTU appropriately, all
packets arrive unfragmented. Clearly, the rate that ICMP's are sent
needs to be limited to not more than one per RTT.

  This works well on the end system that reassembles the packets. So,
getting an estimate of the PMTU from a transport mode, or tunnel mode
terminating on the end node is easy.
  If the tunnel doesn't terminate on the end node (but on a security
gateway), then one observes that the gateway must reassemble the ESP
or AH packets to terminate the tunnel. The gateway node can there 
	a) send an ICMP (in transport? or tunnel?) to the originating
	gateway, informing it of the PMTU that it sees, and the 
	originating gateway will send an ICMP to nodes that set the 
	DF bit when they do normal PMTU.
	
	b) alternatively, the PMTU between gateways could be
	communicated with an ISAKMP message.

	c) send an ICMP through the tunnel, to the TCP originating
	(TLO in the terminology of my traversal draft) node. The TLO
	sees this packet just like it would when it got ICMP messages
	when security wasn't present. However, it never sees the 
	routes that were tunnelled through, but the far end tunnel
	point would collect all that info for it anyway.
	
  BTW: not all current IPsec implementations do the right thing with
the DF flag. In theory, a DF flag on a packet going into a tunnel
should cause the tunnel wrapper to have the DF bit set, and the ICMP
Datagram Too Big messages adjusted in size and passed back to the
originating host. This assumes we can trust them. At least one
implementation in Detroit dropped the packets immediately, since the
new packet size didn't fit the outgoing interface's MTU, so the ICMP
would have been trusted in that case. My recommendation is to never
set the DF bit on the outside packet until you can deal with the
ICMP. That is, the problem is punted for a future working group, but
it means that IPsec VPN tunnels become non-compliant routers since
they don't honour the DF bit.
	
  
    Vernon> The trouble with authenticating path MTU information
    Vernon> (regardless of its form) is key distribution.  How would

  My position is that you can't distribute the keys. Please see my
draft for another place where it appears the intermediate security
gateways need the keys.

    Vernon> The only response to worries about path MTU messages, as
    Vernon> well as source quences, port, net, and host Unreachables,
    Vernon> and many other such indications is to be to cross your
    Vernon> fingers and ignore any that would have serious
    Vernon> consequences, such as an message telling you to use an MTU
    Vernon> of 68.

  I think we can do better than that.

    >> ...  I can't think of many cases in the current deployed
    >> internet where the MTU might change during a
    >> connection. Usually, the smallest MTU is on the edges at that
    >> 28.8 (or that 2400 baud modem) link, and that isn't likely to
    >> change suddendly. I can see mobileip possibly changing
    >> this. If/when mobileip is deployed en-mass, it will definitely
    >> include IPsec.

    Vernon> There is a vast amount of topology on the edges, what with
    Vernon> ATM (9180), FDDI (4352), Ethernet (1500 or 1492), PPP (256
    Vernon> to more than 1500), and Frame Relay.

  But, do any of these things *change* during the course of a TCP
connection?
  While the routing on the backbone might change, my guess is that the
MTU between backbone nodes is almost always going to be more than the
1500 typical of a T1. I guess, two networks, that do ATM with their
supplier could get a much lower MTU if the supplier's ATM backbone
goes down, and the move their backup T1s. 

    Vernon> Besides, if you stay away from the edges and in the center
    Vernon> where routes don't flap among links with different MTU's,
    Vernon> you may as well fix your MTU=1500 and forget Path MTU
    Vernon> discover.

  Yes, there is this option.

    >> Getting Path MTU information back to the sending TCP is not an
    >> unsurmountable challenge to VPN uses of IPsec, but it isn't
    >> easy.

    Vernon> On the contrary, I think the key distribution problem is
    Vernon> insurmountable and makes authenticated path MTU
    Vernon> information impossible.  For example, look at any real

  I agree: trying to get the ICMP Datagram Too Big messages
authenticated is not possible. 
  I don't agree that getting authenticated PMTU information is
impossible. 

   :!mcr!:            |  Network security programming, currently
   Michael Richardson | on contract with DataFellows F-Secure IPSec
 WWW: <A HREF="http://www.sandelman.ottawa.on.ca/People/Michael_Richardson/Bio.html">mcr@sandelman.ottawa.on.ca</A>. PGP key available.


-----BEGIN PGP SIGNATURE-----
Version: 2.6.3ia
Charset: latin1
Comment: Processed by Mailcrypt 3.4, an Emacs/PGP interface

iQB1AwUBM63dYaZpLyXYhL+BAQEYaQL/fWe8YzpkZAGRKsGTEv8PJ0sQyM1Jlx4I
NmJCIj8Y9y1+giyg1ZZeLhmh0x15nYQnt/0dH1I3sf+KXZRIYz00LfPVq13MLAkP
5uijiFUS3c6UddKCTwOrR08uqwhydSG/
=1PJL
-----END PGP SIGNATURE-----

From owner-tcp-impl@relay.engr.sgi.com  Sun Jun 22 20:25:29 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA22138 for tcp-impl-list; Sun, 22 Jun 1997 20:11:45 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id UAA22131 for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 22 Jun 1997 20:11:40 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id VAA24233; Sun, 22 Jun 1997 21:11:29 -0600
Date: Sun, 22 Jun 1997 21:11:29 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706230311.VAA24233@mica.denver.sgi.com>
Subject: Re: ICMP must fragment and IPsec
Cc: ipsec@tis.com, tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: "Michael C. Richardson" <mcr@sandelman.ottawa.on.ca>

> ...
>     >> ...  I can't think of many cases in the current deployed
>     >> internet where the MTU might change during a
>     >> connection. ...

>     Vernon> There is a vast amount of topology on the edges, what with
>     Vernon> ATM (9180), FDDI (4352), Ethernet (1500 or 1492), PPP (256
>     Vernon> to more than 1500), and Frame Relay.
> 
>   But, do any of these things *change* during the course of a TCP
> connection?

They do, at least for plain-vanilla IP in Silicon Graphic's internal
network.  (Lots of FDDI and some ATM and HIPPI backbones.)  Of course,
anything through the Internet will be probably filtered to 1500, but
anyone with HIPPI, 802.5, FDDI, or some flavors of ATM will have other
PMTU's, at least for "servers on the backbone".  Add a little
redundancy--say a few high-metric backup 1500 links in your campus, and
it's easy to get flappying in your PMTU for some real applications.

For the duration of an HTTP 1.0 transfer, you'd expect the PMTU to be
constant.  I often have TCP connections that stay up for days.  Often
when demand-dialed PPP/ISDN dies, I switch to PPP/modems, and then back
to PPP/ISDN, keeping the same TCP connections alive.  If I followed
conventional wisdom for modems, I'd be switching between 1500 and
something smaller.


>   While the routing on the backbone might change, my guess is that the
> MTU between backbone nodes is almost always going to be more than the
> 1500 typical of a T1. I guess, two networks, that do ATM with their
> supplier could get a much lower MTU if the supplier's ATM backbone
> goes down, and the move their backup T1s. 

Yes, of course, on the "backbone" (whatever that means today).  But
anyone with FDDI, HIPPI, ATM, 802.5, and PPP out in the marches
(marshes?) has other PMTU's some of the time, at least between "big
servers on the backbone."  Consider big NFS servers doing network
backups.


>   I don't agree that getting authenticated PMTU information is
> impossible. 

Again, how?  Say you do authenticate router99.bad.guy.net as the
furshure source of the PMTU info, in whatever form, that you just got.
How do you know that router99.bad.guy.net is in your path?  It seems to
me that if you could know your path reliably, a lot of problems would
be vastly easier, from IP security to IP routing.  For that matter, if
you know which routers that might touch your packets, then also knowing
the MTU's of their links is a trivial addition, and so you don't need
any ad hoc or post hoc Path MTU discoverying, timers, probing, DF-bit
kludges, lost user-data etc.

Well, I guess you could require everyone use a link-state routing
protocol that covers the entire Internet, and have every host maintain
a global link-state table, and pay attention to PMTU, source quench,
etc only from routers that are on a path that is close to the current
minimum for your packets.  Or continually use something like
`traceroute` to map the path, and pay attention only to PTMU info from
close to most recent path.  Or we could get rid of TCP and switch to
XTP--I think the initial route setup of XTP could be changed to track
the path.  Or best, get rid of IP and run TCP over ATM end-to-end.

My point is that (reasonable) packet switching means not knowing who's
moving your packets, at least in large networks, and there's no
escaping the implications of that fact.


Vernon Schryver,  vjs@sgi.com
¾ï$ýùdOI+óë2zåtàÙÓ‚ÏÉÅÁj¹°«¥¡‡A9›”&\
.pdefaults
.loc-cshrc
.nevotinit
.gamtables
.signature
.sgihelprc
.insightrc
.Xdefaults

From owner-tcp-impl@relay.engr.sgi.com  Mon Jun 23 06:45:19 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id GAA19957 for tcp-impl-list; Mon, 23 Jun 1997 06:42:58 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id GAA19952 for <TCP-IMPL@cthulhu.engr.sgi.com>; Mon, 23 Jun 1997 06:42:57 -0700
Received: from alcor.process.com (alcor.process.com [192.42.95.16]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id GAA07938
	for <TCP-IMPL@RELAY.ENGR.SGI.COM>; Mon, 23 Jun 1997 06:42:55 -0700
	env-from (VOLZ@PROCESS.COM)
Date:     Mon, 23 Jun 1997 09:42 -0400
From: VOLZ@PROCESS.COM (Bernie Volz)
Message-Id: <009B6342469A34A0.31C9@PROCESS.COM>
To: vjs@mica.denver.sgi.com, TCP-IMPL@cthulhu.engr.sgi.com, IPSEC@TIS.COM
Subject:  Re:  ICMP must fragment and IPsec
X-VMS-To: SMTP%"vjs@mica.denver.sgi.com"
X-VMS-Cc:  TCP-IMPL@RELAY.ENGR.SGI.COM, IPSEC@TIS.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>What is this "other end"?
>If talking to the other end of a TCP connection were enough, then the
>MSS negotiation would be enough and the Path MTU Discovery mechanism
>would not be needed.  In fact, the MSS negotiation is often not enough
>because a vast number of boxes between the ends might legitimately tell
>you to reduce your MTU.

Please don't confuse MSS with MTU. The Maximum Segment Size has
*NOTHING* to do with MTU. The MSS reflects what the maximum segment
size a TCP implementation is willing and able to receive and that
has nothing to do with the MTU of an interface.

For example ... if MSS was MTU, what would happen if a multi-homed host
with an Ethernet and FDDI interface switched a connection from Ethernet
(w/MTU of 1500) to FDDI (w/MTU of 4352). You would *NOT* have wanted TCP
to send an MSS with only 1500.

- Bernie Volz
  Process Software Corporation

From owner-tcp-impl@relay.engr.sgi.com  Mon Jun 23 08:07:50 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA29706 for tcp-impl-list; Mon, 23 Jun 1997 08:05:07 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA29677 for <TCP-IMPL@cthulhu.engr.sgi.com>; Mon, 23 Jun 1997 08:05:02 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id JAA25015; Mon, 23 Jun 1997 09:04:58 -0600
Date: Mon, 23 Jun 1997 09:04:58 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706231504.JAA25015@mica.denver.sgi.com>
To: TCP-IMPL@cthulhu.engr.sgi.com, VOLZ@PROCESS.COM (Bernie Volz)
Subject: Re:  ICMP must fragment and IPsec
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: VOLZ@PROCESS.COM (Bernie Volz)
> To: vjs, TCP-IMPL@cthulhu.engr.sgi.com, IPSEC@TIS.COM

(I'm sending only to the TCP-IMPL list)

> ...
> Please don't confuse MSS with MTU. The Maximum Segment Size has
> *NOTHING* to do with MTU. The MSS reflects what the maximum segment
> size a TCP implementation is willing and able to receive and that
> has nothing to do with the MTU of an interface.
> 
> For example ... if MSS was MTU, what would happen if a multi-homed host
> with an Ethernet and FDDI interface switched a connection from Ethernet
> (w/MTU of 1500) to FDDI (w/MTU of 4352). You would *NOT* have wanted TCP
> to send an MSS with only 1500.


On the contrary, please understand the meaning and use of the MSS, and
particularly the use of the MSS in preventing fragmentation.  If the
MSS were only the maximum TCP segment that you could reassemble from
IP fragments, then practically every system would request 64K.

If your TCP implementation follows the (at least de facto) standard,
you will use an MSS of 1500 if the interface your host used to send
its first packets for the TCP connection was the Ethernet interface.
Moreover, a case can be made in that if a routing change might ever
cause the TCP connection to switch from an intial choice of your FDDI
interface to the Ethernet interface, then you negotiate an MSS of 1500
in order to prevent fragmentation.


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Mon Jun 23 08:38:01 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA05824 for tcp-impl-list; Mon, 23 Jun 1997 08:33:51 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA05819 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 23 Jun 1997 08:33:44 -0700
Received: from FNAL.FNAL.Gov (fnal.fnal.gov [131.225.110.17]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA00610
	for <tcp-impl@relay.engr.sgi.com>; Mon, 23 Jun 1997 08:33:41 -0700
	env-from (crawdad@gungnir.fnal.gov)
Received: from gungnir.fnal.gov ("port 45521"@gungnir.fnal.gov)
 by FNAL.FNAL.GOV (PMDF V5.0-8 #3998) id <01IKEQC1X7TQ0007ZP@FNAL.FNAL.GOV>;
 Mon, 23 Jun 1997 10:33:31 -0600
Received: from gungnir.fnal.gov by gungnir.fnal.gov (SMI-8.6/SMI-SVR4)
 id KAA12274; Mon, 23 Jun 1997 10:33:26 -0500
Date: Mon, 23 Jun 1997 10:33:25 -0500
From: Matt Crawford <crawdad@FNAL.GOV>
Subject: Re: ICMP must fragment and IPsec
In-reply-to: "22 Jun 1997 17:33:49 MDT."
 <"199706222333.RAA23828"@mica.denver.sgi.com>
To: vjs@mica.denver.sgi.com (Vernon Schryver)
Cc: tcp-impl@cthulhu.engr.sgi.com, ipsec@tis.com
Message-id: <199706231533.KAA12274@gungnir.fnal.gov>
Content-transfer-encoding: 7BIT
X-Face: 
 /RKQi"kntyd}7l)d8n%'Dum<~(aMW3,5g&'NiH5I4Jj|wT:j;Qa$!@A<~/*C:{:MmAQ:o%S /KKi}G4_.||4I[9!{%3]Hd"a*E{<k&QF?d6L7o&zLqb%kXn!!]ykXMKtTiy9#20]$EKP/^Z$T]'P6,
 8L#r&mH4PB<ljN,_.=iCpv#N:HIcy5t7{HV:<=g=V?^;-d,J*xkq0r
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> >   One way might be to have an ICMP or TCP option that requests the
> > other end to provide a response, giving the size of the largest
> > fragment received. This would be enclosed in the SA that the TCP data
> > is flowing in. This is in some sense a variation of the TCP MSS option.
> 
> What is this "other end"?
> If talking to the other end of a TCP connection were enough, then the
> MSS negotiation would be enough ...

No, I think he meant for one end to tell the other what was the size
of the largest IP packet-or-fragment it has actually received.  It
can't rightly be a TCP option, because TCP wouldn't know this.  And
besides, it becomes pretty hairy at any level when you try to find
out what was the largest packet received "lately."  Ugh.

				Matt Crawford


From owner-tcp-impl@relay.engr.sgi.com  Mon Jun 23 09:01:32 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA11579 for tcp-impl-list; Mon, 23 Jun 1997 08:57:35 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA11575 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 23 Jun 1997 08:57:30 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id JAA25294; Mon, 23 Jun 1997 09:57:20 -0600
Date: Mon, 23 Jun 1997 09:57:20 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706231557.JAA25294@mica.denver.sgi.com>
To: ipsec@tis.com, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: ICMP must fragment and IPsec
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Matt Crawford <crawdad@FNAL.GOV>
> To: vjs (Vernon Schryver)
> Cc: tcp-impl@cthulhu.engr.sgi.com, ipsec@tis.com

> > >   One way might be to have an ICMP or TCP option that requests the
> > > other end to provide a response, giving the size of the largest
> > > fragment received. This would be enclosed in the SA that the TCP data
> > > is flowing in. This is in some sense a variation of the TCP MSS option.
> > 
> > What is this "other end"?
> > If talking to the other end of a TCP connection were enough, then the
> > MSS negotiation would be enough ...
> 
> No, I think he meant for one end to tell the other what was the size
> of the largest IP packet-or-fragment it has actually received.  It
> can't rightly be a TCP option, because TCP wouldn't know this.  And
> besides, it becomes pretty hairy at any level when you try to find
> out what was the largest packet received "lately."  Ugh.


Yes, I foolishly missed that very interesting idea.

I'm not bothered by the "lately" in "the largest packet received
lately", since you must have timers even to use the DF bit.  The
idea just moves the timers.  Instead, it bothers me that:

  - it requires changes in both hosts.  PMTU discovery is a hack that
    works without any changes in the rest of the net including the
    peer.  Consider how long it has taken for routers to support the
    improved ICMP error message.

  - it also requires protocol changes.  The IETF aint what it used to be.

  - when would you re-probe to discover if the PMTU has increased?
    This is not a showstopper, but doesn't have an obviously neat answer.

  - I think it assumes UDP does not need PMTU discovery.

  - it assumes no intermediate router is doing fragment reassembly
    I don't know of any that do that, but it is a recurring idea
    for good reasons.

  - the largest fragment is as large as PMTU.
    First, since all but the last IP fragment must be a multiple of 8
    bytes, the largest fragment will generally be the largest multiple
    of 8 less or equal to the MTU.  For example, you'll probably guess
    1496 or 1488 instead of 1500 or 1492 for an Ethernet segment.

    Second, instead of the usual algorithm, a router might try to
    fragment into evenly sized pieces.  At the cost of a divide
    instruction (cheap on modern CPU's), that can reduce the total
    fragmentation should the datagram have to be fragmented twice.
    Consider the silly UDP/IP fragment sizes seen often seen from
    NFS servers with FDDI interfaces.


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Mon Jun 23 10:30:35 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA06725 for tcp-impl-list; Mon, 23 Jun 1997 10:27:25 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA06429 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 23 Jun 1997 10:26:39 -0700
Received: from gecko.nas.nasa.gov (gecko.nas.nasa.gov [129.99.34.45]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA03452
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 23 Jun 1997 10:26:36 -0700
	env-from (kml@nas.nasa.gov)
Received: from gecko.nas.nasa.gov (kml@localhost)
	by gecko.nas.nasa.gov (8.8.3/NAS.6.1) with ESMTP id KAA12912; Mon, 23 Jun 1997 10:26:30 -0700 (PDT)
Message-Id: <199706231726.KAA12912@gecko.nas.nasa.gov>
To: Matt Crawford <crawdad@FNAL.GOV>
cc: vjs@mica.denver.sgi.com (Vernon Schryver), tcp-impl@cthulhu.engr.sgi.com,
        ipsec@tis.com
Subject: Re: ICMP must fragment and IPsec 
In-reply-to: Your message of "Mon, 23 Jun 1997 10:33:25 CDT."
             <199706231533.KAA12274@gungnir.fnal.gov> 
Date: Mon, 23 Jun 1997 10:26:29 -0700
From: "Kevin M. Lahey" <kml@nas.nasa.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

In message <199706231533.KAA12274@gungnir.fnal.gov>Matt Crawford writes
>> >   One way might be to have an ICMP or TCP option that requests the
>> > other end to provide a response, giving the size of the largest
>> > fragment received. This would be enclosed in the SA that the TCP data
>> > is flowing in. This is in some sense a variation of the TCP MSS option.
>> 
>> What is this "other end"?
>> If talking to the other end of a TCP connection were enough, then the
>> MSS negotiation would be enough ...
>
>No, I think he meant for one end to tell the other what was the size
>of the largest IP packet-or-fragment it has actually received.  It
>can't rightly be a TCP option, because TCP wouldn't know this.  And
>besides, it becomes pretty hairy at any level when you try to find
>out what was the largest packet received "lately."  Ugh.

Then, too, wouldn't this would fail under IPv6, since only the
originating host can fragment packets?  If routers are just dropping
the packets (and sending ICMP messages) rather than fragmenting and
forwarding, the end system would never get any useful fragment sizes to
deal with.

Kevin Lahey 
kml@nas.nasa.gov

From owner-tcp-impl@relay.engr.sgi.com  Mon Jun 23 15:36:50 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA05751 for tcp-impl-list; Mon, 23 Jun 1997 15:33:56 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA05741 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 23 Jun 1997 15:33:54 -0700
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA00244
	for <tcp-impl@relay.engr.sgi.com>; Mon, 23 Jun 1997 15:33:53 -0700
	env-from (Chris.Schmechel@Eng.Sun.COM)
Received: from Eng.Sun.COM ([129.146.1.13]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id PAA00897 for <tcp-impl@relay.engr.sgi.com>; Mon, 23 Jun 1997 15:56:47 -0700
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id PAA14580; Mon, 23 Jun 1997 15:33:48 -0700
Received: from mont-blanc.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id PAA14464; Mon, 23 Jun 1997 15:06:22 -0700
Received: by mont-blanc.eng.sun.com (SMI-8.6/SMI-SVR4)
	id PAA01627; Mon, 23 Jun 1997 15:04:09 -0700
Message-Id: <199706232204.PAA01627@mont-blanc.eng.sun.com>
Received: by NeXT.Mailer (Solaris OpenStep-1.1-sparc-17 Mar 1997 Version 1.1 )
From: Chris Schmechel <Chris.Schmechel@Eng.Sun.COM>
Date: Mon, 23 Jun 1997 15:04:08 +0800
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Input Needed - Testing Tools for TCP...
cc: Chris.Schmechel@Eng.Sun.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Hi -

I'm finishing up an Internet Draft which is a survey of testing  
tools for TCP.  As this time, I would like to solicit out to the  
general group suggestions/inputs for tools to include, etc.

Thanks,

-Chris Schmechel
 <Chris.Schmechel@Eng.Sun.COM>

 

From owner-tcp-impl@relay.engr.sgi.com  Mon Jun 23 22:17:44 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA19719 for tcp-impl-list; Mon, 23 Jun 1997 22:14:41 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id WAA19714 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 23 Jun 1997 22:14:39 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id WAA13071
	for <tcp-impl@relay.engr.SGI.COM>; Mon, 23 Jun 1997 22:14:39 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id WAA29809; Mon, 23 Jun 1997 22:14:38 -0700 (PDT)
Message-Id: <199706240514.WAA29809@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: tentative scheduling for Munich
Date: Mon, 23 Jun 1997 22:14:38 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

We're tentatively scheduled for the following slot at the Munich IETF:

        Monday, August 11 at 1930-2200 (opposite calsch, snmpv3, udlr)

No word yet whether we'll have an MBone room (we asked for one).

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Jun 24 07:20:15 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA03878 for tcp-impl-list; Tue, 24 Jun 1997 07:18:26 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA03869 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Jun 1997 07:18:18 -0700
Received: from www10.w3.org (www10.w3.org [18.23.0.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA28294
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Jun 1997 07:18:15 -0700
	env-from (frystyk@w3.org)
Received: from big (big.w3.org [18.29.0.116]) by www10.w3.org (8.8.5/8.7.3) with SMTP id KAA16141; Tue, 24 Jun 1997 10:18:13 -0400 (EDT)
X-Authentication-Warning: www10.w3.org: Host big.w3.org [18.29.0.116] claimed to be big
Message-Id: <3.0.1.32.19970624101812.00970580@pop.w3.org>
X-Sender: frystyk@pop.w3.org
X-Mailer: Windows Eudora Pro Version 3.0.1 (32)
Date: Tue, 24 Jun 1997 10:18:12 -0400
To: Chris Schmechel <Chris.Schmechel@Eng.Sun.COM>,
        tcp-impl@cthulhu.engr.sgi.com
From: Henrik Frystyk Nielsen <frystyk@w3.org>
Subject: Re: Input Needed - Testing Tools for TCP...
Cc: Eric Prudhommeaux <eric@apocalypse.org>
In-Reply-To: <199706232204.PAA01627@mont-blanc.eng.sun.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

At 03:04 PM 6/23/97 +0800, Chris Schmechel wrote:

>I'm finishing up an Internet Draft which is a survey of testing  
>tools for TCP.  As this time, I would like to solicit out to the  
>general group suggestions/inputs for tools to include, etc.

The ones, we used for HTTP/1.1 performance testing are listed at

	http://www.w3.org/Protocols/HTTP/Performance/#TCP

Eric Prud'hommeaux also wrote some Perl scripts to convert between the
various tools and to be able to extract average figures over several runs
of tcpdump.

Hope this helps,

Henrik
--
Henrik Frystyk Nielsen, <frystyk@w3.org>
World Wide Web Consortium
http://www.w3.org/People/Frystyk

From owner-tcp-impl@relay.engr.sgi.com  Tue Jun 24 12:57:07 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA04238 for tcp-impl-list; Tue, 24 Jun 1997 12:54:35 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA04228 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Jun 1997 12:54:33 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA02521
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 24 Jun 1997 12:54:30 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id MAA00999; Tue, 24 Jun 1997 12:04:12 -0700 (PDT)
Message-Id: <199706241904.MAA00999@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: paper on analyzing packet traces of TCP behavior available
Date: Tue, 24 Jun 1997 12:04:12 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

[My apologies to those of you who will receive multiple copies of this
announcement from other mailing lists ...]

The paper "Automated Packet Trace Analysis of TCP Implementations",
to appear in SIGCOMM '97, is now available from:

	ftp://ftp.ee.lbl.gov/papers/vp-tcpanaly-sigcomm97.ps.Z

I've appended the abstract.

		Vern


Automated Packet Trace Analysis of TCP Implementations

Vern Paxson
Network Research Group
Lawrence Berkeley National Laboratory
vern@ee.lbl.gov

We describe "tcpanaly", a tool for automatically analyzing a TCP
implementation's behavior by inspecting packet traces of the TCP's
activity.  Doing so requires surmounting a number of hurdles, including
detecting packet filter measurement errors, coping with ambiguities due to
the distance between the measurement point and the TCP, and accommodating a
surprisingly large range of behavior among different TCP implementations.
We discuss why our efforts to develop a fully general tool failed, and
detail a number of significant differences among 8 major TCP implementations,
some of which, if ubiquitous, would devastate Internet performance.  The
most problematic TCPs were all independently written, suggesting that
correct TCP implementation is fraught with difficulty.  Consequently, it
behooves the Internet community to develop testing programs and reference
implementations.

From owner-tcp-impl@relay.engr.sgi.com  Tue Jun 24 16:05:07 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA26878 for tcp-impl-list; Tue, 24 Jun 1997 16:03:03 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA26871 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Jun 1997 16:03:02 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA29413
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 24 Jun 1997 16:02:59 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id QAA01755; Tue, 24 Jun 1997 16:02:58 -0700 (PDT)
Message-Id: <199706242302.QAA01755@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: TCP research issues from a tcp-impl perspective
Date: Tue, 24 Jun 1997 16:02:58 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

In a couple of weeks I'll be attending a meeting of the IRTF's End-to-End
Research Group.  One of the agenda items is a presentation by me on TCP
research issues from the perspective of the tcp-impl WG.  The issues I've
noted so far are:

	- How to compute RTO when using high-resolution timings

	- How to compute RTO when timing more than one packet per RTT
	  (i.e., how to adjust the constants for the exponentially weighted
	  moving average)

	- Is the initial slow-start cwnd going to be increased, and if
	  so, to what?

	- How to fix the MSS*MSS/cwnd granularity problem (Rich Stevens
	  noted that if cwnd > MSS*MSS, then due to integer arithmetic
	  it'll never grow any larger)

	- How long a sending pause merits a new slow-start

	- Sharing cwnd across connections

	- Caching cwnd over time

	- What about deploying Vegas?

	- What about deploying Janey Hoe's changes?

	- Should below-sequence pure acks be acked (for keep-alives)?

	- Is it time to revisit constants like MSL and initial RTO?

I invite suggestions for other issues, either via the list or private
email.  Also, if you particularly care about a few specific issues, let
me know via private email, that'll help with prioritizing the list.

	Thanks,

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Jun 24 16:41:56 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA08977 for tcp-impl-list; Tue, 24 Jun 1997 16:39:44 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA08971 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Jun 1997 16:39:42 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id QAA10082
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Jun 1997 16:39:38 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Tue, 24 Jun 1997 19:39:38 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Tue, 24 Jun 1997 19:39:38 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id TAA26265; Tue, 24 Jun 1997 19:42:09 -0400
Date: Tue, 24 Jun 1997 19:42:09 -0400
Message-Id: <199706242342.TAA26265@MAILSERV-2HIGH-A.FTP.COM>
To: vern@ee.lbl.gov
Subject: Re: TCP research issues from a tcp-impl perspective
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: tcp-impl@cthulhu.engr.sgi.com
Repository: mailserv-2high-a.ftp.com, [message accepted at Tue Jun 24 19:42:05 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||research issues from the perspective of the tcp-impl WG.  The issues I've
||noted so far are:
||
I'm not sure if your list implies window scaling, SACK and fast recovery
but it seems to me that they should get added to the MUST list as we go
forward.  I know its not research at this point but...

I suspect an item indicating our discussion, and later dissassociation with
the SYN attack is also in order so it doesn't come up again :-).

I've heard from parties involved with other TCP's than ours as well as
from board vendors the desire to reduce the ACK stream on "highly reliable
LAN"'s.  The thought here (which I personally disagree with) is that on
a high speed LAN, say 100-Meg Ether or FDDI, the 1 ACK per 2 data packets
exchange unecessarily clutters the LAN and that somehow backing off the
ACK policy to 1 ACK per 4 packets, 
1 ACK per window size/packetsize-some constant
will lead to less small apckets on the wire, leads to less host processing,
leads to better LAN utilization/more cycles for the CPU, etc.

The way it has been explained to me - reducing the ACK frequency can yield
up to 20% performance improvement over a LAN/decrease collisions on a LAN,
etc.

It personally bothers me that something like this could possibly be
deployed without serious research cycles invested by non-commercial
parties.

L.




From owner-tcp-impl@relay.engr.sgi.com  Tue Jun 24 17:18:12 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA18573 for tcp-impl-list; Tue, 24 Jun 1997 17:16:26 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA18560 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Jun 1997 17:16:24 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA19187
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Jun 1997 17:16:24 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id RAA02090; Tue, 24 Jun 1997 17:16:19 -0700 (PDT)
Message-Id: <199706250016.RAA02090@daffy.ee.lbl.gov>
To: backman@ftp.com
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP research issues from a tcp-impl perspective
In-reply-to: Your message of Tue, 24 Jun 1997 19:42:09 PDT.
Date: Tue, 24 Jun 1997 17:16:19 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I'm not sure if your list implies window scaling, SACK and fast recovery
> but it seems to me that they should get added to the MUST list as we go
> forward.  I know its not research at this point but...

I wasn't thinking of bringing these up, because as you note they're not
research.  Since they're implemented as TCP options (well, except for
fast recovery), it seems a little funny to make them MUST's - I also
don't see why they have to be MUST's in order for things to go forward.
The key is that implementations that care about high performance implement
them, and for that still works even if they're options.

> I suspect an item indicating our discussion, and later dissassociation with
> the SYN attack is also in order so it doesn't come up again :-).

Okay, I'll add it to the list.

> ... the desire to reduce the ACK stream on "highly reliable LAN"'s.

I'll add this too if some other tcp-impl'ers chime in that they'd like to
see it on the list.

	Thanks,

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Jun 24 17:24:44 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA20648 for tcp-impl-list; Tue, 24 Jun 1997 17:23:28 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA20641 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Jun 1997 17:23:26 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id RAA20660
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Jun 1997 17:23:22 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Tue, 24 Jun 1997 20:23:21 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Tue, 24 Jun 1997 20:23:21 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id UAA26454; Tue, 24 Jun 1997 20:25:52 -0400
Date: Tue, 24 Jun 1997 20:25:52 -0400
Message-Id: <199706250025.UAA26454@MAILSERV-2HIGH-A.FTP.COM>
To: vern@ee.lbl.gov
Subject: Re: TCP research issues from a tcp-impl perspective
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: tcp-impl@cthulhu.engr.sgi.com
Repository: mailserv-2high-a.ftp.com, [message accepted at Tue Jun 24 20:25:46 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||> I'm not sure if your list implies window scaling, SACK and fast recovery
||> but it seems to me that they should get added to the MUST list as we go
||> forward.  I know its not research at this point but...
||
||I wasn't thinking of bringing these up, because as you note they're not
||research.  Since they're implemented as TCP options (well, except for
||fast recovery), it seems a little funny to make them MUST's - I also
||don't see why they have to be MUST's in order for things to go forward.
||The key is that implementations that care about high performance implement
||them, and for that still works even if they're options.

fair enough; I'll desist on this one.

L.




From owner-tcp-impl@relay.engr.sgi.com  Tue Jun 24 17:35:45 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA22992 for tcp-impl-list; Tue, 24 Jun 1997 17:34:04 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA22968 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Jun 1997 17:34:01 -0700
Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.219]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA23206
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Jun 1997 17:34:00 -0700
	env-from (raj@hpisrdq.cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185]) by palrel3.hp.com with SMTP (8.7.5/8.7.3) id RAA11928 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Jun 1997 17:34:00 -0700 (PDT)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA27226; Tue, 24 Jun 1997 17:33:51 -0700
Message-Id: <33B0676F.6731@cup.hp.com>
Date: Tue, 24 Jun 1997 17:33:51 -0700
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: backman@ftp.com
Cc: vern@ee.lbl.gov, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP research issues from a tcp-impl perspective
References: <199706242342.TAA26265@MAILSERV-2HIGH-A.FTP.COM>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> The way it has been explained to me - reducing the ACK frequency can yield
> up to 20% performance improvement over a LAN/decrease collisions on a LAN,
> etc.

I could believe it more or less - the per-packet processing (as opposed
to per byte) is "roughly" the same for an ACK and a data packet, and if
you have the acks go to epsilon, that is a 33% reduction in the number
of packets processed by the systems. 

> It personally bothers me that something like this could possibly be
> deployed without serious research cycles invested by non-commercial
> parties.

You sayin' .com can't do serious research? I wonder if there are any
dueling fields left near Munich :) (though I doubt I could attend :( )

rick jones

From owner-tcp-impl@relay.engr.sgi.com  Tue Jun 24 17:59:36 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA29009 for tcp-impl-list; Tue, 24 Jun 1997 17:56:47 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA29004 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Jun 1997 17:56:45 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id SAA29625 for tcp-impl@cthulhu.engr.sgi.com; Tue, 24 Jun 1997 18:56:35 -0600
Date: Tue, 24 Jun 1997 18:56:35 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706250056.SAA29625@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP research issues from a tcp-impl perspective
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > The way it has been explained to me - reducing the ACK frequency can yield
> > up to 20% performance improvement over a LAN/decrease collisions on a LAN,
> > etc.
> 
> I could believe it more or less - the per-packet processing (as opposed
> to per byte) is "roughly" the same for an ACK and a data packet, and if
> you have the acks go to epsilon, that is a 33% reduction in the number
> of packets processed by the systems. 

If you hack the BSD code (e.g. play with the delayed ACK stuff) to
significantly reduce the number of ACKs, then you can avoid the
Ethernet Capture Effect.  On the other hand, if you use a TCP window of
more than ~20-30K and fully 802.3 MAC chips without BLAM, you will hit
the Ethernet capture effect and you will get only about 70% of Ethernet
(10MHz or 100MHz).

I've never figured out a reasonable and trivial way to reduce ACKs on
"highly reliable" (Ethernet) LANs without messing up high speed WANs.

Given BLAM, I doubt fiddling with the ACK rate is justified for Ethernet.

Those interested in paths with assymetric bandwidth and who are already
using "ACK compression" (e.g. "cable-TV modems") would probably
appreciate such a mechanism.  (I don't speak for such people, but I've
known some.)  Whether that would be a Good Thing(tm) for the Internet
is not clear, at least to me.


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 02:13:29 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id CAA03300 for tcp-impl-list; Wed, 25 Jun 1997 02:12:07 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id CAA03283 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 02:12:04 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id CAA22466
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 02:12:03 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Wed, 25 Jun 1997 05:12:01 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Wed, 25 Jun 1997 05:12:01 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id FAA27440; Wed, 25 Jun 1997 05:14:32 -0400
Date: Wed, 25 Jun 1997 05:14:32 -0400
Message-Id: <199706250914.FAA27440@MAILSERV-2HIGH-A.FTP.COM>
To: raj@hpisrdq.cup.hp.com
Subject: Re: TCP research issues from a tcp-impl perspective
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: vern@ee.lbl.gov, tcp-impl@cthulhu.engr.sgi.com
Repository: mailserv-2high-a.ftp.com, [message accepted at Wed Jun 25 05:14:24 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||> It personally bothers me that something like this could possibly be
||> deployed without serious research cycles invested by non-commercial
||> parties.
||
||You sayin' .com can't do serious research? I wonder if there are any
||dueling fields left near Munich :) (though I doubt I could attend :( )
||
Not at all; I'm saying that I want impartial and objective analysis
independent of politico/marketing battles to make "my workstatiion
faster than your workstation"

L.



From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 04:30:41 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id EAA13207 for tcp-impl-list; Wed, 25 Jun 1997 04:29:14 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id EAA13200 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 04:29:12 -0700
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id EAA10829
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 04:29:08 -0700
	env-from (mouse@Twig.Rodents.Montreal.QC.CA)
Received: (from mouse@localhost)
	by Twig.Rodents.Montreal.QC.CA (8.8.5/8.8.5) id HAA11783;
	Wed, 25 Jun 1997 07:29:05 -0400 (EDT)
Date: Wed, 25 Jun 1997 07:29:05 -0400 (EDT)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199706251129.HAA11783@Twig.Rodents.Montreal.QC.CA>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP research issues from a tcp-impl perspective
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>> It personally bothers me that something like this could possibly be
>> deployed without serious research cycles invested by non-commercial
>> parties.

> You sayin' .com can't do serious research?

I'm not the person who wrote the text you're replying to, but I agree
with it, and here's why: commercial research has a nasty tendency to
end up doing the commercially expedient thing instead of the right
thing.  Just think about how often we've seen, on this very list,
someone from some commercial TCP vendor say something like "yeah, it
may be broken, but we had to do it because customers were saying "but
it works with vendor X's stack".".

Briefly: I don't trust partisan research.  Apparently whoever wrote the
initial quote doesn't either.

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 07:54:58 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA06855 for tcp-impl-list; Wed, 25 Jun 1997 07:53:22 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA06844 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 07:53:18 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id HAA16483
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 07:53:16 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Wed, 25 Jun 1997 10:53:08 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Wed, 25 Jun 1997 10:53:08 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id KAA29127; Wed, 25 Jun 1997 10:55:39 -0400
Date: Wed, 25 Jun 1997 10:55:39 -0400
Message-Id: <199706251455.KAA29127@MAILSERV-2HIGH-A.FTP.COM>
To: mouse@Rodents.Montreal.QC.CA
Subject: Re: TCP research issues from a tcp-impl perspective
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: tcp-impl@cthulhu.engr.sgi.com
Repository: mailserv-2high-a.ftp.com, [message accepted at Wed Jun 25 10:55:35 1997]
Originating-Client: tunes.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||> You sayin' .com can't do serious research?
||
||I'm not the person who wrote the text you're replying to, but I agree
||with it, and here's why: commercial research has a nasty tendency to
||end up doing the commercially expedient thing instead of the right
||thing.  Just think about how often we've seen, on this very list,
||someone from some commercial TCP vendor say something like "yeah, it
||may be broken, but we had to do it because customers were saying "but
||it works with vendor X's stack".".
||
I wrote the initial quote and you also quoted words of mine above :-).

And in a nutshell thats why us commericial types love having the research
types to keep us honest :-)


From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 09:27:57 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA28255 for tcp-impl-list; Wed, 25 Jun 1997 09:25:39 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA28244 for <tcp-impl@engr.sgi.com>; Wed, 25 Jun 1997 09:25:36 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id JAA17293
	for <tcp-impl@engr.sgi.com>; Wed, 25 Jun 1997 09:25:35 -0700
	env-from (solensky@ftp.com)
Received: from ftp.com by ftp.com  ; Wed, 25 Jun 1997 12:25:34 -0400
Received: from mailserv-2high.ftp.com by ftp.com  ; Wed, 25 Jun 1997 12:25:34 -0400
Received: from fenway.ftp.com by MAILSERV-2HIGH.FTP.COM (SMI-8.6/SMI-SVR4)
	id MAA06230; Wed, 25 Jun 1997 12:21:41 -0400
Message-Id: <199706251621.MAA06230@MAILSERV-2HIGH.FTP.COM>
X-Mapi-Messageclass: IPM
To: vern@ee.lbl.gov, backman@ftp.com
Cc: tcp-impl@engr.sgi.com
X-Mailer: FTP Software Internet Mail 2.0
Mime-Version: 1.0
From: Frank T Solensky <solensky@ftp.com>
Subject: RE: TCP research issues from a tcp-impl perspective
Date: Wed, 25 Jun 1997 12:25:28 -0400
Content-Type: text/plain; charset=US-ASCII; X-MAPIextension=".TXT"
Content-Transfer-Encoding: quoted-printable
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>>Reply to your message of 6/24/97 8:27 PM
>
>> ... the desire to reduce the ACK stream on "highly reliable LAN"'s.
>
>I'll add this too if some other tcp-impl'ers chime in that they'd like to
>see it on the list.

I'd second but I'm not sure I really count as "other" in this case...

Larry: the paper Vern mentioned yesterday has some arguments
against it for connections over wide-area nets: see "stretch acks"
on page 11.=09

>I've never figured out a reasonable and trivial way to reduce ACKs on
>"highly reliable" (Ethernet) LANs without messing up high speed WANs.

Could testing for the destination address being on the same subnet
be this form of test?  Though I suspect the burtiness argument
could also apply over a LAN as well..
								--Frank


From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 11:29:01 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA05786 for tcp-impl-list; Wed, 25 Jun 1997 11:27:14 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA05767 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 11:27:10 -0700
Received: from frantic.BSDI.COM (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA02532
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 25 Jun 1997 11:27:05 -0700
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id NAA06629;
	Wed, 25 Jun 1997 13:26:31 -0500 (CDT)
Date: Wed, 25 Jun 1997 13:26:31 -0500 (CDT)
From: David Borman <dab@BSDI.COM>
Message-Id: <199706251826.NAA06629@frantic.BSDI.COM>
To: alan@lxorguk.ukuu.org.uk, vjs@mica.denver.sgi.com
Subject: Re: Proposed TCP Group Extensions
Cc: tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Re:  ICMP MUST FRAGMENT tcp denial attacks

It seems to me that the easiest thing to do is to look
at the TCP ports in the returned header and verify that
they belong to an active connection, and if so, then
verify that the sequence number is within the send window
for that connection.  ICMP returns 64 bits beyond the IP
header, so that is 8 bytes, which for TCP is the port
numbers and the sequence field.

The attacker can't see the packets, so he'd have to guess
the sequence number.  If he can see the packets, it doesn't
really matter then, because there are worse things he can
do to you if he can see your packets.

			-David Borman, dab@bsdi.com

From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 11:37:10 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA08888 for tcp-impl-list; Wed, 25 Jun 1997 11:35:49 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA08873 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 11:35:47 -0700
Received: from postoffice.Reston.mci.net (postoffice.Reston.mci.net [204.70.128.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA04931
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 11:35:46 -0700
	env-from (gmiller@mci.net)
Received: from mci.net (ale [166.45.4.49])
	by postoffice.Reston.mci.net (8.8.5/8.8.5) with ESMTP id OAA14818;
	Wed, 25 Jun 1997 14:35:42 -0400 (EDT)
Message-Id: <199706251835.OAA14818@postoffice.Reston.mci.net>
X-Mailer: exmh version 1.6.9 8/22/96
To: Chris Schmechel <Chris.Schmechel@Eng.Sun.COM>
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Input Needed - Testing Tools for TCP... 
In-reply-to: Your message of "Mon, 23 Jun 1997 15:04:08 +0800."
             <199706232204.PAA01627@mont-blanc.eng.sun.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 25 Jun 1997 14:35:42 -0400
From: Greg Miller <gmiller@mci.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


>I'm finishing up an Internet Draft which is a survey of testing  
>tools for TCP.  As this time, I would like to solicit out to the  
>general group suggestions/inputs for tools to include, etc.

You might find this page helpful:

http://www.nlanr.net/Caidants/meastools.html

Greg


-- 
Gregory J. Miller
vBNS Engineering
MCI Telecommunications              
Reston, VA 20191                                     gmiller@mci.net



From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 11:42:43 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA10974 for tcp-impl-list; Wed, 25 Jun 1997 11:41:14 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA10956 for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 11:41:11 -0700
Received: from frantic.BSDI.COM (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA06460
	for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 11:41:07 -0700
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id NAA06674;
	Wed, 25 Jun 1997 13:40:55 -0500 (CDT)
Date: Wed, 25 Jun 1997 13:40:55 -0500 (CDT)
From: David Borman <dab@BSDI.COM>
Message-Id: <199706251840.NAA06674@frantic.BSDI.COM>
To: TCP-IMPL@cthulhu.engr.sgi.com, vjs@mica.denver.sgi.com, VOLZ@PROCESS.COM
Subject: Re:  ICMP must fragment and IPsec
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From owner-tcp-impl@cthulhu.engr.sgi.com Mon Jun 23 10:16:21 1997
> Date: Mon, 23 Jun 1997 09:04:58 -0600
> From: vjs@mica.denver.sgi.com (Vernon Schryver)
> To: TCP-IMPL@cthulhu.engr.sgi.com, VOLZ@PROCESS.COM (Bernie Volz)
> Subject: Re:  ICMP must fragment and IPsec
> Precedence: bulk
> 
> > From: VOLZ@PROCESS.COM (Bernie Volz)
> > To: vjs, TCP-IMPL@cthulhu.engr.sgi.com, IPSEC@TIS.COM
> 
> (I'm sending only to the TCP-IMPL list)
> 
> > ...
> > Please don't confuse MSS with MTU. The Maximum Segment Size has
> > *NOTHING* to do with MTU. The MSS reflects what the maximum segment
> > size a TCP implementation is willing and able to receive and that
> > has nothing to do with the MTU of an interface.
> > 
> > For example ... if MSS was MTU, what would happen if a multi-homed host
> > with an Ethernet and FDDI interface switched a connection from Ethernet
> > (w/MTU of 1500) to FDDI (w/MTU of 4352). You would *NOT* have wanted TCP
> > to send an MSS with only 1500.
> 
> 
> On the contrary, please understand the meaning and use of the MSS, and
> particularly the use of the MSS in preventing fragmentation.  If the
> MSS were only the maximum TCP segment that you could reassemble from
> IP fragments, then practically every system would request 64K.
> 
> If your TCP implementation follows the (at least de facto) standard,
> you will use an MSS of 1500 if the interface your host used to send
> its first packets for the TCP connection was the Ethernet interface.
> Moreover, a case can be made in that if a routing change might ever
> cause the TCP connection to switch from an intial choice of your FDDI
> interface to the Ethernet interface, then you negotiate an MSS of 1500
> in order to prevent fragmentation.
> 
> 
> Vernon Schryver,  vjs@sgi.com

Vernon,
I disagree with you and agree with Bernie.  If you have both an
FDDI and and ethernet interface, it makes sense to use an MSS of 4352,
not 1500.  Then, if routing changes from the ethernet to the FDDI,
the Path MTU code can discover the larger MTU.  But if you sent a
1500 MSS, you're stuck.

Most systems should be able to send an MSS of 65535, but it is
an optimization to use the MTU of the largest/outgoing interface.
And in the bad old days, hosts weren't smart enough to use
smaller packets when presented with a large MSS, so using the
MTU of the outing interface made much more sense.  But in todays
world, there's no reason to limit yourself to small MSS values.
The biggest problem with PMTU is all the hosts that still stick
in 576 in the MSS because the other host is not "local".  Using
1500 instead of 4352 when you have both FDDI and Ethernet is
the same sort of problem, only not as severe.

		-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 11:58:55 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA16751 for tcp-impl-list; Wed, 25 Jun 1997 11:57:24 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA16730 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 11:57:21 -0700
Received: from hplms26.hpl.hp.com (hplms26.hpl.hp.com [15.255.168.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA11651
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 11:57:21 -0700
	env-from (amr@hplms2.hpl.hp.com)
Received: from hplms2.hpl.hp.com by hplms26.hpl.hp.com with ESMTP
	(1.37.109.16/15.5+ECS 3.3+HPL1.1S) id AA080785038; Wed, 25 Jun 1997 11:57:19 -0700
Received: from cslseed3 (cslseed3.hpl.hp.com) by hplms2.hpl.hp.com with SMTP
	(1.37.109.16/15.5+ECS 3.3+HPL1.1I) id AA056415041; Wed, 25 Jun 1997 11:57:21 -0700
Message-Id: <33B16A0A.42FD@hpl.hp.com>
Date: Wed, 25 Jun 1997 11:57:14 -0700
From: "Amr A. Awadallah" <amr@hplms2.hpl.hp.com>
Organization: HP Labs
X-Mailer: Mozilla 3.0 (WinNT; I)
Mime-Version: 1.0
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP research issues from a tcp-impl perspective
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Given BLAM, I doubt fiddling with the ACK rate is justified for Ethernet.

  Given Ethernet switches and full duplex cards (with one host per port,
which is the trend), fiddling with the ACK rate is not justified
since the capture effect is not there at the first place.

  Also fiddling with the ACK rate will change the rate at which cwnd
increases, specially during slow-start.  If we ACK once every 4 packets,
then this means cwnd will increase by 4 for each ACK, and 4 packets will
be released into the net back-to-back. Back-to-back packets -> capture
effect. But again this is only during the slow-start phase which is
short
for reliable networks, the source's window hits the receiver's
advertised
window pretty quickly and stays stable at that value.

  I wonder if anybody out there is doing research on how to regulate the
back-to-back "effect". It has been pointed out to me by others that
selective ACKs combined with the FACK mechanism may provide a solution
to this problem, but I don't see how. The source still does not have
immediate information of how many packets left the network. The
information is always delayed by the delayed-ACK period.

-- Amr

-- 
 Amr A. Awadallah       ####  /    ####   Hewlett-Packard
 Computer Systems Lab   ###  /_  __ ###   MS: 3L-1
 amr@hpl.hp.com         ##  / / /_/  ##   1501 Page Mill Road
 Phone: (415) 236-2381  ###    /    ###   Palo Alto, CA 94304
 FAX:   (415) 857-7029  ####  /    ####

From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 12:12:38 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA20886 for tcp-impl-list; Wed, 25 Jun 1997 12:09:25 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA20870 for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 12:09:22 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id NAA01870; Wed, 25 Jun 1997 13:09:13 -0600
Date: Wed, 25 Jun 1997 13:09:13 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706251909.NAA01870@mica.denver.sgi.com>
To: VOLZ@PROCESS.COM, TCP-IMPL@cthulhu.engr.sgi.com
Subject: Re:  ICMP must fragment and IPsec
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: David Borman <dab@BSDI.COM>
> To: TCP-IMPL@cthulhu.engr.sgi.com, vjs, VOLZ@PROCESS.COM

> ...
> Vernon,
> I disagree with you and agree with Bernie.  If you have both an
> FDDI and and ethernet interface, it makes sense to use an MSS of 4352,
> not 1500.  Then, if routing changes from the ethernet to the FDDI,
> the Path MTU code can discover the larger MTU.  But if you sent a
> 1500 MSS, you're stuck.
> 
> Most systems should be able to send an MSS of 65535, but it is
> an optimization to use the MTU of the largest/outgoing interface.
> And in the bad old days, hosts weren't smart enough to use
> smaller packets when presented with a large MSS, so using the
> MTU of the outing interface made much more sense.  But in todays
> world, there's no reason to limit yourself to small MSS values.
> The biggest problem with PMTU is all the hosts that still stick
> in 576 in the MSS because the other host is not "local".  Using
> 1500 instead of 4352 when you have both FDDI and Ethernet is
> the same sort of problem, only not as severe.


Why not use an MSS of 65535?  What is being optimized by the choice of
MSS?  I think most systems can IP fragment faster than they can TCP
segment, so they are better off sending 64K TCP segments as a bunch of
IP fragments.  (I've liked that idea ever since someone told me about
using it in an FDDI adapter for a certain supercomputer company's bus
to get what was then an impressive TCP/IP/FDDI speed.)

What if someone turns on that HIPPI interface with the 64K MTU that was
turned off when the TCP connection was established?  (Those 700
Mbit/sec TCP/IP/HIPPI numbers are a tad easier with a full size MSS.)

The reason I see to use a smaller MSS is safety or conservatism.  If
PMTU discovery were perfect, there would be no reason to use anything
except MSS=64K.  In the real world, if you use a larger MSS than your
minimum MTU, then you risk fragmentation if PMTU discovery is not used
or usable by the TCP peer.  If you use too small an MSS, you pay in
performance.  (As Rick Jones and I keep saying, a good value for the
cost of using an MTU of 1500 over FDDI is 4352/1500, and that aint hay.)

In the FDDI/Ethernet case and if you always use MSS=4352, what happenss if:

  1. the peer could do PMTU discovery but it has been turned off
    because some smart guy has installed some Cabletron or Network
    Peripherals FDDI-Ethernet bridges, or a DEC FDDI-Ethernet bridge
    and failed to set the bridge's IP address.

  2. the peer is one of the many millions of boxes that either do
    not support PMTU discovery at all or do not support it by default.

In #1, TCP will work for some stuff, but simply not work at all for big
stuff.  At SGI we demonstrated that between a couple of major internal
email systems, where small messages went just fine, but bigger messages
started and then the TCP connection would mysteriously die as the
bridges silently discarded packets with DF=1.

In #2, at best your routers will be IP fragmenting.


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 12:13:19 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA21556 for tcp-impl-list; Wed, 25 Jun 1997 12:12:04 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA21538 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 12:11:58 -0700
Received: from aruba.lerc.nasa.gov (aruba.lerc.nasa.gov [139.88.35.16]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA16286
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 12:11:57 -0700
	env-from (mallman@lawyers.lerc.nasa.gov)
Received: from lawyers.lerc.nasa.gov by aruba.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01a-main)
        id PAA24320; Wed, 25 Jun 1997 15:11:46 -0400 (EDT)
Received: from lawyers by lawyers.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-local)
        id PAA01661; Wed, 25 Jun 1997 15:11:44 -0400 (EDT)
Message-Id: <199706251911.PAA01661@lawyers.lerc.nasa.gov>
To: "Amr A. Awadallah" <amr@hplms2.hpl.hp.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
From: Mark Allman <mallman@skynet.lerc.nasa.gov>
Reply-To: mallman@skynet.lerc.nasa.gov
Subject: Re: TCP research issues from a tcp-impl perspective 
Date: Wed, 25 Jun 1997 15:11:44 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


A thesis on extended ACK intervals done a couple of years ago at OU
by Stacy Johnson (with Shawn Ostermann) is available at...

    http://jarok.cs.ohiou.edu/papers/

As I recall, extending the ACK interval has a positive effect on
transfers on local networks but doesnt impact performance much on
wide-area networks.  If memory serves me correct, these conclusions
hold for long transfers (I don't believe short transfers were part
of the work).  Also, I don't remember any discussion about the
implications to slow start.  However, the paper may provide some
useful data points.

allman

From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 12:34:03 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA26224 for tcp-impl-list; Wed, 25 Jun 1997 12:32:27 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA26219 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 12:32:25 -0700
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA23336
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 12:32:24 -0700
	env-from (craig@aland.bbn.com)
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id MAA14997; Wed, 25 Jun 1997 12:29:06 -0700 (PDT)
Message-Id: <199706251929.MAA14997@aland.bbn.com>
To: "Amr A. Awadallah" <amr@hplms2.hpl.hp.com>
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP research issues from a tcp-impl perspective 
In-reply-to: Your message of Wed, 25 Jun 97 11:57:14 -0700.
             <33B16A0A.42FD@hpl.hp.com> 
Date: Wed, 25 Jun 97 12:29:06 -0700
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


      I wonder if anybody out there is doing research on how to regulate the
    back-to-back "effect".

Tim Shepard and I are working on it in a different context (turns out
back-to-back packets also cause interesting overrun effects in routers
in some circumstances).

Craig

From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 12:47:02 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA28595 for tcp-impl-list; Wed, 25 Jun 1997 12:45:32 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA28586 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 12:45:30 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id MAA27986
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 12:45:29 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Wed, 25 Jun 1997 15:41:30 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Wed, 25 Jun 1997 15:41:30 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id PAA02164; Wed, 25 Jun 1997 15:44:00 -0400
Date: Wed, 25 Jun 1997 15:44:00 -0400
Message-Id: <199706251944.PAA02164@MAILSERV-2HIGH-A.FTP.COM>
To: craig@aland.bbn.com
Subject: Re: TCP research issues from a tcp-impl perspective 
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: amr@hplms2.hpl.hp.com, tcp-impl@cthulhu.engr.sgi.com
Repository: mailserv-2high-a.ftp.com, [message accepted at Wed Jun 25 15:43:52 1997]
Originating-Client: tunes.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||      I wonder if anybody out there is doing research on how to regulate the
||    back-to-back "effect".
||
||Tim Shepard and I are working on it in a different context (turns out
||back-to-back packets also cause interesting overrun effects in routers
||in some circumstances).
||
Oooh cool; you mean if I crank up my Pentium w/ a fast enough ether-card
I can toast a router, nah, couldn't happen :-)


From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 12:51:12 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA29527 for tcp-impl-list; Wed, 25 Jun 1997 12:49:50 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA29517 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 12:49:48 -0700
Received: from picard.cs.ohiou.edu (picard.cs.ohiou.edu [132.235.3.128]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA29049
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 12:49:45 -0700
	env-from (sdo@picard.cs.ohiou.edu)
Received: from picard.cs.ohiou.edu by picard.cs.ohiou.edu (8.6.11/1.930630)
	id PAA03733; Wed, 25 Jun 1997 15:49:38 -0400
Message-Id: <199706251949.PAA03733@picard.cs.ohiou.edu>
To: Chris Schmechel <Chris.Schmechel@Eng.Sun.COM>
From: "Shawn Ostermann" <sdo@picard.cs.OhioU.Edu>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Input Needed - Testing Tools for TCP... 
Date: Wed, 25 Jun 1997 15:49:38 -0400
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> I'm finishing up an Internet Draft which is a survey of testing  
> tools for TCP.  As this time, I would like to solicit out to the  
> general group suggestions/inputs for tools to include, etc.

Several folks have gotten extremely useful information from the
tcptrace TCP analysis/visualization program that we did here.  Its
home page is at:

http://jarok.cs.ohiou.edu/software/tcptrace/tcptrace.html

Shawn
-------------------------------------------------------------------------
   Dr. Shawn Ostermann  -  Assistant Professor  -  Ohio University
      140 Morton Hall, Ohio University, Athens, Ohio  45701-2979
 ostermann@cs.ohiou.edu -- FAX: (614)593-0406 -- Voice: (614)593-1242
    http://www.cs.ohiou.edu/~osterman   http://jarok.cs.ohiou.edu

From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 12:56:45 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA01317 for tcp-impl-list; Wed, 25 Jun 1997 12:55:24 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA01302 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 12:55:22 -0700
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA01013
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 12:55:00 -0700
	env-from (craig@aland.bbn.com)
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id MAA15117; Wed, 25 Jun 1997 12:54:04 -0700 (PDT)
Message-Id: <199706251954.MAA15117@aland.bbn.com>
To: backman@ftp.com
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP research issues from a tcp-impl perspective 
In-reply-to: Your message of Wed, 25 Jun 97 15:44:00 -0400.
             <199706251944.PAA02164@MAILSERV-2HIGH-A.FTP.COM> 
Date: Wed, 25 Jun 97 12:54:04 -0700
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


    
    ||      I wonder if anybody out there is doing research on how to regulate 
   the
    ||    back-to-back "effect".
    ||
    ||Tim Shepard and I are working on it in a different context (turns out
    ||back-to-back packets also cause interesting overrun effects in routers
    ||in some circumstances).
    ||
    Oooh cool; you mean if I crank up my Pentium w/ a fast enough ether-card
    I can toast a router, nah, couldn't happen :-)

Actually no, you toast your own TCP connection....

Craig

From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 13:11:23 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA05259 for tcp-impl-list; Wed, 25 Jun 1997 13:09:48 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA05248 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 13:09:46 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA06728
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 13:09:45 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id NAA03407; Wed, 25 Jun 1997 13:02:47 -0700 (PDT)
Message-Id: <199706252002.NAA03407@daffy.ee.lbl.gov>
To: "Amr A. Awadallah" <amr@hplms2.hpl.hp.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP research issues from a tcp-impl perspective
In-reply-to: Your message of Wed, 25 Jun 1997 11:57:14 PDT.
Date: Wed, 25 Jun 1997 13:02:47 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>   Also fiddling with the ACK rate will change the rate at which cwnd
> increases, specially during slow-start.  If we ACK once every 4 packets,
> then this means cwnd will increase by 4 for each ACK ...

If following RFC 2001, cwnd increases by 1 for each ack, regardless of
how much new data was acked (provided it's > 0).  So acking once every
4 packets leads to cwnd opening *slower* (since the ack arrival rate is
lower) than acking every 2 packets.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 13:14:03 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA06155 for tcp-impl-list; Wed, 25 Jun 1997 13:12:46 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA06149 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 13:12:44 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA07998
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 13:12:43 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA26783>; Wed, 25 Jun 1997 13:12:31 -0700
Date: Wed, 25 Jun 1997 13:12:23 -0700
Posted-Date: Wed, 25 Jun 1997 13:12:23 -0700
Message-Id: <199706252012.NAA21200@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <NAA21200>; Wed, 25 Jun 1997 13:12:23 -0700
To: amr@hplms2.hpl.hp.com, craig@aland.bbn.com
Subject: Re: TCP research issues from a tcp-impl perspectiveT
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From owner-tcp-impl@cthulhu.engr.sgi.com Wed Jun 25 12:37:32 1997
> To: "Amr A. Awadallah" <amr@hplms2.hpl.hp.com>
> Cc: tcp-impl@cthulhu.engr.sgi.com
> Subject: Re: TCP research issues from a tcp-impl perspective 
> Date: Wed, 25 Jun 97 12:29:06 -0700
> From: Craig Partridge <craig@aland.bbn.com>
> 
> 
>       I wonder if anybody out there is doing research on how to regulate the
>     back-to-back "effect".
> 
> Tim Shepard and I are working on it in a different context (turns out
> back-to-back packets also cause interesting overrun effects in routers
> in some circumstances).
> 
> Craig
> 

This is related to the reason why Keshav's 'packet pair'
(sending two packets back to back through a path to discover
the bottleneck bandwidth) may not measure the max bandwidth.

I brought this up back in 90-91, but the claim was that 
'work conserving' queuing would avoid the problem. It may be
the case that this "effect" is an artifact that destroys the
work-conserving property of an otherwise-conserving discipline...

Joe

From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 13:19:37 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA07736 for tcp-impl-list; Wed, 25 Jun 1997 13:18:06 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA07708 for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 13:18:03 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA10139
	for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 13:17:53 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA26972>; Wed, 25 Jun 1997 13:16:56 -0700
Date: Wed, 25 Jun 1997 13:16:49 -0700
Posted-Date: Wed, 25 Jun 1997 13:16:49 -0700
Message-Id: <199706252016.NAA21314@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <NAA21314>; Wed, 25 Jun 1997 13:16:49 -0700
To: VOLZ@PROCESS.COM, TCP-IMPL@cthulhu.engr.sgi.com, vjs@mica.denver.sgi.com
Subject: Re:  ICMP must fragment and IPsec
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> To: VOLZ@PROCESS.COM, TCP-IMPL@cthulhu.engr.sgi.com
> Subject: Re:  ICMP must fragment and IPsec
> 
> > From: David Borman <dab@BSDI.COM>
> > To: TCP-IMPL@cthulhu.engr.sgi.com, vjs, VOLZ@PROCESS.COM
> 
> > ...
> > Vernon,
> > I disagree with you and agree with Bernie.  If you have both an
> > FDDI and and ethernet interface, it makes sense to use an MSS of 4352,
> > not 1500.  Then, if routing changes from the ethernet to the FDDI,
> > the Path MTU code can discover the larger MTU.  But if you sent a
> > 1500 MSS, you're stuck.
> > 
> > Most systems should be able to send an MSS of 65535, but it is
> > an optimization to use the MTU of the largest/outgoing interface.
> 
> 
> Why not use an MSS of 65535?  What is being optimized by the choice of
> MSS?  I think most systems can IP fragment faster than they can TCP
> segment, so they are better off sending 64K TCP segments as a bunch of
> IP fragments.  (I've liked that idea ever since someone told me about
> using it in an FDDI adapter for a certain supercomputer company's bus
> to get what was then an impressive TCP/IP/FDDI speed.)
> 
> The reason I see to use a smaller MSS is safety or conservatism.  If

Or latency. Reducing the size of the data chunks reduces the
store-and-forward latency throughout the path, including at the end
hosts.

> 
> In #2, at best your routers will be IP fragmenting.
> 

PS - once things fragment, and I lose a frag of a packet, the packet is
hosed. ATM all over again.

Joe

From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 14:04:15 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA21680 for tcp-impl-list; Wed, 25 Jun 1997 14:01:35 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA21660 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 14:01:31 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA23407
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 14:00:46 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id VAA06490; Wed, 25 Jun 1997 21:55:51 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wgzFs-0005FjC; Wed, 25 Jun 97 22:05 BST
Message-Id: <m0wgzFs-0005FjC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TCP research issues from a tcp-impl perspective
To: solensky@ftp.com (Frank T Solensky)
Date: Wed, 25 Jun 1997 22:05:32 +0100 (BST)
Cc: vern@ee.lbl.gov, backman@ftp.com, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199706251621.MAA06230@MAILSERV-2HIGH.FTP.COM> from "Frank T Solensky" at Jun 25, 97 12:25:28 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Could testing for the destination address being on the same subnet
> be this form of test?  Though I suspect the burtiness argument
> could also apply over a LAN as well..

Same subnet for ATM can easily be an international link - ditto for
private SMDS nets



From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 14:04:19 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA21840 for tcp-impl-list; Wed, 25 Jun 1997 14:01:54 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA21818 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 14:01:47 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA23725
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 14:01:10 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id WAA06540; Wed, 25 Jun 1997 22:00:16 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wgzKI-0005FfC; Wed, 25 Jun 97 22:10 BST
Message-Id: <m0wgzKI-0005FfC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TCP research issues from a tcp-impl perspective
To: amr@hplms2.hpl.hp.com (Amr A. Awadallah)
Date: Wed, 25 Jun 1997 22:10:06 +0100 (BST)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <33B16A0A.42FD@hpl.hp.com> from "Amr A. Awadallah" at Jun 25, 97 11:57:14 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> to this problem, but I don't see how. The source still does not have
> immediate information of how many packets left the network. The
> information is always delayed by the delayed-ACK period.

I played briefly with treating the cwnd as "packets/timeperiod" value
and over a 28.8 with some crude tests it seemed to reduce packet loss
and burstiness. Trying to extrapolate this to make it work at 10baseT
in the Linux kernel was basically not viable so I took it no further.
Also as Im not a mathematican of any kind I;ve no idea if the scheme
has any mathematical rights/wrongs



From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 14:12:02 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA25358 for tcp-impl-list; Wed, 25 Jun 1997 14:10:43 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA25345 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 14:10:40 -0700
Received: from hplms26.hpl.hp.com (hplms26.hpl.hp.com [15.255.168.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA28099
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 14:10:38 -0700
	env-from (amr@hplms2.hpl.hp.com)
Received: from hplms2.hpl.hp.com by hplms26.hpl.hp.com with ESMTP
	(1.37.109.16/15.5+ECS 3.3+HPL1.1S) id AA110093038; Wed, 25 Jun 1997 14:10:38 -0700
Received: from cslseed3 (cslseed3.hpl.hp.com) by hplms2.hpl.hp.com with SMTP
	(1.37.109.16/15.5+ECS 3.3+HPL1.1I) id AA121403045; Wed, 25 Jun 1997 14:10:45 -0700
Message-Id: <33B1894E.5063@hpl.hp.com>
Date: Wed, 25 Jun 1997 14:10:38 -0700
From: "Amr A. Awadallah" <amr@hplms2.hpl.hp.com>
Organization: HP Labs
X-Mailer: Mozilla 3.0 (WinNT; I)
Mime-Version: 1.0
To: Vern Paxson <vern@ee.lbl.gov>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP research issues from a tcp-impl perspective
References: <199706252002.NAA03407@daffy.ee.lbl.gov>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> If following RFC 2001, cwnd increases by 1 for each ack, regardless of
> how much new data was acked (provided it's > 0).  So acking once every
> 4 packets leads to cwnd opening *slower* (since the ack arrival rate is
> lower) than acking every 2 packets.

  This not only means that the source would be slower (gives 
advantage to other sources with a more frequent ACK rate), it also means
that conservation of packets will be violated during the fast recovery
period (cwnd inflation). The source is not sending a new packet into the
net for each packet that leaves, it is sending a new packet into the net
for each ACK it receives (regardless of how many packets this ACK might
be ACKing).

  I guess this explains why at the end of fast-recovery the source might
have not released into the network as much packets as left, hence the
window slides and (for large windows) a considerable number of packets
are sent out back-to-back.

Even though tcp_input.c has this comment:

		/*
		 * When new data is acked, open the congestion window.
		 * If the window gives us less than ssthresh packets
		 * in flight, open exponentially (maxseg per packet).
		 * Otherwise open linearly: maxseg per window
		 * (maxseg^2 / cwnd per packet).
		 */

It still does the per ACK increment not per packet.

Thanks for the clarification,

-- Amr

From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 14:37:13 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA03595 for tcp-impl-list; Wed, 25 Jun 1997 14:35:32 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA03571 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 14:35:30 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id OAA07058
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 14:35:29 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Wed, 25 Jun 1997 17:31:43 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Wed, 25 Jun 1997 17:31:43 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id RAA03054; Wed, 25 Jun 1997 17:34:13 -0400
Date: Wed, 25 Jun 1997 17:34:13 -0400
Message-Id: <199706252134.RAA03054@MAILSERV-2HIGH-A.FTP.COM>
To: vern@ee.lbl.gov
Subject: Re: TCP research issues from a tcp-impl perspective
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: amr@hplms2.hpl.hp.com, tcp-impl@cthulhu.engr.sgi.com
Repository: mailserv-2high-a.ftp.com, [message accepted at Wed Jun 25 17:34:06 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||>   Also fiddling with the ACK rate will change the rate at which cwnd
||> increases, specially during slow-start.  If we ACK once every 4 packets,
||> then this means cwnd will increase by 4 for each ACK ...
||
||If following RFC 2001, cwnd increases by 1 for each ack, regardless of
||how much new data was acked (provided it's > 0).  So acking once every
||4 packets leads to cwnd opening *slower* (since the ack arrival rate is
||lower) than acking every 2 packets.

ugh.  Presumably the decreased ACK rate was paired w/ a larger window to
really zip that LAN speed up.

Larger window, slower cwnd opening, hmmm.  



From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 14:59:48 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA10508 for tcp-impl-list; Wed, 25 Jun 1997 14:57:24 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA10502 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 14:57:22 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA16764
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 14:57:19 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id OAA03691; Wed, 25 Jun 1997 14:57:08 -0700 (PDT)
Message-Id: <199706252157.OAA03691@daffy.ee.lbl.gov>
To: touch@ISI.EDU
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP research issues from a tcp-impl perspectiveT
In-reply-to: Your message of Wed, 25 Jun 1997 13:12:23 PDT.
Date: Wed, 25 Jun 1997 14:57:08 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> This is related to the reason why Keshav's 'packet pair'
> (sending two packets back to back through a path to discover
> the bottleneck bandwidth) may not measure the max bandwidth.

With some added considerations and effort, packet pair (bunch) usually
works quite well.  This is discussed in a paper that I recently announced
(but not on tcp-impl):

	ftp://ftp.ee.lbl.gov/papers/vp-pkt-dyn-sigcomm97.ps.Z

- Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 15:15:24 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA15588 for tcp-impl-list; Wed, 25 Jun 1997 15:13:38 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA15582 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 15:13:36 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA24754
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 15:13:24 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id PAA03764; Wed, 25 Jun 1997 15:06:47 -0700 (PDT)
Message-Id: <199706252206.PAA03764@daffy.ee.lbl.gov>
To: "Amr A. Awadallah" <amr@hplms2.hpl.hp.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP research issues from a tcp-impl perspective
In-reply-to: Your message of Wed, 25 Jun 1997 14:10:38 PDT.
Date: Wed, 25 Jun 1997 15:06:47 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>   This not only means that the source would be slower (gives 
> advantage to other sources with a more frequent ACK rate), it also means
> that conservation of packets will be violated during the fast recovery
> period (cwnd inflation). The source is not sending a new packet into the
> net for each packet that leaves, it is sending a new packet into the net
> for each ACK it receives (regardless of how many packets this ACK might
> be ACKing).

But the acks that are arriving are dups - these are sent one per
out-of-sequence data packet received, and not delayed.  So the
conservation of packets remains correct (modulo lost acks).

>   I guess this explains why at the end of fast-recovery the source might
> have not released into the network as much packets as left, hence the
> window slides and (for large windows) a considerable number of packets
> are sent out back-to-back.

If all goes well, fast recovery doesn't suffer from any window sliding.

*But* there are some widespread Reno bugs in which the window is not deflated,
and those sure lead to bursts when fast recovery is over!  They're discussed in

	``Performance Problems in BSD4.4 TCP,''
	L. Brakmo and L. Peterson,
	Computer Communication Review, 25(5), pp. 69-84, October 1995.

and briefly in the paper I announced yesterday.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 15:22:42 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA17949 for tcp-impl-list; Wed, 25 Jun 1997 15:21:03 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA17820 for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 15:20:58 -0700
Received: from frantic.BSDI.COM (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA27889
	for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 15:20:47 -0700
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id RAA07092;
	Wed, 25 Jun 1997 17:20:38 -0500 (CDT)
Date: Wed, 25 Jun 1997 17:20:38 -0500 (CDT)
From: David Borman <dab@BSDI.COM>
Message-Id: <199706252220.RAA07092@frantic.BSDI.COM>
To: TCP-IMPL@cthulhu.engr.sgi.com, vjs@mica.denver.sgi.com, VOLZ@PROCESS.COM
Subject: Re:  ICMP must fragment and IPsec
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Date: Wed, 25 Jun 1997 13:09:13 -0600
> From: vjs@mica.denver.sgi.com (Vernon Schryver)
> Subject: Re:  ICMP must fragment and IPsec
> ...
> Why not use an MSS of 65535?  What is being optimized by the choice of
> MSS?  I think most systems can IP fragment faster than they can TCP
> segment, so they are better off sending 64K TCP segments as a bunch of
> IP fragments.  (I've liked that idea ever since someone told me about
> using it in an FDDI adapter for a certain supercomputer company's bus
> to get what was then an impressive TCP/IP/FDDI speed.)

Not a good idea.  If you loose one, you loose them all.  Many years ago
I ran into a case where two Crays with a 16K MTU for their local
networks were separated by the ARPANET.  The sent 16K TCP packets
that got fragmented.  The loss rate was such that they usually lost
one of the fragments.  Then they'd retransmit another 16k packet,
which lost one of the fragments, and the connection would sit like
that, never succeeding in doing any useful work.

Another item is that if you send a huge TCP packet, nothing can be
acked until the whole thing arrives at the remote system.  That
can really kill VJ slowstart/congestion avoidance.

> What if someone turns on that HIPPI interface with the 64K MTU that was
> turned off when the TCP connection was established?  (Those 700
> Mbit/sec TCP/IP/HIPPI numbers are a tad easier with a full size MSS.)

> The reason I see to use a smaller MSS is safety or conservatism.  If

Yes, I agree with that.  The only question is how conservative?
Old code said "576, if not local", that is too conservative.
Using the MTU of the outgoing interface is the best conservative
answer.  There are only two reasons why I'd argue for a larger
MSS:
	1) The routing could change, and switch to a larger
	   MTU path.  If you set MSS too small, you can't
	   take advantage of it.
	2) Asymetric routes.  If my path out goes through the
	   ethernet, but his path back goes through the FDDI,
	   then when I stick in 1500, he's hosed.

> PMTU discovery were perfect, there would be no reason to use anything
> except MSS=64K.  In the real world, if you use a larger MSS than your

Hosts that don't support PMTUD aren't supposed to send packets larger
than 576 into the outside world, irregardless of what was received
in the MSS option.

> minimum MTU, then you risk fragmentation if PMTU discovery is not used
> or usable by the TCP peer.  If you use too small an MSS, you pay in
> performance.  (As Rick Jones and I keep saying, a good value for the
> cost of using an MTU of 1500 over FDDI is 4352/1500, and that aint hay.)
> 
> In the FDDI/Ethernet case and if you always use MSS=4352, what happenss if:
> 
>   1. the peer could do PMTU discovery but it has been turned off
>     because some smart guy has installed some Cabletron or Network
>     Peripherals FDDI-Ethernet bridges, or a DEC FDDI-Ethernet bridge
>     and failed to set the bridge's IP address.

So you advocate hamstringing all of your FDDI connections, on the off
chance that some clueless person bridges FDDI<->ethernet?  Gag.
So provide a switch, if you must, but at least default to allowing
FDDI to use FDDI size packets...

>   2. the peer is one of the many millions of boxes that either do
>     not support PMTU discovery at all or do not support it by default.

Any host that does not support Path MTU discovery, and sends packets
larger than 576 to remote hosts, irregardless of the received MSS
value, is broken.  It's the old 4.2 hosts that acted like this that
caused all the MSS=576 for non-local connections in the first place!
I'd hope that we are beyond that!  (Yes, I know about subnetsarelocal,
and I'm just not going to worry about that in this discussion...)

> In #1, TCP will work for some stuff, but simply not work at all for big
> stuff.  At SGI we demonstrated that between a couple of major internal
> email systems, where small messages went just fine, but bigger messages
> started and then the TCP connection would mysteriously die as the
> bridges silently discarded packets with DF=1.

That's where the black-hole-detection portion of your PMTUD
implementation should kick in.

> In #2, at best your routers will be IP fragmenting.

Only if the non-PMTUD hosts are stupid enough to send large
packets to remotes hosts.

> Vernon Schryver,  vjs@sgi.com

		-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 16:34:04 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA08473 for tcp-impl-list; Wed, 25 Jun 1997 16:32:04 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA08453 for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 16:32:02 -0700
Received: from ftp.com (ftp.com [128.127.2.122]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id QAA16750
	for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 16:32:01 -0700
	env-from (backman@mailserv-2high-a.ftp.com)
Received: from ftp.com by ftp.com  ; Wed, 25 Jun 1997 19:31:57 -0400
Received: from mailserv-2high-a.ftp.com by ftp.com  ; Wed, 25 Jun 1997 19:31:57 -0400
Received: by MAILSERV-2HIGH-A.FTP.COM (SMI-8.6/SMI-SVR4)
	id TAA03377; Wed, 25 Jun 1997 19:34:28 -0400
Date: Wed, 25 Jun 1997 19:34:28 -0400
Message-Id: <199706252334.TAA03377@MAILSERV-2HIGH-A.FTP.COM>
To: dab@BSDI.COM
Subject: Re:  ICMP must fragment and IPsec
From: backman@ftp.com (Larry Backman)
Reply-To: backman@ftp.com
Cc: TCP-IMPL@cthulhu.engr.sgi.com, vjs@mica.denver.sgi.com, VOLZ@PROCESS.COM
Repository: mailserv-2high-a.ftp.com, [message accepted at Wed Jun 25 19:34:25 1997]
Originating-Client: vxd-eth.ftp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


||Not a good idea.  If you loose one, you loose them all.  Many years ago
||I ran into a case where two Crays with a 16K MTU for their local
||networks were separated by the ARPANET.  The sent 16K TCP packets
||that got fragmented.  The loss rate was such that they usually lost
||one of the fragments.  Then they'd retransmit another 16k packet,
||which lost one of the fragments, and the connection would sit like
||that, never succeeding in doing any useful work.
||

hmmm.  where have I seen this before :-)  Can you say in my NFS past
w/ 8K UDP packets and no backoff.  Which led to us, as well as everyone
else, implementing TCP as part of their NFS.

I fondly remember one certain vendor's ethernet card which reliably
dropped packet 4 of a 6 packet 8K UDP NFS_READ.  We had many happy
customers the day we added "autotuning" to NFS which was nothing more
than slow start with a high water cwnd applied to NFS.

And I know this is alien to the list but it seems to me that as we
are relying on MTU discovery to solve all our problems because we
are only dealing with TCP, good old NFS over UDP keeps chugging along,
churning out those large atomic UDP writes.  Granted that NFS over TCP
is somewhat standard now; but an uncontrolled NFS over UDP can make
a mess out of a pair of LAN's interconnected by a slow WAN bridge.

L.







From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 17:31:29 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA25079 for tcp-impl-list; Wed, 25 Jun 1997 17:29:36 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA25065 for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 17:29:33 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id SAA02756; Wed, 25 Jun 1997 18:29:26 -0600
Date: Wed, 25 Jun 1997 18:29:26 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706260029.SAA02756@mica.denver.sgi.com>
To: VOLZ@PROCESS.COM, TCP-IMPL@cthulhu.engr.sgi.com
Subject: Re:  ICMP must fragment and IPsec
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: David Borman <dab@BSDI.COM>

> > Why not use an MSS of 65535? ...

> Not a good idea.  If you loose one, you loose them all. ...

True.

> ...
> Another item is that if you send a huge TCP packet, nothing can be
> acked until the whole thing arrives at the remote system.  That
> can really kill VJ slowstart/congestion avoidance.

True, but it sure helps if you're running MByte windows on local pipes.

> ...
> > The reason I see to use a smaller MSS is safety or conservatism.  If
> 
> Yes, I agree with that.  The only question is how conservative?
>
> Old code said "576, if not local", that is too conservative.

There is a spectrum:
    1. MSS=576.
    2. MSS=MTU of interface.
    3. MSS=max(MTU's of non-local interfaces).
    4. MSS=9180, 16K, 32K, 65335 or some other special number.

We all agree #1 is too conservative and at least the 64K end of #4
is too risky.

> Using the MTU of the outgoing interface is the best conservative
> answer.

It's also quick and easy to compute and more well defined than #3.

>          There are only two reasons why I'd argue for a larger
> MSS:
> 	1) The routing could change, and switch to a larger
> 	   MTU path.  If you set MSS too small, you can't
> 	   take advantage of it.
> 	2) Asymetric routes.  If my path out goes through the
> 	   ethernet, but his path back goes through the FDDI,
> 	   then when I stick in 1500, he's hosed.

True, but
  - when routing is subject to much change, performance is generally
    less of an issue than having things work.
  - topologies complicated enough for assyemtric paths are usually
    (but not always) either missconfigured (e.g. interface metrics)
    or complicated enough to make working a bigger question than
    performance.

> > PMTU discovery were perfect, there would be no reason to use anything
> > except MSS=64K.  In the real world, if you use a larger MSS than your
> 
> Hosts that don't support PMTUD aren't supposed to send packets larger
> than 576 into the outside world, irregardless of what was received
> in the MSS option.

If that 576 were changed to something reasonable for 1990, such as
1500, then that sentiment would be influential.  But since 576 is so
tiny, that old injunction is and will be widely ignored.  Hosts that
don't have PMTU discover use subnetsarelocal or allnetsarelocal=1.
I think in the real world, there are 2 main choices, PMTU discovery
or *arelocal.  The third, using 576, is minor.

Besides, 576 doesn't help the bridge problem.

> ...
> So you advocate hamstringing all of your FDDI connections, on the off
> chance that some clueless person bridges FDDI<->ethernet?  Gag.

Of course I oppose using 1500 on FDDI.  Instead, I the persistently clue
challenged about their personal problems.  (I also years ago added a
ridiculous-MTU.switch so that they can make things work after we've
discussed their foolishness.)


> So provide a switch, if you must, but at least default to allowing
> FDDI to use FDDI size packets...
> ...

Yes, at least when the path is known to start at 4352.

The question is whether it is reasonable to tell the TCP peer to use
4352 when you know the peer's packets are likely to be fragmented onto
your Ethernet (except when one of those wonderful bridges that neither
fragment nor send ICMP messages is involved, so the packets are just
dropped).

Those bridges are almost enough of a peeve of mine to advocate such a
practice, since it would make them useless except when the those buy
them configure their hosts to use 1500 instead of 4352 over FDDI.  You
wouldn't want every TCP connection to suffer enough timeouts to invoke
a PMTU Discovery blackhole mechanism.  I suspect that at most vendors,
including SGI, I'd be overruled.



You'd be doing the world a favor if in the document said that any
FDDI-Ethernet bridge that does not fragment is junk.  That would at
least make it easier to argue with such the dupes of Cabletron and NPI
salescritters that for years claimed that just dropping big IP/FDDI
packets is just fine, causing neither performance nor interoperability
problems.  (Yes, I understand NPI and Cabletron have started to do
something to fix their products, but only after publishing a lot of
'interesting' statements.)


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Wed Jun 25 22:47:38 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA13145 for tcp-impl-list; Wed, 25 Jun 1997 22:45:59 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id WAA13140 for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 22:45:57 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id WAA13864
	for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 25 Jun 1997 22:45:56 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id WAA04984; Wed, 25 Jun 1997 22:45:54 -0700 (PDT)
Message-Id: <199706260545.WAA04984@daffy.ee.lbl.gov>
To: vjs@mica.denver.sgi.com (Vernon Schryver)
Cc: VOLZ@PROCESS.COM, TCP-IMPL@cthulhu.engr.sgi.com
Subject: Re: ICMP must fragment and IPsec
In-reply-to: Your message of Wed, 25 Jun 1997 18:29:26 PDT.
Date: Wed, 25 Jun 1997 22:45:54 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>   - topologies complicated enough for assyemtric paths are usually
>     (but not always) either missconfigured (e.g. interface metrics)
>     or complicated enough to make working a bigger question than
>     performance.

Not at all - about half of the 1,000 Internet paths I measured in my
routing study had a significant asymmetry.  The paper's available from:

	ftp://ftp.ee.lbl.gov/papers/routing.SIGCOMM.ps.Z

- Vern (the other one)

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun 26 05:50:09 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id FAA20900 for tcp-impl-list; Thu, 26 Jun 1997 05:48:44 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA20890 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 26 Jun 1997 05:48:42 -0700
Received: from socks1.raleigh.ibm.com (socks1.raleigh.ibm.com [204.146.167.124]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id FAA07799
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 26 Jun 1997 05:48:41 -0700
	env-from (narten@raleigh.ibm.com)
Received: from rtpmail01.raleigh.ibm.com by socks1.raleigh.ibm.com (AIX 4.1/UCB 5.64/RTP-FW1.0)
          id AA23480; Thu, 26 Jun 1997 08:48:39 -0400
Received: from cichlid.raleigh.ibm.com (cichlid.raleigh.ibm.com [9.37.83.123])
	by rtpmail01.raleigh.ibm.com (8.8.5/8.8.5/RTP-ral-1.1) with SMTP id IAA35780;
	Thu, 26 Jun 1997 08:48:39 -0400
Received: from lig32-224-57-91.us.lig-dial.ibm.com by cichlid.raleigh.ibm.com (AIX 4.1/UCB 5.64/4.03-RAL)
          id AA11942; Thu, 26 Jun 1997 08:47:20 -0400
Received: from hygro.raleigh.ibm.com (localhost [127.0.0.1]) by hygro.raleigh.ibm.com (8.7.6/8.7.3) with ESMTP id IAA00992; Thu, 26 Jun 1997 08:47:28 -0400
Message-Id: <199706261247.IAA00992@hygro.raleigh.ibm.com>
To: Vern Paxson <vern@ee.lbl.gov>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP research issues from a tcp-impl perspective 
In-Reply-To: Your message of "Tue, 24 Jun 1997 16:02:58 PDT."
             <199706242302.QAA01755@daffy.ee.lbl.gov> 
Date: Thu, 26 Jun 1997 08:47:28 -0400
From: Thomas Narten <narten@raleigh.ibm.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 	- What about deploying Vegas?

I'd suggest adding: How to keep the send window from expanding
excessively once the maximum throughput has been attained. For
example, on modem links, don't keep increasing the send window when
doing so simply increases queuing delays without making better
utilization of the links. Vegas has features that seem to help in this
space.

Thomas

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun 26 07:40:46 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA04494 for tcp-impl-list; Thu, 26 Jun 1997 07:39:18 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA04479 for <TCP-IMPL@cthulhu.engr.sgi.com>; Thu, 26 Jun 1997 07:39:16 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id IAA03854; Thu, 26 Jun 1997 08:39:11 -0600
Date: Thu, 26 Jun 1997 08:39:11 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199706261439.IAA03854@mica.denver.sgi.com>
To: TCP-IMPL@cthulhu.engr.sgi.com, VOLZ@PROCESS.COM
Subject: Re: ICMP must fragment and IPsec
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> To: vjs (Vernon Schryver)
> Cc: VOLZ@PROCESS.COM, TCP-IMPL@cthulhu.engr.sgi.com
> From: Vern Paxson <vern@ee.lbl.gov>
> 
> >   - topologies complicated enough for assyemtric paths are usually
> >     (but not always) either missconfigured (e.g. interface metrics)
> >     or complicated enough to make working a bigger question than
> >     performance.
> 
> Not at all - about half of the 1,000 Internet paths I measured in my
> routing study had a significant asymmetry.  The paper's available from:
> 
> 	ftp://ftp.ee.lbl.gov/papers/routing.SIGCOMM.ps.Z

I think you are agreeing with me.

As I see it, any non-trivial path through the Internet is "complicated
enough to make working a bigger question than performance."  Among
those 1000 paths, how many would you recommend use an MTU and an MSS of
32K and IP fragment in order to squeeze an extra 5% out of the hosts?
How many were fast paths where such games are profitable?  (In the
mid-1990's, 100 Mbit/sec for a single TCP connection is (was) the
slowest you might consider "fast.")


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Thu Jun 26 11:02:34 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA00843 for tcp-impl-list; Thu, 26 Jun 1997 10:58:53 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA00837 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 26 Jun 1997 10:58:51 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA03558
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 26 Jun 1997 10:58:45 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id KAA05759; Thu, 26 Jun 1997 10:58:37 -0700 (PDT)
Message-Id: <199706261758.KAA05759@daffy.ee.lbl.gov>
To: Thomas Narten <narten@raleigh.ibm.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP research issues from a tcp-impl perspective 
In-reply-to: Your message of Thu, 26 Jun 1997 08:47:28 PDT.
Date: Thu, 26 Jun 1997 10:58:37 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > 	- What about deploying Vegas?
> 
> I'd suggest adding: How to keep the send window from expanding
> excessively once the maximum throughput has been attained.

A good one!  Added to the list.

	Thanks,

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Mon Jun 30 13:12:06 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA05397 for tcp-impl-list; Mon, 30 Jun 1997 13:09:50 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA05325 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 30 Jun 1997 13:09:39 -0700
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA01946
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 30 Jun 1997 13:09:37 -0700
	env-from (curtis@brookfield.ans.net)
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.8.5/8.7.3) with ESMTP id QAA08999; Mon, 30 Jun 1997 16:06:08 -0400 (EDT)
Message-Id: <199706302006.QAA08999@brookfield.ans.net>
To: backman@ftp.com
cc: craig@aland.bbn.com, amr@hplms2.hpl.hp.com, tcp-impl@cthulhu.engr.sgi.com
Reply-To: curtis@ans.net
Subject: Re: TCP research issues from a tcp-impl perspective 
In-reply-to: Your message of "Wed, 25 Jun 1997 15:44:00 EDT."
             <199706251944.PAA02164@MAILSERV-2HIGH-A.FTP.COM> 
Date: Mon, 30 Jun 1997 16:06:08 -0400
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <199706251944.PAA02164@MAILSERV-2HIGH-A.FTP.COM>, Larry Backman writ
es:
> 
> Oooh cool; you mean if I crank up my Pentium w/ a fast enough ether-card
> I can toast a router, nah, couldn't happen :-)


There is more than one active host on the Internet (by a long shot).
Even on some LANs.  If you send back to back you are more likely to
see a multiple drop due to the use of tail drop.  Queues are finite
and TCP tends to periodically fill the queue at the bottleneck.

Curtis


From owner-tcp-impl@relay.engr.sgi.com  Mon Jun 30 13:17:49 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA06843 for tcp-impl-list; Mon, 30 Jun 1997 13:15:47 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA06834 for <TCP-IMPL@cthulhu.engr.sgi.com>; Mon, 30 Jun 1997 13:15:45 -0700
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA03433
	for <TCP-IMPL@cthulhu.engr.sgi.com>; Mon, 30 Jun 1997 13:15:43 -0700
	env-from (curtis@brookfield.ans.net)
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.8.5/8.7.3) with ESMTP id QAA09044; Mon, 30 Jun 1997 16:15:13 -0400 (EDT)
Message-Id: <199706302015.QAA09044@brookfield.ans.net>
To: touch@ISI.EDU
cc: VOLZ@PROCESS.COM, TCP-IMPL@cthulhu.engr.sgi.com, vjs@mica.denver.sgi.com
Reply-To: curtis@ans.net
Subject: Re: ICMP must fragment and IPsec 
In-reply-to: Your message of "Wed, 25 Jun 1997 13:16:49 PDT."
             <199706252016.NAA21314@rum.isi.edu> 
Date: Mon, 30 Jun 1997 16:15:13 -0400
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <199706252016.NAA21314@rum.isi.edu>, touch@ISI.EDU writes:
> > 
> > The reason I see to use a smaller MSS is safety or conservatism.  If
> 
> Or latency. Reducing the size of the data chunks reduces the
> store-and-forward latency throughout the path, including at the end
> hosts.


1500B * (8b/B) / 1.5 Mb/s -> 8 msec.  That's T1 (a slow interface).
At 56K this becomes a bit over 200 msec so it makes sense to lower the
MTU on the 56K link not the MSS at the host.

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Mon Jun 30 23:56:38 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA03390 for tcp-impl-list; Mon, 30 Jun 1997 23:54:46 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA03381 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 30 Jun 1997 23:54:44 -0700
Received: from merit.edu (merit.edu [198.108.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id XAA28242
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 30 Jun 1997 23:54:42 -0700
	env-from (wsimpson@greendragon.com)
Received: from Bill.Simpson.DialUp.Mich.Net (pm002-04.dialip.mich.net [141.211.7.140])
	by merit.edu (8.8.5/8.8.5) with SMTP id CAA18178
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 1 Jul 1997 02:54:40 -0400 (EDT)
Date: Tue, 1 Jul 97 04:11:45 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <6162.wsimpson@greendragon.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP research issues from a tcp-impl perspective
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

I'm marking the ones I care about:

>         - How to compute RTO when timing more than one packet per RTT
>           (i.e., how to adjust the constants for the exponentially weighted
>           moving average)
>
Uh, sounds like it could be harmful.  Why?


>         - Is the initial slow-start cwnd going to be increased, and if
>           so, to what?
>
minimum( 2*MSS remote, 2*MSS local calculated from MTU );


>         - How long a sending pause merits a new slow-start
>
Simple, use delayed-Ack TO (DATO)!  If you already released the channel
that long, you have to assume that someone else has begun to fill it, at
least with 1/2 DATO bandwidth worth of packets.  And it won't hurt as
much to slow start when there already has been your own delay.


>         - Is it time to revisit constants like MSL and initial RTO?
>
Yeah, drop MSL to no more than 30 seconds, but IRTO should stay at
current level (3 seconds).  Actually, I have to hand tune IRTO to 5-6
seconds on 14.4Kbps modems here to completely avoid early unnecessary
retransmissions, but 3-4 seconds works fine for 28.8Kbps.  Modem speeds
are finally getting closer to the design speeds for the ARPAnet....

Also, MinRTO should be 200 ms and DATO should be 200-300 ms, never
shorter.  It would save a heck of a lot of useless retransmissions I see
every day.  I'm guessing that a couple stacks are using around 50 ms for
each, and that's much too short, even at T3!  All it takes is one burp
of congestion, and I see an avalanche effect.

Of course, my numbers are imperical, based on many years of playing with
slow WAN links.  I'm afraid all the research is looking at fast links.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Tue Jul  1 01:23:15 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA14962 for tcp-impl-list; Tue, 1 Jul 1997 01:21:14 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id BAA14935 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 1 Jul 1997 01:21:10 -0700
Received: from snowcrash.cymru.net ([163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id BAA13107
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 1 Jul 1997 01:21:09 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (root@centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id JAA24932; Tue, 1 Jul 1997 09:19:08 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wiy5I-0005FfC; Tue, 1 Jul 97 09:14 BST
Message-Id: <m0wiy5I-0005FfC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TCP research issues from a tcp-impl perspective
To: wsimpson@greendragon.com (William Allen Simpson)
Date: Tue, 1 Jul 1997 09:14:48 +0100 (BST)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <6162.wsimpson@greendragon.com> from "William Allen Simpson" at Jul 1, 97 04:11:45 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> >         - Is it time to revisit constants like MSL and initial RTO?
> Yeah, drop MSL to no more than 30 seconds, but IRTO should stay at
> current level (3 seconds).  Actually, I have to hand tune IRTO to 5-6

Before you drop the MSL and IRTO times you might want to test some
really slow links

> Of course, my numbers are imperical, based on many years of playing with
> slow WAN links.  I'm afraid all the research is looking at fast links.

Look at some really slow radio links while you are at it. This is one hop
on a half duplex 9600 baud radio link with a bit of queueing occuring

64 bytes from AAA.BBB.CCC.DDD: icmp_seq=1 ttl=64 time=1120.9 ms
64 bytes from AAA.BBB.CCC.DDD: icmp_seq=2 ttl=64 time=1260.6 ms
64 bytes from AAA.BBB.CCC.DDD: icmp_seq=3 ttl=64 time=1720.6 ms


Alan


From owner-tcp-impl@relay.engr.sgi.com  Tue Jul  1 08:39:57 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA24304 for tcp-impl-list; Tue, 1 Jul 1997 08:37:59 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA24296 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 1 Jul 1997 08:37:55 -0700
Received: from rekk.dna.lth.se (rekk.dna.lth.se [130.235.16.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA24887
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 1 Jul 1997 08:37:53 -0700
	env-from (erics@rekk.dna.lth.se)
Received: from rekk.dna.lth.se (localhost [127.0.0.1])
	by rekk.dna.lth.se (8.8.5/8.8.5) with ESMTP id RAA00364;
	Tue, 1 Jul 1997 17:36:03 +0200
Message-Id: <199707011536.RAA00364@rekk.dna.lth.se>
To: "William Allen Simpson" <wsimpson@greendragon.com>
cc: Eric.Schenk@dna.lth.se, tcp-impl@cthulhu.engr.sgi.com
From: Eric.Schenk@dna.lth.se
Subject: Re: TCP research issues from a tcp-impl perspective 
In-reply-to: Your message of "Tue, 01 Jul 1997 04:11:45 GMT."
             <6162.wsimpson@greendragon.com> 
Date: Tue, 01 Jul 1997 17:36:03 +0200
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


"William Allen Simpson" <wsimpson@greendragon.com> writes:
>I'm marking the ones I care about:
>
>>         - How to compute RTO when timing more than one packet per RTT
>>           (i.e., how to adjust the constants for the exponentially weighted
>>           moving average)
>>
>Uh, sounds like it could be harmful.  Why?

>From memory, RFC1323 implies that timestamps allow this to be done,
and that some sort of aliasing effect in the RTO calculation would be 
removed if an RTT sample were taken on every packet. Looking
at a slightly old version of the BSD code reveals that when RFC1323
timestamps are used a sample will be taken on every packet.
This will mean that old data will decay out of the RTT much faster
than normal when using RFC1323. The open question appears to be,
to me, whether it should be mandated that RTT samples must only be
taken once per round trip, or if some adjusted VJ style filter
should be applied when taking one sample per packet.

In practice I suspect the coarse 500ms timer for RTO in BSD based
code means that any problem introduced by sampling on every packet
will be hidden on networks with a real RTT of less than 500ms
when the pipeline is full. 

>>         - How long a sending pause merits a new slow-start
>>
>Simple, use delayed-Ack TO (DATO)!  If you already released the channel
>that long, you have to assume that someone else has begun to fill it, at
>least with 1/2 DATO bandwidth worth of packets.  And it won't hurt as
>much to slow start when there already has been your own delay.

Is this much different than the 1 RTO timeout currently used
in several implementations? And do you want to imply that DATO
is fixed (e.g. to 200ms) or should it be allowed to be a computed
quanitity?

>Also, MinRTO should be 200 ms and DATO should be 200-300 ms, never
>shorter.  It would save a heck of a lot of useless retransmissions I see
>every day.  I'm guessing that a couple stacks are using around 50 ms for
>each, and that's much too short, even at T3!  All it takes is one burp
>of congestion, and I see an avalanche effect.

MinRTO must never be smaller than DATO, or you will fall over against
anything that does delayed ACKs. In practice this currently means that
you must not set MinRTO to less than 200ms. If we allow DATO to vary,
e.g. as computed from packet interarrival times, then the issue
becomes even more cloudy. Ideally the DATO should be included in
the calculation of RTT, but it is not clear how to accomplish this,
since it is not easily determined which ACKs have been delayed and which
have not, and also it is not necessarily the case that the DATO will be
fixed.

-- 
Eric Schenk                               www: http://www.dna.lth.se/~erics
Dept. of Comp. Sci., Lund University          email: Eric.Schenk@dna.lth.se
Box 118, S-221 00 LUND, Sweden   fax: +46-46 13 10 21  ph: +46-46 222 96 38



From owner-tcp-impl@relay.engr.sgi.com  Tue Jul  1 09:24:20 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA02242 for tcp-impl-list; Tue, 1 Jul 1997 09:22:18 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA02228 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 1 Jul 1997 09:22:13 -0700
Received: from frantic.BSDI.COM (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA05502
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 1 Jul 1997 09:22:03 -0700
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id LAA17249;
	Tue, 1 Jul 1997 11:21:50 -0500 (CDT)
Date: Tue, 1 Jul 1997 11:21:50 -0500 (CDT)
From: David Borman <dab@BSDI.COM>
Message-Id: <199707011621.LAA17249@frantic.BSDI.COM>
To: Eric.Schenk@dna.lth.se, wsimpson@greendragon.com
Subject: Re: TCP research issues from a tcp-impl perspective
Cc: tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From owner-tcp-impl@cthulhu.engr.sgi.com Tue Jul  1 10:51:38 1997
> To: "William Allen Simpson" <wsimpson@greendragon.com>
> cc: Eric.Schenk@dna.lth.se, tcp-impl@cthulhu.engr.sgi.com
> From: Eric.Schenk@dna.lth.se
> Subject: Re: TCP research issues from a tcp-impl perspective 
> Date: Tue, 01 Jul 1997 17:36:03 +0200
> Precedence: bulk
> 
> 
> "William Allen Simpson" <wsimpson@greendragon.com> writes:
> >I'm marking the ones I care about:
> >
> >>         - How to compute RTO when timing more than one packet per RTT
> >>           (i.e., how to adjust the constants for the exponentially weighted
> >>           moving average)
> >>
> >Uh, sounds like it could be harmful.  Why?
> 
> >From memory, RFC1323 implies that timestamps allow this to be done,
> and that some sort of aliasing effect in the RTO calculation would be 
> removed if an RTT sample were taken on every packet. Looking
> at a slightly old version of the BSD code reveals that when RFC1323
> timestamps are used a sample will be taken on every packet.
> This will mean that old data will decay out of the RTT much faster
> than normal when using RFC1323. The open question appears to be,
> to me, whether it should be mandated that RTT samples must only be
> taken once per round trip, or if some adjusted VJ style filter
> should be applied when taking one sample per packet.
> 
> In practice I suspect the coarse 500ms timer for RTO in BSD based
> code means that any problem introduced by sampling on every packet
> will be hidden on networks with a real RTT of less than 500ms
> when the pipeline is full. 

*sigh*... I'm behind on getting the revision to RFC1323 out, I'm
trying to get the final editing done in the next week or so.  One
addition to 1323+ will be a short discussion about the fact that
when when using timestamps, you can get more than 1 RTT sample
per RTT, and the code that figures out the SRTT needs to take
this into account.

			David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Wed Jul  2 02:02:31 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id CAA01542 for tcp-impl-list; Wed, 2 Jul 1997 02:00:22 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id CAA01533 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 02:00:14 -0700
Received: from labinfo.iet.unipi.it (labinfo.iet.unipi.it [131.114.9.5]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id CAA09379
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 02:00:07 -0700
	env-from (luigi@labinfo.iet.unipi.it)
Received: from localhost (luigi@localhost) by labinfo.iet.unipi.it (8.6.5/8.6.5) id JAA14377; Wed, 2 Jul 1997 09:58:06 +0200
From: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Message-Id: <199707020758.JAA14377@labinfo.iet.unipi.it>
Subject: Re: TCP research issues from a tcp-impl perspective
To: dab@BSDI.COM (David Borman)
Date: Wed, 2 Jul 1997 09:58:05 +0200 (MET DST)
Cc: Eric.Schenk@dna.lth.se, wsimpson@greendragon.com,
        tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199707011621.LAA17249@frantic.BSDI.COM> from "David Borman" at Jul 1, 97 11:21:31 am
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 1847      
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

About the RTT filtering issue:

> *sigh*... I'm behind on getting the revision to RFC1323 out, I'm
> trying to get the final editing done in the next week or so.  One
> addition to 1323+ will be a short discussion about the fact that
> when when using timestamps, you can get more than 1 RTT sample
> per RTT, and the code that figures out the SRTT needs to take
> this into account.

I think that the purpose of filtering RTT should be clarified.
Should SRTT track averages, maximums, remove exceedingly large or
exceedingly small samples, or what; and how many RTTs it takes for
a sample to be (practically) forgotten.  Probably a recommendation
should be made to use high resolution timings when possible.

In order to correct the computation when more samples per RTT are
available, you also need to know (dynamically) how many samples
you have per RTT, but this is a bit tricky to implement, and probably
not worth the effort, since there might be such a big variance on
RTT estimates on top of what the real network RTT is (because of
low resolution timers, delayed acks, or acks triggered by user
processes reading data, etc.) that one might also wonder if linear
filtering is appropriate at all for the purpose (e.g. one might
want to compute

	NEWRTT = MAX(alpha*OLDRTT + (1-alpha)*SAMPLE , SAMPLE )

to track large samples more effectively, in order to have more
conservative timeouts...

	Cheers
	Luigi
-----------------------------+--------------------------------------
Luigi Rizzo                  |  Dip. di Ingegneria dell'Informazione
email: luigi@iet.unipi.it    |  Universita' di Pisa
tel: +39-50-568533           |  via Diotisalvi 2, 56126 PISA (Italy)
fax: +39-50-568522           |  http://www.iet.unipi.it/~luigi/
_____________________________|______________________________________
filtering should aim more at removing 



From owner-tcp-impl@relay.engr.sgi.com  Wed Jul  2 07:34:11 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA27863 for tcp-impl-list; Wed, 2 Jul 1997 07:32:11 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA27835 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 07:32:08 -0700
Received: from merit.edu (merit.edu [198.108.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA28372
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 07:32:06 -0700
	env-from (wsimpson@greendragon.com)
Received: from Bill.Simpson.DialUp.Mich.Net (pm036-08.dialip.mich.net [141.211.7.50])
	by merit.edu (8.8.5/8.8.5) with SMTP id KAA20100
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 10:32:03 -0400 (EDT)
Date: Wed, 2 Jul 97 13:54:57 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <6176.wsimpson@greendragon.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP research issues from a tcp-impl perspective
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Excellent points!  And thanks for the ping data, Alan.


> From: Eric.Schenk@dna.lth.se
> >>	     - How long a sending pause merits a new slow-start
> >>
> >Simple, use delayed-Ack TO (DATO)!  If you already released the channel
> >that long, you have to assume that someone else has begun to fill it, at
> >least with 1/2 DATO bandwidth worth of packets.  And it won't hurt as
> >much to slow start when there already has been your own delay.
>
> Is this much different than the 1 RTO timeout currently used
> in several implementations? And do you want to imply that DATO
> is fixed (e.g. to 200ms) or should it be allowed to be a computed
> quanitity?
>
(As you opine later) DATO is always less than RTO, isn't it?  So, I'm
suggesting a shorter time period than RTO.

My estimation is that, on average, 1/2 the bandwidth since the last data
you sent (not received) could be used by someone else in your outgoing
path.  And that value is reflected by the delayed Ack time, not the
entire RTT.


> MinRTO must never be smaller than DATO, or you will fall over against
> anything that does delayed ACKs. In practice this currently means that
> you must not set MinRTO to less than 200ms.

Hmmm, wouldn't the DATO of the peer be actually included in the RTT
calculation, as the Ack shows up later?  So, even when DATO of the peer
is longer, MinRTO won't be hit unless the peer was sending bidirectional
data, and then suddenly quits.

I just looked and actually use MinRTO of 500 ms, but MinATO of 200 ms.


> If we allow DATO to vary,
> e.g. as computed from packet interarrival times, then the issue
> becomes even more cloudy. Ideally the DATO should be included in
> the calculation of RTT, but it is not clear how to accomplish this,
> since it is not easily determined which ACKs have been delayed and which
> have not, and also it is not necessarily the case that the DATO will be
> fixed.
>
I've been using:
	uint32 dato = tcb->to.srtt / 2
			+ tcb->to.mdev;
	dato = max( dato, 500 );
	dato = min( dato, 200 );

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26	DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3	59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Wed Jul  2 08:43:56 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA07269 for tcp-impl-list; Wed, 2 Jul 1997 08:41:40 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA07264 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 08:41:38 -0700
Received: from rekk.dna.lth.se (rekk.dna.lth.se [130.235.16.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA15618
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 08:41:35 -0700
	env-from (erics@rekk.dna.lth.se)
Received: from rekk.dna.lth.se (localhost [127.0.0.1])
	by rekk.dna.lth.se (8.8.5/8.8.5) with ESMTP id RAA08232;
	Wed, 2 Jul 1997 17:39:47 +0200
Message-Id: <199707021539.RAA08232@rekk.dna.lth.se>
To: "William Allen Simpson" <wsimpson@greendragon.com>
cc: Eric.Schenk@dna.lth.se, tcp-impl@cthulhu.engr.sgi.com
From: Eric.Schenk@dna.lth.se
Subject: Re: TCP research issues from a tcp-impl perspective 
In-reply-to: Your message of "Wed, 02 Jul 1997 13:54:57 GMT."
             <6176.wsimpson@greendragon.com> 
Date: Wed, 02 Jul 1997 17:39:47 +0200
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


"William Allen Simpson" <wsimpson@greendragon.com> writes:
>> Is this much different than the 1 RTO timeout currently used
>> in several implementations? And do you want to imply that DATO
>> is fixed (e.g. to 200ms) or should it be allowed to be a computed
>> quanitity?
>>
>(As you opine later) DATO is always less than RTO, isn't it?  So, I'm
>suggesting a shorter time period than RTO.
>
>My estimation is that, on average, 1/2 the bandwidth since the last data
>you sent (not received) could be used by someone else in your outgoing
>path.  And that value is reflected by the delayed Ack time, not the
>entire RTT.

This seems to make intuitive sense to me, but I would want to run
a lot of testing with this change to see what the real effects are.

>> MinRTO must never be smaller than DATO, or you will fall over against
>> anything that does delayed ACKs. In practice this currently means that
>> you must not set MinRTO to less than 200ms.
>
>Hmmm, wouldn't the DATO of the peer be actually included in the RTT
>calculation, as the Ack shows up later?  So, even when DATO of the peer
>is longer, MinRTO won't be hit unless the peer was sending bidirectional
>data, and then suddenly quits.
>
>I just looked and actually use MinRTO of 500 ms, but MinATO of 200 ms.

There are a few different issues mixed in here I think.
First, ACKs are not always delayed, and we cannot easily tell
which ACKs were delayed, and which were not. In fact, when data
is streaming at full speed ACKs are never delayed, we just send immedate
ACKs every second packet. When we stop sending the final ACK is likely
to be delayed. This means that the computed RTT is almost always an
underestimate of the real RTT + DATO, and we stand at risk of false
retransmit of the final packet in any group of continously transmitted
packets.

In the standard BSD implementation timers only have a 500ms
granularity, and the DATO is fixed to 200ms. Also RFC1122 mandates
that DATO never be more than 500ms. This means that the error is
at most 1 tick of the retransmission timer, and the retransmission
timer tends to be more than 1 tick overestimated in any case.
(See details in Van Jacobson's original paper on this stuff).

But, if you have an implementation that uses higher accuracy timers
we start to get into trouble since the DATO of the remote side
cannot be determined by the sender, and the RTT calculation now
has high enough accuracy that we will get a false retransmit
on the last packet in a train. 

>> If we allow DATO to vary,
>> e.g. as computed from packet interarrival times, then the issue
>> becomes even more cloudy. Ideally the DATO should be included in
>> the calculation of RTT, but it is not clear how to accomplish this,
>> since it is not easily determined which ACKs have been delayed and which
>> have not, and also it is not necessarily the case that the DATO will be
>> fixed.
>>
>I've been using:
>	uint32 dato = tcb->to.srtt / 2
>			+ tcb->to.mdev;
>	dato = max( dato, 500 );
>	dato = min( dato, 200 );

Shouldn't the max and min be exchanged in the above?

Anyway, I think we've been over this ground once before a few months back,
but I'll say it again.

If we are going to go with a calculated DATO I favor something like:

	dato = average packet interarrival time
	dato = min(dato, 500)

The average packet interarrival time should be calculated in a way
similar to the RTO, with the difference that extreme samples (say
larger than RTO) should be thrown away. (We only want to sample
time between packet arrivals in a continous stream. Note the time
for pauses in sending.)

Here's my reasoning. We delay ACK's in the hope of reducing the
total number of ACK's we send to every second packet, and in the hope
off piggybacking the ACK on a data packet going the other way.
However, we don't want to delay the ACK to much, since this messes up the
RTO calculations on the remote side. In a stream of continous packets
using the average packet interarrival time means we will normally
never have an delayed ACK timeout event. This is the smallest
estimator we can use that gives us this effect. The estimator
you suggest is substantually more conservative, especially when
the congestion window is large. This will probably peg the delayed
ACK to 500ms in long/slow haul cases.

A fixed DATO of x ms means that the remote side must add in a bias of x ms
to account for the DATO which it cannot measure. Generally because of
the wide spread use of systems with a fixed 200ms DATO this means that
RTO must never fall below 200ms, and perhaps should really be calculated
RTO + 200ms (This is unclear to me, Vern Paxon suggested the additive
calculation to me, and it makes sense, I'm just not sure it makes a
real difference). If DATO varies in some relationship to RTT, then
we need the RTO to be biased by that relationship. If we use your
suggested DATO calculation the bias is easily calculated.
I claim that the bais for my suggested DATO can similarly be calculated
from the interarrival time of the ACKs, or alternatively by multiplying
the RTO by (1+1/cwnd). In the Linux stack I take the second approach.

Of course, this whole discussion is meaningless if you aren't using
high resolution timers for your TCP.

A few further related issues.

-   If the average packet interarrival time goes above 500ms,
    then an interesting question arises: is there still a benifit
    to be had from delaying the ACK? It will cost us a timeout,
    and we will always take that hit, since the next packet will
    never come before that timeout. In a one way stream it would
    be better to ACK immediately, and avoid the 500ms extra bias
    being added to the RTO calculation on the remote side.
    On the other hand, if data is going two ways, then we really would
    like to wait a bit for a data packet to piggyback the ACK on.
    If we wanted to we could also start measuring the average time
    until the system makes such a packet available, and when that
    exceeds 500ms, skip the delayed ACK all together.

-   The RFC's are not terribly clear on exactly how the RTO should
    be used. There are two interpretations, which differ
    quite substantually in their effect:

    (1) When a packet is sent at time T, we expect a reply for that packet
	by time T + RTO.

    (2) When a packet is sent at time T, we expect a reply for the earliest
	unacknowledge packet we have sent at time T + RTO.

    BSD derived stacks take interperation (2). However, just reading the
    RFC's interpetation (1) seems the most natural, and is quite simple
    to implement if the outgoing data is pre-packetized. In BSD style
    implementations, where the packets are constructed from the send queue
    on the fly every time a packet is needed, interpretation (1) would
    be fairly hard to implement, but (2) is a natural.
    The difference between these two interperations is 1/2RTT on a
    symmetric path. In situations where the timers are accurate enough
    for this to make a difference, interpretation (1) will kill fast
    retransmit, since a timeout will often occur before fast retransmit
    can take place.

-- 
Eric Schenk                               www: http://www.dna.lth.se/~erics
Dept. of Comp. Sci., Lund University          email: Eric.Schenk@dna.lth.se
Box 118, S-221 00 LUND, Sweden   fax: +46-46 13 10 21  ph: +46-46 222 96 38



From owner-tcp-impl@relay.engr.sgi.com  Wed Jul  2 13:51:38 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA07147 for tcp-impl-list; Wed, 2 Jul 1997 13:48:13 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA07133 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 13:48:11 -0700
Received: from frantic.BSDI.COM (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA04920
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 13:47:51 -0700
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id PAA19105;
	Wed, 2 Jul 1997 15:47:06 -0500 (CDT)
Date: Wed, 2 Jul 1997 15:47:06 -0500 (CDT)
From: David Borman <dab@BSDI.COM>
Message-Id: <199707022047.PAA19105@frantic.BSDI.COM>
To: luigi@labinfo.iet.unipi.it
Subject: Re: TCP research issues from a tcp-impl perspective
Cc: tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From luigi@labinfo.iet.unipi.it Wed Jul  2 04:00:03 1997
> From: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
> Subject: Re: TCP research issues from a tcp-impl perspective
> To: dab@BSDI.COM (David Borman)
> Date: Wed, 2 Jul 1997 09:58:05 +0200 (MET DST)
> Cc: Eric.Schenk@dna.lth.se, wsimpson@greendragon.com,
>         tcp-impl@cthulhu.engr.sgi.com
> X-Mailer: ELM [version 2.4 PL23]
> ...
> I think that the purpose of filtering RTT should be clarified.
> Should SRTT track averages, maximums, remove exceedingly large or
> exceedingly small samples, or what; and how many RTTs it takes for
> a sample to be (practically) forgotten.  Probably a recommendation
> should be made to use high resolution timings when possible.
> 
> In order to correct the computation when more samples per RTT are
> available, you also need to know (dynamically) how many samples
> you have per RTT, but this is a bit tricky to implement, and probably
> not worth the effort, since there might be such a big variance on
> RTT estimates on top of what the real network RTT is (because of
> low resolution timers, delayed acks, or acks triggered by user
> processes reading data, etc.) that one might also wonder if linear
> filtering is appropriate at all for the purpose (e.g. one might
> want to compute
> 
> 	NEWRTT = MAX(alpha*OLDRTT + (1-alpha)*SAMPLE , SAMPLE )
> 
> to track large samples more effectively, in order to have more
> conservative timeouts...

Dynamically tracking and changing ALPHA on the fly is not
that difficult, especially if you don't worry about being
exact, but rather better than the current code.  The whole
goal of the SRTT is to be able to set the retransmit timer
so that it doesn't go off early and cause unneccessary
retransmissions, but also to try and keep it from waiting
longer than necessary before retransmitting.

The simple code is to count the # of calculated RTT values
during one RTT, and use that result to adjust the ALPPH
weighting for the next RTT values.

		-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Wed Jul  2 14:06:50 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA11052 for tcp-impl-list; Wed, 2 Jul 1997 14:04:21 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA11037 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 14:04:17 -0700
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA09580
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 14:04:16 -0700
	env-from (craig@aland.bbn.com)
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id OAA21713; Wed, 2 Jul 1997 14:03:43 -0700 (PDT)
Message-Id: <199707022103.OAA21713@aland.bbn.com>
To: David Borman <dab@BSDI.COM>
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP research issues from a tcp-impl perspective 
In-reply-to: Your message of Wed, 02 Jul 97 15:47:06 -0500.
             <199707022047.PAA19105@frantic.BSDI.COM> 
Date: Wed, 02 Jul 97 14:03:42 -0700
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


    Dynamically tracking and changing ALPHA on the fly is not
    that difficult

Pardon me -- I've been half watching this discussion and don't recall
seeing an answer to this question.

    Why do we need to weight the samples at all?  

If we get 20 samples in this RTT, vs 10 in the last, it isn't immediately
clear to me that we should weight the 20 differently from the 10 -- they are,
after all the 20 most recent samples and presumably correctly reflect the most
recent state of the network.

Where I might imagine, maybe, fiddling is in the variance estimator, but
as I recall, it already is known to be very sensitive to extreme values
and thus works OK.

Could someone explain what I've missed?

Thanks!

Craig

From owner-tcp-impl@relay.engr.sgi.com  Wed Jul  2 14:26:38 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA14920 for tcp-impl-list; Wed, 2 Jul 1997 14:23:16 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA14915 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 14:23:14 -0700
Received: from frantic.BSDI.COM (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA17647
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 14:23:09 -0700
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id QAA19191;
	Wed, 2 Jul 1997 16:22:53 -0500 (CDT)
Date: Wed, 2 Jul 1997 16:22:53 -0500 (CDT)
From: David Borman <dab@BSDI.COM>
Message-Id: <199707022122.QAA19191@frantic.BSDI.COM>
To: craig@aland.bbn.com, dab@BSDI.COM
Subject: Re: TCP research issues from a tcp-impl perspective
Cc: tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From craig@aland.bbn.com Wed Jul  2 16:04:20 1997
> To: David Borman <dab@BSDI.COM>
> cc: tcp-impl@cthulhu.engr.sgi.com
> Subject: Re: TCP research issues from a tcp-impl perspective 
> Date: Wed, 02 Jul 97 14:03:42 -0700
> From: Craig Partridge <craig@aland.bbn.com>
> 
> 
>     Dynamically tracking and changing ALPHA on the fly is not
>     that difficult
> 
> Pardon me -- I've been half watching this discussion and don't recall
> seeing an answer to this question.

I didn't want to get into details, because I haven't coded it up
so I can't say whether or not it works for sure, but my plan is to
use 4 weighting factors: 7/8, 15/16, 31/32 and 63/64.  When there
was one RTT value in the previous RTT, set the weighting to 7/8.
If there were 2-3 samples, use 15/16, 4-7, use 31/32, and 8 or more,
use 63/64.

Rather than having a static ALPHA value, I'd make it a variable
in the TCP control block.  I'd also change the stored value to
have 6 fixed decimal points (for 63/64).  I've already looked
at the code, and it seems straight forward enough.  BTW, sticking
with the 4 weightings keeps the fixed decimal point math easier.

To count them I'd just use the old BSD RTT estimator code.  Reset
the counter when I start timing a sequence number, and when the
ack for that number comes back, see how many timestamps I've
gotten in the mean time.

>     Why do we need to weight the samples at all?  
> 
> If we get 20 samples in this RTT, vs 10 in the last, it isn't immediately
> clear to me that we should weight the 20 differently from the 10 -- they are,
> after all the 20 most recent samples and presumably correctly reflect the most
> recent state of the network.

Since the RTT is being used to set the SRTT, and the SRTT is being
used to set the retransmit interval, wich is a multiple of the SRTT,
I want the SRTT to be smoothed over the last several RTTs, not just
the last several packets.  Thus, when I'm getting more values per
RTT, each value needs to be weighted less, so that the SRTT is over
the last several RTTs.

> Where I might imagine, maybe, fiddling is in the variance estimator, but
> as I recall, it already is known to be very sensitive to extreme values
> and thus works OK.
> 
> Could someone explain what I've missed?

If you've got 50 packets in flight per RTT, do you really want
your SRTT estimate based mostly on just the last 8 or so packets?
If the RTT was varying by any amount, you probably wouldn't get
good SRTT/variance calculations.
> 
> Thanks!
> 
> Craig

Hopefully that explains better my thinking.
		-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Wed Jul  2 14:44:11 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA19164 for tcp-impl-list; Wed, 2 Jul 1997 14:41:46 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA19150 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 14:41:44 -0700
Received: from labinfo.iet.unipi.it (labinfo.iet.unipi.it [131.114.9.5]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA22836
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 14:41:41 -0700
	env-from (luigi@labinfo.iet.unipi.it)
Received: from localhost (luigi@localhost) by labinfo.iet.unipi.it (8.6.5/8.6.5) id WAA15367; Wed, 2 Jul 1997 22:40:28 +0200
From: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Message-Id: <199707022040.WAA15367@labinfo.iet.unipi.it>
Subject: Re: TCP research issues from a tcp-impl perspective
To: craig@aland.bbn.com (Craig Partridge)
Date: Wed, 2 Jul 1997 22:40:27 +0200 (MET DST)
Cc: dab@BSDI.COM, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199707022103.OAA21713@aland.bbn.com> from "Craig Partridge" at Jul 2, 97 02:03:23 pm
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 1012      
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Pardon me -- I've been half watching this discussion and don't recall
> seeing an answer to this question.
> 
>     Why do we need to weight the samples at all?  
> 
> If we get 20 samples in this RTT, vs 10 in the last, it isn't immediately
> clear to me that we should weight the 20 differently from the 10 -- they are,
> after all the 20 most recent samples and presumably correctly reflect the most
> recent state of the network.

But they might also be very closely correlated, as opposed to samples
which are 1 RTT away from each other, and they could cause old values
to be forgotten too early.

	Cheers
	Luigi
-----------------------------+--------------------------------------
Luigi Rizzo                  |  Dip. di Ingegneria dell'Informazione
email: luigi@iet.unipi.it    |  Universita' di Pisa
tel: +39-50-568533           |  via Diotisalvi 2, 56126 PISA (Italy)
fax: +39-50-568522           |  http://www.iet.unipi.it/~luigi/
_____________________________|______________________________________

From owner-tcp-impl@relay.engr.sgi.com  Wed Jul  2 15:11:13 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA25368 for tcp-impl-list; Wed, 2 Jul 1997 15:09:07 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA25342 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 15:09:05 -0700
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA29612
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 15:09:04 -0700
	env-from (craig@aland.bbn.com)
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id PAA21951; Wed, 2 Jul 1997 15:05:01 -0700 (PDT)
Message-Id: <199707022205.PAA21951@aland.bbn.com>
To: David Borman <dab@BSDI.COM>
cc: tcp-impl@cthulhu.engr.sgi.com, craig@aland.bbn.com
Subject: Re: TCP research issues from a tcp-impl perspective 
In-reply-to: Your message of Wed, 02 Jul 97 16:22:53 -0500.
             <199707022122.QAA19191@frantic.BSDI.COM> 
Date: Wed, 02 Jul 97 15:05:01 -0700
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


    Since the RTT is being used to set the SRTT, and the SRTT is being
    used to set the retransmit interval, wich is a multiple of the SRTT,
    I want the SRTT to be smoothed over the last several RTTs, not just
    the last several packets.

And I say "why?"

If the RTT measured in the last round-trip time is 5, what's wrong with
RTT going to 5, provided RTTVAR is still OK?

Think of RTT as your base round trip time (without queueing effects) and
RTTVAR as reflecting queueing effects.

Then it should be obvious that since no sample can be less than the
queueing free round-trip time, all valid samples (i.e. not tainted by
retranmission) should be equally useful in setting RTT.  No scaling is
required.

The only issue is whether RTTVAR correctly tracks variations in queueing
in the path.  And that's determined by *g*, not *alpha*, in the Jacobson
model.

I've done some experiments feeding round-trip time patterns through the
Jacobson algorithm and while *g* isn't perfect, it does pretty well adjusting
to a run of 50 packets and then a spike.

Craig

From owner-tcp-impl@relay.engr.sgi.com  Wed Jul  2 15:59:11 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA03206 for tcp-impl-list; Wed, 2 Jul 1997 15:57:07 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA03186 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 15:57:04 -0700
Received: from rekk.dna.lth.se (rekk.dna.lth.se [130.235.16.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA14028
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 15:57:00 -0700
	env-from (erics@rekk.dna.lth.se)
Received: from rekk.dna.lth.se (localhost [127.0.0.1])
	by rekk.dna.lth.se (8.8.5/8.8.5) with ESMTP id AAA11137;
	Thu, 3 Jul 1997 00:53:17 +0200
Message-Id: <199707022253.AAA11137@rekk.dna.lth.se>
To: Craig Partridge <craig@aland.bbn.com>
cc: Eric.Schenk@dna.lth.se, David Borman <dab@BSDI.COM>,
        tcp-impl@cthulhu.engr.sgi.com
From: Eric.Schenk@dna.lth.se
Subject: Re: TCP research issues from a tcp-impl perspective 
In-reply-to: Your message of "Wed, 02 Jul 1997 15:05:01 PDT."
             <199707022205.PAA21951@aland.bbn.com> 
Date: Thu, 03 Jul 1997 00:53:16 +0200
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Craig Partridge <craig@aland.bbn.com> writes:
>If the RTT measured in the last round-trip time is 5, what's wrong with
>RTT going to 5, provided RTTVAR is still OK?
>
>Think of RTT as your base round trip time (without queueing effects) and
>RTTVAR as reflecting queueing effects.
>
>Then it should be obvious that since no sample can be less than the
>queueing free round-trip time, all valid samples (i.e. not tainted by
>retranmission) should be equally useful in setting RTT.  No scaling is
>required.

Up to here I'm moderately convinced, but I'd want to read the
math in Van's paper again to see the reasoning behind the orignal
choice of alpha.

>The only issue is whether RTTVAR correctly tracks variations in queueing
>in the path.  And that's determined by *g*, not *alpha*, in the Jacobson
>model.

This is where I think the real concern is. If we are tracking
RTTVAR within a window for every packet in a 50 packet window,
then it will decay away to 0 rather rapidly. If we get a spike
every few RTT the model will have lost all knowledge of the spike
just before it occurs.

-- 
Eric Schenk                               www: http://www.dna.lth.se/~erics
Dept. of Comp. Sci., Lund University          email: Eric.Schenk@dna.lth.se
Box 118, S-221 00 LUND, Sweden   fax: +46-46 13 10 21  ph: +46-46 222 96 38

From owner-tcp-impl@relay.engr.sgi.com  Wed Jul  2 20:14:08 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA11078 for tcp-impl-list; Wed, 2 Jul 1997 20:12:19 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id UAA11072 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 20:12:16 -0700
Received: from merit.edu (merit.edu [198.108.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id UAA08521
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 20:12:14 -0700
	env-from (wsimpson@greendragon.com)
Received: from Bill.Simpson.DialUp.Mich.Net (pm012-26.dialip.mich.net [141.211.7.194])
	by merit.edu (8.8.5/8.8.5) with SMTP id XAA06314
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 2 Jul 1997 23:12:12 -0400 (EDT)
Date: Thu, 3 Jul 97 01:03:02 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <6181.wsimpson@greendragon.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP research issues from a tcp-impl perspective
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Eric.Schenk@dna.lth.se
> >I've been using:
> >	uint32 dato = tcb->to.srtt / 2
> >			+ tcb->to.mdev;
> >	dato = max( dato, 500 );
> >	dato = min( dato, 200 );
>
> Shouldn't the max and min be exchanged in the above?
>
Oops, my mistake in retyping the #defines.  Swap that 200 and 500 (or
max and min).


> Anyway, I think we've been over this ground once before a few months back,
> but I'll say it again.
>
> If we are going to go with a calculated DATO I favor something like:
>
> 	dato = average packet interarrival time
> 	dato = min(dato, 500)
>
> The average packet interarrival time should be calculated in a way
> similar to the RTO, with the difference that extreme samples (say
> larger than RTO) should be thrown away. (We only want to sample
> time between packet arrivals in a continous stream. Note the time
> for pauses in sending.)
>
Hmmm, I'm not sure that the sample time for a continuous stream is what
we want at all.  Those are the times when the DATO _doesn't_ fire.
Packets arrive in trains.  The times we want are the inter-packet-train
intervals.  Otherwise, DATO will fire too soon.


> Here's my reasoning. We delay ACK's in the hope of reducing the
> total number of ACK's we send to every second packet, and in the hope
> off piggybacking the ACK on a data packet going the other way.

Yes.

> However, we don't want to delay the ACK to much, since this messes up the
> RTO calculations on the remote side. In a stream of continous packets
> using the average packet interarrival time means we will normally
> never have an delayed ACK timeout event.

It seems to me that using packet interarrival means you will fire DATO
at the end of _every_ packet train.

Using packet train interval instead, the DATO event will only _rarely_
occur, at actual pauses in the traffic.

> This is the smallest
> estimator we can use that gives us this effect. The estimator
> you suggest is substantually more conservative, especially when
> the congestion window is large. This will probably peg the delayed
> ACK to 500ms in long/slow haul cases.
>
Yes, much more conservative.  I'm looking for the largest reasonable
estimator, not the smallest.  My idea is that DATO will never fire
except in already slow cases (when I argue, it doesn't matter as much).

Anyway, interesting points.  I think we've given the IRTF end2end their
money's worth on this topic.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Thu Jul 17 11:30:38 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA26227 for tcp-impl-list; Thu, 17 Jul 1997 11:28:41 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA26215 for <tcp-impl@engr.sgi.com>; Thu, 17 Jul 1997 11:28:39 -0700
Received: from ietf.org (ietf.org [132.151.1.19]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id LAA20279
	for <tcp-impl@engr.sgi.com>; Thu, 17 Jul 1997 11:28:38 -0700
	env-from (cclark@ietf.org)
Received: from ietf.ietf.org by ietf.org id aa12422; 17 Jul 97 13:23 EDT
Mime-Version: 1.0
Content-Type: Multipart/Mixed; Boundary="NextPart"
To: IETF-Announce@ietf.org
cc: tcp-impl@engr.sgi.com
From: Internet-Drafts@ietf.org
Reply-to: Internet-Drafts@ietf.org
Subject: I-D ACTION:draft-ietf-tcpimpl-tools-00.txt
Date: Thu, 17 Jul 1997 13:23:41 -0400
Message-ID:  <9707171323.aa12422@ietf.org>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

--NextPart

 A New Internet-Draft is available from the on-line Internet-Drafts 
 directories. This draft is a work item of the TCP Implementation Working 
 Group of the IETF.                                                        

       Title     : Some Testing Tools for TCP Implementors                 
       Author(s) : S. Parker, C. Schmechel
       Filename  : draft-ietf-tcpimpl-tools-00.txt
       Pages     : 12
       Date      : 07/16/1997

Available tools for testing TCP implementations are catalogued by this 
memo.  Hopefully disseminating this information will encourage those 
responsible for building and maintaing TCP to make the best use of 
available tests.  The type of testing the tool provides, the type of tests 
it is capable of doing, and its availability is enumerated.                

Internet-Drafts are available by anonymous FTP.  Login with the username
"anonymous" and a password of your e-mail address.  After logging in,
type "cd internet-drafts" and then
     "get draft-ietf-tcpimpl-tools-00.txt".
A URL for the Internet-Draft is:
ftp://ds.internic.net/internet-drafts/draft-ietf-tcpimpl-tools-00.txt
 
Internet-Drafts directories are located at:	
	                                                
     o  Africa:  ftp.is.co.za                    
	                                                
     o  Europe:  ftp.nordu.net            	
                 ftp.nis.garr.it                 
	                                                
     o  Pacific Rim: munnari.oz.au               
	                                                
     o  US East Coast: ds.internic.net           
	                                                
     o  US West Coast: ftp.isi.edu               
	                                                
Internet-Drafts are also available by mail.	
	                                                
Send a message to:  mailserv@ds.internic.net. In the body type: 
     "FILE /internet-drafts/draft-ietf-tcpimpl-tools-00.txt".
							
NOTE: The mail server at ds.internic.net can return the document in
      MIME-encoded form by using the "mpack" utility.  To use this
      feature, insert the command "ENCODING mime" before the "FILE"
      command.  To decode the response(s), you will need "munpack" or
      a MIME-compliant mail reader.  Different MIME-compliant mail readers
      exhibit different behavior, especially when dealing with
      "multipart" MIME messages (i.e., documents which have been split
      up into multiple messages), so check your local documentation on
      how to manipulate these messages.
							
							

Below is the data which will enable a MIME compliant mail reader 
implementation to automatically retrieve the ASCII version
of the Internet-Draft.

--NextPart
Content-Type: Multipart/Alternative; Boundary="OtherAccess"

--OtherAccess
Content-Type:  Message/External-body;
        access-type="mail-server";
        server="mailserv@ds.internic.net"

Content-Type: text/plain
Content-ID: <19970716104402.I-D@ietf.org>

ENCODING mime
FILE /internet-drafts/draft-ietf-tcpimpl-tools-00.txt

--OtherAccess
Content-Type:   Message/External-body;
        name="draft-ietf-tcpimpl-tools-00.txt";
        site="ds.internic.net";
        access-type="anon-ftp";
        directory="internet-drafts"

Content-Type: text/plain
Content-ID: <19970716104402.I-D@ietf.org>

--OtherAccess--

--NextPart--

From owner-tcp-impl@relay.engr.sgi.com  Mon Jul 21 13:25:54 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA09194 for tcp-impl-list; Mon, 21 Jul 1997 13:22:41 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA09161 for <tcp-impl@engr.sgi.com>; Mon, 21 Jul 1997 13:22:38 -0700
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA27354
	for <tcp-impl@engr.sgi.com>; Mon, 21 Jul 1997 13:22:37 -0700
	env-from (Chris.Schmechel@Eng.Sun.COM)
Received: from Eng.Sun.COM ([129.146.1.25]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id NAA20690 for <tcp-impl@engr.sgi.com>; Mon, 21 Jul 1997 13:52:02 -0700
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id NAA20304; Mon, 21 Jul 1997 13:22:07 -0700
Received: from mont-blanc by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id NAA08383; Mon, 21 Jul 1997 13:21:46 -0700
Message-Id: <199707212021.NAA08383@jurassic.eng.sun.com>
Date: Mon, 21 Jul 1997 13:21:29 -0700 (PDT)
From: Chris Schmechel <Chris.Schmechel@Eng.Sun.COM>
Reply-To: Chris Schmechel <Chris.Schmechel@Eng.Sun.COM>
Subject: Reminder of New Internet Draft TCP-IMPL
To: tcp-impl@engr.sgi.com
Cc: cschmec@Eng.Sun.COM
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
Content-MD5: 2JbG1HPnH23gkYAcIgUx6w==
X-Mailer: dtmail 1.2.0 CDE Version 1.2 SunOS 5.6 sun4u sparc 
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Hi -

Just a reminder since Munich is coming up quickly, that
ftp://ds.internic.net/internet-drafts/draft-ietf-tcpimpl-tools-00.txt
is available.

Comments and/or suggestions are helpful.  Please make them to the list.

Thanks,

-Chris Schmechel
 <Chris.Schmechel@Eng.Sun.COM>
 


From owner-tcp-impl@relay.engr.sgi.com  Sat Jul 26 22:17:09 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA27255 for tcp-impl-list; Sat, 26 Jul 1997 22:12:14 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from refugee.engr.sgi.com (refugee.engr.sgi.com [150.166.61.22]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id WAA27251 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 26 Jul 1997 22:12:12 -0700
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (970321.SGI.8.8.5/970502.SGI.AUTOCF) via ESMTP id WAA03045 for <tcp-impl@engr.sgi.com>; Sat, 26 Jul 1997 22:12:12 -0700 (PDT)
Message-Id: <199707270512.WAA03045@refugee.engr.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: test (ignore)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <3039.869980331.1@refugee.engr.sgi.com>
Date: Sat, 26 Jul 1997 22:12:11 -0700
From: Steve Alexander <sca@refugee.engr.sgi.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Checking majordomo functionality.  Ignore this.

-- Steve

From owner-tcp-impl@relay.engr.sgi.com  Mon Jul 28 22:36:53 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA03087 for tcp-impl-list; Mon, 28 Jul 1997 20:47:20 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id UAA02794 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 28 Jul 1997 20:45:50 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id UAA17801
	for <tcp-impl@relay.engr.SGI.COM>; Mon, 28 Jul 1997 20:45:48 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id UAA03816; Mon, 28 Jul 1997 20:45:48 -0700 (PDT)
Message-Id: <199707290345.UAA03816@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: testing, please ignore
Date: Mon, 28 Jul 1997 20:45:48 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Just like it says ...

From owner-tcp-impl@relay.engr.sgi.com  Tue Jul 29 09:53:18 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA20380 for tcp-impl-list; Tue, 29 Jul 1997 09:43:33 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA20327 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 29 Jul 1997 09:43:18 -0700
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA11411
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 29 Jul 1997 09:43:16 -0700
	env-from (craig@aland.bbn.com)
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id JAA13298; Tue, 29 Jul 1997 09:42:54 -0700 (PDT)
Message-Id: <199707291642.JAA13298@aland.bbn.com>
To: tcp-impl@cthulhu.engr.sgi.com
cc: van@ee.lbl.gov
Subject: RTT estimation - a retraction
Reply-To: Craig Partridge <craig@aland.bbn.com>
From: Craig Partridge <craig@aland.bbn.com>
Date: Tue, 29 Jul 97 09:42:53 -0700
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Hi folks:

Shortly before I went on vacation mid-month, I got to talk with Van Jacobson
about the RTT estimation issues.  And Van convinced me that I was wrong and that
Dave Borman was right -- so I want to circulate a retraction.

Unfortunately, since I went on vacation, I don't remember all the details
of Van's arguments.  So here are the highlights as I remember them and
Van can correct me.

The basic issues were (a) taking multiple RTTs per window (esp. taking one
RTT per segment) and (b) using higher precision timers (esp. when using
timestamps).  I'd argued this wasn't a big deal and didn't require changes.

Regarding multiple RTTs, Van pointed out that if you take multiple RTTs,
then, because the RTT estimator gives a lot of credence to each new
sample (large alpha), your RTT estimate will end up tracing the results of
the most recent packets -- which means at the end of a burst, your RTT
estimate will largely reflect the size of the queue you built up in your
burst.  Exactly why this is evil is a detail that, unfortunately, I don't
recall.

Regarding higher precision -- Van said one nice feature of coarser timers
is that they acted as an initial filter -- taking out some of the noise in
the RTT samples before feeding the samples into the estimator.

Craig

E-mail: craig@aland.bbn.com or craig@bbn.com

From owner-tcp-impl@relay.engr.sgi.com  Tue Jul 29 13:09:33 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA01385 for tcp-impl-list; Tue, 29 Jul 1997 13:06:32 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA01308 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 29 Jul 1997 13:06:25 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA13076
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 29 Jul 1997 13:06:24 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id NAA05275; Tue, 29 Jul 1997 13:06:23 -0700 (PDT)
Message-Id: <199707292006.NAA05275@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: revised I-D on Known TCP Implementation Problems
Date: Tue, 29 Jul 1997 13:06:23 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

A revised version of the I-D on known TCP implementation problems is
now available from

	http://reality.sgi.com/sca/tcp-impl/prob-01.txt

I sent it to the tcp-impl list twice over the last week, but unfortunately
(unknown to us) the list has a filter limiting messages to 40 KB, so the
email got chucked both times.

I've already sent this along to the IETF editor since the deadline is
tomorrow.  It'll be one of the agenda items at Munich, so I'd like to
encourage discussion on the mailing list between now and then.

The changes are: (1) descriptions of keepalive problems, contributed by
Scott Dawson; (2) the "Significance" category now describes the environments
for which the problem is significant, rather than trying to assign a
Critical/Serious/Non-Critical ranking (which Joe Touch pointed out is
problematic because some problems are critical in some environments and
no big deal in others).

See (some of) you in Munich ...

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Jul 29 17:18:19 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA09195 for tcp-impl-list; Tue, 29 Jul 1997 10:50:12 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA08765 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 29 Jul 1997 10:48:41 -0700
Received: from swan.ml.org (eerandy.swan.ac.uk [137.44.4.77]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA04053
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 29 Jul 1997 10:48:35 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (surd [137.44.10.205]) by swan.ml.org (8.7.4/8.7.3) with SMTP id SAA26429; Tue, 29 Jul 1997 18:48:13 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0wtHLv-0005FiC; Tue, 29 Jul 97 19:50 BST
Message-Id: <m0wtHLv-0005FiC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: RTT estimation - a retraction
To: craig@aland.bbn.com
Date: Tue, 29 Jul 1997 19:50:34 +0100 (BST)
Cc: tcp-impl@cthulhu.engr.sgi.com, van@ee.lbl.gov
In-Reply-To: <199707291642.JAA13298@aland.bbn.com> from "Craig Partridge" at Jul 29, 97 09:42:53 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Regarding higher precision -- Van said one nice feature of coarser timers
> is that they acted as an initial filter -- taking out some of the noise in
> the RTT samples before feeding the samples into the estimator.

The Linux fine timers are effectively being range limited because of the
"noise" and implicit noise of weaker timers. Its also noticable that the BSD
timers do bad things, but often its the implementation not the randomness.

The issue is that the BSD[1] timers are not "randomly bad" but are spaced evenly
and synchronized between sockets. This gives a very very different behaviour.
Consider..

	[BSD type timers] -- 100baseT----router-->64K

If you dump the BSD derived box and have a lot of active connections you 
sometimes see patterns where a wave of retransmits is fired off over the
100baseT overruning the small buffers on the router and dropping a pile of
the packets. This causea another wave of retransmits.

Fortunately there seems to be enough randomness in the system that even when
trying to make it fall down and degenerate to failure it doesn't.  As networks
speed up the 'timer synchronization' is going to be visible at 100Hz which
Linux users as well as the 2HZ of the BSD slow timeout. So its a general
issue.

[1] BSD meaning BSD like timers. I doubt BSD is the only thing that does
its timers this way.



From owner-tcp-impl@relay.engr.sgi.com  Thu Jul 31 09:31:10 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA04130 for tcp-impl-list; Thu, 31 Jul 1997 09:27:59 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA04073 for <tcp-impl@engr.sgi.com>; Thu, 31 Jul 1997 09:27:52 -0700
Received: from ietf.org (ietf.org [132.151.1.19]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id JAA13306
	for <tcp-impl@engr.sgi.com>; Thu, 31 Jul 1997 09:27:50 -0700
	env-from (cclark@ietf.org)
Received: from ietf.ietf.org by ietf.org id aa09201; 31 Jul 97 9:57 EDT
Mime-Version: 1.0
Content-Type: Multipart/Mixed; Boundary='NextPart'
To: IETF-Announce@ietf.org
Cc: tcp-impl@engr.sgi.com
From: Internet-Drafts@ietf.org
Reply-to: Internet-Drafts@ietf.org
Subject: I-D ACTION: draft-ietf-tcpimpl-prob-01.txt
Date: Thu, 31 Jul 1997 09:57:13 -0400
Message-ID:  <9707310957.aa09201@ietf.org>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

--NextPart
		
A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the TCP Implementation Working Group of the IETF.

	Title		: Known TCP Implementation Problems
	Author(s)	: Sydney Dawson and Vern Paxson
	Filename	: draft-ietf-tcpimpl-prob-01.txt
	Pages		: 20
	Date		: 1997-07-30
	
This memo catalogs a number of known TCP implementation problems.  
The goal in doing so is to improve conditions in the existing Internet 
by enhancing the quality of current TCP/IP implementations.

Internet-Drafts are available by anonymous FTP.  Login wih the username
'anonymous' and a password of your e-mail address.  After logging in,
type 'cd internet-drafts' and then
	'get draft-ietf-tcpimpl-prob-01.txt'.
A URL for the Internet-Draft is:
ftp://ds.internic.net/internet-drafts/draft-ietf-tcpimpl-prob-01.txt

Internet-Drafts directories are located at:

	Africa:	ftp.is.co.za
	
	Europe: ftp.nordu.net
		ftp.nis.garr.it
			
	Pacific Rim: munnari.oz.au
	
	US East Coast: ds.internic.net
	
	US West Coast: ftp.isi.edu

Internet-Drafts are also available by mail.

Send a message to:	mailserv@ds.internic.net.  In the body type:
	'FILE /internet-drafts/draft-ietf-tcpimpl-prob-01.txt'.
	
NOTE:	The mail server at ds.internic.net can return the document in
	MIME-encoded form by using the 'mpack' utility.  To use this
	feature, insert the command 'ENCODING mime' before the 'FILE'
	command.  To decode the response(s), you will need 'munpack' or
	a MIME-compliant mail reader.  Different MIME-compliant mail readers
	exhibit different behavior, especially when dealing with
	'multipart' MIME messages (i.e. documents which have been split
	up into multiple messages), so check your local documentation on
	how to manipulate these messages.
		
		
Below is the data which will enable a MIME compliant mail reader
implementation to automatically retrieve the ASCII version of the
Internet-Draft.

--NextPart
Content-Type: Multipart/Alternative; Boundary='OtherAccess'

OtherAccess
Content-Type:  Message/External-body;
	access-type='mail-server';
	server='mailserv@ds.internic.net'
	
Content-Type: text/plain
Content-ID:	<19970730162839.I-D@ietf.org>

ENCODING mime
FILE /internet-drafts/draft-ietf-tcpimpl-prob-01.txt

--OtherAccess
Content-Type:	Message/External-body;
	name='draft-ietf-tcpimpl-prob-01.txt';
	site='ds.internic.net';
	access-type='anon-ftp';
	directory='internet-drafts'
	
Content-Type: text/plain
Content-ID:	<19970730162839.I-D@ietf.org>

--OtherAccess--

--NextPart--



From owner-tcp-impl@relay.engr.sgi.com  Thu Jul 31 16:38:11 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA29302 for tcp-impl-list; Thu, 31 Jul 1997 16:34:29 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA29001; Thu, 31 Jul 1997 16:33:38 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA03586; Thu, 31 Jul 1997 16:33:36 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id QAA10375; Thu, 31 Jul 1997 16:33:35 -0700 (PDT)
Message-Id: <199707312333.QAA10375@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: revising RFC 2001 within tcp-impl
Cc: Allyn.Romanow@Eng.Sun.COM, sca@refugee.engr.sgi.com, rstevens@kohala.com,
        floyd@ee.lbl.gov, mallman@lerc.nasa.gov, craig@aland.bbn.com
Date: Thu, 31 Jul 1997 16:33:35 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

RFC 2001 is a proposed standard that defines TCP congestion control, namely
the specifics of slow start, congestion avoidance, fast retransmit, and
fast recovery.  We (meaning Allyn, Steve & I) have been discussing the need
to revise it and have agreed that (1) it needs revision, and (2) tcp-impl
is an appropriate forum for developing the revision.  Rich Stevens (author
of 2001) has agreed to do the necessary editing to revise it, based on
input from tcp-impl.

The need for revision comes for two reasons.  The first is that, in its
present form, 2001 standardizes on exactly the Reno algorithms.  For some
aspects of congestion control, this is overly stringent.  For example,
there are a number of TCPs that during congestion avoidance count how much
data has been acked and increase cwnd by one segment after successfully
sending cwnd's worth of data.  This differs from the "cwnd += MSS*MSS/cwnd"
algorithm used by Reno, though the differences are minor.  We would not want
to consider these TCPs as failing to be standard-compliant.

The other reason for revision is that there appears to be widespread
support for increasing the initial value of cwnd beyond its present value
of a single segment, up to two segments (at least; see below).  This
improves performance, both by getting data sent more quickly, and by
avoiding a potentially lengthy delay if the receiver only acks the first
segment using a delayed ack.

Two pragmatic arguments that it's okay to start with two segments are:

	(1) TCPs are constantly sending two back-to-back packets anyway -
	    whenever ack-every-other is used, and always during slow-start.
	    So the net already puts up with pretty much exactly this load
	    anyway.

	(2) Plenty of TCPs already start in some circumstances with
	    an initial window of two segments, due to bugs or design.

So starting with two rather than one is only a lose when the net is so
loaded that it can only sustain a single packet in flight per connection;
and you quickly get to that state anyway, when the second packet is lost,
just as you would after one RTT if you started cwnd at one segment,

Note that the proposed change only affects the initial slow-start, not
slow-start after loss.

Sally Floyd, Mark Allman and Craig Partridge have an I-D that discusses
increasing the initial cwnd to two segments or possibly more (which is
potentially more controversial).  I haven't seen an official announcement
of it yet, but it's unofficially available from:

http://gigahertz.lerc.nasa.gov/~mallman/papers/draft-floyd-incr-init-win-00.txt

Our plan is to begin work on revising 2001 in tcp-impl in parallel with the
development of their I-D.  For now, we will assume that the initial cwnd
change will be to start with two segments.  If down the line the I-D
successfully establishes a different initial value, it is straight-forward
to amend the 2001 revision accordingly.

This will be an agenda topic at Munich.  I'm hoping that we can have some
fruitful discussion of it beforehand on the mailing list.

A final comment: a key point is that our primary goal is to clarify RFC 2001,
not to overhaul it.  For example, I'm predisposed against revising it so that
during slow-start cwnd is opened by the amount of data acked, rather than
one segment per ack (as we discussed earlier on the list), as this is a change
that will significantly alter TCP burstiness compared to how it works in
the net today.  I would also have considered increasing the initial cwnd
out of scope, were it not that there appears widespread consensus that going
to two segments is okay, and an I-D in the works thoroughly exploring the
issues.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Thu Jul 31 17:02:41 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA05549 for tcp-impl-list; Thu, 31 Jul 1997 16:59:41 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA05392; Thu, 31 Jul 1997 16:59:04 -0700
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA13033; Thu, 31 Jul 1997 16:59:03 -0700
	env-from (craig@aland.bbn.com)
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id QAA18030; Thu, 31 Jul 1997 16:57:55 -0700 (PDT)
Message-Id: <199707312357.QAA18030@aland.bbn.com>
To: touch@ISI.EDU
cc: tcp-impl@cthulhu.engr.sgi.com, vern@ee.lbl.gov, Allyn.Romanow@eng.sun.com,
        sca@refugee.engr.sgi.com, rstevens@kohala.com, floyd@ee.lbl.gov,
        mallman@lerc.nasa.gov
Subject: Re: revising RFC 2001 within tcp-impl 
In-reply-to: Your message of Thu, 31 Jul 97 16:51:01 -0700.
             <199707312351.QAA09844@rum.isi.edu> 
Date: Thu, 31 Jul 97 16:57:54 -0700
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


    > Note that the proposed change only affects the initial slow-start, not
    > slow-start after loss.

    Seems like both are just as important, and there doesn't appear to
    be a reason for treating them differently, is there?

Huge difference.  There exist paths in the Internet where the delay*bw
product divided by the number of TCP connections is indeed < 4.  (The
major trans-Atlantic fiber was one such spot just a year ago).

You'd like to give TCP the maximum dynamic range to deal with these types
of situations.

Craig

From owner-tcp-impl@relay.engr.sgi.com  Thu Jul 31 17:34:46 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA12888 for tcp-impl-list; Thu, 31 Jul 1997 17:32:01 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA12616 for <tcp-impl@engr.sgi.com>; Thu, 31 Jul 1997 17:30:13 -0700
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA21319
	for <tcp-impl@engr.sgi.com>; Thu, 31 Jul 1997 17:30:06 -0700
	env-from (Jerry.Chu@Eng.Sun.COM)
Received: from Eng.Sun.COM ([129.146.1.13]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id RAA28490; Thu, 31 Jul 1997 17:30:06 -0700
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id RAA15123; Thu, 31 Jul 1997 17:30:03 -0700
Received: from taipei.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id RAA03550; Thu, 31 Jul 1997 17:30:05 -0700
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id RAA18162; Thu, 31 Jul 1997 17:27:02 -0700
Date: Thu, 31 Jul 1997 17:27:02 -0700
From: Jerry.Chu@Eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199708010027.RAA18162@taipei.eng.sun.com>
To: craig@aland.bbn.com, touch@isi.edu
Subject: Re: revising RFC 2001 within tcp-impl
Cc: tcp-impl@engr.sgi.com
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Agree. Changing the initial slow-start is only a one shot deal near
connection-establish time whereas changing the slow-start after loss will
consititue a *persistent* change if loss continues and i think it's
a lot more dangerous.

Jerry

> From owner-tcp-impl@cthulhu.engr.sgi.com  Thu Jul 31 17:04:35 1997
> To: touch@ISI.EDU
> cc: tcp-impl@cthulhu.engr.sgi.com, vern@ee.lbl.gov, Allyn.Romanow@Eng,
>         sca@refugee.engr.sgi.com, rstevens@kohala.com, floyd@ee.lbl.gov,
>         mallman@lerc.nasa.gov
> Subject: Re: revising RFC 2001 within tcp-impl 
> Date: Thu, 31 Jul 97 16:57:54 -0700
> From: Craig Partridge <craig@aland.bbn.com>
> 
> 
>     > Note that the proposed change only affects the initial slow-start, not
>     > slow-start after loss.
> 
>     Seems like both are just as important, and there doesn't appear to
>     be a reason for treating them differently, is there?
> 
> Huge difference.  There exist paths in the Internet where the delay*bw
> product divided by the number of TCP connections is indeed < 4.  (The
> major trans-Atlantic fiber was one such spot just a year ago).
> 
> You'd like to give TCP the maximum dynamic range to deal with these types
> of situations.
> 
> Craig
> 

From owner-tcp-impl@relay.engr.sgi.com  Thu Jul 31 17:54:02 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA17286 for tcp-impl-list; Thu, 31 Jul 1997 17:50:02 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA17092 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 17:49:18 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id QAA09950
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 16:51:45 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA10470>; Thu, 31 Jul 1997 16:51:34 -0700
Date: Thu, 31 Jul 1997 16:51:01 -0700
Posted-Date: Thu, 31 Jul 1997 16:51:01 -0700
Message-Id: <199707312351.QAA09844@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <QAA09844>; Thu, 31 Jul 1997 16:51:01 -0700
To: tcp-impl@cthulhu.engr.sgi.com, vern@ee.lbl.gov
Subject: Re: revising RFC 2001 within tcp-impl
Cc: Allyn.Romanow@eng.sun.com, sca@refugee.engr.sgi.com, rstevens@kohala.com,
        floyd@ee.lbl.gov, mallman@lerc.nasa.gov, craig@aland.bbn.com
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> To: tcp-impl@cthulhu.engr.sgi.com
> Subject: revising RFC 2001 within tcp-impl
> Date: Thu, 31 Jul 1997 16:33:35 PDT
> From: Vern Paxson <vern@ee.lbl.gov>
> 
> RFC 2001 is a proposed standard that defines TCP congestion control, namely
> the specifics of slow start, congestion avoidance, fast retransmit, and
...
> The other reason for revision is that there appears to be widespread
> support for increasing the initial value of cwnd beyond its present value
> of a single segment, up to two segments (at least; see below).  This
> improves performance, both by getting data sent more quickly, and by
> avoiding a potentially lengthy delay if the receiver only acks the first
> segment using a delayed ack.
...
> 	(2) Plenty of TCPs already start in some circumstances with
> 	    an initial window of two segments, due to bugs or design.

It may be important that the two design points be mutually exclusive,
e.g.,
	start with TWO packets and DO NOT increase the window for SYN-ACK

	start with ONE packet and DO increaase the window for SYN-ACK (the bug)

doing both results in starting with a window of 4, double the current 
value...

> Note that the proposed change only affects the initial slow-start, not
> slow-start after loss.

Seems like both are just as important, and there doesn't appear to
be a reason for treating them differently, is there?

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jul 31 18:12:07 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA22866 for tcp-impl-list; Thu, 31 Jul 1997 18:06:31 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA22580 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 18:05:17 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id SAA29827
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 18:05:13 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA14755>; Thu, 31 Jul 1997 18:05:04 -0700
Date: Thu, 31 Jul 1997 18:04:30 -0700
Posted-Date: Thu, 31 Jul 1997 18:04:30 -0700
Message-Id: <199708010104.SAA11638@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <SAA11638>; Thu, 31 Jul 1997 18:04:30 -0700
To: touch@ISI.EDU, craig@aland.bbn.com
Subject: Re: revising RFC 2001 within tcp-impl
Cc: tcp-impl@cthulhu.engr.sgi.com, vern@ee.lbl.gov, Allyn.Romanow@eng.sun.com,
        sca@refugee.engr.sgi.com, rstevens@kohala.com, floyd@ee.lbl.gov,
        mallman@lerc.nasa.gov
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From craig@aland.bbn.com Thu Jul 31 16:58:56 1997
> To: touch@ISI.EDU
> Cc: tcp-impl@cthulhu.engr.sgi.com, vern@ee.lbl.gov, Allyn.Romanow@eng.sun.com,
>         sca@refugee.engr.sgi.com, rstevens@kohala.com, floyd@ee.lbl.gov,
>         mallman@lerc.nasa.gov
> Subject: Re: revising RFC 2001 within tcp-impl 
> Date: Thu, 31 Jul 97 16:57:54 -0700
> From: Craig Partridge <craig@aland.bbn.com>
> 
> 
>     > Note that the proposed change only affects the initial slow-start, not
>     > slow-start after loss.
> 
>     Seems like both are just as important, and there doesn't appear to
>     be a reason for treating them differently, is there?
> 
> Huge difference.  There exist paths in the Internet where the delay*bw
> product divided by the number of TCP connections is indeed < 4.  (The
> major trans-Atlantic fiber was one such spot just a year ago).
> 
> You'd like to give TCP the maximum dynamic range to deal with these types
> of situations.

Sure - but consider that startups are becoming more significant
(e.g., for short-bursts of packets, which occur even for persistant HTTP,
when clients need only one object from a site)

One justification for using different parameters for startup and restart
is that restarts dominate. If that's the case, then using a small
value for startup isn't significant, because the connection has to
persist long enough for several restarts (otherwise restarts don't dominate).

The other is that you need a large startup value because the connections
are short enough that the initial slowstart is a significant issue.
In that case, restarts aren't an issue.

We don't yet know which case will dominate, so we have no real reason 
to treat either case as 'needing particular dynamic range' capabilities.

(e.g., I could argue that the startup needs the dynamic range too)

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jul 31 18:36:07 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA00855 for tcp-impl-list; Thu, 31 Jul 1997 18:31:46 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA00757 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 18:31:21 -0700
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id SAA05904
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 18:31:20 -0700
	env-from (craig@aland.bbn.com)
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id SAA18194; Thu, 31 Jul 1997 18:30:25 -0700 (PDT)
Message-Id: <199708010130.SAA18194@aland.bbn.com>
To: touch@ISI.EDU
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: revising RFC 2001 within tcp-impl 
In-reply-to: Your message of Thu, 31 Jul 97 18:04:30 -0700.
             <199708010104.SAA11638@rum.isi.edu> 
Date: Thu, 31 Jul 97 18:30:25 -0700
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Joe:
    
I'm afraid I don't quite follow your logic.  Let me try coming at
the problem another way.

First, let's clearly label the three slow start starting windows:

    there's the initial window (IW), used at connection start
    there's the restart window (RW), used after a connection is idle
    there's the loss window (LW), used after a loss

Now consider the two possible conditions:

    (1) IF the available path capacity per TCP connection is > 2
	
	THEN the IW, LW and RW can all be 2 since there's enough
	    capacity

    (2) IF the available path capacity per TCP connection is < 2

	THEN IW can be 2, since it will promptly suffer loss and learn
	    its window is too big

	    LW must be 1, since we need to be able to get our congestion
	    window that low

	    RW could possibly be 2, since on restart we look rather like
	    a new TCP connection, but if we allowed ourself to become
	    idle, we presumably aren't concerned about high performance
	    anyway, so how's about we are nice to the other connections
	    and set RW=1?

Further observe that when we restart the window, or do slow start after
a loss, we don't know if we are in condition (1) or (2).  When restarting,
more connections may have arrived while we were quiet, and loss presumably
indicated the arrival of new traffic (e.g., new connections).

So for LW and RW we must assume we might be in condition (2). So LW must
be 1, and RW ought to be 1.

Craig

PS: Vern's note suggests the new I-D says it is safe to set IW=2.  I'll
observe that the I-D was revised just before it went out to point out that
there are at least two situations that we don't understand well enough
to say that increasing the initial window is safe.  (A little more
simulation is needed).

From owner-tcp-impl@relay.engr.sgi.com  Thu Jul 31 20:59:51 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA17396 for tcp-impl-list; Thu, 31 Jul 1997 20:57:53 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id UAA17383 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 20:57:50 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id UAA10710
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 20:57:50 -0700
	env-from (touch@ISI.EDU)
Received: by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA18202>; Thu, 31 Jul 1997 20:57:47 -0700
Date: Thu, 31 Jul 1997 20:57:47 -0700
From: touch@ISI.EDU (Joe Touch)
Message-Id: <199708010357.AA18202@zephyr.isi.edu>
To: craig@aland.bbn.com
Subject: Re: revising RFC 2001 within tcp-impl
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Craig Partridge <craig@aland.bbn.com>
...
> First, let's clearly label the three slow start starting windows:
> 
>     there's the initial window (IW), used at connection start
>     there's the restart window (RW), used after a connection is idle
>     there's the loss window (LW), used after a loss

I had been assuming that RW and LW were the same; granted
that is not required.

> Further observe that when we restart the window, or do slow start after
> a loss, we don't know if we are in condition (1) or (2).  When restarting,
> more connections may have arrived while we were quiet, and loss presumably
> indicated the arrival of new traffic (e.g., new connections).
> 
> So for LW and RW we must assume we might be in condition (2). So LW must
> be 1, and RW ought to be 1.

That's where I was going.

I thought the case was being made that IW = RW = 4 (or pick a 'larger'
number than 2). LW and RW, as I think you're pointing out above, both
should be conservative.

Now consider two cases for IW, large or small:

more than 10-20 packets in the connection (from open to close,
approx.):
	
	the connection will take (best case, 'order' estimates):

		1 RTT to open
		log2(numpackets) RTTs to transmit the data (approx)
	
	If IW is 1, 2, or 4 we save 0, 1, or 2 RTTs out of a total
	of 5 or more, i.e., the connection saves 25% or less time

	the longer the connection, the less the win for a large IW
	
	(if the connection is BW bound, not latency bound,
	transmission overwhelms the RTT propagation latency,
	and the win is even less dramatic)

    **  IW can be small or large, it doesn't help or hurt much either way.
	
less than 10-20 packets in the connection:

	cost to transmit the packets is, as above

		2 - 6 RTTs regular IW

		2 - 4 RTTs with large IW

	i.e., a reduction of 30-75% of the time

   ** 	Large IW helps, but then only if it also dominates
	the network behavior, i.e., most of the time we're
	injecting bursts of 4 packets. If most of the connections
	are short, this means most of the time we're twice 
	as bursty as we are now. That would appear to be bad.
	(I don't believe current analysis considers this case,
	where most of the traffic is very short connections)

So either it doesn't help or it hurts.

In which case, we agree (don't know if it's for the same reason)
that IW and RW both should be small.

Joe


----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jul 31 22:51:12 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA13682 for tcp-impl-list; Thu, 31 Jul 1997 22:48:45 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id WAA13675 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 22:48:42 -0700
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id WAA01144
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 22:48:42 -0700
	env-from (kcpoon@jurassic.Eng.Sun.COM)
Received: from Eng.Sun.COM ([129.146.1.13]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id WAA20272 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 22:48:40 -0700
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id WAA23507; Thu, 31 Jul 1997 22:48:37 -0700
Received: from shield.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id WAA01408; Thu, 31 Jul 1997 22:48:39 -0700
Received: by shield.eng.sun.com (SMI-8.6/SMI-SVR4)
	id WAA11519; Thu, 31 Jul 1997 22:48:37 -0700
Date: Thu, 31 Jul 1997 22:48:37 -0700
From: kcpoon@jurassic.Eng.Sun.COM (Kacheong Poon)
Message-Id: <199708010548.WAA11519@shield.eng.sun.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: revising RFC 2001 within tcp-impl
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Included message from Craig Partridge <craig@aland.bbn.com>:

>----
>	    RW could possibly be 2, since on restart we look rather like
>	    a new TCP connection, but if we allowed ourself to become
>	    idle, we presumably aren't concerned about high performance
>	    anyway, so how's about we are nice to the other connections
>	    and set RW=1?
>----

If we are already not nice to the network, I don't know why we "ought" to
be nice in restart?  Say, I open a file for reading and that file is fetched
by NFS/TCP and a TCP connection is made to the server.  I read it and then I
want to access another file, since the NFS TCP connection is already
established, another transfer is made using the same connection.  Why should
I be "punished" the second time by TCP as it uses RW=1?  I don't think I'm
not concerned about high performance the second time.  This may be a trivial
example, but I guess there are applications out there which do similar
things.

Is there another stronger reason why RW should not be equal to IW?  If one
says RW=2 is bad, I will also say IW=2 is bad for the same reasons.  If one
says IW=2 is OK, I will say RW=2 is also OK.  In both cases, TCP presumably
knows nothing about the network, unless TCP is sharing info with other
active connections to the same network.  Why should we treat them
differently?  I think IW should be equal to RW and LW should be equal to 1.


							K. Poon.
							kcpoon@eng.sun.com


From owner-tcp-impl@relay.engr.sgi.com  Thu Jul 31 23:12:15 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA18048 for tcp-impl-list; Thu, 31 Jul 1997 23:10:11 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA18025 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 23:10:06 -0700
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id XAA04494
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 23:10:06 -0700
	env-from (kcpoon@jurassic.Eng.Sun.COM)
Received: from Eng.Sun.COM ([129.146.1.13]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id XAA21644 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 23:10:04 -0700
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id XAA25707; Thu, 31 Jul 1997 23:09:58 -0700
Received: from shield.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id XAA04587; Thu, 31 Jul 1997 23:10:00 -0700
Received: by shield.eng.sun.com (SMI-8.6/SMI-SVR4)
	id XAA11556; Thu, 31 Jul 1997 23:09:57 -0700
Date: Thu, 31 Jul 1997 23:09:57 -0700
From: kcpoon@jurassic.Eng.Sun.COM (Kacheong Poon)
Message-Id: <199708010609.XAA11556@shield.eng.sun.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: revising RFC 2001 within tcp-impl
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Included message from touch@ISI.EDU (Joe Touch):

>----
>more than 10-20 packets in the connection (from open to close,
>approx.):
>	
>	the connection will take (best case, 'order' estimates):
>
>		1 RTT to open
>		log2(numpackets) RTTs to transmit the data (approx)
>	
>	If IW is 1, 2, or 4 we save 0, 1, or 2 RTTs out of a total
>	of 5 or more, i.e., the connection saves 25% or less time

I guess it is a "theoretical" best case.  One of the reasons to have larger
than 1 initial cwnd is that many implementations, like BSD, delay acks.  So
the first ack will be delayed, presumably by 200ms, if cwnd is 1.  Say, in a
deparmental network with RTT ~10ms, the saving is not a mere 25% if there is
no ack delayed.  The saving due to no delayed acks is not significant if RTT
is in the order of 100ms, 

>In which case, we agree (don't know if it's for the same reason)
>that IW and RW both should be small.

It seems to me that it makes sense to have cwnd greater than 1 if TCP knows
"in advance" that RTT will be small.  If RTT "is" large, use IW=RW=1.  Or if
TCP does not delay ack initially and after a restart, then IW=RW=1 is good
enough.

>----


							K. Poon.
							kcpoon@eng.sun.com


From owner-tcp-impl@relay.engr.sgi.com  Thu Jul 31 23:14:47 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA28382 for tcp-impl-list; Thu, 31 Jul 1997 16:32:04 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA27720 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 16:29:56 -0700
Received: from merit.edu (merit.edu [198.108.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA02717
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 16:29:52 -0700
	env-from (wsimpson@greendragon.com)
Received: from Bill.Simpson.DialUp.Mich.Net (pm012-28.dialip.mich.net [141.211.7.196])
	by merit.edu (8.8.6/8.8.5) with SMTP id TAA21378
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 19:29:51 -0400 (EDT)
Date: Thu, 31 Jul 97 23:18:30 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <6378.wsimpson@greendragon.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: revised I-D on Known TCP Implementation Problems
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Vern Paxson <vern@ee.lbl.gov>
> I sent it to the tcp-impl list twice over the last week, but unfortunately
> (unknown to us) the list has a filter limiting messages to 40 KB, so the
> email got chucked both times.
>
Glad to see the list is being managed intelligently.  I'm so sick of
folks sending 130KB messages to lists instead of internet-drafts.

It really isn't a problem to send a new draft every few weeks, rather,
than only one per 4 months....  It's just a little -nn increment.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Thu Jul 31 23:14:54 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA17286 for tcp-impl-list; Thu, 31 Jul 1997 17:50:02 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA17092 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 17:49:18 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id QAA09950
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 16:51:45 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA10470>; Thu, 31 Jul 1997 16:51:34 -0700
Date: Thu, 31 Jul 1997 16:51:01 -0700
Posted-Date: Thu, 31 Jul 1997 16:51:01 -0700
Message-Id: <199707312351.QAA09844@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <QAA09844>; Thu, 31 Jul 1997 16:51:01 -0700
To: tcp-impl@cthulhu.engr.sgi.com, vern@ee.lbl.gov
Subject: Re: revising RFC 2001 within tcp-impl
Cc: Allyn.Romanow@eng.sun.com, sca@refugee.engr.sgi.com, rstevens@kohala.com,
        floyd@ee.lbl.gov, mallman@lerc.nasa.gov, craig@aland.bbn.com
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> To: tcp-impl@cthulhu.engr.sgi.com
> Subject: revising RFC 2001 within tcp-impl
> Date: Thu, 31 Jul 1997 16:33:35 PDT
> From: Vern Paxson <vern@ee.lbl.gov>
> 
> RFC 2001 is a proposed standard that defines TCP congestion control, namely
> the specifics of slow start, congestion avoidance, fast retransmit, and
...
> The other reason for revision is that there appears to be widespread
> support for increasing the initial value of cwnd beyond its present value
> of a single segment, up to two segments (at least; see below).  This
> improves performance, both by getting data sent more quickly, and by
> avoiding a potentially lengthy delay if the receiver only acks the first
> segment using a delayed ack.
...
> 	(2) Plenty of TCPs already start in some circumstances with
> 	    an initial window of two segments, due to bugs or design.

It may be important that the two design points be mutually exclusive,
e.g.,
	start with TWO packets and DO NOT increase the window for SYN-ACK

	start with ONE packet and DO increaase the window for SYN-ACK (the bug)

doing both results in starting with a window of 4, double the current 
value...

> Note that the proposed change only affects the initial slow-start, not
> slow-start after loss.

Seems like both are just as important, and there doesn't appear to
be a reason for treating them differently, is there?

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Jul 31 23:16:37 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA08019 for tcp-impl-list; Thu, 31 Jul 1997 15:21:24 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA07789 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 31 Jul 1997 15:20:19 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA11609
	for <tcp-impl@relay.engr.SGI.COM>; Thu, 31 Jul 1997 15:20:16 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id PAA10131; Thu, 31 Jul 1997 15:20:16 -0700 (PDT)
Message-Id: <199707312220.PAA10131@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: E2E Feedback on TCP research issues from a tcp-impl perspective
Date: Thu, 31 Jul 1997 15:20:15 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Here's a summary of the input I received on TCP research issues at the
meeting earlier this month of the IRTF's End-to-End Research Group.
The role of these comments is to help delineate the line between research
issues and implementation issues.  The answers are not definitive in
an IETF standardization sense, but carry significant weight with me
for discussions of which issues are in scope for tcp-impl and which
are out of scope because they are still active research areas.

		Vern


Q: Is the initial slow-start cwnd going to be increased, and if
   so, to what?
A: Consensus in E2E that it can be increased to at least two packets,
   as TCP's send two back-to-back packets all the time anyway.  It may
   be increased to more, per a pending I-D by Sally Floyd and Mark Allman
   [and now Craig Partridge; more about this in the next message I send
   to tcp-impl].

Q: How can we keep the send window from expanding excessively once the
   maximum throughput has been attained?
A: Deploy RED.

Q: SACK is a done deal, right?
A: Yes.

Q: How to compute RTO when using high-resolution timings?
Q: How to compute RTO when timing more than one packet per RTT
   (i.e., how to adjust the constants for the exponentially weighted
   moving average)?
A: These are subtle, but Van Jacobson has worked them out.  He will
   write something up and Vern Paxson will help get it turned into an RFC.

Q: How long a sending pause merits a new slow-start?
A: One RTO.  The thinking is that an RTO is an upper bound estimate
   on how long it takes to get feedback from the network.  Beyond
   this amount of time, you (1) have lost incoming acks for self-clocking,
   and (2) don't reliably know anything about the network's state, so you
   should be conservative and hunt for available bandwidth again.

Q: What about waiting for more than two packets before ack'ing on a LAN?
A: No.  If this really buys you performance, you should scrutinize your
   TCP implementation to find out why it's so expensive to process an ack.

Q: Sharing cwnd across connections?
A: A research issue.  The need for this is ameliorated with HTTP 1.1.

Q: Caching cwnd/ssthresh over time?
A: cwnd, no.  ssthresh, perhaps, with a small time constant; but a
   research issue.

Q: How to defend against SYN flooding attacks?
A: No near-term recommendation.  Longer term: ingress filtering.

Q: What about deploying Vegas?
Q: What about deploying Janey Hoe's changes?
A: These unfortunately slipped between the cracks during the discussion.
   To the extent that opinions were expressed, it is clear that significant
   elements of these are still viewed as research areas.

Q: Should below-sequence pure acks be acked (for keep-alives)?
A: E2E consensus is that keep-alives belong at the application layer;
   no advice on addressing this problem further in TCP.

Q: What about fast retransmit on fewer than 3 dups, if no more dups
   are coming?
A: If you use SACK, then you can reliably know when data has left
   the pipe.  On fewer than 3 dups, this lets you safely send new
   data.  Retransmitting on fewer than 3 is risky.

Q: What about the RST issues Ian Heavens has raised?
A: It would be nice in principle to fix these, as they reveal a flaw
   in the proof of TCP's correctness.  However, in practice the
   corruption scenarios appear unlikely, so it's not clear it's worth
   going through the major procedural headaches necessary to fix this.
   [Ian notes that these issues might also be profitably studied in
   the context of designing a TCP successor.]

Q: What about Joe Touch / Ted Faber's scheme for shifting TIME-WAIT
   into the client?
A: No, because this gives a disincentive for upgrading to HTTP 1.1.
   [Subsequently, Ted & Joe told me that they have measurements showing
   that it's a win with both HTTP 1.0 and 1.1, so this issue will
   no doubt be revisited.  Clearly, right now it's research.]

Q: Is it time to revisit constants like MSL and initial RTO?
A: No.  For MSL, it might be good to document ways of reducing dependence
   on it, such as by using PAWS.

Q: What about systematizing informational data sent in RSTs?
A: This is actually an old idea, going back to the mid 1980's.  It
   appears that the benefit is only marginal, so it's not clear it's
   worth the major procedural headaches necessary to standardize on it.

Q: How to fix the MSS*MSS/cwnd granularity problem (Rich Stevens
   noted that if cwnd > MSS*MSS, then due to integer arithmetic
   it'll never grow any larger)?
A: Add 1.

From owner-tcp-impl@relay.engr.sgi.com  Fri Aug  1 09:04:08 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA29471 for tcp-impl-list; Fri, 1 Aug 1997 09:01:55 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA29446 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 09:01:51 -0700
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA14883
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 09:01:51 -0700
	env-from (Erik.Nordmark@Eng.Sun.COM)
Received: from Eng.Sun.COM ([129.146.1.13]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id JAA22790; Fri, 1 Aug 1997 09:01:50 -0700
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id JAA13823; Fri, 1 Aug 1997 09:01:43 -0700
Received: from bobo.eng.sun.com by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id JAA12169; Fri, 1 Aug 1997 09:01:40 -0700
Received: by bobo.eng.sun.com (SMI-8.6/SMI-SVR4)
	id JAA20427; Fri, 1 Aug 1997 09:01:38 -0700
Date: Fri, 1 Aug 1997 09:01:38 -0700
From: Erik.Nordmark@Eng.Sun.COM (Erik Nordmark)
Message-Id: <199708011601.JAA20427@bobo.eng.sun.com>
To: touch@ISI.EDU
Subject: Re: revising RFC 2001 within tcp-impl
Cc: tcp-impl@cthulhu.engr.sgi.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> Now consider two cases for IW, large or small:
> 
> more than 10-20 packets in the connection (from open to close,
> approx.):
> 	
> 	the connection will take (best case, 'order' estimates):
> 
> 		1 RTT to open
> 		log2(numpackets) RTTs to transmit the data (approx)
> 	
> 	If IW is 1, 2, or 4 we save 0, 1, or 2 RTTs out of a total
> 	of 5 or more, i.e., the connection saves 25% or less time

There is one other factor that is missing from the above.
A large number of TCP implementation ack every other packet (as they
should) but do this even in the beginning of the TCP connection.
If the sender uses IW = 1 you will incur another delay ack timer
(200 ms) per when sending to such receivers.

The correct answer here might to fix the receive side of TCP to
not delay acks for the first data packet in the connection but
it seems like folks have been fixing the sending TCP to use IW = 2
instead.

> So either it doesn't help or it hurts.

But how much does it really hurt?
If the path can only handle 1 packet per RTT with IW = 1
the sender will do this:
	send 1 packet
	receive ack
	send 2 packet - one is dropped
	receive ack for one packet
	time out for lost packet - retransmit
	receive ack
	send 1 packet
	<repeat>

With IW = 2 the above pattern changes from 1,2,1,2,... to 2,1,2,1,...
For a long connection this doesn't really matter.
Thus I think this only gets worse when
 - the path can only handle 1 packet per RTT, AND
 - the amount of data sent is more than 1 packet but less than 3 packets

In the 2 packet case with IW = 1 there will be no loss but
with IW = 2 one of the packets will be dropped and retransmitted.

How important is this case compared to the benefits?

   Erik

From owner-tcp-impl@relay.engr.sgi.com  Fri Aug  1 10:12:55 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA16338 for tcp-impl-list; Fri, 1 Aug 1997 10:06:14 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA16190 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 10:05:44 -0700
Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.235]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA07135
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 10:05:27 -0700
	env-from (raj@hpisrdq.cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185])
	by palrel1.hp.com (8.8.5/8.8.5) with SMTP id KAA14958
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 10:05:26 -0700 (PDT)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA06653; Fri, 1 Aug 1997 10:04:06 -0700
Message-Id: <33E21705.4265@cup.hp.com>
Date: Fri, 01 Aug 1997 10:04:05 -0700
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: touch@ISI.EDU
Cc: tcp-impl@cthulhu.engr.sgi.com, vern@ee.lbl.gov, Allyn.Romanow@Eng.Sun.COM,
        sca@refugee.engr.sgi.com, rstevens@kohala.com, floyd@ee.lbl.gov,
        mallman@lerc.nasa.gov, craig@aland.bbn.com
Subject: Re: revising RFC 2001 within tcp-impl
References: <199707312351.QAA09844@rum.isi.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

touch@ISI.EDU wrote:
> It may be important that the two design points be mutually exclusive,
> e.g.,
>         start with TWO packets and DO NOT increase the window for SYN-ACK
> 
>         start with ONE packet and DO increaase the window for SYN-ACK (the bug)
> 
> doing both results in starting with a window of 4, double the current
> value...

Indeed. Instead of having to enumerate in a later draft, why not simply
state that after the three-way handshake is complete cwnd will be N and
leave it up to the implementation as to how it gets to N. (I am assuming
N >= 2)

I suspect keeping the wording to what the external behaviour should be
is prefered.

rick jones

From owner-tcp-impl@relay.engr.sgi.com  Fri Aug  1 10:19:42 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA19062 for tcp-impl-list; Fri, 1 Aug 1997 10:15:34 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA18850 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 10:14:58 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id KAA12415
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 10:14:38 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA07162>; Fri, 1 Aug 1997 10:14:31 -0700
Date: Fri, 1 Aug 1997 10:13:58 -0700
Posted-Date: Fri, 1 Aug 1997 10:13:58 -0700
Message-Id: <199708011713.KAA05341@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <KAA05341>; Fri, 1 Aug 1997 10:13:58 -0700
To: tcp-impl@cthulhu.engr.sgi.com, kcpoon@jurassic.eng.sun.com
Subject: Re: revising RFC 2001 within tcp-impl
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From owner-tcp-impl@cthulhu.engr.sgi.com Thu Jul 31 23:13:57 1997
> Date: Thu, 31 Jul 1997 23:09:57 -0700
> From: kcpoon@jurassic.Eng.Sun.COM (Kacheong Poon)
> To: tcp-impl@cthulhu.engr.sgi.com
> Subject: Re: revising RFC 2001 within tcp-impl
> 
> Included message from touch@ISI.EDU (Joe Touch):
> 
> >more than 10-20 packets in the connection (from open to close,
> >approx.):
> >	
> >	the connection will take (best case, 'order' estimates):
> >
> >		1 RTT to open
> >		log2(numpackets) RTTs to transmit the data (approx)
> >	
> >	If IW is 1, 2, or 4 we save 0, 1, or 2 RTTs out of a total
> >	of 5 or more, i.e., the connection saves 25% or less time
> 
> I guess it is a "theoretical" best case.  One of the reasons to have larger
> than 1 initial cwnd is that many implementations, like BSD, delay acks.  So
> the first ack will be delayed, presumably by 200ms, if cwnd is 1.  Say, in 

This presumes that delayed ACKs are good, which, for the very small
window case, may be debatable.

> It seems to me that it makes sense to have cwnd greater than 1 if TCP knows
> "in advance" that RTT will be small.  If RTT "is" large, use IW=RW=1.  Or if
> TCP does not delay ack initially and after a restart, then IW=RW=1 is good
> enough.

A small RTT does preclude either being useful, as you indicate.

However, RTT alone isn't sufficient to justify a large IW or RW.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Fri Aug  1 10:34:01 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA22229 for tcp-impl-list; Fri, 1 Aug 1997 10:29:27 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA22171 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 10:29:16 -0700
Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.235]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA19052
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 10:29:14 -0700
	env-from (raj@hpisrdq.cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185])
	by palrel1.hp.com (8.8.5/8.8.5) with SMTP id KAA21482
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 10:29:12 -0700 (PDT)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA06727; Fri, 1 Aug 1997 10:27:52 -0700
Message-Id: <33E21C97.7FA1@cup.hp.com>
Date: Fri, 01 Aug 1997 10:27:51 -0700
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: touch@ISI.EDU
Cc: raj@hpisrdq.cup.hp.com, tcp-impl@cthulhu.engr.sgi.com, vern@ee.lbl.gov,
        Allyn.Romanow@eng.sun.com, sca@refugee.engr.sgi.com,
        rstevens@kohala.com, floyd@ee.lbl.gov, mallman@lerc.nasa.gov,
        craig@aland.bbn.com
Subject: Re: revising RFC 2001 within tcp-impl
References: <199708011720.KAA05512@rum.isi.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

touch@ISI.EDU wrote:
> > Indeed. Instead of having to enumerate in a later draft, why not simply
> > state that after the three-way handshake is complete cwnd will be N and
> > leave it up to the implementation as to how it gets to N. (I am assuming
> > N >= 2)
> 
> Because the value of N is important and determines the group
> congestion properties of the Internet. It should not be
> 'left to the implementation'.

Oops. Mind going faster than fingers...

I should have said "I'm assuming that for the document N will be
selected as some value, at least 2." In other words, we would all pick
some value. I didn't mean to leave the selection of that value as an
implementation choice.

rick

From owner-tcp-impl@relay.engr.sgi.com  Fri Aug  1 11:46:10 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA11449 for tcp-impl-list; Fri, 1 Aug 1997 11:43:26 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA11416 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 11:43:22 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id KAA22710
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 10:38:07 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA09005>; Fri, 1 Aug 1997 10:37:54 -0700
Date: Fri, 1 Aug 1997 10:37:21 -0700
Posted-Date: Fri, 1 Aug 1997 10:37:21 -0700
Message-Id: <199708011737.KAA05917@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <KAA05917>; Fri, 1 Aug 1997 10:37:21 -0700
To: touch@ISI.EDU, raj@hpisrdq.cup.hp.com
Subject: Re: revising RFC 2001 within tcp-impl
Cc: tcp-impl@cthulhu.engr.sgi.com, vern@ee.lbl.gov, Allyn.Romanow@eng.sun.com,
        sca@refugee.engr.sgi.com, rstevens@kohala.com, floyd@ee.lbl.gov,
        mallman@lerc.nasa.gov, craig@aland.bbn.com
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Rick Jones <raj@hpisrdq.cup.hp.com>
> To: touch@ISI.EDU
> Subject: Re: revising RFC 2001 within tcp-impl
> 
> touch@ISI.EDU wrote:
> > > Indeed. Instead of having to enumerate in a later draft, why not simply
> > > state that after the three-way handshake is complete cwnd will be N and
> > > leave it up to the implementation as to how it gets to N. (I am assuming
> > > N >= 2)
> > 
> > Because the value of N is important and determines the group
> > congestion properties of the Internet. It should not be
> > 'left to the implementation'.
> 
> Oops. Mind going faster than fingers...
> 
> I should have said "I'm assuming that for the document N will be
> selected as some value, at least 2." In other words, we would all pick
> some value. I didn't mean to leave the selection of that value as an
> implementation choice.

As Craig pointed out, current implementations have three
values for N, for IW, LW, and RW. 

For most, LW = 1 as it should.

"Leaving it to the implementer", we have current versions that
end up with 
	IW = 2 (because the window opened due to the SYN-ACK)
	RW = 1 (explicitly)

Earlier e-mails (others and mine) make the case that IW=RW,
and that both either start at 1 or 2.

Perhaps we require the following, regardless of 'how the
implementer gets there' (as you say):

	IW = RW = n (for some specific n, as you say)

	LW = 1 

(this is a little more specific than just requiring IW = n, as
you proposed, which is where I was going...)

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Fri Aug  1 12:08:26 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA18436 for tcp-impl-list; Fri, 1 Aug 1997 12:03:18 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA18092 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 12:01:40 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id MAA25694
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 12:01:38 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA14516>; Fri, 1 Aug 1997 12:01:23 -0700
Date: Fri, 1 Aug 1997 12:00:50 -0700
Posted-Date: Fri, 1 Aug 1997 12:00:50 -0700
Message-Id: <199708011900.MAA08051@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <MAA08051>; Fri, 1 Aug 1997 12:00:50 -0700
To: tcp-impl@cthulhu.engr.sgi.com, vern@ee.lbl.gov
Subject: Re: revising RFC 2001 within tcp-impl
Cc: Allyn.Romanow@eng.sun.com, sca@refugee.engr.sgi.com, rstevens@kohala.com,
        floyd@ee.lbl.gov, mallman@lerc.nasa.gov, craig@aland.bbn.com
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> To: tcp-impl@cthulhu.engr.sgi.com
> Subject: revising RFC 2001 within tcp-impl
> From: Vern Paxson <vern@ee.lbl.gov>
...
> Two pragmatic arguments that it's okay to start with two segments are:
...
> 	(2) Plenty of TCPs already start in some circumstances with
> 	    an initial window of two segments, due to bugs or design.

FYI, John Heideman, who works with me, pointed out that in SunOS,
this occurs only for the side initiating the connection.

I.e., after the 2-way handshake:

	SYN-issuer (active open)
		IW = 1
		(SYN-ACK is NOT treated as a data ACK,
		and the window is not increased)

	SYN-receiver (passive open) 
		IW = 2
		ACK (3rd packet in 2-way, of SYN, SYN-ACK, ACK)
		is treated as if it were a data ACK, in a sense,
		and the window IS increased)

Thus the IW=2 case only occurs for the passive open side,
so matters where the passive side has the data, e.g., HTTP.

In the case where the active open side has the data, IW = 1,
e.g., FTP.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Fri Aug  1 12:49:31 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA29405 for tcp-impl-list; Fri, 1 Aug 1997 12:46:33 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA29191 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 12:46:01 -0700
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA09352
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 12:45:59 -0700
	env-from (kcpoon@jurassic.Eng.Sun.COM)
Received: from Eng.Sun.COM ([129.146.1.25]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id MAA07810 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 12:45:57 -0700
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id MAA00466; Fri, 1 Aug 1997 12:45:53 -0700
Received: from shield by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id MAA09482; Fri, 1 Aug 1997 12:45:54 -0700
Date: Fri, 1 Aug 1997 12:45:52 -0700 (PDT)
From: Kacheong Poon <kcpoon@jurassic.Eng.Sun.COM>
Reply-To: Kacheong Poon <kcpoon@jurassic.Eng.Sun.COM>
Subject: Re: revising RFC 2001 within tcp-impl
To: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: "Your message with ID" <199708011601.JAA20427@bobo.eng.sun.com>
Message-ID: <Roam.SIMC.2.0.870464752.21062.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> With IW = 2 the above pattern changes from 1,2,1,2,... to 2,1,2,1,...
> For a long connection this doesn't really matter.
> Thus I think this only gets worse when
>  - the path can only handle 1 packet per RTT, AND
>  - the amount of data sent is more than 1 packet but less than 3 packets

Is this a "theoretical" worst case?  How about in a real world setting that
buffering in network is dynamic?  Suppose for a Web server with M existing
connections.  N new requests come in.  If IW=2, what is the impact on the
existing connections?  If the injection of these 2N packets causes congestiong,
what will be the results?  Will the M+N connections share the bandwidth
"fairly" after that?  If IW=1, and the acks for the N new packets come in at
different time so that there will not be any "synchronzied" burst of packets. 
Will all connections go on smoothly in this case?  Or is the above just another
"theoretical" case that we can safely ignore?

							K. Poon
							kcpoon@eng.sun.com




From owner-tcp-impl@relay.engr.sgi.com  Fri Aug  1 12:53:52 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA01583 for tcp-impl-list; Fri, 1 Aug 1997 12:52:11 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA01410 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 12:51:30 -0700
Received: from firewall.agranat.com (agranat.com [146.115.131.2]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA11745
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 12:51:27 -0700
	env-from (lawrence@devnix.agranat.com)
Received: from s1.Agranat.COM (root@s1 [192.104.71.130]) by firewall.agranat.com (8.6.12/8.6.9) with ESMTP id PAA12443 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 15:51:26 -0400
Received: from devnix.agranat.com (root@devnix.agranat.com [192.104.71.180]) by s1.Agranat.COM (8.6.12/8.6.9) with ESMTP id PAA16169 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 15:51:22 -0400
Received: from devnix.agranat.com (lawrence@localhost [127.0.0.1]) by devnix.agranat.com (8.6.12/8.6.9s1) with ESMTP id PAA09737 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 15:52:03 -0400
Message-Id: <199708011952.PAA09737@devnix.agranat.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: revising RFC 2001 within tcp-impl
In-reply-to: <199708011900.MAA08051@rum.isi.edu>
Date: Fri, 01 Aug 1997 15:52:01 -0400
From: "Scott Lawrence" <lawrence@agranat.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


>>>>> "JT" == Joe Touch <touch@isi.edu> writes:

JT> FYI, John Heideman, who works with me, pointed out that in SunOS,
JT> this occurs only for the side initiating the connection.
JT> ...
JT> Thus the IW=2 case only occurs for the passive open side,
JT> so matters where the passive side has the data, e.g., HTTP.

  HTTP connections are always initiated by the client, which must send
  the request before the server will have anything to say, so HTTP is
  not a good example.

--
Scott Lawrence           EmWeb Embedded Server       <lawrence@agranat.com>
Agranat Systems, Inc.        Engineering            http://www.agranat.com/

From owner-tcp-impl@relay.engr.sgi.com  Fri Aug  1 12:53:57 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA01665 for tcp-impl-list; Fri, 1 Aug 1997 12:52:20 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA01645 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 12:52:16 -0700
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA12170
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 12:52:14 -0700
	env-from (kcpoon@jurassic.Eng.Sun.COM)
Received: from Eng.Sun.COM ([129.146.1.25]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id MAA08777; Fri, 1 Aug 1997 12:51:42 -0700
Received: from jurassic.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id MAA01571; Fri, 1 Aug 1997 12:51:39 -0700
Received: from shield by jurassic.eng.sun.com (SMI-8.6/SMI-SVR4)
	id MAA11172; Fri, 1 Aug 1997 12:51:40 -0700
Date: Fri, 1 Aug 1997 12:51:38 -0700 (PDT)
From: Kacheong Poon <kcpoon@jurassic.Eng.Sun.COM>
Reply-To: Kacheong Poon <kcpoon@jurassic.Eng.Sun.COM>
Subject: Re: revising RFC 2001 within tcp-impl
To: touch@ISI.EDU
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: "Your message with ID" <199708011900.MAA08051@rum.isi.edu>
Message-ID: <Roam.SIMC.2.0.870465098.5177.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> FYI, John Heideman, who works with me, pointed out that in SunOS,
> this occurs only for the side initiating the connection.

I guess this is a known behaviour of BSD derived TCP stacks.  SunOS 4.x's
stack is based on BSD.  I am wondering if this is a valid test to see if
a stack is BSD based...

							K. Poon
							kcpoon@eng.sun.com



From owner-tcp-impl@relay.engr.sgi.com  Fri Aug  1 13:17:13 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA07653 for tcp-impl-list; Fri, 1 Aug 1997 13:15:11 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA07613 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 13:15:04 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA20869
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 13:15:03 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA18722>; Fri, 1 Aug 1997 13:14:58 -0700
Date: Fri, 1 Aug 1997 13:14:24 -0700
Posted-Date: Fri, 1 Aug 1997 13:14:24 -0700
Message-Id: <199708012014.NAA09926@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <NAA09926>; Fri, 1 Aug 1997 13:14:24 -0700
To: tcp-impl@cthulhu.engr.sgi.com, lawrence@agranat.com
Subject: Re: revising RFC 2001 within tcp-impl
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From owner-tcp-impl@cthulhu.engr.sgi.com Fri Aug  1 12:54:36 1997
> To: tcp-impl@cthulhu.engr.sgi.com
> Subject: Re: revising RFC 2001 within tcp-impl
> Date: Fri, 01 Aug 1997 15:52:01 -0400
> From: "Scott Lawrence" <lawrence@agranat.com>
> 
> 
> >>>>> "JT" == Joe Touch <touch@isi.edu> writes:
> 
> JT> FYI, John Heideman, who works with me, pointed out that in SunOS,
> JT> this occurs only for the side initiating the connection.
> JT> ...
> JT> Thus the IW=2 case only occurs for the passive open side,
> JT> so matters where the passive side has the data, e.g., HTTP.
> 
>   HTTP connections are always initiated by the client, which must send
>   the request before the server will have anything to say, so HTTP is
>   not a good example.

Why not?

	syn---->

	<---syn+ack
client window is 1
	---ack->
		server window is 2
	---req->
		server can now send *2* packets of data back
	<-ack--
client window is 2 but who cares? (client isn't really sending much)



I was trying to indicate that HTTP _could_ take advantage of the
passive-side window starting at 2, but FTP could not. Is that not the case?

Joe


----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Fri Aug  1 13:24:34 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA09709 for tcp-impl-list; Fri, 1 Aug 1997 13:22:34 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA09701 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 13:22:31 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA23832
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 13:22:31 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA19224>; Fri, 1 Aug 1997 13:22:30 -0700
Date: Fri, 1 Aug 1997 13:21:55 -0700
Posted-Date: Fri, 1 Aug 1997 13:21:55 -0700
Message-Id: <199708012021.NAA10109@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <NAA10109>; Fri, 1 Aug 1997 13:21:55 -0700
To: touch@ISI.EDU, kcpoon@jurassic.eng.sun.com
Subject: Re: revising RFC 2001 within tcp-impl
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From owner-tcp-impl@cthulhu.engr.sgi.com Fri Aug  1 12:54:27 1997
> Date: Fri, 1 Aug 1997 12:51:38 -0700 (PDT)
> From: Kacheong Poon <kcpoon@jurassic.Eng.Sun.COM>
> Subject: Re: revising RFC 2001 within tcp-impl
> To: touch@ISI.EDU
> Cc: tcp-impl@cthulhu.engr.sgi.com
> 
> > FYI, John Heideman, who works with me, pointed out that in SunOS,
> > this occurs only for the side initiating the connection.
> 
> I guess this is a known behaviour of BSD derived TCP stacks.  SunOS 4.x's
> stack is based on BSD.  I am wondering if this is a valid test to see if
> a stack is BSD based...

BSD-based might be over-ambitious.

There's certainly a test for seeing if this occurs, regardless
of other BSD-like behavior. Tests for specific behaviors like
this might be more applicable.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Fri Aug  1 15:16:40 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA10466 for tcp-impl-list; Fri, 1 Aug 1997 15:14:04 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA10325 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 15:13:30 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA27524
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 15:13:21 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id PAA12792; Fri, 1 Aug 1997 15:13:20 -0700 (PDT)
Message-Id: <199708012213.PAA12792@daffy.ee.lbl.gov>
To: Kacheong Poon <kcpoon@jurassic.Eng.Sun.COM>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: revising RFC 2001 within tcp-impl
In-reply-to: Your message of Fri, 01 Aug 1997 12:45:52 PDT.
Date: Fri, 01 Aug 1997 15:13:19 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Suppose for a Web server with M existing
> connections.  N new requests come in.  If IW=2, what is the impact on the
> existing connections?

If the N new requests are not synchronized, then I don't see how their
arrivals are significantly different from N arrivals with IW=1.  It's the
same number of packets.  The traffic pattern is identical except that with
IW=1, you have an initial single-packet flight, while with IW=2 you either
don't have this, or it instead comes at the end of the connection (in which
case it will generally be at the tail end of a larger slow-start flight, if
no loss occurred; so an RTT's worth of latency gets reduced to a considerably
smaller latency).

Ironically, for TCPs that use "heartbeat" delay ack timers, IW=1 *will*
synchronize new connections, since they'll all get their first acks
at around the same time.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Fri Aug  1 15:51:32 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA15636 for tcp-impl-list; Fri, 1 Aug 1997 13:43:48 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA15613 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 13:43:45 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id KAA15782
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 10:21:40 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA07985>; Fri, 1 Aug 1997 10:21:24 -0700
Date: Fri, 1 Aug 1997 10:20:51 -0700
Posted-Date: Fri, 1 Aug 1997 10:20:51 -0700
Message-Id: <199708011720.KAA05512@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <KAA05512>; Fri, 1 Aug 1997 10:20:51 -0700
To: touch@ISI.EDU, raj@hpisrdq.cup.hp.com
Subject: Re: revising RFC 2001 within tcp-impl
Cc: tcp-impl@cthulhu.engr.sgi.com, vern@ee.lbl.gov, Allyn.Romanow@eng.sun.com,
        sca@refugee.engr.sgi.com, rstevens@kohala.com, floyd@ee.lbl.gov,
        mallman@lerc.nasa.gov, craig@aland.bbn.com
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Rick Jones <raj@hpisrdq.cup.hp.com>
> Subject: Re: revising RFC 2001 within tcp-impl
> 
> touch@ISI.EDU wrote:
> > It may be important that the two design points be mutually exclusive,
> > e.g.,
> >         start with TWO packets and DO NOT increase the window for SYN-ACK
> > 
> >         start with ONE packet and DO increaase the window for SYN-ACK (the bug)
> > 
> > doing both results in starting with a window of 4, double the current
> > value...
> 
> Indeed. Instead of having to enumerate in a later draft, why not simply
> state that after the three-way handshake is complete cwnd will be N and
> leave it up to the implementation as to how it gets to N. (I am assuming
> N >= 2)

Because the value of N is important and determines the group 
congestion properties of the Internet. It should not be 
'left to the implementation'. 

Joe

----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Fri Aug  1 15:55:42 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA15701 for tcp-impl-list; Fri, 1 Aug 1997 13:43:58 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA15683 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 13:43:55 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id KAA17711
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 1 Aug 1997 10:26:35 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA08328>; Fri, 1 Aug 1997 10:26:29 -0700
Date: Fri, 1 Aug 1997 10:25:55 -0700
Posted-Date: Fri, 1 Aug 1997 10:25:55 -0700
Message-Id: <199708011725.KAA05635@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <KAA05635>; Fri, 1 Aug 1997 10:25:55 -0700
To: touch@ISI.EDU, Erik.Nordmark@eng.sun.com
Subject: Re: revising RFC 2001 within tcp-impl
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Erik.Nordmark@Eng.Sun.COM (Erik Nordmark)
> Subject: Re: revising RFC 2001 within tcp-impl
> 
> > Now consider two cases for IW, large or small:
> > 
> > more than 10-20 packets in the connection (from open to close,
> > approx.):
...
> > 	If IW is 1, 2, or 4 we save 0, 1, or 2 RTTs out of a total
> > 	of 5 or more, i.e., the connection saves 25% or less time
> 
> There is one other factor that is missing from the above.
> A large number of TCP implementation ack every other packet (as they
> should) but do this even in the beginning of the TCP connection.

The code I've seen delays ACKS until 2 _full_ packets arrive, 
not 'every other packet'. I.e., they incur 200ms timeouts when
the source sends a partial then full (early versions of HTTP
servers, that sent HTTP headers out, then opened the file 
and started sending content), or full then partial (when
the file ends on a partial packet, even when it contains an
even number of packets).

> The correct answer here might to fix the receive side of TCP to
> not delay acks for the first data packet in the connection but
> it seems like folks have been fixing the sending TCP to use IW = 2
> instead.

Maybe the former has more utility?? :-)

> > So either it doesn't help or it hurts.
> 
> But how much does it really hurt?

Depends on how big the IW is. I'm not really making a case
for not using IW=RW=2 (as in many current implementations), 
rather I'm saying not to increase that to 4 or 8.

As you observe, 2 is fine given delayed ACKs.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Tue Aug  5 16:24:57 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA22359 for tcp-impl-list; Tue, 5 Aug 1997 16:22:12 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA22328 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 5 Aug 1997 16:22:10 -0700
Received: from darkstar.isi.edu (darkstar.isi.edu [128.9.128.127]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id QAA09335
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 5 Aug 1997 16:22:07 -0700
	env-from (johnh@ISI.EDU)
Received: from dash.isi.edu by darkstar.isi.edu (5.65c/5.61+local-27)
	id <AA29292>; Tue, 5 Aug 1997 16:21:54 -0700
Received: from dash.isi.edu (localhost.isi.edu [127.0.0.1])
          by dash.isi.edu (8.8.5/8.8.4) with ESMTP
	  id QAA24592; Tue, 5 Aug 1997 16:23:22 -0700
Message-Id: <199708052323.QAA24592@dash.isi.edu>
X-Url: http://www.isi.edu/~johnh/
To: tcp-impl@cthulhu.engr.sgi.com, vern@ee.lbl.gov, Allyn.Romanow@eng.sun.com,
        sca@refugee.engr.sgi.com, rstevens@kohala.com, floyd@ee.lbl.gov,
        mallman@lerc.nasa.gov
Subject: Re: revising RFC 2001 within tcp-impl
Date: Tue, 05 Aug 1997 16:23:22 -0700
From: John Heidemann <johnh@ISI.EDU>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


While the distinctions between IW and RW are useful,
no RFC currently mandates restart-after-idle behavior.

If a revised 2001 proposes a new IW value,
it seems to me like the strongest statement it can currently make
is something like:

	If a TCP implementation implements slow-start restart
	after idle periods, then they SHOULD use the same
	RW as IW.

(A separate interesting discussion would be recommending
slow-start restart after idle.)

   -John Heidemann


From owner-tcp-impl@relay.engr.sgi.com  Tue Aug  5 22:45:30 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA19506 for tcp-impl-list; Tue, 5 Aug 1997 22:43:46 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id WAA19498 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 5 Aug 1997 22:43:44 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id WAA21586
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 5 Aug 1997 22:43:43 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id WAA23238; Tue, 5 Aug 1997 22:43:42 -0700 (PDT)
Message-Id: <199708060543.WAA23238@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: seeking Munich agenda items
Date: Tue, 05 Aug 1997 22:43:42 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

If you have an item you'd like to see on the Munich agenda, please send
me and Steve (sca@refugee.engr.sgi.com) email.  So far the items I've
noted are:

	- overview of keepalive editions to the "known problems" I-D
	- discussion of the change to the Significance category
	  for that same I-D
	- presentation of the testing tools I-D
	- presentation of the E2E feedback on TCP research issues
	- discussion of revising RFC 2001
	- plea for volunteers to document other known problems

Thanks,

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Thu Aug  7 10:44:25 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA20148 for tcp-impl-list; Thu, 7 Aug 1997 10:42:21 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA20092 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 7 Aug 1997 10:42:15 -0700
Received: from mailhost.Ipsilon.COM (mailhost.ipsilon.com [205.226.5.12]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA11538
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 7 Aug 1997 10:42:14 -0700
	env-from (minshall@ipsilon.com)
Received: from red.ipsilon.com (red.Ipsilon.COM [205.226.1.58]) by mailhost.Ipsilon.COM (8.6.11/8.6.10) with ESMTP id KAA10368; Thu, 7 Aug 1997 10:42:09 -0700
Received: from red.ipsilon.com by red.ipsilon.com (8.6.12) id KAA09810; Thu, 7 Aug 1997 10:42:09 -0700
Message-Id: <199708071742.KAA09810@red.ipsilon.com>
X-Mailer: exmh version 1.6.9 8/22/96
To: Craig Partridge <craig@aland.bbn.com>
cc: tcp-impl@cthulhu.engr.sgi.com, van@ee.lbl.gov
Subject: Re: RTT estimation - a retraction 
In-reply-to: Your message of "Tue, 29 Jul 1997 09:42:53 PDT."
             <199707291642.JAA13298@aland.bbn.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 07 Aug 1997 10:42:08 -0700
From: Greg Minshall <minshall@ipsilon.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Craig, and all,

(Just to say a word or so before you all get together in Munich.)

You want your long-running RTT estimator (which you use to drive 
retransmissions) to "remember" the past, rather than be dominated by all the 
packets in this window (the "recent" past).

You can certainly *measure* the RTT as many times per window as you'd like.  
What you don't want to do is fold more than one measurement (or estimation) 
into your long-running estimator, since that would cause your long-running 
estimator to "forget" older measurements.

(Now, one could take the *average* of measured RTTs during a window, or, more 
conservatively, the MAX of measured RTTs during a window, as the estimate of 
the RTT to plug into the long-running RTT estimator once per window.)

Cheers,  Greg Minshall

From owner-tcp-impl@relay.engr.sgi.com  Thu Aug  7 11:15:10 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA07677 for tcp-impl-list; Thu, 7 Aug 1997 11:12:54 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA07646 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 7 Aug 1997 11:12:51 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id LAA20993
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 7 Aug 1997 11:12:50 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA18170>; Thu, 7 Aug 1997 11:12:48 -0700
Date: Thu, 7 Aug 1997 11:12:47 -0700
Posted-Date: Thu, 7 Aug 1997 11:12:47 -0700
Message-Id: <199708071812.LAA09346@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <LAA09346>; Thu, 7 Aug 1997 11:12:47 -0700
To: craig@aland.bbn.com, minshall@ipsilon.com
Subject: Re: RTT estimation - a retraction
Cc: tcp-impl@cthulhu.engr.sgi.com, van@ee.lbl.gov
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From owner-tcp-impl@cthulhu.engr.sgi.com Thu Aug  7 10:46:17 1997
> To: Craig Partridge <craig@aland.bbn.com>
> Cc: tcp-impl@cthulhu.engr.sgi.com, van@ee.lbl.gov
> Subject: Re: RTT estimation - a retraction 
> Date: Thu, 07 Aug 1997 10:42:08 -0700
> From: Greg Minshall <minshall@ipsilon.com>
> 
> Craig, and all,
> 
> (Just to say a word or so before you all get together in Munich.)
> 
> You want your long-running RTT estimator (which you use to drive 
> retransmissions) to "remember" the past, rather than be dominated by all the 
> packets in this window (the "recent" past).
> 
> You can certainly *measure* the RTT as many times per window as you'd like.  
> What you don't want to do is fold more than one measurement (or estimation) 
> into your long-running estimator, since that would cause your long-running 
> estimator to "forget" older measurements.
> 
> (Now, one could take the *average* of measured RTTs during a window, or, more 
> conservatively, the MAX of measured RTTs during a window, as the estimate of 
> the RTT to plug into the long-running RTT estimator once per window.)

	R-est[t] = estimated RTT at time t
	R-mes[t] = measured RTT at time t

The current algorithm uses an inverse-exponential decay, 

	R-est[t] = R-est[t-1] * alpha + R-mes[t] * beta

where alpha + beta = 1.

The net effect is that:

	R-est[t] = R-mes[t] * beta 
		+ R-mes[t-1] * beta * alpha
		+ R-mes[t-2] * beta * alpha^2
		...

(I may be getting the names for alpha and beta switched, here).


There are other predictors, such as:

	sliding average over a window (window = 10x largest RTT, e.g.)
		(for windowed-averages, another equation is required
		to estimate the window, e.g., a constant, 
		10 * max(R-mes[t]) over the last window, etc.)

	weighted average over a window

	max over a window

I don't know which of these has been tested in a real system.
It might be worth re-examining, given the nature of satellite
channels and web traffic, though.

Joe

	
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Tue Aug 12 10:44:05 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA05435 for tcp-impl-list; Tue, 12 Aug 1997 10:39:50 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA05414 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 12 Aug 1997 10:39:47 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id KAA04662
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 12 Aug 1997 10:39:46 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA23752>; Tue, 12 Aug 1997 10:39:42 -0700
Date: Tue, 12 Aug 1997 10:39:42 -0700
Posted-Date: Tue, 12 Aug 1997 10:39:42 -0700
Message-Id: <199708121739.KAA08900@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <KAA08900>; Tue, 12 Aug 1997 10:39:42 -0700
To: tcp-impl@cthulhu.engr.sgi.com, vern@ee.lbl.gov
Subject: does this qualify as a bug?
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

It appears that the slow-start restart code in FreeBSD (at least)
is not what was intended. Is this the kind of bug that the I-D
is seeking (is slow-start restart considered part of the default
operation??)

Joe


----- Begin Included Message -----

>From touch@ISI.EDU Tue Aug 12 10:33:21 1997
Date: Tue, 12 Aug 1997 10:33:13 -0700
From: touch@ISI.EDU
To: lsam@ISI.EDU
Subject: HTTP 1.1 and slow-start restart (or lack thereof)


----- Begin Included Message -----

>From touch@ISI.EDU Tue Aug 12 10:31:46 1997
Date: Tue, 12 Aug 1997 10:31:44 -0700
From: touch@ISI.EDU
To: tcp-over-satellite@achtung.sp.trw.com
Subject: HTTP 1.1 and slow-start restart (or lack thereof)
Cc: touch@ISI.EDU


As promised last week, here are the results of experiments
to determine the behavior of HTTP 1.1 and slow-start restart.

A bit of background:

Some versions of TCP have a mechanism called 'slow-start restart',
which is intended to cause the congestion window to restart as if
at the beginning of a connection, whenever the 
  
  "pipe is empty because we haven't _sent_ anything for at 
  least a round trip time"-
	Jacobson, Sigcomm 88, 'Congestion Avoidance and Control', 
	available at ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z
	(pseudo-italics mine - Joe)

Kacheong Poon <kcpoon@jurassic.eng.Sun.COM> correctly asserted
that, in existing implementations, that:

	_receiving_ a packet is enough to avoid this restart,
	and, in HTTP 1.1, regardless of how long the connection is
	idle, the subsequent request will reset the idle timer
	at the server, and the window won't restart there

We have confirmed this behavior on FreeBSD 2.2.1, using Apache
servers with persistent connections and Netscape Navigator 4.

There appears to be an error between Jacobson's paper and 
the BSD implementation. In Jacobson's paper, the code is given as
(page 19, code on right side):

	int idle = (snd_max == snd_una);
	if (idle && now - lastsnd > rto)
		cwnd = 1;

However, in the FreeBSD code, 'now - lastsnd' is replaced
by 't_idle >= rxtcur', which isn't quite the same:

	idle = (tp->snd_max == tp->snd_una);
	if (idle && tp->t_idle >= tp->t_rxtcur)
		/*
		 * We have been idle for "a while" and no acks are
		 * expected to clock out any data we send --
		 * slow start to get ack "clock" running again.
		 */
		tp->snd_cwnd = tp->t_maxseg;

Wright/Stevens TCP/IP Illustrated, V2 recognizes this bug. The code
above is on page 853 of V2, in the beginning of Chapter 26. On page
889, Exercise 26.1 asks: (pseudo-italics his):

  Slow start is resumed in Figure 26.1 when there is a pause in
  the _sending_ of data, yet the amount of idle time is calculated as
  the amout of time since the last segment was _receievd_ on the
  connection. Why doesn't TCP calculate the idle time as the amount of
  time since the last segment was _sent_ on the connection?

The answer proposed on page 1085, Appendix A, is:

  The counter t-idle is always running for a connection, whereas TCP
  does not measure the amount of time since the last segment was sent
  on a connection.

------------------------------------------

So, in summary, two bits of news:

	KC was right, existing implementations are net-unfriendly
	and do not do slow-start restart between request gaps 
	in HTTP with persistent connections

	I was also right (at least I'll claim so :-), that this is
	not the intended behavior and is the result of an implementation
	bug.

I'll forward this to the appropriate places; in the meantime, we're
working on a patch for FreeBSD to yield the correct behavior.

Joe

----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/


----- End Included Message -----



----- End Included Message -----


----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Tue Aug 12 12:00:24 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA03903 for tcp-impl-list; Tue, 12 Aug 1997 11:57:27 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA03891 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 12 Aug 1997 11:57:24 -0700
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA06840
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 12 Aug 1997 11:57:23 -0700
	env-from (kcpoon@jurassic.eng.Sun.COM)
Received: from sunmail1.Sun.COM ([129.145.1.2]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id LAA00727; Tue, 12 Aug 1997 11:47:29 -0700
Received: from jurassic.eng.sun.com by sunmail1.Sun.COM (SMI-8.6/SMI-4.1)
	id LAA16398; Tue, 12 Aug 1997 11:47:27 -0700
Received: from shield (shield [129.146.83.81])
	by jurassic.eng.sun.com (8.8.7+Sun.Alpha.7/8.8.7) with SMTP id LAA23067;
	Tue, 12 Aug 1997 11:47:27 -0700 (PDT)
Date: Tue, 12 Aug 1997 11:47:26 -0700 (PDT)
From: Kacheong Poon <kcpoon@jurassic.eng.Sun.COM>
Reply-To: Kacheong Poon <kcpoon@jurassic.eng.Sun.COM>
Subject: Re: does this qualify as a bug?
To: touch@ISI.EDU
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: "Your message with ID" <199708121739.KAA08900@rum.isi.edu>
Message-ID: <Roam.SIMC.2.0.871411646.1903.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>   "pipe is empty because we haven't _sent_ anything for at 
>   least a round trip time"-
> 	Jacobson, Sigcomm 88, 'Congestion Avoidance and Control', 
> 	available at ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z
> 	(pseudo-italics mine - Joe)

Actually, footnote 23 in that paper says that last receive time can be used
because of send/receive symmetry.  I guess the reason is that as long as
ACKs are coming back, the ACK clock is not lost.  But in the HTTP/1.1 case,
the new segment is not a pure ACK, it is a request with data.  So this causes
the problem.

But how long the idle should be before restarting slow start?  When does this
idle time timer start?   If we use last send time and 1 RTO idle time, by the
time the ACK for the last sent segment comes back, it is already 1 RTT after. 
But we may not want to restart slow start that soon given that network may be
stable over that "short" period of time.  I guess this should be discussed
somewhere before.  Is there any conclusion?

							K. Poon
							kcpoon@eng.sun.com



From owner-tcp-impl@relay.engr.sgi.com  Tue Aug 12 13:59:47 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA21884 for tcp-impl-list; Tue, 12 Aug 1997 13:57:21 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA21868 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 12 Aug 1997 13:57:19 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA28825
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 12 Aug 1997 13:57:17 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA04983>; Tue, 12 Aug 1997 13:57:16 -0700
Date: Tue, 12 Aug 1997 13:57:16 -0700
Posted-Date: Tue, 12 Aug 1997 13:57:16 -0700
Message-Id: <199708122057.NAA09122@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <NAA09122>; Tue, 12 Aug 1997 13:57:16 -0700
To: touch@ISI.EDU, kcpoon@jurassic.eng.sun.com
Subject: Re: does this qualify as a bug?
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From kcpoon@jurassic.eng.Sun.COM Tue Aug 12 12:36:13 1997
> Date: Tue, 12 Aug 1997 11:47:26 -0700 (PDT)
> From: Kacheong Poon <kcpoon@jurassic.eng.Sun.COM>
> Subject: Re: does this qualify as a bug?
> To: touch@ISI.EDU
> Cc: tcp-impl@cthulhu.engr.sgi.com
> 
> >   "pipe is empty because we haven't _sent_ anything for at 
> >   least a round trip time"-
> > 	Jacobson, Sigcomm 88, 'Congestion Avoidance and Control', 
> > 	available at ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z
> > 	(pseudo-italics mine - Joe)
> 
> Actually, footnote 23 in that paper says that last receive time can be used
> because of send/receive symmetry.  I guess the reason is that as long as
> ACKs are coming back, the ACK clock is not lost.  But in the HTTP/1.1 case,
> the new segment is not a pure ACK, it is a request with data.  So this causes
> the problem.
> 
> But how long the idle should be before restarting slow start?  When does this
> idle time timer start?   If we use last send time and 1 RTO idle time, by the
> time the ACK for the last sent segment comes back, it is already 1 RTT after. 
> But we may not want to restart slow start that soon given that network may be
> stable over that "short" period of time.  I guess this should be discussed
> somewhere before.  Is there any conclusion?

THere is a different way to solve the problem.

If the "window allowed to send" (snd_cwnd - (snd_nxt - snd_una))?
exceeds the "current window size" (snd_cwnd)?
by more than X% (for some X), then the slow-start restart should ensue,

(can someone suggest the exact formula)?

I vaguely recall a similar mechanism, during ACK processing, though
I can't find it in the BSD code (?). When an ACK
arrives that slides the window forward by more than 1/2,
the window is reset and slow-start restart ensues (does anyone recall?)

Given that the intent is to avoid a line-rate burst, the clearly 
the best mechanism is "monitor and limit the window allowed-to-send size".

This means - on EVERY window size update, if the amount allowed-to-send
exceeds a threshold (X% of cwnd, 2 MSS, ??), the cwnd needs to be deducted
to disallow the bursts. 

Maybe this means restart - maybe not. Maybe we just decrease the window
and allow only a burst of 2 packets until ACK clocking resumes (when
the window will reopen anyway, or at least slide forward and allow 
the next transmission).

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Wed Aug 13 13:10:11 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA11480 for tcp-impl-list; Wed, 13 Aug 1997 13:07:25 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA11467 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 13 Aug 1997 13:07:22 -0700
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA19860
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 13 Aug 1997 13:07:20 -0700
	env-from (curtis@brookfield.ans.net)
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.8.5/8.7.3) with ESMTP id QAA06688; Wed, 13 Aug 1997 16:07:02 -0400 (EDT)
Message-Id: <199708132007.QAA06688@brookfield.ans.net>
To: Kacheong Poon <kcpoon@jurassic.eng.Sun.COM>
cc: touch@ISI.EDU, tcp-impl@cthulhu.engr.sgi.com
Reply-To: curtis@ans.net
Subject: Re: does this qualify as a bug? 
In-reply-to: Your message of "Tue, 12 Aug 1997 11:47:26 PDT."
             <Roam.SIMC.2.0.871411646.1903.kcpoon@jurassic> 
Date: Wed, 13 Aug 1997 16:07:02 -0400
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <Roam.SIMC.2.0.871411646.1903.kcpoon@jurassic>, Kacheong Poon writes
:
> >   "pipe is empty because we haven't _sent_ anything for at 
> >   least a round trip time"-
> > 	Jacobson, Sigcomm 88, 'Congestion Avoidance and Control', 
> > 	available at ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z
> > 	(pseudo-italics mine - Joe)
> 
> Actually, footnote 23 in that paper says that last receive time can be used
> because of send/receive symmetry.  I guess the reason is that as long as
> ACKs are coming back, the ACK clock is not lost.  But in the HTTP/1.1 case,
> the new segment is not a pure ACK, it is a request with data.  So this causes
> the problem.

What problem:

	rcv				snd
		<-- send last segment
			1/2 RTT
		  get last segment
		    send ack -->
		  send request -->
			1/2 RTT
		  get last ACK
		  get request

The request should come at some time very shortly after the last ACK
(<<RTT for all but the smallest RTTs).  As long as the ACKs keep
reseting the idle time, there is data to be sent almost immediately
after going idle.

> But how long the idle should be before restarting slow start?  When does this
> idle time timer start?   If we use last send time and 1 RTO idle time, by the
> time the ACK for the last sent segment comes back, it is already 1 RTT after.

I would like to suggest that we condsider reducing cwnd by a factor of
2 for every 1/2 RTO of idle time rather than go back to a window of 1,
where idle is measured from the last segment sent.  If cwnd drops
below ssthresh, it will quickly return to ssthresh.  

Pros and cons are: 1) This does reduce the burst size after idle.  2)
It does reduce the offerred load in case while idle other connections
have taken up some of the bottleneck bandwidth.  3) It does not bring
the window back to 1, which is a very drastic measure for large window
TCP, particularly for very large RTT (satellite).

I've suggested this before (actually halving for each full RTT), so
sorry for repeating myself.

> But we may not want to restart slow start that soon given that network may be
> stable over that "short" period of time.  I guess this should be discussed
> somewhere before.  Is there any conclusion?

No conclusions that I know of.  :)

> 							K. Poon
> 							kcpoon@eng.sun.com

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Wed Aug 13 13:23:40 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA15296 for tcp-impl-list; Wed, 13 Aug 1997 13:18:59 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA15281 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 13 Aug 1997 13:18:56 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA23098
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 13 Aug 1997 13:18:55 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA01472>; Wed, 13 Aug 1997 13:18:47 -0700
Date: Wed, 13 Aug 1997 13:18:47 -0700
Posted-Date: Wed, 13 Aug 1997 13:18:47 -0700
Message-Id: <199708132018.NAA10410@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <NAA10410>; Wed, 13 Aug 1997 13:18:47 -0700
To: kcpoon@jurassic.eng.sun.com, curtis@ans.net
Subject: Re: does this qualify as a bug?
Cc: touch@ISI.EDU, tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From curtis@brookfield.ans.net Wed Aug 13 13:07:20 1997
> To: Kacheong Poon <kcpoon@jurassic.eng.Sun.COM>
> Cc: touch@ISI.EDU, tcp-impl@cthulhu.engr.sgi.com
> Subject: Re: does this qualify as a bug? 
> Date: Wed, 13 Aug 1997 16:07:02 -0400
> From: Curtis Villamizar <curtis@brookfield.ans.net>
> 
> 
> In message <Roam.SIMC.2.0.871411646.1903.kcpoon@jurassic>, Kacheong Poon writes
> :
> > >   "pipe is empty because we haven't _sent_ anything for at 
> > >   least a round trip time"-
> > > 	Jacobson, Sigcomm 88, 'Congestion Avoidance and Control', 
> > > 	available at ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z
> > > 	(pseudo-italics mine - Joe)
> > 
> > Actually, footnote 23 in that paper says that last receive time can be used
> > because of send/receive symmetry.  I guess the reason is that as long as
> > ACKs are coming back, the ACK clock is not lost.  But in the HTTP/1.1 case,
> > the new segment is not a pure ACK, it is a request with data.  So this causes
> > the problem.
> 
> What problem:
> 
> 	rcv				snd
> 		<-- send last segment
> 			1/2 RTT
> 		  get last segment
> 		    send ack -->
> 		  send request -->
> 			1/2 RTT
> 		  get last ACK
> 		  get request
> 
> The request should come at some time very shortly after the last ACK
> (<<RTT for all but the smallest RTTs).  As long as the ACKs keep
> reseting the idle time, there is data to be sent almost immediately
> after going idle.

The request comes whenever the user clicks (we're talking
about user-created gaps in the stream at this point).

We've seen traces with gaps in the 5-second range. The last ACK
comes before the gap; the request arrives after the gap.
In this case, TCP blasts into the network - which is just as
bad as "no slow-start" - which is listed as a bug.

> > But how long the idle should be before restarting slow start?  When does this
> > idle time timer start?   If we use last send time and 1 RTO idle time, by the
> > time the ACK for the last sent segment comes back, it is already 1 RTT after.
> 
> I would like to suggest that we condsider reducing cwnd by a factor of
> 2 for every 1/2 RTO of idle time rather than go back to a window of 1,
> where idle is measured from the last segment sent.  If cwnd drops
> below ssthresh, it will quickly return to ssthresh.  

There are several suggestions for a solution. I prefer one that
limits the "transmission rights" - preventing the window from
sliding forward far enough to enable a burst of more than a few
packets (4 or so).

However, without an implementation (which we don't have either,
but we're working on), it's premature to pick a winner.


Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Wed Aug 13 16:00:10 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA10157 for tcp-impl-list; Wed, 13 Aug 1997 15:57:40 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA10138 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 13 Aug 1997 15:57:37 -0700
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA23093
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 13 Aug 1997 15:57:35 -0700
	env-from (curtis@brookfield.ans.net)
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.8.5/8.7.3) with ESMTP id SAA07090; Wed, 13 Aug 1997 18:57:13 -0400 (EDT)
Message-Id: <199708132257.SAA07090@brookfield.ans.net>
To: touch@ISI.EDU
cc: kcpoon@jurassic.eng.sun.com, tcp-impl@cthulhu.engr.sgi.com
Reply-To: curtis@ans.net
Subject: Re: does this qualify as a bug? 
In-reply-to: Your message of "Tue, 12 Aug 1997 13:57:16 PDT."
             <199708122057.NAA09122@rum.isi.edu> 
Date: Wed, 13 Aug 1997 18:57:13 -0400
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <199708122057.NAA09122@rum.isi.edu>, touch@ISI.EDU writes:
> 
> THere is a different way to solve the problem.
> 
> If the "window allowed to send" (snd_cwnd - (snd_nxt - snd_una))?
> exceeds the "current window size" (snd_cwnd)?
> by more than X% (for some X), then the slow-start restart should ensue,
> 
> (can someone suggest the exact formula)?
> 
> I vaguely recall a similar mechanism, during ACK processing, though
> I can't find it in the BSD code (?). When an ACK
> arrives that slides the window forward by more than 1/2,
> the window is reset and slow-start restart ensues (does anyone recall?)
> 
> Given that the intent is to avoid a line-rate burst, the clearly 
> the best mechanism is "monitor and limit the window allowed-to-send size".


If you were previously in near steady state and near congestion, one
TCP flow going idle would reduce the standing queue at the bottleneck.
If the TCP flow that went idle was not a significant contributor to
the bottleneck load, then the line rate burst would not be significant
relative to other traffic.  If the TCP flow was a significant or among
the dominant contributors than the idle period of one RTT would very
likely allow the queue to drain.  The <1 RTT idle would not result in
a major increase in offerred load by the other TCP flows (assuming
that most of them are not in initial slow start with no ssthresh).  If
the single TCP flow was accounting for some portion of the bandwidth
(1/Nth for small N) then a full window burst should be well under one
to two delay bandwidth product for the link (the accepted minimum
queueing requirement).

Going to slow start is a very drastic move for large window TCP.  It
takes on the order of N RTTs to reach a window of 2^N (Jamshid argues
that should be 1.5^N).  For a satellite link and a small transfer, that
is a long time.

Sending large line rate bursts is bad but not terrible.

Slow starting large window TCPs is wasteful but not terrible.

There is a tradeoff and the BSD implementation currently is capable of
sending full window bursts (idle slightly under RTO) or slow starting
TCP (idle slightly over RTO).

It would be nice if pacing could be recovered without waiting multiple
RTTs to do so.  Unfortunately, the only way to do that would be to
clock out packets according to some estimate of pacing based on cwnd
and RTO and that would involve using timers to clock out the packets.

I prefer the idea of reducing cwnd by some factor for given idle times
so that the burst size is limited proportionally to the prior value of
cwnd but it does not have to be limited to some small fixed integer.
This does tend to reduce the worst case burst to 1/2 window, avoids
slow start unless a lot of RTOs have elapsed, and reduces the burst
and increased the tendency to go all the way back to slow start as the
number of RTOs increase rather than have a sharp cutoff.

	idle = (tp->snd_max == tp->snd_una);
	if (idle) {
		/* outer "if" avoids dividing if possible */
		idleshift = 2 * tp->t_idle / tp->t_rxtcur;
		if (idleshift > 0) {
			tp->snd_cwnd >>= idleshift;
			if (tp->snd_cwnd < tp->t_maxseg)
				tp->snd_cwnd = tp->t_maxseg;
		}
	}

I think the code above would implement the suggestion I made on
end2end-interest early today.  If you wanted to be even more
aggressive about limiting burst size, the 2 could be replaced with a
large integer (4?).  Even then in cases where the variation in
measured RTT was large compared to min RTT you might get a full window
burst.  While far from perfect, this would be an improvement over
current behavior short of clocking out packets on a timer.

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Wed Aug 13 16:23:18 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA16386 for tcp-impl-list; Wed, 13 Aug 1997 16:15:55 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA16370 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 13 Aug 1997 16:15:52 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id QAA01254
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 13 Aug 1997 16:15:51 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA13227>; Wed, 13 Aug 1997 16:15:50 -0700
Date: Wed, 13 Aug 1997 16:15:50 -0700
Posted-Date: Wed, 13 Aug 1997 16:15:50 -0700
Message-Id: <199708132315.QAA10889@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <QAA10889>; Wed, 13 Aug 1997 16:15:50 -0700
To: touch@ISI.EDU, curtis@ans.net
Subject: Re: does this qualify as a bug?
Cc: kcpoon@jurassic.eng.sun.com, tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From curtis@brookfield.ans.net Wed Aug 13 15:57:32 1997
...
> > Given that the intent is to avoid a line-rate burst, the clearly 
> > the best mechanism is "monitor and limit the window allowed-to-send size".
> 
> If you were previously in near steady state and near congestion, one
> TCP flow going idle would reduce the standing queue at the bottleneck.

In steady state, if the TCP flow is idle for more than a RTT,
the other 'steady state' TCPs will gobble that queue hole by
probes.

> If the TCP flow that went idle was not a significant contributor to
> the bottleneck load, then the line rate burst would not be significant
> relative to other traffic.  If the TCP flow was a significant or among

If we knew that, we wouldn't need to avoid the line-rate bursts. 
In general, it's the agnosticism that causes the conservative design anyway.

> Going to slow start is a very drastic move for large window TCP.  It
> takes on the order of N RTTs to reach a window of 2^N (Jamshid argues
> that should be 1.5^N).  For a satellite link and a small transfer, that
> is a long time.

Sure - I'm not sure that this is the right solution.
There are separable issues:

	- detecting the failure of ACK clocking and thus
		the potential for line-rate bursting
	
		(which can occur with big initial windows,
		or when the transmission is idle for a period)

	- what to do about it

> Sending large line rate bursts is bad but not terrible.

Once RED is implemented, agreed. Until then, this is not the 
case. I can grab a chunk of the queue and starve out other
ack-clocking TCPs, which is clearly anti-social. At BEST,
I will consume BW and router resources up to the first
router at which I lose packets. Either way, it's a bad thing.

> (code included)
> 
> I think the code above would implement the suggestion I made on
> end2end-interest early today.  If you wanted to be even more

Our research group here at ISI is examining this. We will implement
a variety of algorithms (this included) and see what happens. 
If anyone has other suggested algorithms, please post them.

> burst.  While far from perfect, this would be an improvement over
> current behavior short of clocking out packets on a timer.

PS - as mentioned before, we have an implementation of this as well,
called 'rate-based pacing' that clocks out packets using rate estimates
until ACK clocking resumes. We'll compare complexity and computational 
overhead, as well as efficiency and 'niceness'.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Wed Aug 13 17:31:32 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA07648 for tcp-impl-list; Wed, 13 Aug 1997 17:28:20 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA07636 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 13 Aug 1997 17:28:18 -0700
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA22586
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 13 Aug 1997 17:28:16 -0700
	env-from (curtis@brookfield.ans.net)
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.8.5/8.7.3) with ESMTP id UAA07770; Wed, 13 Aug 1997 20:27:59 -0400 (EDT)
Message-Id: <199708140027.UAA07770@brookfield.ans.net>
To: touch@ISI.EDU
cc: kcpoon@jurassic.eng.sun.com, curtis@ans.net, tcp-impl@cthulhu.engr.sgi.com
Reply-To: curtis@ans.net
Subject: Re: does this qualify as a bug? 
In-reply-to: Your message of "Wed, 13 Aug 1997 13:18:47 PDT."
             <199708132018.NAA10410@rum.isi.edu> 
Date: Wed, 13 Aug 1997 20:27:59 -0400
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <199708132018.NAA10410@rum.isi.edu>, touch@ISI.EDU writes:
> 
> The request comes whenever the user clicks (we're talking
> about user-created gaps in the stream at this point).

Oh.  If the user can read a page in under 1 RTO that's a fast reader.
So let is slow start.

> We've seen traces with gaps in the 5-second range. The last ACK
> comes before the gap; the request arrives after the gap.
> In this case, TCP blasts into the network - which is just as
> bad as "no slow-start" - which is listed as a bug.

That would be a bug.  The BSD code fragments I've seen would lead me
to think that BSD doesn't do this unless RTO was above 5 seconds for
some reason.  By the time you have a large window you should also have
a reasonably accurate RTO.

> > I would like to suggest that we condsider reducing cwnd by a factor of
> > 2 for every 1/2 RTO of idle time rather than go back to a window of 1,
> > where idle is measured from the last segment sent.  If cwnd drops
> > below ssthresh, it will quickly return to ssthresh.  
> 
> There are several suggestions for a solution. I prefer one that
> limits the "transmission rights" - preventing the window from
> sliding forward far enough to enable a burst of more than a few
> packets (4 or so).

I think that is overkill but if you can do this in a simple way and
still avoid the slow start delay, that's great.

> However, without an implementation (which we don't have either,
> but we're working on), it's premature to pick a winner.

Thanks for considering the code I posted earlier.

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Wed Aug 13 17:40:39 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA10480 for tcp-impl-list; Wed, 13 Aug 1997 17:38:38 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA10462 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 13 Aug 1997 17:38:35 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id RAA25044
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 13 Aug 1997 17:38:34 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA17033>; Wed, 13 Aug 1997 17:38:34 -0700
Date: Wed, 13 Aug 1997 17:38:33 -0700
Posted-Date: Wed, 13 Aug 1997 17:38:33 -0700
Message-Id: <199708140038.RAA10934@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <RAA10934>; Wed, 13 Aug 1997 17:38:33 -0700
To: touch@ISI.EDU, curtis@ans.net
Subject: Re: does this qualify as a bug?
Cc: kcpoon@jurassic.eng.sun.com, tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From curtis@brookfield.ans.net Wed Aug 13 17:28:10 1997
> To: touch@ISI.EDU
> Cc: kcpoon@jurassic.eng.sun.com, curtis@ans.net, tcp-impl@cthulhu.engr.sgi.com
> Subject: Re: does this qualify as a bug? 
> Date: Wed, 13 Aug 1997 20:27:59 -0400
> From: Curtis Villamizar <curtis@brookfield.ans.net>
> 
> 
> In message <199708132018.NAA10410@rum.isi.edu>, touch@ISI.EDU writes:
> > 
> > The request comes whenever the user clicks (we're talking
> > about user-created gaps in the stream at this point).
> 
> Oh.  If the user can read a page in under 1 RTO that's a fast reader.
> So let is slow start.
> 
> > We've seen traces with gaps in the 5-second range. The last ACK
> > comes before the gap; the request arrives after the gap.
> > In this case, TCP blasts into the network - which is just as
> > bad as "no slow-start" - which is listed as a bug.
> 
> That would be a bug.  The BSD code fragments I've seen would lead me
> to think that BSD doesn't do this unless RTO was above 5 seconds for
> some reason.  By the time you have a large window you should also have
> a reasonably accurate RTO.

FreeBSD just won't shut the window down, because it just received
data (and reset t_idle to 0). The condition is:

	idle = (tp->snd_max == tp->snd_una);

idle is true when all outstanding data is acknowledged

	if (idle && tp->t_idle >= tp->t_rxtcur)
		tp->snd_cwnd = tp->t_maxseg;

at this point t_idle is 0, so this is never true.

So FreeBSD will burst to whatever amount of window is
possible, i.e., a full window if all outstanding sends
have been ACKd.

If this is indeed a bug, I'll write it up for the I-D.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Thu Aug 14 08:13:53 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA03245 for tcp-impl-list; Thu, 14 Aug 1997 08:11:31 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA03234 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 14 Aug 1997 08:11:29 -0700
Received: from merit.edu (merit.edu [198.108.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA27682
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 14 Aug 1997 08:11:11 -0700
	env-from (wsimpson@greendragon.com)
Received: from Bill.Simpson.DialUp.Mich.Net (pm002-27.dialip.mich.net [141.211.7.163])
	by merit.edu (8.8.6/8.8.5) with SMTP id LAA04688
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 14 Aug 1997 11:11:09 -0400 (EDT)
Date: Thu, 14 Aug 97 13:03:22 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <6436.wsimpson@greendragon.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: does this qualify as a bug?
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> If this is indeed a bug, I'll write it up for the I-D.
>
Having read the previous postings, I agree this is a bug, please write
it up!

But someone also needs to write up the fix in RFC2001bis.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Fri Aug 15 15:42:49 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA16394 for tcp-impl-list; Fri, 15 Aug 1997 15:36:39 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA16306 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 15 Aug 1997 15:36:17 -0700
Received: from ni1.ni.net (ni1.ni.net [192.215.247.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA11552
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 15 Aug 1997 15:36:16 -0700
	env-from (wti2@ni.net)
Received: from LOCALNAME (wti2.ni.net [199.107.84.12])
	by ni1.ni.net (8.8.5/8.8.5) with SMTP id PAA22209
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 15 Aug 1997 15:35:58 -0700 (PDT)
Date: Fri, 15 Aug 1997 15:35:58 -0700 (PDT)
Message-Id: <199708152235.PAA22209@ni1.ni.net>
X-Sender: wti2@ni.net
X-Mailer: Windows Eudora Pro Version 2.1.2
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
To: tcp-impl@cthulhu.engr.sgi.com
From: Engineering <wti2@ni.net>
Subject: HELP
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

HELP


From owner-tcp-impl@relay.engr.sgi.com  Tue Aug 19 14:56:13 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA11540 for tcp-impl-list; Tue, 19 Aug 1997 14:47:15 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA11490 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 14:46:56 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA07503
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 14:46:55 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id OAA15220; Tue, 19 Aug 1997 14:46:53 -0700 (PDT)
Message-Id: <199708192146.OAA15220@daffy.ee.lbl.gov>
To: touch@ISI.EDU
Cc: tcp-impl@cthulhu.engr.sgi.com, van@ee.lbl.gov
Subject: Re: does this qualify as a bug?
In-reply-to: Your message of Wed, 13 Aug 1997 17:38:33 PDT.
Date: Tue, 19 Aug 1997 14:46:53 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

I talked with Van about slow-start after idle issues.  His basic argument is:
any time you've lost the ack clock, you need to recover it, and the mechanism
for doing so is a slow-start.  So after 1 RTO (at which point you don't expect
any more feedback from the receiver), you need to begin a new slow-start, just
as though you were beginning a new connection.

There may be room in the future for introducing pacing techniques that
supplement the ack clock.  But this is TCP research, and out of scope
for tcp-impl.

So my leaning is that for the RFC 2001 revision, we add a description of
slow-start after 1 RTO idle, with a window the same as the initial connection
window.  The behavior you described is indeed a bug, probably one that's
very easy to make, and should be written up as such.  Clearly, we will need
to be careful in our description of what exactly "1 RTO idle" means.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Aug 19 15:15:04 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA24130 for tcp-impl-list; Tue, 19 Aug 1997 15:10:27 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA24115 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 15:10:23 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id PAA18168
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 15:10:23 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA18957>; Tue, 19 Aug 1997 15:10:19 -0700
Date: Tue, 19 Aug 1997 15:10:19 -0700
Posted-Date: Tue, 19 Aug 1997 15:10:19 -0700
Message-Id: <199708192210.PAA02816@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <PAA02816>; Tue, 19 Aug 1997 15:10:19 -0700
To: touch@ISI.EDU, vern@ee.lbl.gov
Subject: Re: does this qualify as a bug?
Cc: tcp-impl@cthulhu.engr.sgi.com, van@ee.lbl.gov
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


----- Begin Included Message -----

> To: touch@ISI.EDU
> Cc: tcp-impl@cthulhu.engr.sgi.com, van@ee.lbl.gov
> Subject: Re: does this qualify as a bug?
> Date: Tue, 19 Aug 1997 14:46:53 PDT
> From: Vern Paxson <vern@ee.lbl.gov>
> 
> I talked with Van about slow-start after idle issues.  His basic argument is:
> any time you've lost the ack clock, you need to recover it, and the mechanism
> for doing so is a slow-start.  So after 1 RTO (at which point you don't expect
> any more feedback from the receiver), you need to begin a new slow-start, just
> as though you were beginning a new connection.

> So my leaning is that for the RFC 2001 revision, we add a description of
> slow-start after 1 RTO idle, with a window the same as the initial connection
> window.  The behavior you described is indeed a bug, probably one that's
> very easy to make, and should be written up as such.  Clearly, we will need
> to be careful in our description of what exactly "1 RTO idle" means.
> 
> 		Vern

Agreed- there are separable issues:

	- how to know when the clock is lost

	- what to do about it when you have new data to send

the bug is best aimed at the former; the latter is research (as you said)

however, 1 RTO idle may not be the best description of 'losing the ack
clock'.  The clock is lost any time the window front moves forward
without data to send.

For now, we can probably use the conservative 'if you've lost the
clock for 1 RTO, then you _certainly_ have lost it'.

A better estimate would take into account the 'permission to send'.
I would consider that a better fix to the bug, rather than 'research'
per se.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Tue Aug 19 15:41:25 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA06086 for tcp-impl-list; Tue, 19 Aug 1997 15:35:26 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA05952 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 15:35:06 -0700
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA00779
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 15:35:05 -0700
	env-from (craig@aland.bbn.com)
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id PAA05803; Tue, 19 Aug 1997 15:34:22 -0700 (PDT)
Message-Id: <199708192234.PAA05803@aland.bbn.com>
To: Vern Paxson <vern@ee.lbl.gov>
cc: touch@ISI.EDU, tcp-impl@cthulhu.engr.sgi.com, van@ee.lbl.gov
Subject: Re: does this qualify as a bug? 
In-reply-to: Your message of Tue, 19 Aug 97 14:46:53 -0700.
             <199708192146.OAA15220@daffy.ee.lbl.gov> 
Date: Tue, 19 Aug 97 15:34:22 -0700
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


    So my leaning is that for the RFC 2001 revision, we add a description of
    slow-start after 1 RTO idle, with a window the same as the initial connection
    window.

Vern:
    
    I thought at the meeting in Munich we'd agreed that the restart window
should be:

    MIN(IW,CW)

where IW is the initial congestion window and CW is the most recent
congestion window.  (The idea being that if you turn out to be a domain
where the last known congestion window was less than your initial window,
you ought to stay small).

Craig

From owner-tcp-impl@relay.engr.sgi.com  Tue Aug 19 16:02:04 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA15391 for tcp-impl-list; Tue, 19 Aug 1997 15:56:11 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA15364 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 15:56:09 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id PAA11027
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 15:56:08 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA20744>; Tue, 19 Aug 1997 15:56:07 -0700
Date: Tue, 19 Aug 1997 15:56:06 -0700
Posted-Date: Tue, 19 Aug 1997 15:56:06 -0700
Message-Id: <199708192256.PAA03056@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <PAA03056>; Tue, 19 Aug 1997 15:56:06 -0700
To: vern@ee.lbl.gov, craig@aland.bbn.com
Subject: Re: does this qualify as a bug?
Cc: touch@ISI.EDU, tcp-impl@cthulhu.engr.sgi.com, van@ee.lbl.gov
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From craig@aland.bbn.com Tue Aug 19 15:35:42 1997
> To: Vern Paxson <vern@ee.lbl.gov>
> Cc: touch@ISI.EDU, tcp-impl@cthulhu.engr.sgi.com, van@ee.lbl.gov
> Subject: Re: does this qualify as a bug? 
> Date: Tue, 19 Aug 97 15:34:22 -0700
> From: Craig Partridge <craig@aland.bbn.com>
> 
> 
>     So my leaning is that for the RFC 2001 revision, we add a description of
>     slow-start after 1 RTO idle, with a window the same as the initial connection
>     window.
> 
> Vern:
>     
>     I thought at the meeting in Munich we'd agreed that the restart window
> should be:
> 
>     MIN(IW,CW)
> 
> where IW is the initial congestion window and CW is the most recent
> congestion window.  (The idea being that if you turn out to be a domain
> where the last known congestion window was less than your initial window,
> you ought to stay small).

In the current design, IW = 1 which makes that function degenerate.

Can we postpone this new function as 'research to be determined later'??
(when IW > 1 has been accepted?)

JOe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Tue Aug 19 17:08:47 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA08123 for tcp-impl-list; Tue, 19 Aug 1997 17:06:30 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA08107 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 17:06:27 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA16096
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 17:06:26 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id RAA15419; Tue, 19 Aug 1997 17:06:17 -0700 (PDT)
Message-Id: <199708200006.RAA15419@daffy.ee.lbl.gov>
To: Craig Partridge <craig@aland.bbn.com>
Cc: touch@ISI.EDU, tcp-impl@cthulhu.engr.sgi.com, van@ee.lbl.gov
Subject: Re: does this qualify as a bug? 
In-reply-to: Your message of Tue, 19 Aug 1997 15:34:22 PDT.
Date: Tue, 19 Aug 1997 17:06:17 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>     I thought at the meeting in Munich we'd agreed that the restart window
> should be:
> 
>     MIN(IW,CW)

Oops, yes, that's the right way to phrase it.

Joe writes:

> Can we postpone this new function as 'research to be determined later'??
> (when IW > 1 has been accepted?)

Part of the 2001 update is to put in hooks for IW > 1 (we will initially
use IW=2, and edit the doc as needed down the line).  So it's appropriate
to adopt Craig's correction above.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Aug 19 17:08:46 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA07396 for tcp-impl-list; Tue, 19 Aug 1997 17:04:13 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA07388 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 17:04:11 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA15325
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 17:04:10 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id RAA15393; Tue, 19 Aug 1997 17:04:08 -0700 (PDT)
Message-Id: <199708200004.RAA15393@daffy.ee.lbl.gov>
To: touch@ISI.EDU
Cc: tcp-impl@cthulhu.engr.sgi.com, van@ee.lbl.gov
Subject: Re: does this qualify as a bug?
In-reply-to: Your message of Tue, 19 Aug 1997 15:10:19 PDT.
Date: Tue, 19 Aug 1997 17:04:08 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> A better estimate would take into account the 'permission to send'.
> I would consider that a better fix to the bug, rather than 'research'
> per se.

If I followed the thread right, the "permission to send" idea you outlined
includes potential per-ack window fiddling, for acks that advance by more
than a (so far unspecified) threshold.  This sounds like a potentially
complicated fix that will need research to flesh it out.  Is that right,
or are you referring above to a different fix?

I agree that the fundamental problem is recovering the ack clock whenever
it's lost, which can occur due to reasons other than running idle.  But I'm
reluctant to try to address the general problem in the 2001 update, because
I think doing so will be significantly harder than just firming up
slow-start-on-idle.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Aug 19 17:44:15 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA17211 for tcp-impl-list; Tue, 19 Aug 1997 17:39:39 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA17157 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 17:39:27 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id RAA02282
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 17:39:26 -0700
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA25038>; Tue, 19 Aug 1997 17:39:25 -0700
Date: Tue, 19 Aug 1997 17:39:24 -0700
Posted-Date: Tue, 19 Aug 1997 17:39:24 -0700
Message-Id: <199708200039.RAA03302@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-6)
	id <RAA03302>; Tue, 19 Aug 1997 17:39:24 -0700
To: touch@ISI.EDU, vern@ee.lbl.gov
Subject: Re: does this qualify as a bug?
Cc: tcp-impl@cthulhu.engr.sgi.com, van@ee.lbl.gov
X-Sun-Charset: US-ASCII
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > A better estimate would take into account the 'permission to send'.
> > I would consider that a better fix to the bug, rather than 'research'
> > per se.
> 
> If I followed the thread right, the "permission to send" idea you outlined
> includes potential per-ack window fiddling, for acks that advance by more
> than a (so far unspecified) threshold.  This sounds like a potentially
> complicated fix that will need research to flesh it out.  Is that right,
> or are you referring above to a different fix?

The check of 'when did ack clocking fail' is based on some
trigger - either due to per-packet processing
(just before send, during receive/ACK processing, etc), or
due to a timer.

The check that is done at that trigger can be based on
	idle time since last send
	idle time since last receive
	current window state

It seems that all solutions are points in that overall design space.
In that case, there may be room to pick an 'appropriate' point,
rather than a point which is an artifact (idle recv > RTO, currently).

> I agree that the fundamental problem is recovering the ack clock whenever
> it's lost, which can occur due to reasons other than running idle.  But I'm
> reluctant to try to address the general problem in the 2001 update, because
> I think doing so will be significantly harder than just firming up
> slow-start-on-idle.

Recovering vs. detecting.

> > Can we postpone this new function as 'research to be determined later'??
> > (when IW > 1 has been accepted?)
> 
> Part of the 2001 update is to put in hooks for IW > 1 (we will initially
> use IW=2, and edit the doc as needed down the line).  So it's appropriate
> to adopt Craig's correction above.

That being the case, I don't feel that detecting isn't 
any more or less research than IW > 1.


Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Tue Aug 19 21:06:00 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA28465 for tcp-impl-list; Tue, 19 Aug 1997 21:02:38 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id VAA28457 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 21:02:36 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id VAA23500
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 21:02:34 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.6/1.43r)
	id VAA15812; Tue, 19 Aug 1997 21:02:32 -0700 (PDT)
Message-Id: <199708200402.VAA15812@daffy.ee.lbl.gov>
To: touch@ISI.EDU
Cc: tcp-impl@cthulhu.engr.sgi.com, van@ee.lbl.gov
Subject: Re: does this qualify as a bug?
In-reply-to: Your message of Tue, 19 Aug 1997 17:39:24 PDT.
Date: Tue, 19 Aug 1997 21:02:32 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> The check that is done at that trigger can be based on
> 	idle time since last send
> 	idle time since last receive
> 	current window state

I don't agree that the line between implementation and research is the same
as that between detecting loss of the ack clock vs. what to do about it.
I agree that, in principle, detecting loss of the clock by inspecting the
window state is on a par with doing so based on idle time, and perhaps better.
But I think in practice there's a significant difference.  If "current window
state" is based on how much data has been acked by the most recent ack (which
is what I understand you mean by it), then (1) connection performance along
the forward path will now be degraded by conditions along the reverse path
(due to lost acks), which is not presently the case; and (2) people interested
in playing tricks with acks (e.g., for bandwidth-asymmetric paths) will lose
big if those tricks are considered by the endpoint to reflect loss of the ack
clock.  So I think the window-state approach will be significantly more
contentious than the idle-time approach.  Since we're trying to go for
minimal changes and minimal contention, our present course should be to
codify/correct the existing idle-time scheme, and not try at this point to
standardize on a different approach.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Aug 19 22:55:30 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA12753 for tcp-impl-list; Tue, 19 Aug 1997 22:52:24 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id WAA12745 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 22:52:22 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id WAA10070
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 19 Aug 1997 22:52:21 -0700
	env-from (touch@ISI.EDU)
Received: by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA02333>; Tue, 19 Aug 1997 22:52:20 -0700
Date: Tue, 19 Aug 1997 22:52:20 -0700
From: touch@ISI.EDU (Joe Touch)
Message-Id: <199708200552.AA02333@zephyr.isi.edu>
To: vern@ee.lbl.gov
Subject: Re: does this qualify as a bug?
Cc: tcp-impl@cthulhu.engr.sgi.com, van@ee.lbl.gov
X-Auto-Sig-Adder-By: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Vern Paxson <vern@ee.lbl.gov>
> 
> > The check that is done at that trigger can be based on
> > 	idle time since last send
> > 	idle time since last receive
> > 	current window state
...
> I agree that, in principle, detecting loss of the clock by inspecting the
> window state is on a par with doing so based on idle time, and perhaps better.
> But I think in practice there's a significant difference.  If "current window
> state" is based on how much data has been acked by the most recent ack (which
> is what I understand you mean by it), then (1) connection performance along
> the forward path will now be degraded by conditions along the reverse path
> (due to lost acks), which is not presently the case; and (2) people interested

"not presently the case" argues against any bug fix.
Maybe the forward path should be degraded by conditions along the
reverse. Maybe it already is - losing ACKs can cause burstiness 
at the source that can overrun buffers on the forward path and
cause loss, even when a true paced stream would not have.

> in playing tricks with acks (e.g., for bandwidth-asymmetric paths) will lose
> big if those tricks are considered by the endpoint to reflect loss of the ack
> clock.  So I think the window-state approach will be significantly more

"tricks" that lose ack clocks cause source bursts, and maybe they
should be 'punished'. That doesn't seem inconsistent either.

> contentious than the idle-time approach.  Since we're trying to go for
> minimal changes and minimal contention, our present course should be to
> codify/correct the existing idle-time scheme, and not try at this point to
> standardize on a different approach.

My point is that either approach is 'different' than what is currently
done, and that using RTO-since-send may not be any more correct than
the current RTO-since-receive in indicating the true problem - loss
of ACK clock.

The correct fix is to precisely define 'loss of ack clock'.
There is certainly leeway in the term 'loss' - lose it a little
(be off by 50% when you lose one 'ACK') or lose it a lot (be off
by as much as you can ever be, when all data is ACKd and you have
sent nothing, i.e., 1 RTO).

I'll certainly endeavor to present that continuum, and be conservative
in the overhead of computation for the 'bug fix' version - would
that be a reasonable path?

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Wed Aug 20 10:40:48 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA04396 for tcp-impl-list; Wed, 20 Aug 1997 10:37:07 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA04362 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 20 Aug 1997 10:37:03 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA11542
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 20 Aug 1997 10:37:00 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.7/8.8.5)
	id KAA17189; Wed, 20 Aug 1997 10:36:27 -0700 (PDT)
Message-Id: <199708201736.KAA17189@daffy.ee.lbl.gov>
To: touch@ISI.EDU (Joe Touch)
Cc: tcp-impl@cthulhu.engr.sgi.com, van@ee.lbl.gov
Subject: Re: does this qualify as a bug?
In-reply-to: Your message of Tue, 19 Aug 1997 22:52:20 PDT.
Date: Wed, 20 Aug 1997 10:36:27 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> "not presently the case" argues against any bug fix.

No - "not presently the case, and significant, and we're not chartered
to make changes other than minor refinements" argues against the bug fix.

> Maybe the forward path should be degraded by conditions along the
> reverse. Maybe it already is ...

Yes, maybe - research question, how the window-state approach alters TCP
dynamics and throughput from its current behavior.

> "tricks" that lose ack clocks cause source bursts, and maybe they
> should be 'punished'. That doesn't seem inconsistent either.

Maybe they should - my point was that standardizing on the window-state
approach will be more contentious than the idle-time approach.

> My point is that either approach is 'different' than what is currently
> done

I don't see how the idle-time approach is significantly different from
what is (supposed) to be already done.  Certainly not to the same degree
as switching to a window-state approach.

> The correct fix is to precisely define 'loss of ack clock'.

This is the correct fix for TCP-ng; and maybe also for a subsequent TCP
revision.  But not for tcp-impl, in which our job is to stick as close
to current practice/standards as feasible.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Thu Aug 21 15:05:32 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA04759 for tcp-impl-list; Thu, 21 Aug 1997 14:58:12 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA04736 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 21 Aug 1997 14:58:09 -0700
Received: from janus.3com.com (janus.3com.com [129.213.128.99]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA16966
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 21 Aug 1997 14:58:08 -0700
	env-from (dougw@3com.com)
Received: from new-york.3com.com (new-york.3com.com [129.213.157.12])
	by janus.3com.com (8.8.2/8.8.5) with ESMTP id OAA09193
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 21 Aug 1997 14:58:02 -0700 (PDT)
Received: from chicago.nsd.3com.com (chicago.nsd.3com.com [129.213.157.11])
	by new-york.3com.com (8.8.2/8.8.5) with ESMTP id OAA28066
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 21 Aug 1997 14:50:31 -0700 (PDT)
Received: from morrito.nsd.3com.com (morrito.nsd.3com.com [129.213.16.8])
	by chicago.nsd.3com.com (8.8.2/8.8.5) with ESMTP id OAA17941;
	Thu, 21 Aug 1997 14:56:05 -0700 (PDT)
From: Douglas Wolff <dougw@3com.com>
Received: (from dougw@localhost)
	by morrito.nsd.3com.com (8.8.2/8.8.5) id OAA00816;
	Thu, 21 Aug 1997 14:58:00 -0700 (PDT)
Date: Thu, 21 Aug 1997 14:58:00 -0700 (PDT)
Message-Id: <199708212158.OAA00816@morrito.nsd.3com.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: does this qualify as a bug?
Cc: dougw@ewd.3Com.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-MD5: PgLF5hRxqZHOap1LYiXy2w==
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I talked with Van about slow-start after idle issues.  His basic argument is:
> any time you've lost the ack clock, you need to recover it, and the mechanism
> for doing so is a slow-start.  So after 1 RTO (at which point you don't expect
> any more feedback from the receiver), you need to begin a new slow-start, just
> as though you were beginning a new connection.

Does slow-start after idle adjust the RTO at all (say, start timer backoff)? Since 
the connection has been idle, any round trip time estimate is likely to be stale. 
I've seen a large percentage of TCP retries on private networks with a TCP 
application that each send short, infrequent exchanges of data. I have a hunch part 
of the problem is this stale RTT estimate.

Doug Wolff
3Com Corporation

From owner-tcp-impl@relay.engr.sgi.com  Fri Aug 22 22:58:52 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA23724 for tcp-impl-list; Fri, 22 Aug 1997 22:57:10 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id WAA23711 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 22 Aug 1997 22:57:04 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id WAA03946
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 22 Aug 1997 22:57:03 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.7/8.8.5)
	id WAA22972; Fri, 22 Aug 1997 22:57:00 -0700 (PDT)
Message-Id: <199708230557.WAA22972@daffy.ee.lbl.gov>
To: Douglas Wolff <dougw@3com.com>
Cc: tcp-impl@cthulhu.engr.sgi.com, van@ee.lbl.gov
Subject: Re: does this qualify as a bug?
In-reply-to: Your message of Thu, 21 Aug 1997 14:58:00 PDT.
Date: Fri, 22 Aug 1997 22:56:59 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Does slow-start after idle adjust the RTO at all (say, start timer backoff)?

Good question.  My leaning is no - I'd expect that RTO remains a decent
estimate for quite a while, and it would be nice to avoid the possible
performance hit that would come from resetting it to the startup value.
I discussed this with Van and this is his thinking on it too.

> I've seen a large percentage of TCP retries on private networks with a TCP 
> application that each send short, infrequent exchanges of data. I have a
> hunch part of the problem is this stale RTT estimate.

You see them needlessly retransmitting?  If so, some traces would resolve
whether they're doing so because the RTO was too stale, or whether for some
other reason.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Fri Aug 29 09:14:33 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA25508 for tcp-impl-list; Fri, 29 Aug 1997 09:12:18 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA25486 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 29 Aug 1997 09:12:16 -0700
Received: from janus.3com.com (janus.3com.com [129.213.128.99]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA07134
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 29 Aug 1997 09:12:11 -0700
	env-from (dougw@3com.com)
Received: from new-york.3com.com (new-york.3com.com [129.213.157.12])
	by janus.3com.com (8.8.2/8.8.5) with ESMTP id JAA16652
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 29 Aug 1997 09:12:07 -0700 (PDT)
Received: from chicago.nsd.3com.com (chicago.nsd.3com.com [129.213.157.11])
	by new-york.3com.com (8.8.2/8.8.5) with ESMTP id JAA23746
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 29 Aug 1997 09:03:21 -0700 (PDT)
Received: from morrito.nsd.3com.com (morrito.nsd.3com.com [129.213.16.8])
	by chicago.nsd.3com.com (8.8.2/8.8.5) with ESMTP id JAA23385
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 29 Aug 1997 09:10:11 -0700 (PDT)
From: Douglas Wolff <dougw@3com.com>
Received: (from dougw@localhost)
	by morrito.nsd.3com.com (8.8.2/8.8.5) id JAA11070
	for tcp-impl@cthulhu.engr.sgi.com; Fri, 29 Aug 1997 09:12:03 -0700 (PDT)
Date: Fri, 29 Aug 1997 09:12:03 -0700 (PDT)
Message-Id: <199708291612.JAA11070@morrito.nsd.3com.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Source quench
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-MD5: BFmd0LlZ8FJAxQlqd/Z5kA==
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hi -
I've been looking over an old TCP stack wrt RFC2001, etc. Does anyone have any 
comments on how to treat source quench ICMP messages?  Should they trigger 
congestion avoidance, or is current thinking to ignore them and wait for packet 
drop?

Thanx,
Douglas Wolff
-------------
3Com Corporation
dougw@3com.com
voice (408)764-8186
Fax   (408)764-5002


From owner-tcp-impl@relay.engr.sgi.com  Fri Aug 29 13:49:53 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA27560 for tcp-impl-list; Fri, 29 Aug 1997 13:47:16 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA27534 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 29 Aug 1997 13:47:12 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA10770
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 29 Aug 1997 13:47:03 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id VAA08189; Fri, 29 Aug 1997 21:45:25 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0x4Z9e-0005FjC; Fri, 29 Aug 97 23:04 BST
Message-Id: <m0x4Z9e-0005FjC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Source quench
To: dougw@3com.com (Douglas Wolff)
Date: Fri, 29 Aug 1997 23:04:33 +0100 (BST)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199708291612.JAA11070@morrito.nsd.3com.com> from "Douglas Wolff" at Aug 29, 97 09:12:03 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I've been looking over an old TCP stack wrt RFC2001, etc. Does anyone have any 
> comments on how to treat source quench ICMP messages?  Should they trigger 
> congestion avoidance, or is current thinking to ignore them and wait for packet 
> drop?

It seems to work well enough either way. The biggest problem I've had
playing with it has ironically been 3com kit and some older routers that
don't rate limit their source quenches. If you treat each one as a drop
and then you count the drops too (which is what I was doing) you end up
off the far end of the round trip timer very fast.

The other issue is that a lot of people firewall icmp to web servers and
the like, so you have to trust the drop anyway

Alan


From owner-tcp-impl@relay.engr.sgi.com  Fri Aug 29 14:33:01 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA09469 for tcp-impl-list; Fri, 29 Aug 1997 14:25:23 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA09410 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 29 Aug 1997 14:25:20 -0700
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA21317
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 29 Aug 1997 14:25:19 -0700
	env-from (curtis@brookfield.ans.net)
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.8.5/8.7.3) with ESMTP id RAA00872; Fri, 29 Aug 1997 17:24:00 -0400 (EDT)
Message-Id: <199708292124.RAA00872@brookfield.ans.net>
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
cc: dougw@3com.com (Douglas Wolff), tcp-impl@cthulhu.engr.sgi.com
Reply-To: curtis@ans.net
Subject: Re: Source quench 
In-reply-to: Your message of "Fri, 29 Aug 1997 23:04:33 BST."
             <m0x4Z9e-0005FjC@lightning.swansea.linux.org.uk> 
Date: Fri, 29 Aug 1997 17:24:00 -0400
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <m0x4Z9e-0005FjC@lightning.swansea.linux.org.uk>, Alan Cox writes:
> > I've been looking over an old TCP stack wrt RFC2001, etc. Does anyone have 
> any 
> > comments on how to treat source quench ICMP messages?  Should they trigger 
> > congestion avoidance, or is current thinking to ignore them and wait for pa
> cket 
> > drop?
> 
> It seems to work well enough either way. The biggest problem I've had
> playing with it has ironically been 3com kit and some older routers that
> don't rate limit their source quenches. If you treat each one as a drop
> and then you count the drops too (which is what I was doing) you end up
> off the far end of the round trip timer very fast.
> 
> The other issue is that a lot of people firewall icmp to web servers and
> the like, so you have to trust the drop anyway
> 
> Alan
> 


RFC1812 [Page 57]

  4.3.3.3 Source Quench

   A router SHOULD NOT originate ICMP Source Quench messages.  As
   specified in Section [4.3.2], a router that does originate Source
   Quench messages MUST be able to limit the rate at which they are
   generated.

   DISCUSSION
      Research seems to suggest that Source Quench consumes network
      bandwidth but is an ineffective (and unfair) antidote to
      congestion.  See, for example, [INTERNET:9] and [INTERNET:10].
      Section [5.3.6] discusses the current thinking on how routers
      ought to deal with overload and network congestion.

   A router MAY ignore any ICMP Source Quench messages it receives.

There has been discussion about the long time its been since RFC1122
has been updated but when it does get updated it will almost certainly
say that "A host MAY ignore any ICMP Source Quench messages it
receives".

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Fri Aug 29 21:36:12 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA07869 for tcp-impl-list; Fri, 29 Aug 1997 21:34:27 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id VAA07862 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 29 Aug 1997 21:34:24 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id VAA08703
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 29 Aug 1997 21:34:23 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.7/8.8.5)
	id VAA04602; Fri, 29 Aug 1997 21:34:20 -0700 (PDT)
Message-Id: <199708300434.VAA04602@daffy.ee.lbl.gov>
To: curtis@ans.net
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Source quench 
In-reply-to: Your message of Fri, 29 Aug 1997 17:24:00 PDT.
Date: Fri, 29 Aug 1997 21:34:20 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> There has been discussion about the long time its been since RFC1122
> has been updated but when it does get updated it will almost certainly
> say that "A host MAY ignore any ICMP Source Quench messages it
> receives".

I agree - so my leaning is that we don't document any particular source
quench behavior as reflecting an implementation problem.

Since we're chartered to provide input on tweaks for clarifying RFC 1122 et al,
I've added this one to the list (pending further email discussion, of course).

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Thu Sep  4 23:04:08 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA06674 for tcp-impl-list; Thu, 4 Sep 1997 23:01:16 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA06606 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 4 Sep 1997 23:00:55 -0700
Received: from mist.corpwest.baynetworks.com (screen2r.BayNetworks.COM [134.177.3.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id XAA00290
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 4 Sep 1997 23:00:52 -0700
	env-from (knichols@baynetworks.com)
Received: by mist.corpwest.baynetworks.com (8.8.6/8.8.5)
	id WAA09449; Thu, 4 Sep 1997 22:59:03 -0700 (PDT)
Message-Id: <199709050559.WAA09449@mist.corpwest.baynetworks.com>
X-Mailer: exmh version 2.0delta 6/3/97
To: tcp-impl@cthulhu.engr.sgi.com, end2end-interest@isi.edu
cc: pwarren@gte.com
Subject: simulation results for increased tcp intial window
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 04 Sep 1997 22:59:02 PDT
From: Kathleen Nichols <knichols@baynetworks.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


These are the results of simulations exploring the conditions under which
a larger inital window size (IW) for TCP is a "win" and to determine
what effects, if any, the larger IW might have on other traffic flows
using an IW of 1. This set of simulations was inspired by discussions
at the Munich IETF tcp-impl and tcp-sat meetings. It appeared that some
of the questions being raised could be addressed fairly easily in an
ns-2 simulation. It turned out that the simulation model was easy to
construct, but debugging ns-2's tcp-full implementation took a lot more time.

For ns-2 users: some modifications were made to the base tcp class,
mainly fixes to the timers (tcp base class modifications by Van,
tcp-full modifications made by both Van and myself). The tcp-full code
was modified to use an "application" class and three application
client-server pairs were written: a simple file transfer (ftp), a model
of http1.0 style web connection and a very rough model of http1.1
style web connection. I'll see about making these modified files available
through the "contributed code" link from the ns-2 web page. (so don't bother
me in the short term unless you're a Close Personal Friend.)

The simulated network topology:

     10Mb,100us                             10Mb,100us
     (all 4 links)                          (all 4 links)

C   n2_________                               ______ n6   S
l   n3_________\                             /______ n7	  e
i              \\         1.5Mb,50ms        //		  r
e               n1 ------------------------ n0		  v
n   n4__________//                          \ \_____ n8   e
t   n5__________/                            \______ n9   r
s							  s
              URLs -->		<--- FTP & Web data

Each left hand side node (n2-n5) has four web clients attached to
it, each of which is served by a different web server attached to
one of the nodes on the right hand side (n6-n8). The links to and from those
nodes is at 10 Mbps. The bottleneck link is between n1 and n0. Depending
on the simulation scenario, one or two ftp clients can also be
attached to the left hand side nodes and ftp servers can be attached
to the right hand side nodes. All links are bi-directional, but
only acks, syns, fins, and URLs are flowing from left to right.

Assumptions made in the simulations were that all ftps transfered 1 MB
files and that all web pages had exactly three embeded urls. The web clients
are browsing quite aggressively, requesting a new page after a delay
uniformly randomly distributed between 1 and 5 seconds. This is not meant to
realistically model a single user's web-browsing pattern, but to create a
reasonably heavy traffic load whose individual tcp connections
accurately reflect real web traffic.

The maximum tcp window was set to 11 packets, maximum packet size to
1460 bytes, and buffer sizes were set at 22 packets.
(The ns-2 tcp's require setting window sizes and buffer sizes in number of
packets. In tcp-full some of the internal parameters have been set to be
byte-oriented, but external values must still be set in number of packets.)

The first set of simulation runs was done with 16 web clients and
a number of ftp clients ranging from zero to 8. The IW was varied
from 1 to 4, though the 4 packet case lies beyond what is currently
recommended. The figures of merit used were the median page delay
seen by the web clients and the median file transfer delay seen
by the ftp clients. The simulated run time was rather large, 360 seconds,
to sample a large number of these metrics. (The median values remained
stable for twice that time, so it seemed adequate.)

	Median Web Page Delays (secs) | Median File Transfer Delays
#FTPs	IW=1	IW=2	IW=3	IW=4 | IW=1	IW=2	IW=3	IW=4
------------------------------------ | ----------------------------
 0	0.71	0.58	0.55	0.52 | 
 1	0.81	0.68	0.64	0.62 |  9.1	 9.3	 9.3	 9.4
 4	2.17	1.76	1.56	1.46 | 26.3	27.0	27.1	28.1
 6	2.57	2.11	1.87	1.70 | 39.5	38.3	40.1	40.7
 8	2.80	2.37	2.07	2.02 | 52.2	51.7	52.2	52.1

 percentage improvement in page delays vs number of ftps
#FTPs	IW=1	IW=2	IW=3	IW=4 
------------------------------------ 
 0	-	18	23	27 
 1	-	16	21	23 
 4	-	19	28	33 
 6	-	18	27	34 
 8	-	15	26	28 

Even though the ftps use the same IW as the webs, the effect is
not significant since there are only about 50 file transfers
completed over the run time of the simulation. When a packet is
dropped, the restart window size used is one packet. Thus it
didn't seem necessary to compare web clients with larger IWs to
ftps with shorter IWs. On the other hand, it is interesting to
mix some webs with shorter windows with those using longer windows.
This experiment doubled the number of web clients to 32. All 32 were
simulated using the same initial window size, first IW=1, then IW=3.
Then the clients were split into two groups of 16 each, one of which
use IW=1 and the other used IW=3.

Median Page Delays (secs)
#Webs	IW=1	IW=3
--------------------
 32	0.75	0.61
 16/16	0.80	0.60	

The first line shows the same result as the earlier data: clients
with IW=3 significantly outperform clients with IW=1.  The second
line shows that running a mixture of IW=3 & IW=1 has a tiny
negative effect on the IW=1 conversations and essentially no
effect on the IW=3 conversations.

Since these simulations were all with http1.0 style web traffic, a
natural question is to ask how results are affected by migration to
http1.1. A rough model of this behavior was simulated by using one
connection to send all of the information from both the primary URL
and the three in-lines. These results:

Med Web Page Delay   | Med FTP Delays	| % web improvement
#FTPs	IW=1	IW=3 | IW=1	IW=3	|  from IW=1 to IW=3
------------------------------------	|--------------
 0	0.57	0.45 | 			|	21
 1	0.64	0.52 |  9.2	 9.5	|	19
 4	1.80	1.31 | 27.0	27.0	|	27
 8	2.26	1.74 | 53.1	54.6	|	23

Although these web clients clearly have better delay properties, they
seem to get about the same percentage delay improvement from going
to the larger IW.

The indications from these results are that increasing the initial
window size to 3 packets (or 4380 bytes) doesn't "hurt" and helps
to improve the perceived performance. These simulations have suggested
some further analyses of the traffic dynamics of the simulated network.
It is also possible to do some further variations on the scenarios
simulated here.

Using ns for the simulations made it possible to explore some
other effects. ns-2 has a built-in RED function for buffer managment,
making it a simple matter to rerun the simulations with the RED buffer
managment on. With no FTPs there are no (or almost no) dropped packets,
so that case will not differ from those with the drop tail queues.

	Median Web Page Delays (secs) | Median File Transfer Delays
#FTPs	IW=1	IW=2	IW=3	IW=4 | IW=1	IW=2	IW=3	IW=4
------------------------------------ | ----------------------------
 1	0.82	0.69	0.64	0.62 |  9.1	 9.3	 9.4	 9.4
 4	1.31	1.11	1.03	0.98 | 27.8	29.2	29.5	29.3
 6	1.68	1.54	1.48	1.47 | 42.3	43.1	42.8	43.6
 8	2.02	1.91	1.69	1.61 | 55.1	58.7	59.7	51.3

 percentage improvement in page delay
#FTPs	IW=1	IW=2	IW=3	IW=4 
------------------------------------ 
 1	-	16	22	24 
 4	-	15	21	25 
 6	-	 8	12	13 
 8	-	 5	16	20 

There are two interesting aspects to these results. First, for the cases
where there are enough concurrent FTPs to fill the buffers, there is a
larger improvement gained going from drop tail to RED than with the
increased IW, another validation of the usefulness of RED.
The other is that the improvements from larger IWs are smaller with
the RED scenario. Although deploying RED would have a more powerful
impact on the delays seen by small transfers like typical web pages,
increasing the initial window size is still useful.

Packet drop rates did increase with IW, but the change was not significant.
For the drop-tail simulations, the drop rates on the congested link for all
flows range from 0.6-1.0% for 4 FTPs, 1.6-1.9% for 6 FTPs, and 2.4-2.8%
for 8 FTPs.  For the RED scenarios the ranges were 1.8-2.0% for 4 FTPs,
2.9-3.2% for 6 FTPs, and 4.0-4.2% for 8 FTPs. Since the increased drop
rates were accompanied by better performance, it's clear that, for
these low rates, drop rate is clearly not an indicator of user level
performance.

	Kathie
	knichols@baynetworks.com
	(this work benefited from discussions and comments from Van Jacobson)



From owner-tcp-impl@relay.engr.sgi.com  Fri Sep  5 07:51:20 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA11182 for tcp-impl-list; Fri, 5 Sep 1997 07:48:50 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA11176 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 5 Sep 1997 07:48:47 -0700
Received: from olympus.eecs.umich.edu (olympus.eecs.umich.edu [141.213.8.56]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA01965
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 5 Sep 1997 07:48:46 -0700
	env-from (wuchang@eecs.umich.edu)
Received: from olympus.eecs.umich.edu (localhost [127.0.0.1]) by olympus.eecs.umich.edu (8.8.7/8.8.2) with ESMTP id KAA09600; Fri, 5 Sep 1997 10:46:00 -0400 (EDT)
Message-Id: <199709051446.KAA09600@olympus.eecs.umich.edu>
To: Kathleen Nichols <knichols@baynetworks.com>
Subject: Re: simulation results for increased tcp intial window 
In-reply-to: Your message of "Thu, 04 Sep 1997 22:59:02 PDT."
             <199709050559.WAA09449@mist.corpwest.baynetworks.com> 
cc: tcp-impl@cthulhu.engr.sgi.com, end2end-interest@isi.edu
Date: Fri, 05 Sep 1997 10:45:59 -0400
From: Wu-chang Feng <wuchang@eecs.umich.edu>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


  >> Packet drop rates did increase with IW, but the change was 
  >> not significant.

For small scale experiments like this, loss rates won't be
significant.  What about loss rates when there are a large number of
TCP flows such as the experiment(s) in...

  Morris, R., "TCP Behavior with Many Flows", ICNP '97

Having a large IW may exacerbate the large loss rates observed in his
experiments.

Wu

From owner-tcp-impl@relay.engr.sgi.com  Fri Sep  5 10:09:58 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA18644 for tcp-impl-list; Fri, 5 Sep 1997 10:04:13 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA18575 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 5 Sep 1997 10:04:03 -0700
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA13844
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 5 Sep 1997 10:04:01 -0700
	env-from (curtis@brookfield.ans.net)
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.8.5/8.7.3) with ESMTP id NAA10878; Fri, 5 Sep 1997 13:03:18 -0400 (EDT)
Message-Id: <199709051703.NAA10878@brookfield.ans.net>
To: Kathleen Nichols <knichols@baynetworks.com>
cc: tcp-impl@cthulhu.engr.sgi.com, end2end-interest@ISI.EDU, pwarren@gte.com
Reply-To: curtis@ans.net
Subject: Re: simulation results for increased tcp intial window 
In-reply-to: Your message of "Thu, 04 Sep 1997 22:59:02 PDT."
             <199709050559.WAA09449@mist.corpwest.baynetworks.com> 
Date: Fri, 05 Sep 1997 13:03:18 -0400
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Kathie,

The objection to IW>1 is that it may have adverse affects on already
congested networks that are dominated by HTTP transfers.  You have not
done simulations on a congested link dominated by HTTP so your results
are not applicable to the assertion that IW>1 may be harmful.

If you limit the rate of requests to one request every 1-5 seconds
after completion and transfers complete in 0.5 to 0.7 seconds, with 16
clients, the link utilizations are very low.  Since each client is
idle for 3 soconds on avergae, they have a 1/6 duty cycle and 16/6
HTTP transfers can be expected to be active.  If you have 1-8 long
running ftp transfer, then the traffic on the link is dominated by the
ftp.  It is not at all surprising that a very small amount of HTTP
traffic had a negligible effect on the TCPs dominating the traffic and
it is also not surprising that the slightly more aggressive HTTPs did
a little better.

One thing you did not mention is the size of the HTTP transfers.  I
don't think you mentioned the queue capacity either.

> Packet drop rates did increase with IW, but the change was not
> significant.  For the drop-tail simulations, the drop rates on the
> congested link for all flows range from 0.6-1.0% for 4 FTPs, 1.6-1.9%
> for 6 FTPs, and 2.4-2.8% for 8 FTPs.  For the RED scenarios the ranges
> were 1.8-2.0% for 4 FTPs, 2.9-3.2% for 6 FTPs, and 4.0-4.2% for 8
> FTPs. Since the increased drop rates were accompanied by better
> performance, it's clear that, for these low rates, drop rate is
> clearly not an indicator of user level performance.

I suspect the drop rate for 0 FTPs was exactly zero and 1 FTP was
close to zero.  These are uncongested.  You also didn't mention the
bottleneck link utilization.  If the link utilization drops with the
increase in loss then this will have an adverse on already congested
links (anything that loweres bottleneck utilization is a problem for
already congested links.

Most of the US Internet seems to still be running under 5% loss or
even under 1% loss.  On portions of the Internet, drop rates are
already 5-15%.  I think the US to Europe problems of 25-50% loss are
now a thing of the past.  Portions of the world are living with
underprovisioned networks and higher loss rates  outside the US and
western Europe.

It would be interesting to try this with increasing numbers of HTTP
clients such that the loss rate with no FTP was in the 1% range, in
the 5% range, and in the 15% range.  Then increase IW and see what the
effect is.

While I dodn't advocate running links at 1% loss or more, we must
consider reality.

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Fri Sep  5 10:16:40 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA22345 for tcp-impl-list; Fri, 5 Sep 1997 10:14:12 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA22311 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 5 Sep 1997 10:14:08 -0700
Received: from mist.corpwest.baynetworks.com (screen2r.BayNetworks.COM [134.177.3.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA17614
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 5 Sep 1997 10:14:05 -0700
	env-from (knichols@baynetworks.com)
Received: by mist.corpwest.baynetworks.com (8.8.6/8.8.5)
	id KAA10181; Fri, 5 Sep 1997 10:01:26 -0700 (PDT)
Message-Id: <199709051701.KAA10181@mist.corpwest.baynetworks.com>
X-Mailer: exmh version 2.0delta 6/3/97
To: Wu-chang Feng <wuchang@eecs.umich.edu>
cc: Kathleen Nichols <knichols@baynetworks.com>, tcp-impl@cthulhu.engr.sgi.com,
        end2end-interest@isi.edu
Subject: Re: simulation results for increased tcp intial window 
In-reply-to: Your message of "Fri, 05 Sep 1997 10:45:59 EDT."
             <199709051446.KAA09600@olympus.eecs.umich.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Fri, 05 Sep 1997 10:01:26 PDT
From: Kathleen Nichols <knichols@baynetworks.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 
>   >> Packet drop rates did increase with IW, but the change was 
>   >> not significant.
> 
> For small scale experiments like this, loss rates won't be
> significant.  What about loss rates when there are a large number of
> TCP flows such as the experiment(s) in...
> 
>   Morris, R., "TCP Behavior with Many Flows", ICNP '97
> 
> Having a large IW may exacerbate the large loss rates observed in his
> experiments.
> 
> Wu

The 16 web clients can easily cause 48 simultaneously active
tcp connections along with the FTPs. These values were experimentatlly
chosen to cause drop rates in the 1-5% rate on the "T1 link". Many more
configurations could be tested and I would certainly invite interested
parties to do so and share the results with the list. I would be very
much interested in other studies and I assume most other readers of
these lists would be.

	Kathie


From owner-tcp-impl@relay.engr.sgi.com  Fri Sep  5 11:06:24 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA13353 for tcp-impl-list; Fri, 5 Sep 1997 11:02:02 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA13336 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 5 Sep 1997 11:01:59 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA04606
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 5 Sep 1997 11:01:57 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id SAA05335; Fri, 5 Sep 1997 18:58:14 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0x72Zx-0005FjC; Fri, 5 Sep 97 18:53 BST
Message-Id: <m0x72Zx-0005FjC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: simulation results for increased tcp intial window
To: curtis@ans.net
Date: Fri, 5 Sep 1997 18:53:57 +0100 (BST)
Cc: knichols@baynetworks.com, tcp-impl@cthulhu.engr.sgi.com,
        end2end-interest@ISI.EDU, pwarren@gte.com
In-Reply-To: <199709051703.NAA10878@brookfield.ans.net> from "Curtis Villamizar" at Sep 5, 97 01:03:18 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> already 5-15%.  I think the US to Europe problems of 25-50% loss are
> now a thing of the past.  Portions of the world are living with
> underprovisioned networks and higher loss rates  outside the US and
> western Europe.

I've been collecting but not keeping 15 minute states from our site to
most US backbones across mae-east. Its typically under 4% during US night time
rising to 4-15% during US day time with the loss generally at the mae and
beyond. 

The curious can poll www.cymru.net/cgi-bin/ping-status for a UK view of the US

Alan


From owner-tcp-impl@relay.engr.sgi.com  Fri Sep  5 11:06:26 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA12528 for tcp-impl-list; Fri, 5 Sep 1997 11:00:07 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA12463 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 5 Sep 1997 10:59:59 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA03711
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 5 Sep 1997 10:59:58 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id SAA05324; Fri, 5 Sep 1997 18:56:15 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0x72Y2-0005FjC; Fri, 5 Sep 97 18:51 BST
Message-Id: <m0x72Y2-0005FjC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: simulation results for increased tcp intial window
To: knichols@baynetworks.com (Kathleen Nichols)
Date: Fri, 5 Sep 1997 18:51:58 +0100 (BST)
Cc: wuchang@eecs.umich.edu, knichols@baynetworks.com,
        tcp-impl@cthulhu.engr.sgi.com, end2end-interest@ISI.EDU
In-Reply-To: <199709051701.KAA10181@mist.corpwest.baynetworks.com> from "Kathleen Nichols" at Sep 5, 97 10:01:26 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> chosen to cause drop rates in the 1-5% rate on the "T1 link". Many more
> configurations could be tested and I would certainly invite interested
> parties to do so and share the results with the list. I would be very
> much interested in other studies and I assume most other readers of
> these lists would be.

Im not in a situation with time to do such testing but for europe you want
to be modelling 20-30 parallel over a 64K line for realistic views of some
sites under load. Also 4-8 over a 28.8 modem (typical client loading images
aggresively)


From owner-tcp-impl@relay.engr.sgi.com  Sun Sep  7 19:20:08 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA21721 for tcp-impl-list; Sun, 7 Sep 1997 19:17:48 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA21714 for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 7 Sep 1997 19:17:45 -0700
Received: from mmlab.snu.ac.kr (mmlab.snu.ac.kr [147.46.114.112]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id TAA02024
	for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 7 Sep 1997 19:17:32 -0700
	env-from (chchoi@mmlab.snu.ac.kr)
Received: (from chchoi@localhost) by mmlab.snu.ac.kr (8.6.12h2/8.6.12) id LAA03892 for tcp-impl@cthulhu.engr.sgi.com; Mon, 8 Sep 1997 11:03:51 +1000
From: Changho Choi <chchoi@mmlab.snu.ac.kr>
Message-Id: <199709080103.LAA03892@mmlab.snu.ac.kr>
Subject: tcp simulation module or suit
To: tcp-impl@cthulhu.engr.sgi.com
Date: Mon, 8 Sep 1997 11:03:49 +0900 (GMT+9:00)
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-2022-kr
Content-Transfer-Encoding: 7bit
Content-Length: 707       
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

hello everyone~!

I want to get a information of TCP simulation.
Is there anybody who knows about that?
Where can I get the information of TCP simulation program, module, suit, and so on.

thanks ahead for your answer.

-- 
   ___o            ___o            ___o           ___o           ___o
  __ \\ __        __ \\ __        __ \\ __       __ \\ __       __ \\ __
 (*)/  (*)       (*)/  (*)       (*)/  (*)      (*)/  (*)      (*)/  (*)
+-----------------------------------------------------------------------+
     changho choi                
          e-mail: chchoi@mmlab.snu.ac.kr  
 	  web : http://mmlab.snu.ac.kr/~chchoi 
+-----------------------------------------------------------------------+

From owner-tcp-impl@relay.engr.sgi.com  Mon Sep  8 08:45:01 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA21792 for tcp-impl-list; Mon, 8 Sep 1997 08:42:33 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA21774 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 08:42:30 -0700
Received: from ell.ee.lbl.gov (ell.ee.lbl.gov [131.243.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA10606
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 08:42:28 -0700
	env-from (kfall@ee.lbl.gov)
Received: by ell.ee.lbl.gov (8.8.7/8.8.5)
	id IAA18526; Mon, 8 Sep 1997 08:42:07 -0700 (PDT)
From: kfall@ee.lbl.gov (Kevin Fall)
Message-Id: <199709081542.IAA18526@ell.ee.lbl.gov>
To: Changho Choi <chchoi@mmlab.snu.ac.kr>
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: tcp simulation module or suit
In-reply-to: Your communique of Mon, 08 Sep 97 11:03:49 U.
             <199709080103.LAA03892@mmlab.snu.ac.kr>
Date: Mon, 08 Sep 97 08:42:06 PDT
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>
> From:  Changho Choi <chchoi@mmlab.snu.ac.kr>
> To:    tcp-impl@cthulhu.engr.sgi.com
> Subject: tcp simulation module or suit
> Date:  Mon, 08 Sep 97 11:03:49 U
>
>      hello everyone~!
> 
> I want to get a information of TCP simulation.
> Is there anybody who knows about that?
> Where can I get the information of TCP simulation program, module, suit, and 
so on.
> 
> thanks ahead for your answer.
> 
> -- 

you may want to have a look at:
	http://www-mash.cs.berkeley.edu/ns/ns.html

- K

From owner-tcp-impl@relay.engr.sgi.com  Mon Sep  8 09:14:50 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA29087 for tcp-impl-list; Mon, 8 Sep 1997 09:11:24 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA29077 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 09:11:22 -0700
Received: from callisto.ftel.co.uk (callisto.ftel.co.uk [192.131.79.11]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA19718
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 09:11:19 -0700
	env-from (G.Cope@ftel.co.uk)
Received: from callisto.ftel.co.uk (WQLRQ7sl/ygTuphshHqPeFy4O3srjh76@localhost.ftel.co.uk [127.0.0.1])
	by callisto.ftel.co.uk (8.8.7/8.8.7/main-solaris.mc@Thu21Aug1997-14:03PM) with SMTP id RAA20000;
	Mon, 8 Sep 1997 17:11:17 +0100 (BST)
Message-ID: <341423A5.42A6@ftel.co.uk>
Date: Mon, 08 Sep 1997 17:11:17 +0100
From: Graham Cope <G.Cope@ftel.co.uk>
Organization: Fujitsu Telecommunications Europe Ltd
X-Mailer: Mozilla 3.01Gold (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: tcp-impl@cthulhu.engr.sgi.com
Subject: RSVP Implementation
References: <199709051446.KAA09600@olympus.eecs.umich.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Can anyone please advise on what is happening with RSVP implementations.

Are they happening? Are there prototypes? Are we waiting for
standardisation or what?



Graham Cope

From owner-tcp-impl@relay.engr.sgi.com  Mon Sep  8 10:09:29 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA16064 for tcp-impl-list; Mon, 8 Sep 1997 10:03:15 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA16026 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 10:03:08 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id KAA06622
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 10:03:06 -0700
	env-from (braden@ISI.EDU)
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-26)
	id <AA29689>; Mon, 8 Sep 1997 10:02:54 -0700
Date: Mon, 8 Sep 97 10:04:28 PDT
From: braden@ISI.EDU
Posted-Date: Mon, 8 Sep 97 10:04:28 PDT
Message-Id: <9709081704.AA18390@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA18390>; Mon, 8 Sep 97 10:04:28 PDT
To: tcp-impl@cthulhu.engr.sgi.com, G.Cope@ftel.co.uk
Subject: Re: RSVP Implementation
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


  *> Can anyone please advise on what is happening with RSVP implementations.
  *> 
  *> Are they happening? Are there prototypes? Are we waiting for
  *> standardisation or what?
  *> 
  *> 
  *> 
  *> Graham Cope
  *> 

RSVP implementations are discussed on the rsvp-test@isi.edu mailing list.

Bob Braden

From owner-tcp-impl@relay.engr.sgi.com  Mon Sep  8 12:52:55 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA14344 for tcp-impl-list; Mon, 8 Sep 1997 12:47:42 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA14286 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 12:47:35 -0700
Received: from motgate.mot.com (motgate.mot.com [129.188.136.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA17547
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 12:47:29 -0700
	env-from (romano@magoo.rsch.comm.mot.com)
Received: from pobox.mot.com (pobox.mot.com [129.188.137.100]) by motgate.mot.com (8.8.5/8.6.10/MOT-3.8) with ESMTP id OAA27139 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 14:47:27 -0500 (CDT)
Comments: ( Received on motgate.mot.com from client pobox.mot.com, sender romano@magoo.rsch.comm.mot.com )
Received: from il02dns1.comm.mot.com (il02dns1.comm.mot.com [145.1.3.2]) by pobox.mot.com (8.8.5/8.6.10/MOT-3.8) with ESMTP id OAA04535 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 14:47:25 -0500 (CDT)
Received: from magoo.rsch.comm.mot.com (magoo.comm.mot.com [145.1.80.34]) by il02dns1.comm.mot.com (8.7.5/8.7.3) with SMTP id OAA27936 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 14:47:19 -0500 (CDT)
Received: from localhost by magoo.rsch.comm.mot.com (4.1/SMI-4.1)
	id AA14630; Mon, 8 Sep 97 14:47:14 CDT
Message-Id: <9709081947.AA14630@magoo.rsch.comm.mot.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: simulation results for increased tcp intial window 
In-Reply-To: Your message of "Fri, 05 Sep 1997 18:51:58 BST."
             <m0x72Y2-0005FjC@lightning.swansea.linux.org.uk> 
Date: Mon, 08 Sep 1997 14:47:12 -0500
From: Guy Romano <romano@magoo.rsch.comm.mot.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

I apologize if this message was sent out twice.

A wireless link should also be considered because of its low bandwidth,
high latency, and high delay variance characteristics.  CDPD with its raw
data rate of 19.2 Kbps and effective user throughput on the order of 9.6 Kbps
could be considered as an example.

I assume that any IW>2 will begin to have a negative impact on TCP performance
on a wireless link.


Guy Romano

> > chosen to cause drop rates in the 1-5% rate on the "T1 link". Many more
> > configurations could be tested and I would certainly invite interested
> > parties to do so and share the results with the list. I would be very
> > much interested in other studies and I assume most other readers of
> > these lists would be.
> 
> Im not in a situation with time to do such testing but for europe you want
> to be modelling 20-30 parallel otcp-impl@engr.sgi.com ver a 64K line for realistic views of some
> sites under load. Also 4-8 over a 28.8 modem (typical client loading images
> aggresively)
> 


From owner-tcp-impl@relay.engr.sgi.com  Mon Sep  8 13:25:35 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA26131 for tcp-impl-list; Mon, 8 Sep 1997 13:17:46 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA25968 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 13:17:02 -0700
Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.235]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA28603
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 13:17:01 -0700
	env-from (raj@cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185])
	by palrel1.hp.com (8.8.6/8.8.5tis) with SMTP id NAA00626
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 13:17:00 -0700 (PDT)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA16991; Mon, 8 Sep 1997 13:14:28 -0700
Message-Id: <34145CA3.60D9@cup.hp.com>
Date: Mon, 08 Sep 1997 13:14:27 -0700
From: Rick Jones <raj@cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: tcp-impl@cthulhu.engr.sgi.com
Subject: what IW is being used today?
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

In regards to all the simulations and suggestions and such, I have a
question.

How many systems on the Internet today strictly adhere to the IW=1? 

I ask because I wonder if we are already gaining operational experience
with the Internet proper with an IW that is effectively 2.

>From everything I have heard, it sounds like the strict adherents are
mostly Solaris 2.something systems. From the dust-up over the IW between
Win95/NT and Solaris, I gather that Win95/NT have an effective IW of 2. 

I know that HP-UX has an effective IW of 2, and I would guess (with no
hard data) that most other commercial Unix offerings have that as well
or we would have also heard of performance interoperability issues
between Win95/NT and those Unixes. I'll not hazard a guess for FreeBSD
and Linux.

I guess a followup question would be what proportion of the "Internet
traffic mass" is comprised of "ants" (Win95 et al) versus "mamals" (the
Unixes).

rick jones

From owner-tcp-impl@relay.engr.sgi.com  Mon Sep  8 16:17:41 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA28994 for tcp-impl-list; Mon, 8 Sep 1997 16:11:46 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA28963 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 16:11:44 -0700
Received: from motgate.mot.com (motgate.mot.com [129.188.136.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA06306
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 16:11:42 -0700
	env-from (romano@magoo.rsch.comm.mot.com)
Received: from mothost.mot.com (mothost.mot.com [129.188.137.101]) by motgate.mot.com (8.8.5/8.6.10/MOT-3.8) with ESMTP id SAA11946 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 18:11:41 -0500 (CDT)
Comments: ( Received on motgate.mot.com from client mothost.mot.com, sender romano@magoo.rsch.comm.mot.com )
Received: from il02dns1.comm.mot.com (il02dns1.comm.mot.com [145.1.3.2]) by mothost.mot.com (8.8.5/8.6.10/MOT-3.8) with ESMTP id SAA22985 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 18:11:34 -0500 (CDT)
Received: from magoo.rsch.comm.mot.com (magoo.comm.mot.com [145.1.80.34]) by il02dns1.comm.mot.com (8.7.5/8.7.3) with SMTP id SAA05680; Mon, 8 Sep 1997 18:11:36 -0500 (CDT)
Received: from localhost by magoo.rsch.comm.mot.com (4.1/SMI-4.1)
	id AA19037; Mon, 8 Sep 97 18:11:29 CDT
Message-Id: <9709082311.AA19037@magoo.rsch.comm.mot.com>
To: Rick Jones <raj@cup.hp.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today? 
In-Reply-To: Your message of "Mon, 08 Sep 1997 13:14:27 PDT."
             <34145CA3.60D9@cup.hp.com> 
Date: Mon, 08 Sep 1997 18:11:27 -0500
From: Guy Romano <romano@magoo.rsch.comm.mot.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I know that HP-UX has an effective IW of 2, and I would guess (with no
> hard data) that most other commercial Unix offerings have that as well
> or we would have also heard of performance interoperability issues
> between Win95/NT and those Unixes. I'll not hazard a guess for FreeBSD
> and Linux.

I don't believe that a stack with an IW=2 will have interoperability
issues communicating with a stack with IW=1.  The initial congestion
window size is the concern of the sending TCP.  The acking strategy at
the receiving TCP will not change based on the sending TCP's window
size.

Just a data point, OSF1 v3.2c by DEC has IW=1.


Guy

From owner-tcp-impl@relay.engr.sgi.com  Mon Sep  8 17:05:25 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA15051 for tcp-impl-list; Mon, 8 Sep 1997 16:57:52 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA15030 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 16:57:46 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA21920
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 16:57:45 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.7/8.8.5)
	id QAA24648; Mon, 8 Sep 1997 16:47:48 -0700 (PDT)
Message-Id: <199709082347.QAA24648@daffy.ee.lbl.gov>
To: Rick Jones <raj@cup.hp.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today?
In-reply-to: Your message of Mon, 08 Sep 1997 13:14:27 PDT.
Date: Mon, 08 Sep 1997 16:47:48 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> How many systems on the Internet today strictly adhere to the IW=1? 

Just to clarify - I take it by "strictly" you mean that they use IW=1 and
don't have any bugs that inadvertantly increase IW in some situations,
right?  Because there's the connection-responder-opens-cwnd-on-SYN-ack bug
that most BSD TCPs suffer from, and there's the cwnd-set-to-offered-MSS-oops-
other-side-wants-a-smaller-MSS bug, that also effectively sets IW > 1.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Mon Sep  8 17:36:57 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA28836 for tcp-impl-list; Mon, 8 Sep 1997 17:31:27 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA28805 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 17:31:21 -0700
Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.235]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA02472
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 17:31:18 -0700
	env-from (raj@hpisrdq.cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185])
	by palrel1.hp.com (8.8.6/8.8.5tis) with SMTP id RAA09689
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 17:31:18 -0700 (PDT)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA21101; Mon, 8 Sep 1997 17:28:44 -0700
Message-Id: <3414983C.6E4A@cup.hp.com>
Date: Mon, 08 Sep 1997 17:28:44 -0700
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: Vern Paxson <vern@ee.lbl.gov>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today?
References: <199709082347.QAA24648@daffy.ee.lbl.gov>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Vern Paxson wrote:
> 
> > How many systems on the Internet today strictly adhere to the IW=1?
> 
> Just to clarify - I take it by "strictly" you mean that they use IW=1 and
> don't have any bugs that inadvertantly increase IW in some situations,
> right?  

Yes, that is what I mean. I somewhat loosely used the term "effective
IW" in that regard.

> Because there's the connection-responder-opens-cwnd-on-SYN-ack bug
> that most BSD TCPs suffer from

Now that is an interesting statement - does not the ACK of the SYN or
SYN|ACK imply that a packet has left the network, and isn't the whole
idea conservation of packets?

>, and there's the cwnd-set-to-offered-MSS-oops-
> other-side-wants-a-smaller-MSS bug, that also effectively sets IW > 1.

You might add to the issues list - application sets TCP_NODELAY and
sends a bunch of tinygrams... :)

rick jones

From owner-tcp-impl@relay.engr.sgi.com  Mon Sep  8 17:36:57 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA27669 for tcp-impl-list; Mon, 8 Sep 1997 17:27:58 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA27656 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 17:27:56 -0700
Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.235]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA01583
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 17:27:55 -0700
	env-from (raj@hpisrdq.cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185])
	by palrel1.hp.com (8.8.6/8.8.5tis) with SMTP id RAA09121
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 17:27:53 -0700 (PDT)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA20497; Mon, 8 Sep 1997 17:25:19 -0700
Message-Id: <3414976E.675B@cup.hp.com>
Date: Mon, 08 Sep 1997 17:25:18 -0700
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: Guy Romano <romano@magoo.rsch.comm.mot.com>
Cc: Rick Jones <raj@hpisrdq.cup.hp.com>, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today?
References: <9709082311.AA19037@magoo.rsch.comm.mot.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Guy Romano wrote:
> 
> > I know that HP-UX has an effective IW of 2, and I would guess (with no
> > hard data) that most other commercial Unix offerings have that as well
> > or we would have also heard of performance interoperability issues
> > between Win95/NT and those Unixes. I'll not hazard a guess for FreeBSD
> > and Linux.
> 
> I don't believe that a stack with an IW=2 will have interoperability
> issues communicating with a stack with IW=1.  The initial congestion

That depends on the direction. Clearly, the IW=1 stack in SOlaris did
not work well with the IW=2 stack of Win95/NT - that is what I was using
as a guesstimate that most other popular stacks on the net were
effectively IW=2.

> Just a data point, OSF1 v3.2c by DEC has IW=1.

Cool - any guesstimate what the "Internet weight" of OSF1 v3.2c happens
to be?

>From their SPECweb96 disclosure (see www.specbench.org and the second
quarter '97 results) Digital's 4.0D-5 (Rev 697) OS can have the IW
tuned:

OS Notes

    Webserver sysconfigtab tuning:
    proc: max-proc-per-user=1024,
maxusers=2048,max-threads-per-user=2048
    socket: sominconn=32765, somaxconn=327675 inet: tcbhashsize=16384,
tcp_cwnd_segments=2

rick jones

From owner-tcp-impl@relay.engr.sgi.com  Mon Sep  8 19:26:42 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA25453 for tcp-impl-list; Mon, 8 Sep 1997 19:22:36 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA25438 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 19:22:29 -0700
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id TAA05789
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 19:22:26 -0700
	env-from (curtis@brookfield.ans.net)
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.8.5/8.7.3) with ESMTP id WAA22573; Mon, 8 Sep 1997 22:18:03 -0400 (EDT)
Message-Id: <199709090218.WAA22573@brookfield.ans.net>
To: Rick Jones <raj@cup.hp.com>
cc: tcp-impl@cthulhu.engr.sgi.com
Reply-To: curtis@ans.net
Subject: Re: what IW is being used today? 
In-reply-to: Your message of "Mon, 08 Sep 1997 13:14:27 PDT."
             <34145CA3.60D9@cup.hp.com> 
Date: Mon, 08 Sep 1997 22:18:03 -0400
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <34145CA3.60D9@cup.hp.com>, Rick Jones writes:
> In regards to all the simulations and suggestions and such, I have a
> question.
> 
> How many systems on the Internet today strictly adhere to the IW=1? 
> 
> I ask because I wonder if we are already gaining operational experience
> with the Internet proper with an IW that is effectively 2.
> 
> >From everything I have heard, it sounds like the strict adherents are
> mostly Solaris 2.something systems. From the dust-up over the IW between
> Win95/NT and Solaris, I gather that Win95/NT have an effective IW of 2. 
> 
> I know that HP-UX has an effective IW of 2, and I would guess (with no
> hard data) that most other commercial Unix offerings have that as well
> or we would have also heard of performance interoperability issues
> between Win95/NT and those Unixes. I'll not hazard a guess for FreeBSD
> and Linux.

There are no interoperability issues if one end does IW=1 and the
other IW=2.

> I guess a followup question would be what proportion of the "Internet
> traffic mass" is comprised of "ants" (Win95 et al) versus "mamals" (the
> Unixes).
> 
> rick jones


A more important question is what percentage of high volume web
servers implement IW=1.  BSD variants account for a high percentage
mostly *becasue they perform well*, but also because the PC BSD
varients (BSDI, FreeBSD, etc) are very cost effective (IW=1).  Solaris
and other SysV based Unix (like HP/UX) are also widely used despite
not performing as well (IW=2).  NT is less popular as a web server
since performance is worse yet but still significant due to religious
convictions of some (IW=2?).  For web servers on a WAN performance is
largely a matter of the quality of the TCP implementation.  It seems
like a fairly even split.

It doesn't really matter whether the clients (Win95) implement IW>1.

Some TCPs out there don't bother with slow start at all.  The fact
that that hasn't yet completely killed the Internet is no reason to
declare this a good practice.  Same with IW>1.

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Mon Sep  8 19:56:24 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA00382 for tcp-impl-list; Mon, 8 Sep 1997 19:52:07 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA00371 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 19:52:05 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id TAA14011
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 19:52:04 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.7/8.8.5)
	id TAA24965; Mon, 8 Sep 1997 19:52:02 -0700 (PDT)
Message-Id: <199709090252.TAA24965@daffy.ee.lbl.gov>
To: curtis@ans.net
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today? 
In-reply-to: Your message of Mon, 08 Sep 1997 22:18:03 PDT.
Date: Mon, 08 Sep 1997 19:52:02 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

(Not speaking in an official capacity.)

> BSD variants account for a high percentage
> mostly *becasue they perform well*, but also because the PC BSD
> varients (BSDI, FreeBSD, etc) are very cost effective (IW=1).  Solaris
> and other SysV based Unix (like HP/UX) are also widely used despite
> not performing as well (IW=2).  NT is less popular as a web server
> since performance is worse yet but still significant due to religious
> convictions of some (IW=2?).

This is just about all backwards.  BSD-based Web servers have IW=2
due to the bug of taking the SYN-ack as advancing cwnd.  Solaris doesn't
do this, and got nailed in a Web benchmarking study because it has
to wait for the delayed-ack timer to get the first slow-start ack, while
the violaters don't.  My traces of NT show it likewise uses IW=1.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Mon Sep  8 20:23:44 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA05048 for tcp-impl-list; Mon, 8 Sep 1997 20:21:20 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id UAA04913 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 20:20:38 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id UAA20858
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 20:20:36 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.7/8.8.5)
	id UAA25086; Mon, 8 Sep 1997 20:13:59 -0700 (PDT)
Message-Id: <199709090313.UAA25086@daffy.ee.lbl.gov>
To: Rick Jones <raj@hpisrdq.cup.hp.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today?
In-reply-to: Your message of Mon, 08 Sep 1997 17:28:44 PDT.
Date: Mon, 08 Sep 1997 20:13:58 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > Because there's the connection-responder-opens-cwnd-on-SYN-ack bug
> > that most BSD TCPs suffer from
> 
> Now that is an interesting statement - does not the ACK of the SYN or
> SYN|ACK imply that a packet has left the network, and isn't the whole
> idea conservation of packets?

I think the whole idea is convervation of *data* packets.

> How about allowing the ACK to increase the window for the number of
> packets ACKed? Yes, it does mean that slow start isn't quite as slow ...

An interesting research question - but out of scope for tcp-impl, as this
is a significant change.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Mon Sep  8 20:27:00 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA05664 for tcp-impl-list; Mon, 8 Sep 1997 20:24:14 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id UAA05658 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 20:24:12 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id UAA21828
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 8 Sep 1997 20:24:10 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.7/8.8.5)
	id UAA25156; Mon, 8 Sep 1997 20:24:08 -0700 (PDT)
Message-Id: <199709090324.UAA25156@daffy.ee.lbl.gov>
To: curtis@ans.net
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today? 
In-reply-to: Your message of Mon, 08 Sep 1997 19:52:02 PDT.
Date: Mon, 08 Sep 1997 20:24:08 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Minor correction:

> My traces of NT show it likewise uses IW=1.

They do unless their initial MSS is > than the MSS that's agreed upon; same
bug as some Reno TCPs have.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep  9 00:26:41 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA07205 for tcp-impl-list; Tue, 9 Sep 1997 00:24:13 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA07200 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 00:24:11 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id AAA13058
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 00:24:08 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id IAA07011; Tue, 9 Sep 1997 08:23:26 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0x8Kbi-0005FjC; Tue, 9 Sep 97 08:21 BST
Message-Id: <m0x8Kbi-0005FjC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: what IW is being used today?
To: romano@magoo.rsch.comm.mot.com (Guy Romano)
Date: Tue, 9 Sep 1997 08:21:05 +0100 (BST)
Cc: raj@cup.hp.com, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <9709082311.AA19037@magoo.rsch.comm.mot.com> from "Guy Romano" at Sep 8, 97 06:11:27 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Just a data point, OSF1 v3.2c by DEC has IW=1.

Linux is intended to have IW=1. I think we have it right ;)

Another thing we need to add to the list of tcp issues appears to be properly
handling of MTU discovery in the presence of no ICMP. The number of big name
web sites demonstrating a total lack of clues in configuring their firewall
and hosts (block ICMP, mtu discovery on) - big names too - suggests we must
make correct behaviour in the presence of failure of mtu discovery a
SHOULD

Alan


From owner-tcp-impl@relay.engr.sgi.com  Tue Sep  9 04:50:50 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id EAA07412 for tcp-impl-list; Tue, 9 Sep 1997 04:48:03 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id EAA07406 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 04:48:01 -0700
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id EAA22626
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 04:47:54 -0700
	env-from (mouse@Twig.Rodents.Montreal.QC.CA)
Received: (from mouse@localhost)
	by Twig.Rodents.Montreal.QC.CA (8.8.5/8.8.5) id HAA12063;
	Tue, 9 Sep 1997 07:47:47 -0400 (EDT)
Date: Tue, 9 Sep 1997 07:47:47 -0400 (EDT)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199709091147.HAA12063@Twig.Rodents.Montreal.QC.CA>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today?
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Another thing we need to add to the list of tcp issues appears to be
> properly handling of MTU discovery in the presence of no ICMP.  [...]
> suggests we must make correct behaviour in the presence of failure of
> mtu discovery a SHOULD

What _is_ correct behavior in the presence of a peer aggressively
attempting to break the protocols?  Dropping all ICMP strikes me a bit
like dropping all ACK-only packets - there comes a point where you have
to throw up your hands and say "sorry, the peer is broken".  I'd say we
should come down hard on anyone who drops so aggressively that MTU
discovery can't work; I see no reason to encourage such behavior by
making everybody else jump through hoops to do something near-optimal
when faced with it.

Or does your "correct behavior" mean just "doesn't exhibit a
network-harming failure mode"?  In that case, I agree - but that
applies to _every_ peer; there's no reason to single out those that
aggressively drop ICMP.

> The number of big name web sites demonstrating a total lack of clues
> in configuring their firewall and hosts (block ICMP, mtu discovery
> on) - big names too -

- is no excuse to let them redefine the protocols.

To pick an analogy from a layer higher up the stack, there are lots of
sites with NAT boxes (often also playing firewall).  Such boxes break
FTP when the data connection active side is the "outside" machine (ie,
PORT mode when FTPing out, PASV mode when FTPing in).  But I don't see
people attempting to "fix" FTP because of this.  (Indeed, one such box
I have encountered actually modifies the data passing over the FTP
control connection to kludge around this, which is even uglier than the
original problem.)

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep  9 07:46:18 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA27605 for tcp-impl-list; Tue, 9 Sep 1997 07:43:08 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA27595 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 07:43:05 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id IAA08257 for tcp-impl@cthulhu.engr.sgi.com; Tue, 9 Sep 1997 08:43:02 -0600
Date: Tue, 9 Sep 1997 08:43:02 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199709091443.IAA08257@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today?
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: alan@lxorguk.ukuu.org.uk (Alan Cox)
> To: romano@magoo.rsch.comm.mot.com (Guy Romano)
> Cc: raj@cup.hp.com, tcp-impl@cthulhu.engr.sgi.com

> ...
> Another thing we need to add to the list of tcp issues appears to be properly
> handling of MTU discovery in the presence of no ICMP. The number of big name
> web sites demonstrating a total lack of clues in configuring their firewall
> and hosts (block ICMP, mtu discovery on) - big names too - suggests we must
> make correct behaviour in the presence of failure of mtu discovery a
> SHOULD


What do you figure is the right response when an HTTP client sends a
few redundant ACKs and then a FIN?   Isn't that what your "big name"
server with MTU discovery on and ICMP blocked will see?   How is that
different from many other problems, starting with transient unidirectional
blackholes (e.g. routing problems) and continuing with data-dependent
packet loss?


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep  9 09:39:44 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA24752 for tcp-impl-list; Tue, 9 Sep 1997 09:32:57 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA24711 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 09:32:51 -0700
Received: from motgate.mot.com (motgate.mot.com [129.188.136.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA03279
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 09:32:48 -0700
	env-from (romano@magoo.rsch.comm.mot.com)
Received: from pobox.mot.com (pobox.mot.com [129.188.137.100]) by motgate.mot.com (8.8.5/8.6.10/MOT-3.8) with ESMTP id LAA07809 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 11:32:46 -0500 (CDT)
Comments: ( Received on motgate.mot.com from client pobox.mot.com, sender romano@magoo.rsch.comm.mot.com )
Received: from il02dns1.comm.mot.com (il02dns1.comm.mot.com [145.1.3.2]) by pobox.mot.com (8.8.5/8.6.10/MOT-3.8) with ESMTP id LAA15286 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 11:32:43 -0500 (CDT)
Received: from magoo.rsch.comm.mot.com (magoo.comm.mot.com [145.1.80.34]) by il02dns1.comm.mot.com (8.7.5/8.7.3) with SMTP id LAA04740; Tue, 9 Sep 1997 11:32:28 -0500 (CDT)
Received: from localhost by magoo.rsch.comm.mot.com (4.1/SMI-4.1)
	id AA27118; Tue, 9 Sep 97 11:32:20 CDT
Message-Id: <9709091632.AA27118@magoo.rsch.comm.mot.com>
To: Rick Jones <raj@hpisrdq.cup.hp.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today? 
In-Reply-To: Your message of "Mon, 08 Sep 1997 17:25:18 PDT."
             <3414976E.675B@cup.hp.com> 
Date: Tue, 09 Sep 1997 11:32:18 -0500
From: Guy Romano <romano@magoo.rsch.comm.mot.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Rick Jones wrote:

> Guy Romano wrote:
> > 
> > I don't believe that a stack with an IW=2 will have interoperability
> > issues communicating with a stack with IW=1.  The initial congestion
> 
> That depends on the direction. Clearly, the IW=1 stack in SOlaris did
> not work well with the IW=2 stack of Win95/NT - that is what I was using
> as a guesstimate that most other popular stacks on the net were
> effectively IW=2.
> 
I still don't see how this is a problem.  Are you sure the reason
that the Solaris and Win95/NT stacks didn't work well together
was due to the IWs used?  The Solaris stack has at least one
problem that was mentioned a while ago concerning premature
timeouts when the RTT is fairly large.

One output from this group will be an update to RFC2001.  We may, 
if appropriate, increase the cwnd to something larger than 1.  Once
this is done a number of stack vendors will update their stacks to
reflect the new initial cwnd.  So, for a potentially long period of
time, the net will have a number of stacks with IW=1 communicating
with stacks that have IW>1.  If, as you assert, IW=1 stacks do not 
work well with IW>1 stacks then the change to RFC2001 will be 
creating a problem.  

I personally don't belive that this will be a problem.

> > Just a data point, OSF1 v3.2c by DEC has IW=1.
> 
> Cool - any guesstimate what the "Internet weight" of OSF1 v3.2c happens
> to be?
> 
While DEC may not be very popular, they do implement a damn good
TCP stack.


Guy

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep  9 10:44:42 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA20398 for tcp-impl-list; Tue, 9 Sep 1997 10:37:41 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA20377 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 10:37:35 -0700
Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.235]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA27953
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 10:37:34 -0700
	env-from (raj@hpisrdq.cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185])
	by palrel1.hp.com (8.8.6/8.8.5tis) with SMTP id KAA18927
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 10:37:33 -0700 (PDT)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA22328; Tue, 9 Sep 1997 10:34:55 -0700
Message-Id: <341588BE.4108@cup.hp.com>
Date: Tue, 09 Sep 1997 10:34:54 -0700
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: Vern Paxson <vern@ee.lbl.gov>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today?
References: <199709090313.UAA25086@daffy.ee.lbl.gov>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > Now that is an interesting statement - does not the ACK of the SYN or
> > SYN|ACK imply that a packet has left the network, and isn't the whole
> > idea conservation of packets?
> 
> I think the whole idea is convervation of *data* packets.

Well, if I was going to be picky, I'd say that SYN segments consume
sequence number space just like a data segment and call it metadata or
something. 

Instead though I'll ask why the successful passage of a SYN segment
should be any less informative than a data segment. If the intermediate
queues are full or near full when a connection is established, the SYN
(or SYN|ACK) would fill the last slot in the queue just like any data
segment and would presumably preclude the addition of another segment
(data, SYN, or ACK) just like a data segment. 

Further, doesn't a SYN segment's arrival trigger a RED drop of a data
segment just like any other segment - or do the routers look to see what
type of TCP segment it is?

> > How about allowing the ACK to increase the window for the number of
> > packets ACKed? Yes, it does mean that slow start isn't quite as slow ...
> 
> An interesting research question - but out of scope for tcp-impl, as this
> is a significant change.

Isn't that the behaviour described in the original paper? 

Now if you really want to get into a researchy topic - how about
congenstion avoidance applied to the sending of ACKs? They can be upward
of 50% of the packets yes?

rick jones

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep  9 11:06:49 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA27913 for tcp-impl-list; Tue, 9 Sep 1997 10:57:53 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA27861 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 10:57:47 -0700
Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.235]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA04602
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 10:57:46 -0700
	env-from (raj@hpisrdq.cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185])
	by palrel1.hp.com (8.8.6/8.8.5tis) with SMTP id KAA23450
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 10:57:44 -0700 (PDT)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA22333; Tue, 9 Sep 1997 10:55:09 -0700
Message-Id: <34158D7D.57C8@cup.hp.com>
Date: Tue, 09 Sep 1997 10:55:09 -0700
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: Guy Romano <romano@magoo.rsch.comm.mot.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today?
References: <9709091632.AA27118@magoo.rsch.comm.mot.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I still don't see how this is a problem.  Are you sure the reason
> that the Solaris and Win95/NT stacks didn't work well together
> was due to the IWs used?  The Solaris stack has at least one
> problem that was mentioned a while ago concerning premature
> timeouts when the RTT is fairly large.

Consider what happens if you have a web server with strict adherence to
IW=1 talking to a client which is very good about "only ack every two
full-sized segments." The server will send the first segment of the http
response, and will await an ACK from the client. The client will receive
the single segment, but will not ACK immediately since it is only one
MSS-worth of data, and it has nothing else to send-back the other
direction.

So, the URL retrieval has a rather nasty wait for the standalone ACK
interval in the middle of it.

>From what I have read of the descriptions of the performance issue
between Solaris and Win95/NT, that is the most plausible explanation. 

Yes, everything still "works" in that the data eventually gets from one
side to the other. It does not work "efficiently." Since the server
waited for the standalone ACK, I think we actually have one *more*
segment traversing the network than if the server sent two data segments
in the beginning (modulo segment loss).

One could argue then that receivers should immediatly ACK the first data
segment they receive, to accomodate the conservative IW but again, that
is putting more segments into the network, which I would think would
make it *more* likely to have packet loss since the average "load" is
increased by one packet per connection.

rick jones

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep  9 13:17:00 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA23061 for tcp-impl-list; Tue, 9 Sep 1997 13:13:44 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA23048 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 13:13:42 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA21731
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 13:13:40 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.7/8.8.5)
	id NAA27263; Tue, 9 Sep 1997 13:07:01 -0700 (PDT)
Message-Id: <199709092007.NAA27263@daffy.ee.lbl.gov>
To: Rick Jones <raj@hpisrdq.cup.hp.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today?
In-reply-to: Your message of Tue, 09 Sep 1997 10:34:54 PDT.
Date: Tue, 09 Sep 1997 13:07:01 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> If the intermediate
> queues are full or near full when a connection is established, the SYN
> (or SYN|ACK) would fill the last slot in the queue just like any data
> segment and would presumably preclude the addition of another segment
> (data, SYN, or ACK) just like a data segment. 

If I've queued N pure ack (or SYN) segments vs. N 1460 byte data packets,
it takes me about 3% as much time to drain the SYN segments from the
queue as the data packets.  That's a major difference from a network
load perspective.  A data packet leaving the network is a significantly
greater change in network load than a SYN leaving.

I agree that there's a lot of room for arguing/discussing here - but
I think the right list for it is end2end-interest.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep  9 20:07:20 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA28551 for tcp-impl-list; Tue, 9 Sep 1997 20:04:18 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id UAA28541 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 20:04:15 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id UAA18853
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 20:04:14 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id EAA26422; Wed, 10 Sep 1997 04:02:28 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0x8Xcs-0005FvC; Tue, 9 Sep 97 22:15 BST
Message-Id: <m0x8Xcs-0005FvC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: what IW is being used today?
To: vjs@mica.denver.sgi.com (Vernon Schryver)
Date: Tue, 9 Sep 1997 22:15:10 +0100 (BST)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199709091443.IAA08257@mica.denver.sgi.com> from "Vernon Schryver" at Sep 9, 97 08:43:02 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> What do you figure is the right response when an HTTP client sends a
> few redundant ACKs and then a FIN?   Isn't that what your "big name"
> server with MTU discovery on and ICMP blocked will see?   How is that
> different from many other problems, starting with transient unidirectional
> blackholes (e.g. routing problems) and continuing with data-dependent
> packet loss?

One approach is to be a bit smarter with MTU discovery - turn discovery off
on retransmits and if you see consistent cases of  transmit with DF, silence,
transmit without DF ack then to turn it off for good on that session.

Im not sure of a good way to tell that from getting the rtt wrong however,
and I'd like to hear suggestions. The obvious answer is to educate the big
name site's firewall configuration folks. So far thats proving to be a
demonstration of the shortage of clueful people on the internet..

Alan

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep  9 20:12:11 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA29545 for tcp-impl-list; Tue, 9 Sep 1997 20:10:24 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id UAA29540 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 20:10:22 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id UAA19879
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 20:10:20 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id EAA26444; Wed, 10 Sep 1997 04:07:14 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0x8XTP-0005FoC; Tue, 9 Sep 97 22:05 BST
Message-Id: <m0x8XTP-0005FoC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: what IW is being used today?
To: vern@ee.lbl.gov (Vern Paxson)
Date: Tue, 9 Sep 1997 22:05:23 +0100 (BST)
Cc: raj@hpisrdq.cup.hp.com, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199709092007.NAA27263@daffy.ee.lbl.gov> from "Vern Paxson" at Sep 9, 97 01:07:01 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> If I've queued N pure ack (or SYN) segments vs. N 1460 byte data packets,
> it takes me about 3% as much time to drain the SYN segments from the
> queue as the data packets.  That's a major difference from a network

There are other assumptions here to do with how routers buffer - be it
frames, bytes or fixed sized buffers.

> I agree that there's a lot of room for arguing/discussing here - but
> I think the right list for it is end2end-interest.

Definitely - its way to complex for merely bug resolving


From owner-tcp-impl@relay.engr.sgi.com  Tue Sep  9 20:27:28 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA02284 for tcp-impl-list; Tue, 9 Sep 1997 20:25:05 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id UAA02276 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 20:25:02 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id VAA10397 for tcp-impl@cthulhu.engr.sgi.com; Tue, 9 Sep 1997 21:24:57 -0600
Date: Tue, 9 Sep 1997 21:24:57 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199709100324.VAA10397@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today?
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: alan@lxorguk.ukuu.org.uk (Alan Cox)
> To: vjs (Vernon Schryver)
> Cc: tcp-impl@cthulhu.engr.sgi.com

> > What do you figure is the right response when an HTTP client sends a
> > few redundant ACKs and then a FIN?   Isn't that what your "big name"
> > server with MTU discovery on and ICMP blocked will see?   How is that
> > different from many other problems, starting with transient unidirectional
> > blackholes (e.g. routing problems) and continuing with data-dependent
> > packet loss?
> 
> One approach is to be a bit smarter with MTU discovery - turn discovery off
> on retransmits and if you see consistent cases of  transmit with DF, silence,
> transmit without DF ack then to turn it off for good on that session.

What does it mean to turn off MTU discovery on a session?  Do you mean
to drop down to some random MSS that is smaller than both of the MSS's
and the MTU's of the TCP peers?  Such as the old ~500, despite the
performance hit?  Or some other ad hoc number such as 1500?

Do you do this on all retransmissions?  Do you do some kind of
slow-increase on the MSS just in case misdiagnosed the problem?


> Im not sure of a good way to tell that from getting the rtt wrong however,
> and I'd like to hear suggestions. The obvious answer is to educate the big
> name site's firewall configuration folks. So far thats proving to be a
> demonstration of the shortage of clueful people on the internet..

I'm not sure of a good way to tell broken MTU discovery from the
typical broken PC-web-surfer.  I bet a lot of HTTP sessions wind down
into dup-ACK...dup-ACK... as some ISP's terminal server tries to cope
with sick modems and lame PC PPP code.


There is another tactic, "let 'em fix the network."  It's not as if
broken MTU discovery is the biggest or most common problem on the net
today.


I care a little about this problem because of a persistent trickle of
complaints about access to www.sgi.com by a few (but not all distant)
clients with local FDDI rings .  Based on probes around the Internet
from www.sgi.com, things are working fine within a bunch of hops of the
machine.  I suspect there are random routers throughout the Internet
that do not generate ICMP errors.  Or FDDI-Ethernet bridges such as
some models of Cabletron and NPI.


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep  9 20:41:31 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id UAA03966 for tcp-impl-list; Tue, 9 Sep 1997 20:39:04 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id UAA03960 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 20:39:01 -0700
Received: from gecko.nas.nasa.gov (gecko.nas.nasa.gov [129.99.34.45]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id UAA25206
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 20:38:58 -0700
	env-from (kml@nas.nasa.gov)
Received: from gecko.nas.nasa.gov (kml@localhost)
	by gecko.nas.nasa.gov (8.8.7/NAS.6.1) with ESMTP id UAA18433; Tue, 9 Sep 1997 20:38:09 -0700 (PDT)
Message-Id: <199709100338.UAA18433@gecko.nas.nasa.gov>
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Path MTU discovery when ICMP can't get through
In-reply-to: Your message of "Tue, 09 Sep 1997 08:21:05 BST."
             <m0x8Kbi-0005FjC@lightning.swansea.linux.org.uk> 
Date: Tue, 09 Sep 1997 20:38:09 -0700
From: "Kevin M. Lahey" <kml@nas.nasa.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

In message <m0x8Kbi-0005FjC@lightning.swansea.linux.org.uk>Alan Cox writes
>Another thing we need to add to the list of tcp issues appears to be properly
>handling of MTU discovery in the presence of no ICMP. The number of big name
>web sites demonstrating a total lack of clues in configuring their firewall
>and hosts (block ICMP, mtu discovery on) - big names too - suggests we must
>make correct behaviour in the presence of failure of mtu discovery a
>SHOULD

Is there a standard way to do this?  Just dropping DF after some number
of timeouts sounds lame, but keeping track of the largest segment
successfully acknowledged sounds like it might get ugly.  What is 
the *right* solution?

Thanks,

Kevin

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep  9 21:34:56 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA10000 for tcp-impl-list; Tue, 9 Sep 1997 21:32:35 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id VAA09995 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 21:32:32 -0700
Received: from frantic.BSDI.COM (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id VAA05428
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 21:32:28 -0700
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id XAA11130;
	Tue, 9 Sep 1997 23:32:26 -0500 (CDT)
Date: Tue, 9 Sep 1997 23:32:26 -0500 (CDT)
From: David Borman <dab@BSDI.COM>
Message-Id: <199709100432.XAA11130@frantic.BSDI.COM>
To: tcp-impl@cthulhu.engr.sgi.com, vjs@mica.denver.sgi.com
Subject: Re: what IW is being used today?
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From owner-tcp-impl@cthulhu.engr.sgi.com Tue Sep  9 09:48:46 1997
> Date: Tue, 9 Sep 1997 08:43:02 -0600
> From: vjs@mica.denver.sgi.com (Vernon Schryver)
> To: tcp-impl@cthulhu.engr.sgi.com
> Subject: Re: what IW is being used today?
> Precedence: bulk
> 
> > From: alan@lxorguk.ukuu.org.uk (Alan Cox)
> > To: romano@magoo.rsch.comm.mot.com (Guy Romano)
> > Cc: raj@cup.hp.com, tcp-impl@cthulhu.engr.sgi.com
> 
> > ...
> > Another thing we need to add to the list of tcp issues appears to be properly
> > handling of MTU discovery in the presence of no ICMP. The number of big name
> > web sites demonstrating a total lack of clues in configuring their firewall
> > and hosts (block ICMP, mtu discovery on) - big names too - suggests we must
> > make correct behaviour in the presence of failure of mtu discovery a
> > SHOULD
> 
> 
> What do you figure is the right response when an HTTP client sends a
> few redundant ACKs and then a FIN?   Isn't that what your "big name"
> server with MTU discovery on and ICMP blocked will see?   How is that
> different from many other problems, starting with transient unidirectional
> blackholes (e.g. routing problems) and continuing with data-dependent
> packet loss?

The lack of getting back an ICMP message when a packet is dropped
due to it needing to be fragmented and DF being set is what I have
always refered to as "black hole detection".  There used to be routers
that could disable the generation of all ICMP host unreachable messages
to protect old 4.2BSD based hosts, but they went too far, breaking
PMTU Discovery.  RFC 1435 was written to address that issue.  An
over-agressive firewall that blocks ICMP provides a situation that
is indistinguishable from the routers that didn't generate the ICMP
message.

I've not implemented black hole detection (in part, because if the
routers adhere to RFC 1435, you shouldn't need it), but one method
is fairly straight forward, even if it is a bit painful.

	1) You keep track of connections that have retransmissions.
	2) After a set number of retransmissions, you decide to turn
	   on the black hole detection.
	3) Black hole detection assumes that the packets are being
	   lost due to the DF being set and the ICMP message being
	   blocked.
	4) Go to your internal table of MTUs, and drop the connection
	   down to the next item.
	5) Go back to step 2

You could also just turn off DF after a certain number of
retransmissions, and keep retransmitting.  If you then get
an ACK, assume that we've detected a black hole.  Turn back
on the DF, and drop down to the next smaller size in the MTU
table.

But in any case, its all based on heuristics of making assumtions
about the lack of response from the remote host, and we can all come
up with situations where this type of heuristic would fail.  Hopefully
failure would just mean sending smaller than optimal data packets.

One could also keep track of the largest packet that has ever been
sent and received, and only do the black hole detection when sending
a packet larger than that value.  Of course, a change in the path
could cause that kind of decision to be inaccurate.

I never followed through on implementing Black Hole Detection
because it didn't seem to me that the hassels of developing the
code was worth the payback.

			-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Tue Sep  9 22:05:59 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA13770 for tcp-impl-list; Tue, 9 Sep 1997 22:03:44 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id WAA13764 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 22:03:41 -0700
Received: from lox.sandelman.ottawa.on.ca (lox.sandelman.ottawa.on.ca [205.233.54.146]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id WAA10297
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 22:03:35 -0700
	env-from (mcr@istari.sandelman.ottawa.on.ca)
Received: from istari.sandelman.ottawa.on.ca (istari.sandelman.ottawa.on.ca [205.233.54.136]) by lox.sandelman.ottawa.on.ca (8.7.5/8.7.3) with ESMTP id BAA01941 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 01:11:51 -0400 (EDT)
Received: from istari.sandelman.ottawa.on.ca ([[UNIX: localhost]]) by istari.sandelman.ottawa.on.ca (8.7.5/8.7.3) with ESMTP id BAA13135 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 01:02:50 -0400 (EDT)
Message-Id: <199709100502.BAA13135@istari.sandelman.ottawa.on.ca>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today? 
In-reply-to: Your message of "Tue, 09 Sep 1997 23:32:26 CDT."
             <199709100432.XAA11130@frantic.BSDI.COM> 
Date: Wed, 10 Sep 1997 01:02:45 -0400
From: "Michael C. Richardson" <mcr@sandelman.ottawa.on.ca>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

-----BEGIN PGP SIGNED MESSAGE-----


>>>>> "David" == David Borman <dab@BSDI.COM> writes:
    David> I've not implemented black hole detection (in part, because
    David> if the routers adhere to RFC 1435, you shouldn't need it),
    David> but one method is fairly straight forward, even if it is a
    David> bit painful.

    David> 	1) You keep track of connections that have
    David> retransmissions.  2) After a set number of retransmissions,
    David> you decide to turn on the black hole detection.  3) Black
    David> hole detection assumes that the packets are being lost due
    David> to the DF being set and the ICMP message being blocked.  4)
    David> Go to your internal table of MTUs, and drop the connection
    David> down to the next item.  5) Go back to step 2

  This is an excellent heuristic. Would it work for IPv6 packets? I
think so. I was thinking about something similar, but wasn't so
succinct. 

    David> You could also just turn off DF after a certain number of
    David> retransmissions, and keep retransmitting.  If you then get
    David> an ACK, assume that we've detected a black hole.  Turn back
    David> on the DF, and drop down to the next smaller size in the
    David> MTU table.

  This doesn't work, since the DF bit is always on for v6 packets. 

    David> I never followed through on implementing Black Hole
    David> Detection because it didn't seem to me that the hassels of
    David> developing the code was worth the payback.

  You may not get ICMP's back from many intermediate routers when
IPsec is involved. To put it simply: the IPsec SA doesn't allow
packets from that source to that destination to enter the tunnel.
  This *is* a real life situation, and it doesn't just affect the
routers over which the encrypted packets travel (as I had originally
thought in draft-richardson-ipsec-pmtu-discovery). For example:

                                  /-B
  A---Gw1---Internet---Gw2---R---R
                                  \-C

  Gw1/Gw2 may have an SA that allows subnet/host A to talk to
subnet/host B, but not C. (This is a real security scenario, see 
draft-moskowitz-ipsec-vpn)
  R's are routers. They may perform ingress filtering, so Gw2 is
confident that a packet from B could not have been spoofed by a node
from C. Maybe Gw2 and the router cloud are FDDI, and B's network is
ethernet. So, we'd like to have PMTU. 
  There are more details. Some might say that this is a bad network
design. 

   :!mcr!:            |  Network security programming, currently
   Michael Richardson | on contract with DataFellows F-Secure IPSec
 WWW: <A HREF="http://www.sandelman.ottawa.on.ca/People/Michael_Richardson/Bio.html">mcr@sandelman.ottawa.on.ca</A>. PGP key available.


-----BEGIN PGP SIGNATURE-----
Version: 2.6.3ia
Charset: latin1
Comment: Processed by Mailcrypt 3.4, an Emacs/PGP interface

iQB1AwUBNBYp16ZpLyXYhL+BAQGjsgL/YD5wkNpmHlR9v39jc7p9gD5CcmelYJm0
r4HhS8M1A9EfVW9Z8MPyo074KZt73RWI9NHEsO3pAMz9uXmqePn8l01xKgwIfIkF
UQ81YxOjQy5gBxky5dwWSj9RIIQ6rEfk
=VRyZ
-----END PGP SIGNATURE-----

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep  9 23:56:22 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA23625 for tcp-impl-list; Tue, 9 Sep 1997 23:54:03 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA23605 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 23:54:00 -0700
Received: from lestat.nas.nasa.gov (lestat.nas.nasa.gov [129.99.50.29]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id XAA28471
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Sep 1997 23:54:00 -0700
	env-from (thorpej@lestat.nas.nasa.gov)
Received: from localhost (localhost [127.0.0.1]) by lestat.nas.nasa.gov (8.8.6/8.6.12) with SMTP id XAA04007; Tue, 9 Sep 1997 23:46:18 -0700 (PDT)
Message-Id: <199709100646.XAA04007@lestat.nas.nasa.gov>
X-Authentication-Warning: lestat.nas.nasa.gov: localhost [127.0.0.1] didn't use HELO protocol
To: "Kevin M. Lahey" <kml@nas.nasa.gov>
Cc: alan@lxorguk.ukuu.org.uk (Alan Cox), tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Path MTU discovery when ICMP can't get through 
Reply-To: Jason Thorpe <thorpej@nas.nasa.gov>
From: Jason Thorpe <thorpej@nas.nasa.gov>
Date: Tue, 09 Sep 1997 23:46:08 -0700
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

On Tue, 09 Sep 1997 20:38:09 -0700 
 "Kevin M. Lahey" <kml@nas.nasa.gov> wrote:

[ PMTU in presence of no ICMP ]

 > Is there a standard way to do this?  Just dropping DF after some number
 > of timeouts sounds lame, but keeping track of the largest segment
 > successfully acknowledged sounds like it might get ugly.  What is 
 > the *right* solution?

I think the best approach here is to keep it simple - after your
threshold of timeouts, simply say "Ok, someone along the path is
being lame", and fall back on a `traditional' segment size computation
(such as BSD's MTU-for-local-addresses-else-default-mss hack).  I would
think that this would be a corner case (and thus not worth the effort of
a more complex solution), and maybe the resulting less-than-optimal
performance will convince people to fix their firewalls?

Jason R. Thorpe                                       thorpej@nas.nasa.gov
NASA Ames Research Center                            Home: +1 408 866 1912
NAS: M/S 258-6                                       Work: +1 415 604 0935
Moffett Field, CA 94035                             Pager: +1 415 428 6939

From owner-tcp-impl@relay.engr.sgi.com  Wed Sep 10 07:36:13 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA08044 for tcp-impl-list; Wed, 10 Sep 1997 07:33:36 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA08034 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 07:33:32 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id IAA11180 for tcp-impl@cthulhu.engr.sgi.com; Wed, 10 Sep 1997 08:33:29 -0600
Date: Wed, 10 Sep 1997 08:33:29 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199709101433.IAA11180@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today?
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: David Borman <dab@BSDI.COM>
> To: tcp-impl@cthulhu.engr.sgi.com, vjs

> ...
> I never followed through on implementing Black Hole Detection
> because it didn't seem to me that the hassels of developing the
> code was worth the payback.

That has my vote.  Let 'em fix their routers instead of bloating
everyone else's kernels.


] To: "Kevin M. Lahey" <kml@nas.nasa.gov>
] Cc: alan@lxorguk.ukuu.org.uk (Alan Cox), tcp-impl@cthulhu.engr.sgi.com
] From: Jason Thorpe <thorpej@nas.nasa.gov>

] ...
] I think the best approach here is to keep it simple - after your
] threshold of timeouts, simply say "Ok, someone along the path is
] being lame", and fall back on a `traditional' segment size computation
] (such as BSD's MTU-for-local-addresses-else-default-mss hack).  I would
] think that this would be a corner case (and thus not worth the effort of
] a more complex solution), and maybe the resulting less-than-optimal
] performance will convince people to fix their firewalls?


Note that the MTU-for-local-addresses-else-default-mss hack is not
relevant.  A pair of BSD systems will have negogated a MSS at the
beginning of the TCP connection that is no larger than the smallest of
their two local MTU's.

As I said last night, I have fairly good evidence that occassionally
PMTU Discovery is broken not by firewalls near HTTP server, but
elsewhere.


Vernon Schryver,  vjs@sgi.com



From owner-tcp-impl@relay.engr.sgi.com  Wed Sep 10 08:21:14 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA16386 for tcp-impl-list; Wed, 10 Sep 1997 08:15:59 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA16372 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 08:15:57 -0700
Received: from motgate.mot.com (motgate.mot.com [129.188.136.100]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA09433
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 08:15:52 -0700
	env-from (romano@magoo.rsch.comm.mot.com)
Received: from pobox.mot.com (pobox.mot.com [129.188.137.100]) by motgate.mot.com (8.8.5/8.6.10/MOT-3.8) with ESMTP id KAA12578; Wed, 10 Sep 1997 10:15:48 -0500 (CDT)
Comments: ( Received on motgate.mot.com from client pobox.mot.com, sender romano@magoo.rsch.comm.mot.com )
Received: from il02dns1.comm.mot.com (il02dns1.comm.mot.com [145.1.3.2]) by pobox.mot.com (8.8.5/8.6.10/MOT-3.8) with ESMTP id KAA05378; Wed, 10 Sep 1997 10:15:29 -0500 (CDT)
Received: from magoo.rsch.comm.mot.com (magoo.comm.mot.com [145.1.80.34]) by il02dns1.comm.mot.com (8.7.5/8.7.3) with SMTP id KAA19667; Wed, 10 Sep 1997 10:15:22 -0500 (CDT)
Received: from localhost by magoo.rsch.comm.mot.com (4.1/SMI-4.1)
	id AA07932; Wed, 10 Sep 97 10:15:21 CDT
Message-Id: <9709101515.AA07932@magoo.rsch.comm.mot.com>
Cc: Rick Jones <raj@hpisrdq.cup.hp.com>, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today? 
In-Reply-To: Your message of "Wed, 10 Sep 1997 09:46:04 CDT."
             <9709101446.AA07586@magoo.rsch.comm.mot.com> 
Date: Wed, 10 Sep 1997 10:15:20 -0500
From: Guy Romano <romano@magoo.rsch.comm.mot.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Many apologies, this obviously wasn't meant to go the
tcp-impl group.  I'm not accustom to this new mailer yet.

Guy

From owner-tcp-impl@relay.engr.sgi.com  Wed Sep 10 08:36:59 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA20599 for tcp-impl-list; Wed, 10 Sep 1997 08:34:35 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA20593 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 08:34:33 -0700
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA18808
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 08:34:27 -0700
	env-from (curtis@brookfield.ans.net)
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1]) by brookfield.ans.net (8.8.5/8.7.3) with ESMTP id LAA29354; Wed, 10 Sep 1997 11:33:42 -0400 (EDT)
Message-Id: <199709101533.LAA29354@brookfield.ans.net>
To: Vern Paxson <vern@ee.lbl.gov>
cc: curtis@ans.net, tcp-impl@cthulhu.engr.sgi.com
Reply-To: curtis@ans.net
Subject: Re: what IW is being used today? 
In-reply-to: Your message of "Mon, 08 Sep 1997 19:52:02 PDT."
             <199709090252.TAA24965@daffy.ee.lbl.gov> 
Date: Wed, 10 Sep 1997 11:33:42 -0400
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <199709090252.TAA24965@daffy.ee.lbl.gov>, Vern Paxson writes:
> (Not speaking in an official capacity.)
> 
> > BSD variants account for a high percentage
> > mostly *becasue they perform well*, but also because the PC BSD
> > varients (BSDI, FreeBSD, etc) are very cost effective (IW=1).  Solaris
> > and other SysV based Unix (like HP/UX) are also widely used despite
> > not performing as well (IW=2).  NT is less popular as a web server
> > since performance is worse yet but still significant due to religious
> > convictions of some (IW=2?).
> 
> This is just about all backwards.  BSD-based Web servers have IW=2
> due to the bug of taking the SYN-ack as advancing cwnd.  Solaris doesn't
> do this, and got nailed in a Web benchmarking study because it has
> to wait for the delayed-ack timer to get the first slow-start ack, while
> the violaters don't.  My traces of NT show it likewise uses IW=1.
> 
> 		Vern


Thanks for the correction.

Sounds like IW=2 is more important for avoiding problems with delayed
ACK (one clock tick can be 500 msec) than it is for getting the window
fully open in one less RTT.

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Wed Sep 10 09:27:09 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA03372 for tcp-impl-list; Wed, 10 Sep 1997 09:24:17 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA03359 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 09:24:15 -0700
Received: from lestat.nas.nasa.gov (lestat.nas.nasa.gov [129.99.50.29]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA04058
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 09:24:10 -0700
	env-from (thorpej@lestat.nas.nasa.gov)
Received: from localhost (localhost [127.0.0.1]) by lestat.nas.nasa.gov (8.8.6/8.6.12) with SMTP id JAA11228 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 09:17:42 -0700 (PDT)
Message-Id: <199709101617.JAA11228@lestat.nas.nasa.gov>
X-Authentication-Warning: lestat.nas.nasa.gov: localhost [127.0.0.1] didn't use HELO protocol
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today? 
Reply-To: Jason Thorpe <thorpej@nas.nasa.gov>
From: Jason Thorpe <thorpej@nas.nasa.gov>
Date: Wed, 10 Sep 1997 09:17:40 -0700
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

On Wed, 10 Sep 1997 08:33:29 -0600 
 vjs@mica.denver.sgi.com (Vernon Schryver) wrote:

 > Note that the MTU-for-local-addresses-else-default-mss hack is not
 > relevant.  A pair of BSD systems will have negogated a MSS at the
 > beginning of the TCP connection that is no larger than the smallest of
 > their two local MTU's.

In my mind, it is relevant, in that the size of the segment you want to send
should be the min of the peer's advertised MSS and the discovered path MTU.
If you can't determine the latter, you need to pick some "safe" segment
size which, in traditional BSD systems, is tcp_mssdflt [512] (or the
interface MTU, for local addresses or subnets if subnets-are-local is
enabled).

You're right that it would not be relevant if one were to fall back on
the traditional BSD behavior of _when_ this computation is performed.
However, I'm not suggesting crippling MSS negotiation like traditional
BSD systems do.

I'm suggesting doing a similar computation each time a segment is to be
transmitted (I probably wan't clear on that); it's not that expensive to
perform.

I think that this is a bit like Dave's Black Hole Detection, but picks a 
quick and easy fallback in lieu of Dave's more complex algorithm.
Pseudo-code:

int
tcp_segsize(connection)
{

	if (path has associated MTU) {
		segsize = route's MTU;
	} else if (connection has a black hole) {
		segsize = default [small] mss;
	} else {
		segsize = interface MTU;
	}

	return (min(peer's advertised MSS, segsize));
}

If done for every segment to be transmitted, this should also do the `right
thing' in the event the path changes.  It might make sense to switch the
order of the "path MTU" and "black hole" checks...

Jason R. Thorpe                                       thorpej@nas.nasa.gov
NASA Ames Research Center                            Home: +1 408 866 1912
NAS: M/S 258-6                                       Work: +1 415 604 0935
Moffett Field, CA 94035                             Pager: +1 415 428 6939

From owner-tcp-impl@relay.engr.sgi.com  Wed Sep 10 09:48:11 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA10220 for tcp-impl-list; Wed, 10 Sep 1997 09:43:37 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA10197 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 09:43:33 -0700
Received: from alpha.xerox.com (alpha.Xerox.COM [13.1.64.93]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id JAA10070
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 09:43:09 -0700
	env-from (fenner@parc.xerox.com)
Received: from crevenia.parc.xerox.com ([13.2.116.11]) by alpha.xerox.com with SMTP id <54160(5)>; Wed, 10 Sep 1997 09:42:58 PDT
Received: from localhost by crevenia.parc.xerox.com with SMTP id <177486>; Wed, 10 Sep 1997 09:42:29 -0700
To: Jason Thorpe <thorpej@nas.nasa.gov>
cc: "Kevin M. Lahey" <kml@nas.nasa.gov>, alan@lxorguk.ukuu.org.uk (Alan Cox),
        tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Path MTU discovery when ICMP can't get through 
In-reply-to: Your message of "Tue, 09 Sep 97 23:46:08 PDT."
             <199709100646.XAA04007@lestat.nas.nasa.gov> 
Date: Wed, 10 Sep 1997 09:42:20 PDT
From: Bill Fenner <fenner@parc.xerox.com>
Message-Id: <97Sep10.094229pdt.177486@crevenia.parc.xerox.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Jason Thorpe <thorpej@nas.nasa.gov> wrote:
>I think the best approach here is to keep it simple

I agree.  I've been thinking about this for a while, and although the
idea of "kicking" the MTU discovery portion of IP to reduce the MTU
after N TCP retransmissions seems attractive, it seems like there are
too many ways for it to get "false positives" for a normal event like a
route flap or high congestion or high delay.

However, the best way to keep it simple might just be to stop setting
DF.  The "fall-back-to-small-PMTU-value" doesn't necessarily work if
you keep DF set; I observed this problem on a link with an MTU of ~300
a couple of months ago.  (I guess it would work if your "small PMTU value"
was 68, but that's probably not a good idea =)

  Bill

From owner-tcp-impl@relay.engr.sgi.com  Wed Sep 10 10:37:52 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA29305 for tcp-impl-list; Wed, 10 Sep 1997 10:33:53 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA29293 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 10:33:50 -0700
Received: from lox.sandelman.ottawa.on.ca ([205.233.54.146]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA02879
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 10:33:42 -0700
	env-from (mcr@istari.sandelman.ottawa.on.ca)
Received: from istari.sandelman.ottawa.on.ca (istari.sandelman.ottawa.on.ca [205.233.54.136]) by lox.sandelman.ottawa.on.ca (8.7.5/8.7.3) with ESMTP id NAA05842 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 13:44:25 -0400 (EDT)
Received: from istari.sandelman.ottawa.on.ca ([[UNIX: localhost]]) by istari.sandelman.ottawa.on.ca (8.7.5/8.7.3) with ESMTP id NAA16544 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 13:35:25 -0400 (EDT)
Message-Id: <199709101735.NAA16544@istari.sandelman.ottawa.on.ca>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Path MTU discovery when ICMP can't get through 
In-reply-to: Your message of "Wed, 10 Sep 1997 09:42:20 PDT."
             <97Sep10.094229pdt.177486@crevenia.parc.xerox.com> 
Date: Wed, 10 Sep 1997 13:35:24 -0400
From: "Michael C. Richardson" <mcr@sandelman.ottawa.on.ca>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

-----BEGIN PGP SIGNED MESSAGE-----


>>>>> "Bill" == Bill Fenner <fenner@parc.xerox.com> writes:
    Bill> However, the best way to keep it simple might just be to
    Bill> stop setting DF.  The "fall-back-to-small-PMTU-value"

  Again, this isn't an option for v6.

  Given the that any change that is proposed will likely take into v6
deployment (may *cause* v6.. "oh, and TCP is generally faster") it has
to take this into account.

   :!mcr!:            |  Network security programming, currently
   Michael Richardson | on contract with DataFellows F-Secure IPSec
 WWW: <A HREF="http://www.sandelman.ottawa.on.ca/People/Michael_Richardson/Bio.html">mcr@sandelman.ottawa.on.ca</A>. PGP key available.



-----BEGIN PGP SIGNATURE-----
Version: 2.6.3ia
Charset: latin1
Comment: Processed by Mailcrypt 3.4, an Emacs/PGP interface

iQB1AwUBNBbaWqZpLyXYhL+BAQFGeQL5AYAUbwWV6Y6nXSEfYXFq8zd2i8T/J5hA
abHe1o/7v9tpVc0Jk+IkmCEHwa5ToAbabi9Ox9ara18o1Y4cA3k8hpl0qdTOaVoU
WTsWGVLRpniWJY7t+gaHZ70kwqnGRFNe
=Wep9
-----END PGP SIGNATURE-----

From owner-tcp-impl@relay.engr.sgi.com  Wed Sep 10 14:10:57 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA05929 for tcp-impl-list; Wed, 10 Sep 1997 14:07:54 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA05910 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 14:07:50 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA18390
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 10 Sep 1997 14:07:47 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id WAA13866; Wed, 10 Sep 1997 22:05:19 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0x8t8Q-0005FxC; Wed, 10 Sep 97 21:13 BST
Message-Id: <m0x8t8Q-0005FxC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: what IW is being used today?
To: curtis@ans.net
Date: Wed, 10 Sep 1997 21:13:10 +0100 (BST)
Cc: vern@ee.lbl.gov, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199709101533.LAA29354@brookfield.ans.net> from "Curtis Villamizar" at Sep 10, 97 11:33:42 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Sounds like IW=2 is more important for avoiding problems with delayed
> ACK (one clock tick can be 500 msec) than it is for getting the window
> fully open in one less RTT.

So should we fix the delayed ack timer policy on those systems or use IW=2 ?
The former seems to work out on slow networks.

From owner-tcp-impl@relay.engr.sgi.com  Sat Sep 13 13:42:07 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA07831 for tcp-impl-list; Sat, 13 Sep 1997 13:40:48 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA07824 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 13 Sep 1997 13:40:46 -0700
Received: from lrc.di.epfl.ch (lrcsun14.epfl.ch [128.178.156.56]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA23156
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 13 Sep 1997 13:40:41 -0700
	env-from (manthorp@lrc.di.epfl.ch)
Received: (from manthorp@localhost) by lrc.di.epfl.ch (8.6.12/8.6.9) id WAA07516; Sat, 13 Sep 1997 22:40:22 +0200
From: Sam Manthorpe <manthorp@lrc.di.epfl.ch>
Message-Id: <199709132040.WAA07516@lrc.di.epfl.ch>
Subject: Re: tcp simulation module or suit
To: chchoi@mmlab.snu.ac.kr (Changho Choi)
Date: Sat, 13 Sep 1997 22:40:22 +0200 (MET DST)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199709080103.LAA03892@mmlab.snu.ac.kr> from "Changho Choi" at Sep 8, 97 11:03:49 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 511       
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Hi,

> I want to get a information of TCP simulation.
> Is there anybody who knows about that?

As well as the ns simulator, you can have a look at the
STCP simulator:

    http://lrcwww.epfl.ch/~manthorp/stcp/

Sam.

---------------------------------------------------------------
Sam Manthorpe, Laboratoire de Reseaux de Communications (LRC),
Ecole Polytechnique Federale de Lausanne, 1015 Lausanne, Suisse.
tel:+41 21 693 6749 fax: +41 21 693 6610
web: http://lrcwww.epfl.ch     email: manthorpe@di.epfl.ch

From owner-tcp-impl@relay.engr.sgi.com  Mon Sep 15 03:12:54 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id DAA06193 for tcp-impl-list; Mon, 15 Sep 1997 03:11:37 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id DAA06188 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 15 Sep 1997 03:11:33 -0700
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id DAA14997
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 15 Sep 1997 03:11:33 -0700
	env-from (Jerry.Chu@eng.Sun.COM)
Received: from sunmail1.Sun.COM ([129.145.1.2]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id DAA27385; Mon, 15 Sep 1997 03:11:31 -0700
Received: from jurassic.eng.sun.com by sunmail1.Sun.COM (SMI-8.6/SMI-4.1)
	id DAA18865; Mon, 15 Sep 1997 03:11:28 -0700
Received: from taipei.eng.sun.com (taipei [129.146.86.158])
	by jurassic.eng.sun.com (8.8.7+Sun.Alpha.7/8.8.7) with SMTP id DAA03403;
	Mon, 15 Sep 1997 03:11:29 -0700 (PDT)
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id DAA26153; Mon, 15 Sep 1997 03:07:45 -0700
Date: Mon, 15 Sep 1997 03:07:45 -0700
From: Jerry.Chu@eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199709151007.DAA26153@taipei.eng.sun.com>
To: alan@lxorguk.ukuu.org.uk
Subject: Re: what IW is being used today?
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > Sounds like IW=2 is more important for avoiding problems with delayed
> > ACK (one clock tick can be 500 msec) than it is for getting the window
> > fully open in one less RTT.
> 
> So should we fix the delayed ack timer policy on those systems or use IW=2 ?
> The former seems to work out on slow networks.
> 

That's what Solaris has (not delaying the ack of the first data segment)
for years, and one of the reason why we didn't discover the IW=1 vs
delayed-ack interoperability hiccup sooner (because we don't see this
on Solaris to Solaris testing).

We were compelled to add a new tcp tunable "tcp_slow_start_initial" in
Solaris 2.6 in order to circumvent this problem. For a complete story,
check out the following web page:

http://www.sun.com/sun-on-net/performance/tcp.slowstart.html

Jerry

From owner-tcp-impl@relay.engr.sgi.com  Mon Sep 15 09:18:40 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA26030 for tcp-impl-list; Mon, 15 Sep 1997 09:17:08 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA25622 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 15 Sep 1997 09:16:36 -0700
Received: from mail3.microsoft.com (mail3.microsoft.com [131.107.3.23]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA27472
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 15 Sep 1997 09:16:29 -0700
	env-from (peterf@microsoft.com)
Received: by mail3.microsoft.com with Internet Mail Service (5.0.1459.27)
	id <S9TKS1V7>; Mon, 15 Sep 1997 09:17:13 -0700
Message-ID: <40357F94775ECF118C2800805FD46CBD0292E0F4@RED-69-MSG.dns.microsoft.com>
From: Peter Ford <peterf@microsoft.com>
To: "'Jerry.Chu@Eng.Sun.COM'" <Jerry.Chu@Eng.Sun.COM>,
        alan@lxorguk.ukuu.org.uk
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: RE: what IW is being used today?
Date: Mon, 15 Sep 1997 09:16:12 -0700
X-Priority: 3
X-Mailer: Internet Mail Service (5.0.1459.27)
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


There is a lot of empirical evidence that indicates IW=2 works.   There
also does not seem to be evidence that IW=2 is a bad thing.

Sun's stated direction appears to be on the mark given this state of
affairs:

"Sun is actively participating in an effort in IETF to revise TCP
specification to allow more packets to be sent initially. Once the
revision is ratified, Sun will take the appropriate actions to upgrade
Solaris TCP accordingly." 
[http://www.sun.com/sun-on-net/performance/tcp.slowstart.html]




Cheers,
Peter


	-----Original Message-----
	From:	Jerry.Chu@Eng.Sun.COM [SMTP:Jerry.Chu@Eng.Sun.COM]
	Sent:	Monday, September 15, 1997 3:08 AM
	To:	alan@lxorguk.ukuu.org.uk
	Cc:	tcp-impl@cthulhu.engr.sgi.com
	Subject:	Re: what IW is being used today?

	> > Sounds like IW=2 is more important for avoiding problems
with delayed
	> > ACK (one clock tick can be 500 msec) than it is for getting
the window
	> > fully open in one less RTT.
	> 
	> So should we fix the delayed ack timer policy on those systems
or use IW=2 ?
	> The former seems to work out on slow networks.
	> 

	That's what Solaris has (not delaying the ack of the first data
segment)
	for years, and one of the reason why we didn't discover the IW=1
vs
	delayed-ack interoperability hiccup sooner (because we don't see
this
	on Solaris to Solaris testing).

	We were compelled to add a new tcp tunable
"tcp_slow_start_initial" in
	Solaris 2.6 in order to circumvent this problem. For a complete
story,
	check out the following web page:

	http://www.sun.com/sun-on-net/performance/tcp.slowstart.html

	Jerry

From owner-tcp-impl@relay.engr.sgi.com  Mon Sep 15 12:39:36 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA16940 for tcp-impl-list; Mon, 15 Sep 1997 12:35:48 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA16923 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 15 Sep 1997 12:35:45 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA03248
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 15 Sep 1997 12:35:39 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id UAA22127; Mon, 15 Sep 1997 20:29:57 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0xAggU-0005FsC; Mon, 15 Sep 97 20:19 BST
Message-Id: <m0xAggU-0005FsC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: what IW is being used today?
To: peterf@microsoft.com (Peter Ford)
Date: Mon, 15 Sep 1997 20:19:46 +0100 (BST)
Cc: Jerry.Chu@Eng.Sun.COM, alan@lxorguk.ukuu.org.uk,
        tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <40357F94775ECF118C2800805FD46CBD0292E0F4@RED-69-MSG.dns.microsoft.com> from "Peter Ford" at Sep 15, 97 09:16:12 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> There is a lot of empirical evidence that indicates IW=2 works.   There
> also does not seem to be evidence that IW=2 is a bad thing.

I know only one verified example - the 9600 baud half duplex radio link here.
On that IW=2 sucks badly (but in general stuff sucks pretty badly over it).
I've not found anything else it made worse (28.8,14.4 modem, 64K circuits
and ethernet)

Alan


From owner-tcp-impl@relay.engr.sgi.com  Mon Sep 15 16:19:08 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA11887 for tcp-impl-list; Mon, 15 Sep 1997 16:16:52 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA11876 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 15 Sep 1997 16:16:50 -0700
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA13846
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 15 Sep 1997 16:16:49 -0700
	env-from (Jerry.Chu@eng.Sun.COM)
Received: from sunmail1.Sun.COM ([129.145.1.2]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id QAA10626; Mon, 15 Sep 1997 16:16:47 -0700
Received: from jurassic.eng.sun.com by sunmail1.Sun.COM (SMI-8.6/SMI-4.1)
	id QAA21100; Mon, 15 Sep 1997 16:16:43 -0700
Received: from taipei.eng.sun.com (taipei [129.146.86.158])
	by jurassic.eng.sun.com (8.8.7+Sun.Alpha.7/8.8.7) with SMTP id QAA20008;
	Mon, 15 Sep 1997 16:16:45 -0700 (PDT)
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id QAA26732; Mon, 15 Sep 1997 16:13:02 -0700
Date: Mon, 15 Sep 1997 16:13:02 -0700
From: Jerry.Chu@eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199709152313.QAA26732@taipei.eng.sun.com>
To: raj@hpisrdq.cup.hp.com
Subject: Re: what IW is being used today?
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> but doesn't sending an immediate ack for the first data segment mean
> *more* packets (ACKs) in flight across the network, and then more congestion?
> 

If all tcp implementations converge on IW=2 at some point, I guess
we can switch back to the standard delayed-ack for the first data segment
to conserve packets.

But I vaguely remember a case where a http server making multiple write()
calls, each consisting of small chunk of data, bumped into either Nagle or
silly-window avoidance and exposed similar performance problem w/
delayed-acks. In general I'm more concerned about p-http where many tcp
performance anomaly may be exposed. Any comments?

Jerry



From owner-tcp-impl@relay.engr.sgi.com  Mon Sep 15 17:39:24 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA09925 for tcp-impl-list; Mon, 15 Sep 1997 17:37:47 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from neteng.engr.sgi.com (neteng.engr.sgi.com [192.26.80.10]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA09907; Mon, 15 Sep 1997 17:37:45 -0700
Received: from localhost (sca@localhost) by neteng.engr.sgi.com (970903.SGI.8.8.7/960327.SGI.AUTOCF) via SMTP id RAA462644; Mon, 15 Sep 1997 17:37:44 -0700 (PDT)
Message-Id: <199709160037.RAA462644@neteng.engr.sgi.com>
To: minutes@ietf.org, tcp-impl@cthulhu.engr.sgi.com
Subject: Munich Minutes
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <462288.874370264.1@neteng.engr.sgi.com>
Date: Mon, 15 Sep 1997 17:37:44 -0700
From: Steve Alexander <sca@neteng.engr.sgi.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Apologies for being so late with these...

-- Steve

The TCP/IP Implementor's Working Group met in Munich, on Monday, August 11th,
at 7:30 PM.

Steve Alexander presented an overview of recent changes that have been
made to the "Known Problems" I-D.  There were two, namely:
	- a discussion of keepalive problems was added
	- the "significance" category now includes a description of
	  environments in which the problem is significant

Steve Parker then gave an overview of the current status of the "Testing Tools"
I-D.  For each tool listed, the document provides information on:
	- name
	- category
	- availability
	- description
	- an overview of how automated the tool is
	- other references

Currently the following tools are described:
	- dummynet, netperf, orchestra, packet shell, tcpanaly, tcptrace,
	  tcplook, treno

Feedback from the audience suggested two additions:
	- sock (Rich Stevens' test program); Kent Malave agreed to provide
	  a description of this
	- SPINS, a protocol verification package

The next item on the agenda was an overview of proposed changes to RFC 2001.
2001 is being revised to allow implementations which implement the required
algorithms in an acceptable manner, but which may differ slightly from
4.3-BSD-Reno to be considered conformant to RFC 2001.  At the same time, it 
is proposed that the initial slow-start window should be increased to two
segments.

There is an internet draft written by Floyd/Allman/Partridge, which discusses
some additional proposed changes.  Craig Partridge gave an overview of terms:
	- IW; initial congestion window
	- RW; restart congestion window after idle
	- LW; congestion window used after a loss

Sally Floyd indicated that more research was needed to ensure that the correct
trade-offs are made between good scenarios and bad.

Craig Partridge asserted that no simulations done to date showed any problems
with raising IW to two segments, but agreed that more cross-traffic simulation
is probably a good idea.

Allyn Romanow pointed out that not all of the changes in the Floyd/Allman/
Partridge draft were being considered for the RFC 2001 update; only increasing
IW to two segments.

Van Jacobson gave a historical overview of the reasons for the IW being one
segment initially, namely a very early ethernet card that could only buffer
a single frame on receive.  Van also pointed out that dropped SYNs were a
large problem with current web traffic.

It was pointed out that if the congestion window oscillated between one segment
and two, that this might be less bursty than with four.

Matt Mathis pointed out that the current spec (2001) isn't completely
up-to-date with all of the latest TCP congestion control enhancements.

Craig agreed that more simulation should be done to ensure that fairness
issues are handled correctly.

Van pointed out that timing is really used for congestion control.  He
explained that the RTT est. is not really used to estimate round trip time, but
rather as a clock to determine when it is likely that a packet has left the
network.  Van suggested that using the max rtt might be better than using the
single RTT, which is a lower bound.

Craig Partridge volunteered to write up the RTO algorithm (Karn's) as an RFC,
since it is not currently documented in the RFC series.

Bob Braden mentioned that the reason that a lot of Van's early work is not
explained in RFC 1122 is that the 1122 working group didn't think that they
could explain it as well, and that it provided an incentive to read the paper
and get the whole story.  It was suggested that Van should convert the paper
into an RFC; Van seemed amenable.

After the congestion control discussion ended, a list of outstanding problems
for the "Known Problems" I-D was presented, and again, volunteers were
solicited to contribute text to the I-D.

Bernard Volz mentioned a new problem, namely implementations that only send a
FIN after all outstanding data has been ACKed; this should be added to
the I-D.



A discussion of the IRTF End2end group's position on research vs. engineering
was skipped.  This was because the summary had been posted to the mailing list
but generated little interest.

The meeting then adjourned.

Action items:
	- Kent Malave, write up description of 'sock'
	- Ian Heavens, write up issues around half-duplex close
	- Van Jacobson, possibly write up latest SIGCOMM paper as RFC
	- Craig Partridge, write I-D on Karn's algorithm

-- Steve Alexander, Vern Paxson

From owner-tcp-impl@relay.engr.sgi.com  Mon Sep 15 17:44:08 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA11970 for tcp-impl-list; Mon, 15 Sep 1997 17:42:59 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA11963 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 15 Sep 1997 17:42:58 -0700
Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.235]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA06783
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 15 Sep 1997 17:42:54 -0700
	env-from (raj@hpisrdq.cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185])
	by palrel1.hp.com (8.8.6/8.8.5tis) with SMTP id RAA07664
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 15 Sep 1997 17:42:53 -0700 (PDT)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA01579; Mon, 15 Sep 1997 17:40:06 -0700
Message-Id: <341DD566.5B8D@cup.hp.com>
Date: Mon, 15 Sep 1997 17:40:06 -0700
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: Hsiao-keng Jerry Chu <Jerry.Chu@eng.Sun.COM>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today?
References: <199709152313.QAA26732@taipei.eng.sun.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> But I vaguely remember a case where a http server making multiple write()
> calls, each consisting of small chunk of data, bumped into either Nagle or
> silly-window avoidance and exposed similar performance problem w/
> delayed-acks.

That though is a case of a seriously broken http server _application_.
All data that is logically associated should (must) be provided to the
transport in a single call to writev() or sendmsg() if the application
is to expect decent performance.

At least, that is how I got one (maybe two) web server vendors to write
their code :)

When the application provides the data in one call to the transport, and
then the transports (plural) preclude timely delivery through no fault
of the network, that is a broken transport _pairing_.

rick jones
sometimes those benchmarks do good things...

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep 16 02:04:50 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id CAA07753 for tcp-impl-list; Tue, 16 Sep 1997 02:03:00 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id CAA07738 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 16 Sep 1997 02:02:54 -0700
Received: from callisto.ftel.co.uk (callisto.ftel.co.uk [192.131.79.11]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id CAA01697
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 16 Sep 1997 02:02:52 -0700
	env-from (G.Cope@ftel.co.uk)
Received: from callisto.ftel.co.uk (shsQqCqArkd70HAyE5puRQMkxOCyrtNP@localhost.ftel.co.uk [127.0.0.1])
	by callisto.ftel.co.uk (8.8.7/8.8.7/Revision:1.32) with SMTP id KAA18627;
	Tue, 16 Sep 1997 10:02:44 +0100 (BST)
Message-ID: <341E4B33.2CFE@ftel.co.uk>
Date: Tue, 16 Sep 1997 10:02:43 +0100
From: Graham Cope <G.Cope@ftel.co.uk>
Organization: Fujitsu Telecommunications Europe Ltd
X-Mailer: Mozilla 3.01Gold (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: Hsiao-keng Jerry Chu <Jerry.Chu@Eng.Sun.COM>
CC: tcp-impl@cthulhu.engr.sgi.com
Subject: Initial ssthresh
References: <199709151007.DAA26153@taipei.eng.sun.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> That's what Solaris has (not delaying the ack of the first data segment)
> for years, and one of the reason why we didn't discover the IW=1 vs
> delayed-ack interoperability hiccup sooner (because we don't see this
> on Solaris to Solaris testing).
> 
> We were compelled to add a new tcp tunable "tcp_slow_start_initial" in
> Solaris 2.6 in order to circumvent this problem. For a complete story,
> check out the following web page:
> 
> http://www.sun.com/sun-on-net/performance/tcp.slowstart.html



I have developed a TCP simulation model that works as described in
RFC2001 (+other RFCs). I have observed some strange start-up behaviour
with TCP, which is related to the above, but to do with ssthresh, and
not the initial window, IW.

I have a system with an Ethernet feeding into buffer (= a bridge or
router) which transmits at slow speed (e.g. 64kbps).
  The window opens quite quickly, fills up the buffer, and delays go up
quickly, resulting in time-outs and re-transmissions. For a few seconds
the system is OK, but then it becomes highly congested, but eventually
settles down.
  The basic problem is that ssthresh is set to 65535 bytes, and hence
the cwnd initially opens (almost) exponentially fast, filling the
upstream buffer quickly.

A solution is to set ssthresh = 1 (MSS). This gives a more controlled
(linear) start-up. But, in a high speed network I guess that such a
change would be detrimental (we want the window to open quickly).


I have only simulated this, so does anyone have comments on whether is
is:

a) a real effect
b) a bug in my understanding
c) a bug in my model


Graham Cope

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep 16 11:54:51 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA16537 for tcp-impl-list; Tue, 16 Sep 1997 11:52:01 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA16456 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 16 Sep 1997 11:51:50 -0700
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA20841
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 16 Sep 1997 11:51:43 -0700
	env-from (Jerry.Chu@eng.Sun.COM)
Received: from sunmail1.Sun.COM ([129.145.1.2]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id LAA19633; Tue, 16 Sep 1997 11:51:35 -0700
Received: from jurassic.eng.sun.com by sunmail1.Sun.COM (SMI-8.6/SMI-4.1)
	id LAA11049; Tue, 16 Sep 1997 11:51:33 -0700
Received: from taipei.eng.sun.com (taipei [129.146.86.158])
	by jurassic.eng.sun.com (8.8.7+Sun.Alpha.7/8.8.7) with SMTP id LAA03113;
	Tue, 16 Sep 1997 11:51:32 -0700 (PDT)
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id LAA27569; Tue, 16 Sep 1997 11:47:49 -0700
Date: Tue, 16 Sep 1997 11:47:49 -0700
From: Jerry.Chu@eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199709161847.LAA27569@taipei.eng.sun.com>
To: G.Cope@ftel.co.uk
Subject: Re: Initial ssthresh
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


>  The basic problem is that ssthresh is set to 65535 bytes, and hence
>the cwnd initially opens (almost) exponentially fast, filling the
>upstream buffer quickly.

Many TCP implementations, including Solaris, cache route attributes
like ssthresh so that there is a better understanding of what the
bottle-neck buffer size may be after the first run, and thus a better
initial value to use for ssthresh. If things work out as expected, you
should see less and less initial congestion as you repeat the test.

Jerry


From owner-tcp-impl@relay.engr.sgi.com  Tue Sep 16 14:03:04 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA00625 for tcp-impl-list; Tue, 16 Sep 1997 14:01:33 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA00607 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 16 Sep 1997 14:01:30 -0700
Received: from firewall.agranat.com (agranat.com [146.115.131.2]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA28655
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 16 Sep 1997 14:01:27 -0700
	env-from (lawrence@devnix.agranat.com)
Received: from agranat.com (s1 [192.104.71.130]) by firewall.agranat.com (8.6.12/8.6.9) with ESMTP id RAA20991 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 16 Sep 1997 17:01:20 -0400
Received: from devnix.agranat.com (root@devnix.agranat.com [192.104.71.180])
	by agranat.com (8.8.5/8.8.5) with ESMTP id RAA17673
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 16 Sep 1997 17:01:18 -0400
Received: from devnix.agranat.com (lawrence@localhost [127.0.0.1]) by devnix.agranat.com (8.8.5/8.6.9devnix) with ESMTP id RAA21610 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 16 Sep 1997 17:02:35 -0400
Message-Id: <199709162102.RAA21610@devnix.agranat.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Munich Minutes
In-reply-to: <199709160037.RAA462644@neteng.engr.sgi.com>
Date: Tue, 16 Sep 1997 17:02:30 -0400
From: "Scott Lawrence" <lawrence@agranat.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


>>>>> "SA" == Steve Alexander <sca@neteng.engr.sgi.com> writes:

SA> Bernard Volz mentioned a new problem, namely implementations that
SA> only send a FIN after all outstanding data has been ACKed; this
SA> should be added to the I-D.

  Can a summary of this issue be sent to the list as well so that we
  need not wait for the next draft to see it?

--
Scott Lawrence           EmWeb Embedded Server       <lawrence@agranat.com>
Agranat Systems, Inc.        Engineering            http://www.agranat.com/

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep 16 15:07:44 1997
Received: (from majordomo@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA22697 for tcp-impl-list; Tue, 16 Sep 1997 15:06:01 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA22681 for <TCP-IMPL@cthulhu.engr.sgi.com>; Tue, 16 Sep 1997 15:05:59 -0700
Received: from alcor.process.com (alcor.process.com [192.42.95.16]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id PAA19595
	for <TCP-IMPL@CTHULHU.ENGR.SGI.COM>; Tue, 16 Sep 1997 15:05:56 -0700
	env-from (VOLZ@PROCESS.COM)
Date:     Tue, 16 Sep 1997 18:05 -0400
From: VOLZ@PROCESS.COM (Bernie Volz)
Message-Id: <009BA65397121686.10B8@PROCESS.COM>
To: lawrence@agranat.com, TCP-IMPL@cthulhu.engr.sgi.com
Subject:  Re: Munich Minutes
X-VMS-To: SMTP%"lawrence@agranat.com"
X-VMS-Cc: TCP-IMPL@CTHULHU.ENGR.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>SA> Bernard Volz mentioned a new problem, namely implementations that
>SA> only send a FIN after all outstanding data has been ACKed; this
>SA> should be added to the I-D.
>
>  Can a summary of this issue be sent to the list as well so that we
>  need not wait for the next draft to see it?
>--
>Scott Lawrence           EmWeb Embedded Server       <lawrence@agranat.com>

Here's a write up (following the format used in DRAFT-IETF-TCPIMPL-PROB)
on the additional TCP implementation problem that I raised at the TCP
Implementators Working Group session in Munich. This is pretty much what
I sent Craig and Vern in late August.

- Bernie Volz
  Process Software Corporation

-----

Failure to Send FIN Notifications Promptly

Classification
	Performance

Description
	When the application "closes" the connection, TCP should send
	any remaining data and the FIN notification promptly to the
	peer. A few implementations will delay sending (any remaining
	data) and the FIN notification until the connection is idle
	(ie, all unacknowledged data is acknowledged). This results
	in reduced performance as connection closure is typically
	delayed by the delay acknowledgement of the peer (which is
	nominally 200 milliseconds).

	The TCP send policy algorithm should include a condition for
	a pending FIN. If there is sufficient TCP window space to
	send the remaining data, it should be sent immediately with
	the FIN indication regardless of how little data is available.
	If no data is remaining to be sent, the FIN should be sent
	immediately.

	Sending of the FIN should never be delayed if all data has
	been sent or can be sent with the FIN. Unacknowledged data
	must not delay sending of the FIN unless the window does not
	allow sending any remaining unsent data.

	Note that a FIN segment should also always have the PSH flag
	set.

Significane
	Can lead to reduced performance especially for heavily used
	services such as HTTP.

Implications
	Reduced performance.

Trace file demonstrating it
	Made using tcpdump (no lossed reported):

	10:04:38.68 A > B: S 1031850376:1031850376(0) win 4096
			<mss 1460,wscale 0,eol> (DF)
	10:04:38.71 B > A: S 596916473:596916473(0) ack 1031850377
			win 8760 <mss 1460> (DF)
	10:04:38.73 A > B: . ack 1 win 4096 (DF)
	10:04:41.98 A > B: P 1:4(3) ack 1 win 4096 (DF)
	10:04:42.15 B > A: . ack 4 win 8757 (DF)
	10:04:42.23 A > B: P 4:7(3) ack 1 win 4096 (DF)
	10:04:42.25 B > A: P 1:11(10) ack 7 win 8754 (DF)
	10:04:42.32 A > B: . ack 11 win 4096 (DF)
	10:04:42.33 B > A: P 11:51(40) ack 7 win 8754 (DF)
	10:04:42.51 A > B: . ack 51 win 4096 (DF)
	10:04:42.53 B > A: F 51:51(0) ack 7 win 8754 (DF)
	10:04:42.56 A > B: FP 7:7(0) ack52 win 4096 (DF)
	10:04:42.58 B > A: . ack 8 win 8754 (DF)

	Machine B in the trace below does not send out a FIN
	notification promptly if there is any data outstanding. It
	instead waits for all unacknowledged data to be acknowledged
	before sending the FIN bit. The connection was closed at
	10:04.42.33 after requesting 40 octets to be sent. However,
	the FIN notification wasn't sent until 10:04.42.51 (after the
	(delayed) acknowledgement of the 40 octets of data).

Trace file demonstrating correct behavoir:
	Made using tcpdump (no lossed reported):

	10:27:53.85 C > D: S 419744533:419744533(0) win 4096
			<mss 1460,wscale 0,eol> (DF)
	10:27:53.92 D > C: S 10082297:10082297(0) ack 419744534
			win 8760 <mss 1460> (DF)
	10:27:53.95 C > D: . ack 1 win 4096 (DF)
	10:27:54.62 D > C: . ack 4 win 8757 (DF)
	10:27:54.76 C > D: P 4:7(3) ack 1 win 4096 (DF)
	10:27:54.89 D > C: P 1:11(10) ack 7 win 8754 (DF)
	10:27:54.90 D > C: FP 11:51(40) ack7 win 8754 (DF)
	10:27:54.92 C > D: . ack 52 win 4096 (DF)
	10:27:55.01 C > D: FP 7:7(0) ack 52 win 4096 (DF)
	10:27:55.09 D > C: . ack 8 win 8754 (DF)

	Here, Machine D sends the FIN with the 40 octets of data even
	before the original 10 octets have been acknowledged. This is
	correct behavoir as it provides for the highest performance.


From owner-tcp-impl@relay.engr.sgi.com  Fri Sep 19 03:32:31 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id DAA14804 for tcp-impl-list; Fri, 19 Sep 1997 03:31:15 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id DAA14799 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 19 Sep 1997 03:31:13 -0700
Received: from fly.cnuce.cnr.it (foda-devel.cnuce.cnr.it [131.114.192.86]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id DAA13983
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 19 Sep 1997 03:31:07 -0700
	env-from (pot@fly.cnuce.cnr.it)
Received: by fly.cnuce.cnr.it (Smail3.1.26.7 #3)
	id m0xC0N5-0004XnC; Fri, 19 Sep 97 12:33 MET
Message-Id: <m0xC0N5-0004XnC@fly.cnuce.cnr.it>
Date: Fri, 19 Sep 97 12:33 MET
From: Francesco Potorti` <F.Potorti@cnuce.cnr.it>
To: VOLZ@PROCESS.COM (Bernie Volz)
CC: tcp-impl@cthulhu.engr.sgi.com, lawrence@agranat.com
In-reply-to: <009BA65397121686.10B8@PROCESS.COM> (VOLZ@PROCESS.COM)
Subject: Re: Munich Minutes
Organization: CNUCE-CNR, Via S.Maria 36, Pisa - Italy +39-50-593211
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

I would like to point out a typo and a problem in the traces you sent
together with your text "Failure to Send FIN Notifications Promptly":

Typo:   
   	10:04:42.56 A > B: FP 7:7(0) ack52 win 4096 (DF)
                                     ^^^^^

Problem:
   	10:27:53.85 C > D: S 419744533:419744533(0) win 4096
   			<mss 1460,wscale 0,eol> (DF)
   	10:27:53.92 D > C: S 10082297:10082297(0) ack 419744534
   			win 8760 <mss 1460> (DF)
   	10:27:53.95 C > D: . ack 1 win 4096 (DF)
   	10:27:54.62 D > C: . ack 4 win 8757 (DF)

In the previous trace, a packet is missing (the one from station D
that carries bytes 1:4(3) of data).

Regards

-- 
Francesco Potorti` (researcher)        Voice:    +39-50-593203
Computer Network Division              Operator: +39-50-593211
CNUCE-CNR, Via Santa Maria 36          Fax:      +39-50-904052
56126 Pisa - Italy                     Email:    F.Potorti@cnuce.cnr.it

From owner-tcp-impl@relay.engr.sgi.com  Mon Sep 22 10:17:24 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA15705 for tcp-impl-list; Mon, 22 Sep 1997 10:15:25 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA15687 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 22 Sep 1997 10:15:24 -0700
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA10981
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 22 Sep 1997 10:15:16 -0700
	env-from (sparker@fstop.Eng.Sun.COM)
Received: from Eng.Sun.COM ([129.146.1.25]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id KAA17776 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 22 Sep 1997 10:15:16 -0700
Received: from fstop. by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id KAA16188; Mon, 22 Sep 1997 10:15:14 -0700
Received: from fstop by fstop. (SMI-8.6/SMI-SVR4)
	id KAA19811; Mon, 22 Sep 1997 10:15:01 -0700
Message-Id: <199709221715.KAA19811@fstop.>
From: sparker@Eng.Sun.COM
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Munich Minutes 
Date: Mon, 22 Sep 1997 10:15:01 -0700
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


With respect to the TCP testing tools I-D, the minutes mention:

- Feedback from the audience suggested two additions:
- 	- sock (Rich Stevens' test program); Kent Malave agreed to provide
- 	  a description of this

Kent, if you're out there, please drop me a line.

- 	- SPINS, a protocol verification package

So I reviewed this, and I'm not so sure it's helpful to include this.
SPINS cannot be used to test an implementation, only to take a translation
of an implementation, and prove the protocol behaves according to the
assertions posited.

While it's a fine tool, I wonder, given that buy its nature it cannot
test an existing implementation directly if it should be considered
relevant?  Or is it worth including seprately as a textual footnote?

I'ld like input from the group on this.

Cheers,

	~sparker

From owner-tcp-impl@relay.engr.sgi.com  Wed Sep 24 05:53:48 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id FAA08090 for tcp-impl-list; Wed, 24 Sep 1997 05:52:40 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA08070 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 24 Sep 1997 05:52:25 -0700
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id FAA02791
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 24 Sep 1997 05:49:17 -0700
	env-from (chetan@protocol.ece.iisc.ernet.in)
Received: from ece.iisc.ernet.in by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id SAA11924; Wed, 24 Sep 1997 18:18:51 +0530
Received: from protocol.ece.iisc.ernet.in by ece.iisc.ernet.in (4.1/SMI-4.1)
	id AA17617; Wed, 24 Sep 97 18:18:49+0530
Received: from localhost (chetan@localhost)
	by protocol.ece.iisc.ernet.in (8.8.5/8.8.5) with SMTP id SAA13660;
	Wed, 24 Sep 1997 18:21:42 GMT
Date: Wed, 24 Sep 1997 18:21:42 +0000 (GMT)
From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
To: tcp-impl@cthulhu.engr.sgi.com
Cc: chetan@sam.ece.iisc.ernet.in
Subject: TCP on Mobile IP
In-Reply-To: <341E4B33.2CFE@ftel.co.uk>
Message-Id: <Pine.LNX.3.95.970924181659.13446B-100000@protocol.ece.iisc.ernet.in>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

 
Hello there !

Can any one tell me if there is any mailing list for TCP on Mobile IP,

Or is there any place where I can get much information about same.

Any help is appreciated.

Thanks in advance
*****************************************************************
chetan . S

    ::::::::::: TREE SAVES THOSE WHO SAVE TREES ::::::::::::


E-mail chetan@protocol.ece.iisc.ernet.in

WEB PAGE - http://pclab.ece.iisc.ernet.in/chetan

Snail mail
 #104 ,East Park Road,
8 th Cross,Malleshwarm,
Bangalore,
Karnataka,India.
pin 560003.

Phone 
	work place
		(080)3092282
	res.
		(080)3349218      
		(080)3347220


From owner-tcp-impl@relay.engr.sgi.com  Fri Sep 26 12:58:16 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA25994 for tcp-impl-list; Fri, 26 Sep 1997 12:57:09 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA25972 for <tcp-impl@engr.sgi.com>; Fri, 26 Sep 1997 12:57:07 -0700
Received: from aware.com (gateway.aware.com [192.80.75.194]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA15466
	for <tcp-impl@engr.sgi.com>; Fri, 26 Sep 1997 12:56:46 -0700
	env-from (thuan@aware.com)
Received: by gateway.aware.com id <11651>; Fri, 26 Sep 1997 15:54:45 -0400
Date: Fri, 26 Sep 1997 15:54:17 -0400
Message-Id: <97Sep26.155445edt.11651@gateway.aware.com>
From: Thuan Tran <thuan@aware.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
To: tcp-impl@engr.sgi.com
X-Mailer: VM 6.34 under Emacs 19.34.6
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

subscribe


From owner-tcp-impl@relay.engr.sgi.com  Fri Sep 26 14:34:02 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA13262 for tcp-impl-list; Fri, 26 Sep 1997 14:32:41 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA13240 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 26 Sep 1997 14:32:39 -0700
Received: from monsoon.dial.pipex.net (monsoon.dial.pipex.net [158.43.128.69]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id OAA13994
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 26 Sep 1997 14:32:36 -0700
	env-from (aa296@dial.pipex.com)
Message-Id: <199709262132.OAA13994@sgi.sgi.com>
Received: (qmail 25719 invoked from network); 26 Sep 1997 21:32:33 -0000
Received: from ae248.du.pipex.com (HELO steve) (193.130.244.248)
  by smtp.dial.pipex.com with SMTP; 26 Sep 1997 21:32:33 -0000
Reply-To: <aa296@dial.pipex.com>
From: "Steven Wass" <aa296@dial.pipex.com>
To: <tcp-impl@cthulhu.engr.sgi.com>
Subject: Fw: 
Date: Fri, 26 Sep 1997 22:37:52 +0100
X-MSMail-Priority: Normal
X-Priority: 3
X-Mailer: Microsoft Internet Mail 4.70.1161
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk



----------
> From: Thuan Tran <thuan@aware.com>
> To: tcp-impl@cthulhu.engr.sgi.com
> Subject: 
> Date: Friday, September 26, 1997 8:54 PM
> 
> subscribe

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep 30 10:15:09 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA24430 for tcp-impl-list; Tue, 30 Sep 1997 10:13:02 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA24388 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 30 Sep 1997 10:12:59 -0700
Received: from merit.edu (merit.edu [198.108.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA18544
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 30 Sep 1997 10:12:53 -0700
	env-from (wsimpson@greendragon.com)
Received: from Bill.Simpson.DialUp.Mich.Net (pm054-24.dialip.mich.net [141.211.6.162])
	by merit.edu (8.8.7/8.8.5) with SMTP id NAA02317
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 30 Sep 1997 13:12:50 -0400 (EDT)
Date: Tue, 30 Sep 97 16:18:07 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <6596.wsimpson@greendragon.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today?
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > peterf@microsoft.com (Peter Ford) opines:
> > There is a lot of empirical evidence that indicates IW=2 works.   There
> > also does not seem to be evidence that IW=2 is a bad thing.
>
> From: alan@lxorguk.ukuu.org.uk (Alan Cox)
> I know only one verified example - the 9600 baud half duplex radio link here.
> On that IW=2 sucks badly (but in general stuff sucks pretty badly over it).
> I've not found anything else it made worse (28.8,14.4 modem, 64K circuits
> and ethernet)
>
I have had similar experience to Alan at 9600 (and slower).  There is
still a fair amount of that around here.  Merit just upgraded the 2400
only last year -- I expect it will be awhile until all the 9600 will be
replaced.

I have seen IW=2 work badly on 28.8 when talking to SunOS (with POP3).

I consistently get a retransmit on the 4th (and sometimes 3rd) packet
coming back, which is probably the window opening just enough to send 4
datagrams in response to my 1 plus Syn, and then timing out when the
Acks are not fast enough.  I don't have a trace on both ends, so this is
guesswork.

And there is a lot of SunOS still out there.  Folks don't upgrade (at
Sun upgrade prices).  And look at how much W3.1 is still around at
cheaper M$ upgrade prices!

So, I'm more in favor of recommending no delay on the first Ack after
Syn+Ack, rather than IW=2.

How about _MUST_ no delay first Ack after idle, and _MAY_ IW=RW=2?

That will handle performance for new/upgraded systems, while not
harming traffic with old ones unless deliberately configured.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep 30 18:59:38 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA03402 for tcp-impl-list; Tue, 30 Sep 1997 18:58:17 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA03397 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 30 Sep 1997 18:58:11 -0700
Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.235]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id SAA04800
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 30 Sep 1997 18:58:10 -0700
	env-from (raj@hpisrdq.cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185])
	by palrel1.hp.com (8.8.6/8.8.5tis) with SMTP id SAA16935
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 30 Sep 1997 18:58:09 -0700 (PDT)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA03714; Tue, 30 Sep 1997 18:54:53 -0700
Message-Id: <3431AD6C.31A6@cup.hp.com>
Date: Tue, 30 Sep 1997 18:54:52 -0700
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: William Allen Simpson <wsimpson@greendragon.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today?
References: <6596.wsimpson@greendragon.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> So, I'm more in favor of recommending no delay on the first Ack after
> Syn+Ack, rather than IW=2.
> 
> How about _MUST_ no delay first Ack after idle, and _MAY_ IW=RW=2?
> 
> That will handle performance for new/upgraded systems, while not
> harming traffic with old ones unless deliberately configured.

Except it will put an "extra" ACK packet on the network for small
"transactions" larger than one MSS. I'd rather not make each transaction
more packets, that just feels like asking for even more congestion -
perhaps even more than burstiness from IW=2.

rick jones

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep 30 19:36:09 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA07726 for tcp-impl-list; Tue, 30 Sep 1997 19:34:48 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA07664 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 30 Sep 1997 19:34:40 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id TAA12080
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 30 Sep 1997 19:34:39 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.7/8.8.5)
	id TAA06796; Tue, 30 Sep 1997 19:34:16 -0700 (PDT)
Message-Id: <199710010234.TAA06796@daffy.ee.lbl.gov>
To: "William Allen Simpson" <wsimpson@greendragon.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today?
In-reply-to: Your message of Tue, 30 Sep 1997 16:18:07 PST.
Date: Tue, 30 Sep 1997 19:34:15 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

(Not speaking in an IETF capacity)

> I have seen IW=2 work badly on 28.8 when talking to SunOS (with POP3).
> ...
> I consistently get a retransmit on the 4th (and sometimes 3rd) packet

So how is this significantly different from IW=1?  One RTT later, you're
in the same situation, cwnd=2.

> And there is a lot of SunOS still out there ...

SunOS is Tahoe-derived and behaves in many ways like other BSD
implementations.  So if it was SunOS then what you're describing is
probably widespread.

> How about _MUST_ no delay first Ack after idle, and _MAY_ IW=RW=2?

I'd go with MAY and MAY; or perhaps SHOULD and MAY.  Certainly the intent
for IW=2 is MAY and not MUST.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep 30 22:32:30 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA25374 for tcp-impl-list; Tue, 30 Sep 1997 22:31:07 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id WAA25364 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 30 Sep 1997 22:31:06 -0700
Received: from merit.edu (merit.edu [198.108.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id WAA13349
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 30 Sep 1997 22:31:03 -0700
	env-from (wsimpson@greendragon.com)
Received: from Bill.Simpson.DialUp.Mich.Net (pm035-05.dialip.mich.net [141.211.7.16])
	by merit.edu (8.8.7/8.8.5) with SMTP id BAA12180;
	Wed, 1 Oct 1997 01:30:59 -0400 (EDT)
Date: Wed, 1 Oct 97 03:59:15 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <6608.wsimpson@greendragon.com>
To: Vern Paxson <vern@ee.lbl.gov>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today?
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Vern Paxson <vern@ee.lbl.gov>
> > I have seen IW=2 work badly on 28.8 when talking to SunOS (with POP3).
> > ...
> > I consistently get a retransmit on the 4th (and sometimes 3rd) packet
>
> So how is this significantly different from IW=1?  One RTT later, you're
> in the same situation, cwnd=2.
>
For whatever reason, when I tuned the code back to IW=1, the retransmits
went away, even when delaying the first Ack.  I can guess that the SRTT
backs off enough that the initial RTT smoothed with a longer 2nd packet
RTT gives enough delay that the following timeout is long enough.  Just
a guess.

But IW=2 gives noticable decreased thruput.  Particularly for POP3,
where each message is only a few MSS.  Unnecessary retransmits are
a real bandwidth hog on a 28.8 link.


> SunOS is Tahoe-derived and behaves in many ways like other BSD
> implementations.  So if it was SunOS then what you're describing is
> probably widespread.
>
"If"?  I talk to this system daily.  I have this tendency to watch the
SRTT bounce around as I FTP or POP3 from anywhere.  Just idle laziness,
I suppose.  Nothing better to do.  ;-(

For jollies, here is a trace:

  1: 141.211.7.31    pm035-20.dialip.mich.net. (1 ms) (1 ms) (1 ms)
  2: 141.211.7.10    pm035-aa.mich.net. (132 ms) (128 ms) (116 ms)
  3: 141.211.7.2     E-CCB-DIAL1.c-CCB2.UMNET.UMICH.EDU. (116 ms) (112 ms) (87 ms)
  4: 198.108.3.1     fdd3-0.michnet1.mich.net. (157 ms) (117 ms) (144 ms)
  5: 192.203.195.5   cpe3-fddi-1.WillowSprings.mci.net. (140 ms) (146 ms) (113 ms)
  6: 166.48.23.253   bordercore2-hssi1-0.WillowSprings.mci.net. (145 ms) (373 ms) (382 ms)
  7: 204.70.4.9      core3.Atlanta.mci.net. (174 ms) *** (156 ms)
  8: *** 206.157.77.50   ast-psi-nap.Atlanta.mci.net. (207 ms) (149 ms)
  9: 38.1.2.5        core.ithaca.ny.nsf.psi.net. (691 ms) (257 ms) (718 ms)
 10: 38.1.25.2       rc2.southeast.us.psi.net. (172 ms) (175 ms) (157 ms)
 11: 38.1.45.195     dublin.oh.southeast.us.psi.net. (176 ms) (221 ms) (172 ms)
 12: 38.1.45.195     dublin.oh.southeast.us.psi.net. (204 ms) (192 ms) (183 ms)
 13: 38.146.112.110  dublin.oh.isdn.psi.net. (190 ms) (195 ms) (215 ms)
 14: 137.175.1.1     mstar-gate.MorningStar.Com. (221 ms) (239 ms) (232 ms)
 15: 137.175.1.2     link-1.MorningStar.Com. (253 ms) (230 ms) (216 ms)
 16: 137.175.2.11    volitans.MorningStar.Com. (431 ms) (237 ms) (212 ms)

Note that the RTTs have quite a range.  That's why I don't trust the
simulations where folks are reporting such nice numbers for IW=2.  They
just don't match the real world experience shown by a simple trace.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32
BSimpson@MorningStar.com
    Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2

From owner-tcp-impl@relay.engr.sgi.com  Tue Sep 30 22:54:20 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA27963 for tcp-impl-list; Tue, 30 Sep 1997 22:51:33 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id WAA27956 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 30 Sep 1997 22:51:31 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id WAA16893
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 30 Sep 1997 22:51:30 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.7/8.8.5)
	id WAA07061; Tue, 30 Sep 1997 22:51:27 -0700 (PDT)
Message-Id: <199710010551.WAA07061@daffy.ee.lbl.gov>
To: "William Allen Simpson" <wsimpson@greendragon.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: what IW is being used today?
In-reply-to: Your message of Wed, 01 Oct 1997 03:59:15 PST.
Date: Tue, 30 Sep 1997 22:51:27 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> For whatever reason, when I tuned the code back to IW=1, the retransmits
> went away, even when delaying the first Ack.

Hmmmm, it would be interesting to understand this better (in your copious
spare time).  In my thesis I found that the RTT adaptation almost always
worked correctly for BSD-derived TCPs, including SunOS.  But the slowest
path in the study was 56 Kbps.

>   1: 141.211.7.31    pm035-20.dialip.mich.net. (1 ms) (1 ms) (1 ms)
>  ...
>  16: 137.175.2.11    volitans.MorningStar.Com. (431 ms) (237 ms) (212 ms)
> 
> Note that the RTTs have quite a range.  That's why I don't trust the
> simulations where folks are reporting such nice numbers for IW=2.  They
> just don't match the real world experience shown by a simple trace.

Traceroute is not a "simple trace" in terms of RTTs.  There's noise in
generating the ICMP Time Exceeded (this is one of the problems Van tries to
tackle with pathchar) that has nothing to do with the network path.
I certainly agree, though, that sometimes end-to-end packet times have a
large amount of variation.  The key question is whether the current RTT
algorithms, properly implemented, fail to deal with it.  The evidence
of which I'm aware says that the BSD versions of the algorithms are
appropriately conservative - and could be tightened considerably, though
exactly how to do that is still a research area.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Thu Oct  2 16:23:56 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA15476 for tcp-impl-list; Thu, 2 Oct 1997 16:20:19 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA15462 for <tcp-impl@engr.sgi.com>; Thu, 2 Oct 1997 16:20:17 -0700
Received: from utmsi (utmsi.zo.utexas.edu [192.138.168.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id QAA19925
	for <tcp-impl@engr.sgi.com>; Thu, 2 Oct 1997 16:20:15 -0700
	env-from (The.University.of.Texas.at.Austin,Marine.Science.Institute,75@utmsi.zo.utexas.edu)
Received: from msi19.zo.utexas.edu by utmsi (5.0/SMI-SVR4)
	id AA13815; Thu, 2 Oct 1997 14:48:40 +0600
Message-Id: <3.0.32.19971002144405.006ddfcc@utmsi.zo.utexas.edu>
X-Sender: afamos@utmsi.zo.utexas.edu
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Thu, 02 Oct 1997 14:44:06 -0500
To: tcp-impl@engr.sgi.com
From: "Anthony F. Amos" <The.University.of.Texas.at.Austin,Marine.Science.Institute,75@utmsi.zo.utexas.edu>
Subject: What is it?
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
content-length: 0
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Nowhere in the discussion of TCP/IP do I find what the acronym TCP/IP
actually means.  Please tell me?  Also do you know what TICP stands for?
Thanks, Tony Amos



From owner-tcp-impl@relay.engr.sgi.com  Fri Oct  3 22:23:43 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA04642 for tcp-impl-list; Fri, 3 Oct 1997 22:22:20 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id WAA04635 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 3 Oct 1997 22:22:18 -0700
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id WAA27898
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 3 Oct 1997 22:21:32 -0700
	env-from (chetan@protocol.ece.iisc.ernet.in)
Received: from ece.iisc.ernet.in by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id KAA18582; Sat, 4 Oct 1997 10:51:04 +0530
Received: from protocol.ece.iisc.ernet.in by ece.iisc.ernet.in (4.1/SMI-4.1)
	id AA26133; Sat, 4 Oct 97 10:51:04+0530
Received: from localhost (chetan@localhost)
	by protocol.ece.iisc.ernet.in (8.8.5/8.8.5) with SMTP id KAA31017
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 4 Oct 1997 10:53:54 GMT
Date: Sat, 4 Oct 1997 10:53:53 +0000 (GMT)
From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: packets,segments and windows
Message-Id: <Pine.LNX.3.95.971004104730.30904A-100000@protocol.ece.iisc.ernet.in>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

  
Hello !

Can any one tell me how are PACKETS, SEGMENTS and WINDOWS related
in the TCP implementation.

any response brief or elaborate are much appreciated
Thanks in advance



E-mail chetan@protocol.ece.iisc.ernet.in

WEB PAGE - http://pclab.ece.iisc.ernet.in/chetan



From owner-tcp-impl@relay.engr.sgi.com  Sat Oct  4 10:09:14 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA19450 for tcp-impl-list; Sat, 4 Oct 1997 10:07:57 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA19444 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 4 Oct 1997 10:07:52 -0700
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA27410
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 4 Oct 1997 10:07:51 -0700
	env-from (craig@aland.bbn.com)
Received: (from craig@localhost) by aland.bbn.com (8.7.1/8.7.1) id KAA07535; Sat, 4 Oct 1997 10:04:39 -0700 (PDT)
Message-Id: <199710041704.KAA07535@aland.bbn.com>
To: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: packets,segments and windows 
In-reply-to: Your message of Sat, 04 Oct 97 10:53:53 -0000.
             <Pine.LNX.3.95.971004104730.30904A-100000@protocol.ece.iisc.ernet.in> 
Date: Sat, 04 Oct 97 10:04:39 -0700
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


    Can any one tell me how are PACKETS, SEGMENTS and WINDOWS related
    in the TCP implementation.

Sure.  Much of the question can be answered with simple definitions.

PACKET is a layer 2 (e.g. Ethernet, Token Ring, PPP) data unit.  It is the
formatted header and data that is transmitted over the layer 2 infrastructure.

DATAGRAM is the IP data unit, the formatted IP header and data, transmitted
among hosts and routers.

SEGMENT is the TCP data unit, the formatted TCP pseudo-header and data,
transmitted between applications.

A WINDOW is a flow control mechanism, bounding how much unacknowledged
data can be outstanding at any time.  TCP uses multiple WINDOWs.

Craig

From owner-tcp-impl@relay.engr.sgi.com  Sun Oct  5 08:28:59 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA25234 for tcp-impl-list; Sun, 5 Oct 1997 08:27:42 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA25227 for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 5 Oct 1997 08:27:40 -0700
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA17129
	for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 5 Oct 1997 08:25:16 -0700
	env-from (chetan@protocol.ece.iisc.ernet.in)
Received: from ece.iisc.ernet.in by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id SAA22840; Sun, 5 Oct 1997 18:23:44 +0530
Received: from protocol.ece.iisc.ernet.in by ece.iisc.ernet.in (4.1/SMI-4.1)
	id AA12102; Sun, 5 Oct 97 18:23:43+0530
Received: from localhost (chetan@localhost)
	by protocol.ece.iisc.ernet.in (8.8.5/8.8.5) with SMTP id SAA18595;
	Sun, 5 Oct 1997 18:26:47 GMT
Date: Sun, 5 Oct 1997 18:26:47 +0000 (GMT)
From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
To: Craig Partridge <craig@aland.bbn.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: packets,segments and windows 
In-Reply-To: <199710041704.KAA07535@aland.bbn.com>
Message-Id: <Pine.LNX.3.95.971005181855.18440B-100000@protocol.ece.iisc.ernet.in>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


On Sat, 4 Oct 1997, Craig Partridge wrote:

*>
*>    Can any one tell me how are PACKETS, SEGMENTS and WINDOWS related
*>    in the TCP implementation.
*>
*>Sure.  Much of the question can be answered with simple definitions.
*>
*>PACKET is a layer 2 (e.g. Ethernet, Token Ring, PPP) data unit.  It is the
*>formatted header and data that is transmitted over the layer 2 infrastructure.
*>
*>DATAGRAM is the IP data unit, the formatted IP header and data, transmitted
*>among hosts and routers.
*>
*>SEGMENT is the TCP data unit, the formatted TCP pseudo-header and data,
*>transmitted between applications.
*>
*>A WINDOW is a flow control mechanism, bounding how much unacknowledged
*>data can be outstanding at any time.  TCP uses multiple WINDOWs.
*>
*>Craig
*>

Thanks for ur reply

In fact I wanted to what happens in this case

We know that TCP is byte stream oriented windowed flow control. What
happens if the advertising window is exactly MSS+1 or n*MSS+1, n is an
integer ?

Is there any trade off between delaying for the one byte and sending it as
one segment with one byte ?


From owner-tcp-impl@relay.engr.sgi.com  Mon Oct  6 02:18:51 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id CAA08373 for tcp-impl-list; Mon, 6 Oct 1997 02:17:22 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id CAA08366 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 6 Oct 1997 02:17:16 -0700
Received: from callisto.ftel.co.uk (callisto.ftel.co.uk [192.131.79.11]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id CAA17544
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 6 Oct 1997 02:16:04 -0700
	env-from (G.Cope@ftel.co.uk)
Received: from callisto.ftel.co.uk (yUZ2CKuC1PYMVkjWBnoLSs5oVsbMUrlp@localhost.ftel.co.uk [127.0.0.1])
	by callisto.ftel.co.uk (8.8.7/8.8.7/Revision:1.32) with SMTP id KAA16629;
	Mon, 6 Oct 1997 10:12:19 +0100 (BST)
Message-ID: <3438AB73.56D5@ftel.co.uk>
Date: Mon, 06 Oct 1997 10:12:19 +0100
From: Graham Cope <G.Cope@ftel.co.uk>
Organization: Fujitsu Telecommunications Europe Ltd
X-Mailer: Mozilla 3.01Gold (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
CC: Craig Partridge <craig@aland.bbn.com>, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: packets,segments and windows
References: <Pine.LNX.3.95.971005181855.18440B-100000@protocol.ece.iisc.ernet.in>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> In fact I wanted to what happens in this case
> 
> We know that TCP is byte stream oriented windowed flow control. What
> happens if the advertising window is exactly MSS+1 or n*MSS+1, n is an
> integer ?


I understand that there is also a minimum segment size, and also
possibly a minimum initial segment size. Can anybody please enlighten me
(us?) on typical values for these in implementations?


> 
> Is there any trade off between delaying for the one byte and sending it as
> one segment with one byte ?


Yes there is a trade-off based on overheads of headers. 
Optimising this depends on characteristics of the TCP layer, of which
TCP is in general unaware. One possible exception is 'pushing' data.



Graham Cope

From owner-tcp-impl@relay.engr.sgi.com  Mon Oct  6 07:54:54 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA04470 for tcp-impl-list; Mon, 6 Oct 1997 07:53:36 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA04463 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 6 Oct 1997 07:53:33 -0700
Received: from fwns2.raleigh.ibm.com (fwns2d.raleigh.ibm.com [204.146.167.236]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA15164
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 6 Oct 1997 07:53:29 -0700
	env-from (narten@raleigh.ibm.com)
Received: from rtpmail03.raleigh.ibm.com (rtpmail03.raleigh.ibm.com [9.37.172.47]) by fwns2.raleigh.ibm.com (AIX4.2/UCB 8.7/8.7RTP-FW1.1) with ESMTP id KAA19940; Mon, 6 Oct 1997 10:52:15 -0400 (EDT)
Received: from cichlid.raleigh.ibm.com (cichlid.raleigh.ibm.com [9.37.83.123])
	by rtpmail03.raleigh.ibm.com (8.8.5/8.8.5/RTP-ral-1.1) with SMTP id KAA28228;
	Mon, 6 Oct 1997 10:52:16 -0400
Received: from localhost.raleigh.ibm.com by cichlid.raleigh.ibm.com (AIX 4.1/UCB 5.64/4.03-RAL)
          id AA18024; Mon, 6 Oct 1997 10:52:14 -0400
Message-Id: <9710061452.AA18024@cichlid.raleigh.ibm.com>
To: Craig Partridge <craig@aland.bbn.com>
Cc: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>,
        tcp-impl@cthulhu.engr.sgi.com
Subject: Re: packets,segments and windows 
In-Reply-To: Your message of "Sat, 04 Oct 1997 10:04:39 PDT."
             <199710041704.KAA07535@aland.bbn.com> 
Date: Mon, 06 Oct 1997 10:52:08 -0400
From: Thomas Narten <narten@raleigh.ibm.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>     Can any one tell me how are PACKETS, SEGMENTS and WINDOWS related
>     in the TCP implementation.

> Sure.  Much of the question can be answered with simple definitions.

But like standards, if you don't like the current definitions,  you
can always wait for next years. :-)

> PACKET is a layer 2 (e.g. Ethernet, Token Ring, PPP) data unit.  It is the
> formatted header and data that is transmitted over the layer 2
> infrastructure.

Hmm. You might find some of the IPv6 folks disagreeing with this,
since they seem to use "packet" everywhere for L3, and try not to use
the term datagram at all.

FWIW, I've always used the term "frame" for an L2 packet, since that
term doesn't seem to be used at higher layers.

> DATAGRAM is the IP data unit, the formatted IP header and data, transmitted
> among hosts and routers.

Hmm, what are UDP datagrams then? :-)

> SEGMENT is the TCP data unit, the formatted TCP pseudo-header and data,
> transmitted between applications.

At least in this case, no one else uses the same terminology.

Thomas

From owner-tcp-impl@relay.engr.sgi.com  Mon Oct  6 08:24:18 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA10362 for tcp-impl-list; Mon, 6 Oct 1997 08:21:28 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA10139 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 6 Oct 1997 08:20:45 -0700
Received: from frantic.BSDI.COM ([205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA22380
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 6 Oct 1997 08:20:39 -0700
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id KAA16523;
	Mon, 6 Oct 1997 10:14:17 -0500 (CDT)
Date: Mon, 6 Oct 1997 10:14:17 -0500 (CDT)
From: David Borman <dab@BSDI.COM>
Message-Id: <199710061514.KAA16523@frantic.BSDI.COM>
To: chetan@protocol.ece.iisc.ernet.in, craig@aland.bbn.com
Subject: Re: packets,segments and windows
Cc: tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Date: Sun, 5 Oct 1997 18:26:47 +0000 (GMT)
> From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
> To: Craig Partridge <craig@aland.bbn.com>
> Cc: tcp-impl@cthulhu.engr.sgi.com
> Subject: Re: packets,segments and windows 
> ...
> We know that TCP is byte stream oriented windowed flow control. What
> happens if the advertising window is exactly MSS+1 or n*MSS+1, n is an
> integer ?
> 
> Is there any trade off between delaying for the one byte and sending it as
> one segment with one byte ?

If the window (and congestion window) allow all the outstanding data
to be sent, then send it all, even if the last packet only has one
byte of data.  There really is nothing to be saved by defering the
last byte of data.  And if the other side can't do anything until
it gets all the data, then delaying the last byte just introduces
additional latency.

If there is additional data to send beyond what will fit in the window,
then you should defer sending the trailing tinygram until the window
opens up some more, to allow for better aggregation of the data.

There are, of course, details and exceptions to these cases, but these
make a good rule of thumb.
			-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Mon Oct  6 08:44:34 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA16156 for tcp-impl-list; Mon, 6 Oct 1997 08:42:40 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA16107 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 6 Oct 1997 08:42:16 -0700
Received: from mhs.swan.ac.uk (mhs.swan.ac.uk [137.44.1.33]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA28734
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 6 Oct 1997 08:42:06 -0700
	env-from (A.Eberlein@swan.ac.uk)
Received: from eepluto.swan.ac.uk (actually host eepluto) by mhs 
          with SMTP (PP); Mon, 6 Oct 1997 16:31:40 +0100
From: Armin Eberlein <A.Eberlein@swansea.ac.uk>
Date: Mon, 6 Oct 1997 16:43:38 +0100
Message-Id: <7442.199710061543@eepluto.swan.ac.uk>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: RTT measurement resolution in TCP
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk



The current resolution of RTT is set at a minimum of 0.5 sec. What is the reason that the resolution of the RTT cannot be increased?

Thanks!

A.Eberlein@swansea.ac.uk

From owner-tcp-impl@relay.engr.sgi.com  Mon Oct  6 12:11:41 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA25231 for tcp-impl-list; Mon, 6 Oct 1997 12:09:51 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA25212 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 6 Oct 1997 12:09:50 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA06090
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 6 Oct 1997 12:09:48 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id UAA16512; Mon, 6 Oct 1997 20:07:41 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0xIIDz-0005FrC; Mon, 6 Oct 97 19:49 BST
Message-Id: <m0xIIDz-0005FrC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: RTT measurement resolution in TCP
To: A.Eberlein@swansea.ac.uk (Armin Eberlein)
Date: Mon, 6 Oct 1997 19:49:46 +0100 (BST)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <7442.199710061543@eepluto.swan.ac.uk> from "Armin Eberlein" at Oct 6, 97 04:43:38 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> The current resolution of RTT is set at a minimum of 0.5 sec. What is the reason that the resolution of the RTT cannot be increased?

Delayed ack


From owner-tcp-impl@relay.engr.sgi.com  Mon Oct  6 17:20:11 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA12560 for tcp-impl-list; Mon, 6 Oct 1997 17:18:08 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA12536 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 6 Oct 1997 17:18:07 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id RAA15058
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 6 Oct 1997 17:18:05 -0700
	env-from (braden@ISI.EDU)
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-29)
	id <AA00595>; Mon, 6 Oct 1997 17:18:02 -0700
Date: Mon, 6 Oct 97 17:17:42 PDT
From: braden@ISI.EDU
Posted-Date: Mon, 6 Oct 97 17:17:42 PDT
Message-Id: <9710070017.AA01290@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA01290>; Mon, 6 Oct 97 17:17:42 PDT
To: craig@aland.bbn.com, narten@raleigh.ibm.com
Subject: Re: packets,segments and windows
Cc: chetan@protocol.ece.iisc.ernet.in, tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Ten years ago we defined all these terms in Section 1.3.3 of Host
Requirements RFC-1122, hoping that we would never have to do it again!

Bob Braden

From owner-tcp-impl@relay.engr.sgi.com  Mon Oct  6 22:09:23 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA27325 for tcp-impl-list; Mon, 6 Oct 1997 22:06:14 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id WAA27312 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 6 Oct 1997 22:06:08 -0700
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id WAA17707
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 6 Oct 1997 22:05:00 -0700
	env-from (chetan@protocol.ece.iisc.ernet.in)
Received: from ece.iisc.ernet.in by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id KAA14070; Tue, 7 Oct 1997 10:27:04 +0530
Received: from protocol.ece.iisc.ernet.in by ece.iisc.ernet.in (4.1/SMI-4.1)
	id AA14959; Tue, 7 Oct 97 10:27:03+0530
Received: from localhost (chetan@localhost)
	by protocol.ece.iisc.ernet.in (8.8.5/8.8.5) with SMTP id KAA21566;
	Tue, 7 Oct 1997 10:29:48 GMT
Date: Tue, 7 Oct 1997 10:29:48 +0000 (GMT)
From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Armin Eberlein <A.Eberlein@swansea.ac.uk>, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: RTT measurement resolution in TCP
In-Reply-To: <m0xIIDz-0005FrC@lightning.swansea.linux.org.uk>
Message-Id: <Pine.LNX.3.95.971007102723.21256A-100000@protocol.ece.iisc.ernet.in>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


On Mon, 6 Oct 1997, Alan Cox wrote:

*>> The current resolution of RTT is set at a minimum of 0.5 sec. What is the reason that the resolution of the RTT cannot be increased?
*>
*>Delayed ack
*>
Sir can U (or any one in the mailing list) please elobrate on this, since
I am more intrested on higher resolution RTT.
thanks in advance. 


From owner-tcp-impl@relay.engr.sgi.com  Tue Oct  7 01:17:42 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA19753 for tcp-impl-list; Tue, 7 Oct 1997 01:16:03 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id BAA19746 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 7 Oct 1997 01:16:00 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id BAA07292
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 7 Oct 1997 01:15:59 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id JAA25662; Tue, 7 Oct 1997 09:13:03 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0xITxS-0005FyC; Tue, 7 Oct 97 08:21 BST
Message-Id: <m0xITxS-0005FyC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: RTT measurement resolution in TCP
To: chetan@protocol.ece.iisc.ernet.in (Chetan Kumar)
Date: Tue, 7 Oct 1997 08:21:29 +0100 (BST)
Cc: alan@lxorguk.ukuu.org.uk, A.Eberlein@swansea.ac.uk,
        tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <Pine.LNX.3.95.971007102723.21256A-100000@protocol.ece.iisc.ernet.in> from "Chetan Kumar" at Oct 7, 97 10:29:48 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> *>Delayed ack
> *>
> Sir can U (or any one in the mailing list) please elobrate on this, since
> I am more intrested on higher resolution RTT.
> thanks in advance. 

Ok. We had Linux trying to do finer grained RTT and learned two things very
fast.

1.	The option to delay acks for up to .5 of a second totally throws the
	RTT estimator when you do this. We got a lot of excess retransmits.
	Trying to second guess the delayed ack behaviour the other end is a
	nono

2.	The BSD based stacks generally use very low resolution timers for
	kernel operations so viewed at that resolution the packet rtt's 
	often slew wildly. (Seem above the .5 second level all looks quite nice)

Finally we couldn't measure a throughput difference above noise level. We just
had larger tcp windows. Only if there was measurable packet loss would this
be a problem.

Alan


From owner-tcp-impl@relay.engr.sgi.com  Tue Oct  7 03:42:30 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id DAA29614 for tcp-impl-list; Tue, 7 Oct 1997 03:39:08 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id DAA29600 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 7 Oct 1997 03:39:02 -0700
Received: from merit.edu (merit.edu [198.108.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id DAA14229
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 7 Oct 1997 03:39:01 -0700
	env-from (wsimpson@greendragon.com)
Received: from Bill.Simpson.DialUp.Mich.Net (pm002-06.dialip.mich.net [141.211.7.142])
	by merit.edu (8.8.7/8.8.5) with SMTP id GAA18932
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 7 Oct 1997 06:38:59 -0400 (EDT)
Date: Tue, 7 Oct 97 10:09:13 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <6641.wsimpson@greendragon.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: packets,segments and windows
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Thomas Narten <narten@raleigh.ibm.com>
> > PACKET is a layer 2 (e.g. Ethernet, Token Ring, PPP) data unit.  It is the
> > formatted header and data that is transmitted over the layer 2
> > infrastructure.
>
> Hmm. You might find some of the IPv6 folks disagreeing with this,
> since they seem to use "packet" everywhere for L3, and try not to use
> the term datagram at all.
>
The folks in IPv6 using the term packet instead of datagram are ignorant.

Partridge is correct.  The terminology is well established over more
than 25 years, in a field that is at most 30 years old.


> FWIW, I've always used the term "frame" for an L2 packet, since that
> term doesn't seem to be used at higher layers.
>
A frame is "physical" -- OSI layer 1.

Sometimes we synthesize the frame in software; but never-the-less, it is
a "physical layer" construct.

I don't understand what this has to do with Lagrange points (L2, L3)?


> > DATAGRAM is the IP data unit, the formatted IP header and data, transmitted
> > among hosts and routers.
>
> Hmm, what are UDP datagrams then? :-)
>
UDP is a particular transport over IP datagrams, where the unit of
transport matches the network unit.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32

From owner-tcp-impl@relay.engr.sgi.com  Tue Oct  7 04:06:16 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id EAA02716 for tcp-impl-list; Tue, 7 Oct 1997 04:04:51 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id EAA02705 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 7 Oct 1997 04:04:49 -0700
Received: from merit.edu (merit.edu [198.108.1.42]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id EAA17915
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 7 Oct 1997 04:04:48 -0700
	env-from (wsimpson@greendragon.com)
Received: from Bill.Simpson.DialUp.Mich.Net (pm012-27.dialip.mich.net [141.211.7.195])
	by merit.edu (8.8.7/8.8.5) with SMTP id HAA19054
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 7 Oct 1997 07:04:46 -0400 (EDT)
Date: Tue, 7 Oct 97 10:50:15 GMT
From: "William Allen Simpson" <wsimpson@greendragon.com>
Message-ID: <6643.wsimpson@greendragon.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: packets,segments and windows
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

I wrote my response, and then read Bob's, and then re-read RFC-1122,
having forgotten about it after all these years.  RFC-1122 has a nicer
terminology model than the previous messages on this list, defining the
terms at _interfaces_ between layers.

I used the same model in the RFC-1661 terminology section.

So, I'd amend my previous message to note that a "frame" is delivered at
the interface from link to physical.  That's why we can synthesize it in
software.

WSimpson@UMich.edu
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32

From owner-tcp-impl@relay.engr.sgi.com  Thu Oct  9 08:31:16 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA10034 for tcp-impl-list; Thu, 9 Oct 1997 08:29:39 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA10026 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 9 Oct 1997 08:29:36 -0700
Received: from callisto.ftel.co.uk (callisto.ftel.co.uk [192.131.79.11]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA00872
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 9 Oct 1997 08:29:28 -0700
	env-from (G.Cope@ftel.co.uk)
Received: from callisto.ftel.co.uk (LouSyh6IHV9w0OIbCn4xtpip83S8//1l@localhost.ftel.co.uk [127.0.0.1])
	by callisto.ftel.co.uk (8.8.7/8.8.7/Revision:1.32) with SMTP id QAA17455;
	Thu, 9 Oct 1997 16:29:24 +0100 (BST)
Message-ID: <343CF853.427C@ftel.co.uk>
Date: Thu, 09 Oct 1997 16:29:23 +0100
From: Graham Cope <G.Cope@ftel.co.uk>
Organization: Fujitsu Telecommunications Europe Ltd
X-Mailer: Mozilla 3.01Gold (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: tcp-impl@cthulhu.engr.sgi.com
CC: Walley Gary <walleygm@btlip10.bt.co.uk>
Subject: Resetting of ssthresh according to RFC2001
References: <7442.199710061543@eepluto.swan.ac.uk>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

RFC2001 states that:

"When congestion occurs (indicated by a timeout or the reception of
dupliucate ACKs), one-half of the currect window size should is saved in
ssthresh."

Presumably this means that ssthresh should be reset when congestion
occurs. I have observed (ina simulation) that if congestion occurs then
multiple segments tend to be lost or timeout, cwnd is set to MSS after
the first loss, and hence ssthresh is set to 2*MSS. This limits the
total achievable throughput.


Thus, I take this mean that ssthresh should be reset to half the current
window size when congestion occurs, but should not be reset until
congestion goes away. Is this reasonable? If so, how to determine when
congestion has gone away?


I'm sure this must have been discussed before. If so, any reference?


Thanks in advance...


Graham Cope

From owner-tcp-impl@relay.engr.sgi.com  Thu Oct  9 15:09:23 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA11713 for tcp-impl-list; Thu, 9 Oct 1997 15:07:04 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA11699 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 9 Oct 1997 15:07:01 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA12623
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 9 Oct 1997 15:06:59 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.7/8.8.5)
	id PAA23469; Thu, 9 Oct 1997 15:06:39 -0700 (PDT)
Message-Id: <199710092206.PAA23469@daffy.ee.lbl.gov>
To: Graham Cope <G.Cope@ftel.co.uk>
Cc: tcp-impl@cthulhu.engr.sgi.com, Walley Gary <walleygm@btlip10.bt.co.uk>
Subject: Re: Resetting of ssthresh according to RFC2001
In-reply-to: Your message of Thu, 09 Oct 1997 16:29:23 PDT.
Date: Thu, 09 Oct 1997 15:06:39 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Presumably this means that ssthresh should be reset when congestion
> occurs.

More exactly, when TCP responds to a loss, by either a timeout retransmission
or beginning a fast retransmit/fast recovery sequence, it first sets ssthresh
based on the value of cwnd in effect at the time of the loss.

> I have observed (ina simulation) that if congestion occurs then
> multiple segments tend to be lost or timeout, cwnd is set to MSS after
> the first loss ...

For a timeout retransmission, this is correct behavior.

> ... and hence ssthresh is set to 2*MSS.

This is incorrect.  ssthresh is supposed to be set *first*, to half of the
current value of cwnd.  This happens before cwnd is set to 1 MSS to begin
slow start.

If cwnd was sufficiently large to begin with, then the only normal circumstance
in which ssthresh should be set to 2*MSS is if the retransmitted packet is
itself lost (or if the slow start following the retransmission doesn't get
very far before there's a loss).  In that case, ssthresh (again) gets set
to half of cwnd, which gets rounded up to the minimum value of 2*MSS if
needed.

> Thus, I take this mean that ssthresh should be reset to half the current
> window size when congestion occurs, but should not be reset until
> congestion goes away ...

There's no explicit notion of congestion going away.  The algorithm is
that ssthresh is always set to cwnd/2 (modulo rounding) when a fast
retransmission begins or a timeout retransmission occurs.  If you have a
timeout retransmission that is itself lost, then yes, ssthresh is set to
MSS*2 - and that's what you want, because the net is still congested, so
you must proceed gingerly.

I've noted this as an area to clarify as we tweak RFC 2001.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Mon Oct 13 02:59:29 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id CAA17492 for tcp-impl-list; Mon, 13 Oct 1997 02:57:39 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id CAA17487 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 13 Oct 1997 02:57:32 -0700
Received: from callisto.ftel.co.uk (callisto.ftel.co.uk [192.131.79.11]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id CAA25756
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 13 Oct 1997 02:57:17 -0700
	env-from (G.Cope@ftel.co.uk)
Received: from callisto.ftel.co.uk (SZKBz33Qee80IqY+uRJ2hnscywvqURgW@localhost.ftel.co.uk [127.0.0.1])
	by callisto.ftel.co.uk (8.8.7/8.8.7/Revision:1.32) with SMTP id KAA14787;
	Mon, 13 Oct 1997 10:57:00 +0100 (BST)
Message-ID: <3441F06B.7861@ftel.co.uk>
Date: Mon, 13 Oct 1997 10:56:59 +0100
From: Graham Cope <G.Cope@ftel.co.uk>
Organization: Fujitsu Telecommunications Europe Ltd
X-Mailer: Mozilla 3.01Gold (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: Vern Paxson <vern@ee.lbl.gov>
CC: tcp-impl@cthulhu.engr.sgi.com, Walley Gary <walleygm@btlip10.bt.co.uk>
Subject: Re: Resetting of ssthresh according to RFC2001
References: <199710092206.PAA23469@daffy.ee.lbl.gov>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Vern Paxson wrote:
> 
> > Presumably this means that ssthresh should be reset when congestion
> > occurs.
> 
> More exactly, when TCP responds to a loss, by either a timeout retransmission
> or beginning a fast retransmit/fast recovery sequence, it first sets ssthresh
> based on the value of cwnd in effect at the time of the loss.
> 
> > I have observed (ina simulation) that if congestion occurs then
> > multiple segments tend to be lost or timeout, cwnd is set to MSS after
> > the first loss ...
> 
> For a timeout retransmission, this is correct behavior.
> 
> > ... and hence ssthresh is set to 2*MSS.
> 
> This is incorrect.  ssthresh is supposed to be set *first*, to half of the
> current value of cwnd.  This happens before cwnd is set to 1 MSS to begin
> slow start.
> 
> If cwnd was sufficiently large to begin with, then the only normal circumstance
> in which ssthresh should be set to 2*MSS is if the retransmitted packet is
> itself lost (or if the slow start following the retransmission doesn't get
> very far before there's a loss).  In that case, ssthresh (again) gets set
> to half of cwnd, which gets rounded up to the minimum value of 2*MSS if
> needed.
> 
> > Thus, I take this mean that ssthresh should be reset to half the current
> > window size when congestion occurs, but should not be reset until
> > congestion goes away ...
> 
> There's no explicit notion of congestion going away.  The algorithm is
> that ssthresh is always set to cwnd/2 (modulo rounding) when a fast
> retransmission begins or a timeout retransmission occurs.  If you have a
> timeout retransmission that is itself lost, then yes, ssthresh is set to
> MSS*2 - and that's what you want, because the net is still congested, so
> you must proceed gingerly.
> 


What I am observing in my simulation model is that if a segment gets
delayed, then all the other segments behind it also get delayed. The
timers for all of these expire, and after the timer for the second has
expired cwnd and ssthresh are down 2*MSS. This is OK for cwnd, but seems
drastic for ssthresh.
  I agree that it would be sensible for ssthresh to be set to 2*MSS
after loss of the retransmitted segment (which happens after at least 1
RTT), but not after timeout of the second segment in a sequence of
segments that experience the same congestion.

   If congestion is a state, then there are multiple events that
correspond to being in that state (i.e. multiple timeouts).


Has this been investigated before (I'm still getting up to speed in this
area)? If not, I think it might be worth doing so since unnecessary
reduction of ssthresh can reduce throughput.



Graham Cope

From owner-tcp-impl@relay.engr.sgi.com  Mon Oct 13 11:35:12 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA19221 for tcp-impl-list; Mon, 13 Oct 1997 11:33:31 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA19212 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 13 Oct 1997 11:33:28 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA29966
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 13 Oct 1997 11:33:27 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.7/8.8.5)
	id LAA04573; Mon, 13 Oct 1997 11:32:34 -0700 (PDT)
Message-Id: <199710131832.LAA04573@daffy.ee.lbl.gov>
To: Graham Cope <G.Cope@ftel.co.uk>
Cc: tcp-impl@cthulhu.engr.sgi.com, Walley Gary <walleygm@btlip10.bt.co.uk>
Subject: Re: Resetting of ssthresh according to RFC2001
In-reply-to: Your message of Mon, 13 Oct 1997 10:56:59 PDT.
Date: Mon, 13 Oct 1997 11:32:34 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> What I am observing in my simulation model is that if a segment gets
> delayed, then all the other segments behind it also get delayed. The
> timers for all of these expire ...

Aha, now I see the problem.  With BSD-derived TCPs, there's just *one*
timer.  When it expires, the first unacknowledged segment is retransmitted
and the timer is backed off and restarted.  So you don't get this sort of
cascading.  If you have multiple timers, then you need to reset all of them
whenever one expires.

You might consider using "ns" for your simulations.  It has TCP modules that
have been worked on a lot to make them fairly accurate.  It's available from:

	http://www-mash.cs.berkeley.edu/ns/

though evidently that machine lost a disk recently and the ns stuff won't
be back on-line for another day or two.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Oct 14 11:39:44 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA12731 for tcp-impl-list; Tue, 14 Oct 1997 11:38:04 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA12680 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 14 Oct 1997 11:37:46 -0700
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA01456
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 14 Oct 1997 11:37:44 -0700
	env-from (curtis@brookfield.ans.net)
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1])
	by brookfield.ans.net (8.8.5/8.8.5) with ESMTP id OAA07948;
	Tue, 14 Oct 1997 14:37:12 -0400 (EDT)
Message-Id: <199710141837.OAA07948@brookfield.ans.net>
To: Graham Cope <G.Cope@ftel.co.uk>
cc: tcp-impl@cthulhu.engr.sgi.com, Walley Gary <walleygm@btlip10.bt.co.uk>
Reply-To: curtis@ans.net
Subject: Re: Resetting of ssthresh according to RFC2001 
In-reply-to: Your message of "Thu, 09 Oct 1997 16:29:23 BST."
             <343CF853.427C@ftel.co.uk> 
Date: Tue, 14 Oct 1997 14:37:11 -0400
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <343CF853.427C@ftel.co.uk>, Graham Cope writes:
> RFC2001 states that:
> 
> "When congestion occurs (indicated by a timeout or the reception of
> dupliucate ACKs), one-half of the currect window size should is saved in
> ssthresh."
> 
> Presumably this means that ssthresh should be reset when congestion
> occurs. I have observed (ina simulation) that if congestion occurs then
> multiple segments tend to be lost or timeout, cwnd is set to MSS after
> the first loss, and hence ssthresh is set to 2*MSS. This limits the
> total achievable throughput.

What simulator?

With Reno, if fast rexmt triggers, you could half more than once but
cwnd is never set to 1 unless RTO fires.  With newreno the problem of
halving more than once is eliminated if the losses are within the same
RTT.

If RTO triggers instead of fast retransmit there is a chance of
getting another loss in the next few RTTs and getting the hit while
cwnd is ramping up during slow start.

> Thus, I take this mean that ssthresh should be reset to half the current
> window size when congestion occurs, but should not be reset until
> congestion goes away. Is this reasonable? If so, how to determine when
> congestion has gone away?

No.  Your assumption is not reasonable.  [Do you need references on
fast retransmit, fast recovery and newreno?  If so, look elsewhere in
RFC2001, look at TCP Illustrated Vol 1., and check the section on
newreno in http://www.iet.unipi.it/~luigi/sack.html.  I lost the URL
to J. Hoe's work and Sally Floyd's work which (I think independently)
proposed to do the same thing, now called newreno.]

> I'm sure this must have been discussed before. If so, any reference?

It has and in great detail.  Are you sure you are doing fast
retransmit and fast recovery correctly in the simulations?  

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Tue Oct 14 15:09:05 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA10680 for tcp-impl-list; Tue, 14 Oct 1997 15:03:48 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA10663 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 14 Oct 1997 15:03:46 -0700
Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.219]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA02267
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 14 Oct 1997 15:03:45 -0700
	env-from (raj@hpisrdq.cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185])
	by palrel3.hp.com (8.8.5/8.8.5tis) with SMTP id PAA10383
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 14 Oct 1997 15:03:44 -0700 (PDT)
Received: by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA12082; Tue, 14 Oct 1997 15:00:02 -0700
Message-Id: <9710142200.AA12082@hpisrdq.cup.hp.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Out Of Band and Nagle
Date: Tue, 14 Oct 1997 15:00:02 -0700
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Should the sending of urgent data by an application override the Nagle
algorithm? In other words, should the setting if the urgent flag be
functionally the same as "setting" TCP_NODELAY, sending the urgent
data, and then "resetting" TCP_NODELAY to its previous state?

rick jones

From owner-tcp-impl@relay.engr.sgi.com  Wed Oct 15 05:18:01 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id FAA17967 for tcp-impl-list; Wed, 15 Oct 1997 05:16:37 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA17952 for <tcp-impl@engr.sgi.com>; Wed, 15 Oct 1997 05:16:35 -0700
Received: from stpauli.amaonline.com (stpauli.amaonline.com [208.200.38.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id FAA09007
	for <tcp-impl@engr.sgi.com>; Wed, 15 Oct 1997 05:16:34 -0700
	env-from (charlietuna@comports.com)
Date: Wed, 15 Oct 1997 05:16:34 -0700
From: charlietuna@comports.com
Received: from 1Cust250.tnt21.dfw5.da.uu.net [208.254.191.250]
	(HELO MORE.NET)
	by stpauli.amaonline.com (AltaVista Mail V1.0/1.0 BL18 listener)
	id 0000_0063_3444_b448_a48a;
	Wed, 15 Oct 1997 07:17:12 -0500
To: tcp-impl@engr.sgi.com
Subject: FREE PUTTER!
Message-Id: <199710150716.e-mail@MORE.NET.com>
Received: (from uudp@lcllhost!) by in2.i_b_m.net (8.6.9/8.6.9) id CFF569794 for <rodney@LAPD!.com>; Sun, 18 May 1997 01:12:39 GMT
Received: from tomsnet!.com (mh.tomsnet!.com [100.301.57.69]) by m4.tomsnet!.com (8.6.12/8.6.12) with ESMTP id PAA21932 
Received: from reb50.rs40_date.net (root@reb50.rs_date.net [289.36.1.176]) by tomsnet!.com (8.6.12/8.6.12) with ESMTP id PBA023891 for <zena@tomsnet!.com>;
Received: (from capt_domo@lclhost!) by pc.spark_er.net (8.7.3/6.7.3) id CFF34285 for planet_oreo_horizon; Sat, 17 May 2001 20:12:58 -0500 (CDT)
Received: from emoose.mail.n_bot.com (emoose.mx.n_bot.com [198.81.11.42]) by md.s#parpnet.net (8.7.4/8.7.3) with ESMTP id RAC035940 for <wayne_bobbit.com>;
Received: from clift.b89_crost.com (clift.b89_crost.com [199.3.12.256]) by dot.2_bycentric.net (8.8.5/04/01 3.26)) id LAT131787;
Received: from spr_most.bix.45neter!.com(204.332.183.71) by hars11.ix.45neter!.com via smapt (V1.3) id smr0029301;
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


       FOR GOLFERS 

Free "Patented Putter" Offering
         
 Exclusively through the Internet
 (Right Hand Only) our retail price is US$69.00
 But it's yours for FREE!**
* small shipping & handling fee applies.
kellgirl42@hotmail.com


From owner-tcp-impl@relay.engr.sgi.com  Wed Oct 15 10:59:25 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA26767 for tcp-impl-list; Wed, 15 Oct 1997 10:57:21 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA26737 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 10:57:19 -0700
Received: from harrier.cisco.com (harrier.cisco.com [171.69.1.173]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA06625
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 10:57:18 -0700
	env-from (fred@cisco.com)
Received: from fred-axel-fr.cisco.com (fred-axel-fr.cisco.com [171.69.128.115]) by harrier.cisco.com (8.6.12/8.6.5) with ESMTP id KAA11436; Wed, 15 Oct 1997 10:57:17 -0700
Received: from [171.69.128.118] (fred-hm-dhcp3.cisco.com [171.69.128.118]) by fred-axel-fr.cisco.com (8.6.8+c/CISCO.WS.1.1) with ESMTP id KAA05335; Wed, 15 Oct 1997 10:57:14 -0700
X-Sender: fred@stilton.cisco.com
Message-Id: <v0310282eb06ab3635a3f@[171.69.128.118]>
In-Reply-To: <9710142200.AA12082@hpisrdq.cup.hp.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Wed, 15 Oct 1997 10:53:08 -0700
To: Rick Jones <raj@hpisrdq.cup.hp.com>
From: Fred Baker <fred@cisco.com>
Subject: Re: Out Of Band and Nagle
Cc: tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

At 3:00 PM -0700 10/14/97, Rick Jones wrote:
>Should the sending of urgent data by an application override the Nagle
>algorithm? In other words, should the setting if the urgent flag be
>functionally the same as "setting" TCP_NODELAY, sending the urgent
>data, and then "resetting" TCP_NODELAY to its previous state?

probably. "urgent" is generally incompatible with "sit on this for an
indeterminate time period"

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
  "A beautiful idea has a much greater chance of being a correct idea
   than an ugly one."
                     -- Roger Penrose, "The Emperor's New Mind", 1989



From owner-tcp-impl@relay.engr.sgi.com  Wed Oct 15 11:29:02 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA09462 for tcp-impl-list; Wed, 15 Oct 1997 11:26:35 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA09433 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 11:26:33 -0700
Received: from kalae.kohala.com (kalae.kohala.com [209.75.135.35]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA16111
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 11:26:28 -0700
	env-from (rstevens@kohala.kohala.com)
Received: from kohala.kohala.com (kohala.kohala.com [209.75.135.33])
	by kalae.kohala.com (8.8.5/8.8.5) with ESMTP id LAA16657;
	Wed, 15 Oct 1997 11:26:19 -0700 (MST)
Received: (from rstevens@localhost) by kohala.kohala.com (8.8.5/8.8.3) id LAA16729; Wed, 15 Oct 1997 11:26:19 -0700 (MST)
Message-Id: <199710151826.LAA16729@kohala.kohala.com>
From: rstevens@kohala.com (W. Richard Stevens)
Date: Wed, 15 Oct 1997 11:26:19 -0700
Reply-To: "W. Richard Stevens" <rstevens@kohala.com>
X-Phone: +1 520 297 9416
X-Homepage: http://www.kohala.com/~rstevens
X-Mailer: Mail User's Shell (7.2.6 beta(3) 11/17/96)
To: Rick Jones <raj@hpisrdq.cup.hp.com>
Subject: Re: Out Of Band and Nagle
Cc: tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

With BSD-derived stacks the answer is yes.  The t_force flag is set
when OOB is sent, causing tcp_output() to skip all its normal checks
(Nagle, etc.).

I'm not sure whether this is "right" or not, but I didn't see anything
in RFC 1122 about this.  My gut feel is that it is OK, since OOB data
should be rare.

Let's be thankful that HTTP has never tried to use OOB data!

        Rich Stevens


From owner-tcp-impl@relay.engr.sgi.com  Wed Oct 15 12:26:51 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA07374 for tcp-impl-list; Wed, 15 Oct 1997 12:24:37 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA07348 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 12:24:34 -0700
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id MAA03768
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 12:24:33 -0700
	env-from (jt@mentat.com)
Received: from feller.mentat.com ([192.88.122.144]) by mentat.com (4.1/SMI-4.1)
	id AA12548; Wed, 15 Oct 97 12:22:19 PDT
Received: by feller.mentat.com (SMI-8.6/SMI-SVR4)
	id MAA01035; Wed, 15 Oct 1997 12:24:40 -0700
Date: Wed, 15 Oct 1997 12:24:40 -0700
From: jt@mentat.com (Jerry Toporek)
Message-Id: <199710151924.MAA01035@feller.mentat.com>
To: raj@hpisrdq.cup.hp.com, rstevens@kohala.com
Subject: Re: Out Of Band and Nagle
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 
> With BSD-derived stacks the answer is yes.  The t_force flag is set
> when OOB is sent, causing tcp_output() to skip all its normal checks
> (Nagle, etc.).
> 
> I'm not sure whether this is "right" or not, but I didn't see anything
> in RFC 1122 about this.  My gut feel is that it is OK, since OOB data
> should be rare.

Rich:

The important point, that is not explict in RFC 1122, is that the URG flag
should be made visible to the receiving side without delay.  Forcing out a
segment with data that would not otherwise be sendable (which may or may not
include the data at the urgent pointer) is a way to accomplish this, but I am
not sure that it is a good idea to provide this back-door mechanism to bypass
all send controls.  I believe that the right thing to do when the sender TCP
is presented with outbound urgent data at a time when the send rules preclude
putting a segment on the wire is to generate a zero-length segment with the
URG flag on and the urgent pointer set.
 
jt


From owner-tcp-impl@relay.engr.sgi.com  Wed Oct 15 13:11:53 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA19754 for tcp-impl-list; Wed, 15 Oct 1997 13:06:57 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA19736 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 13:06:51 -0700
Received: from zero.aec.at (zero.aec.at [193.170.192.102]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA15081
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 13:06:48 -0700
	env-from (andi@zero.aec.at)
Received: (qmail 9215 invoked by uid 573); 15 Oct 1997 18:59:13 -0000
To: jt@mentat.com (Jerry Toporek)
Cc: raj@hpisrdq.cup.hp.com, rstevens@kohala.com, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Out Of Band and Nagle
References: <199710151924.MAA01035@feller.mentat.com>
From: Andi Kleen <ak@muc.de>
Date: 15 Oct 1997 20:59:12 +0200
In-Reply-To: jt@mentat.com's message of Wed, 15 Oct 1997 12:24:40 -0700
Message-ID: <k2bu0q4zun.fsf@zero.aec.at>
Lines: 28
X-Mailer: Gnus v5.4.41/Emacs 19.34
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

jt@mentat.com (Jerry Toporek) writes:

> > 
> > With BSD-derived stacks the answer is yes.  The t_force flag is set
> > when OOB is sent, causing tcp_output() to skip all its normal checks
> > (Nagle, etc.).
> > 
> > I'm not sure whether this is "right" or not, but I didn't see anything
> > in RFC 1122 about this.  My gut feel is that it is OK, since OOB data
> > should be rare.
> 
> Rich:
> 
> The important point, that is not explict in RFC 1122, is that the URG flag
> should be made visible to the receiving side without delay.  Forcing out a
> segment with data that would not otherwise be sendable (which may or may not
> include the data at the urgent pointer) is a way to accomplish this, but I am
> not sure that it is a good idea to provide this back-door mechanism to bypass
> all send controls.  I believe that the right thing to do when the sender TCP
> is presented with outbound urgent data at a time when the send rules preclude
> putting a segment on the wire is to generate a zero-length segment with the
> URG flag on and the urgent pointer set.

There are some buggy TCP stacks that crash when they receive a non-syn,non-ack
zero length segment. The Linux stack tries hard to avoid generating these
packets.

-Andi

From owner-tcp-impl@relay.engr.sgi.com  Wed Oct 15 13:11:59 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA19943 for tcp-impl-list; Wed, 15 Oct 1997 13:07:48 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA19924 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 13:07:43 -0700
Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.219]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA15344
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 13:07:43 -0700
	env-from (raj@hpisrdq.cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185])
	by palrel3.hp.com (8.8.5/8.8.5tis) with SMTP id NAA29773
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 13:07:42 -0700 (PDT)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA14260; Wed, 15 Oct 1997 13:03:58 -0700
Message-Id: <344521AE.1AB3@cup.hp.com>
Date: Wed, 15 Oct 1997 13:03:58 -0700
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Out Of Band and Nagle
References: <199710151924.MAA01035@feller.mentat.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Jerry Toporek wrote:
> The important point, that is not explict in RFC 1122, is that the URG flag
> should be made visible to the receiving side without delay.  Forcing out a
> segment with data that would not otherwise be sendable (which may or may not
> include the data at the urgent pointer) is a way to accomplish this, but I am
> not sure that it is a good idea to provide this back-door mechanism to bypass
> all send controls.  I believe that the right thing to do when the sender TCP
> is presented with outbound urgent data at a time when the send rules preclude
> putting a segment on the wire is to generate a zero-length segment with the
> URG flag on and the urgent pointer set.

What is the value of the URG flag without the URG data being present?

Is there any concern about the legacy TCP's processing such a segment
correctly?

Also, how is the proposed behaviour any greater a back door than
TCP_NODELAY, which is a huge, gaping door already?

rick jones

From owner-tcp-impl@relay.engr.sgi.com  Wed Oct 15 13:48:11 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA00547 for tcp-impl-list; Wed, 15 Oct 1997 13:45:45 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA00464 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 13:45:40 -0700
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA26022
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 13:45:38 -0700
	env-from (jt@mentat.com)
Received: from rock.mentat.com ([192.88.122.136]) by mentat.com (4.1/SMI-4.1)
	id AA13070; Wed, 15 Oct 97 13:40:33 PDT
Date: Wed, 15 Oct 97 13:40:33 PDT
From: jt@mentat.com (Jerry Toporek)
Message-Id: <9710152040.AA13070@mentat.com>
To: ak@muc.de
Subject: Re: Out Of Band and Nagle
Cc: raj@hpisrdq.cup.hp.com, rstevens@kohala.com, tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 
> There are some buggy TCP stacks that crash when they receive a non-syn,non-ack
> zero length segment. The Linux stack tries hard to avoid generating these
> packets.
> 

Sure...  Fine.  I certainly didn't mean to suggest that the ACK flag would
not be on.  The ACK flag should be on for everything other than an initial
SYN and certain RSTs.

jt


From owner-tcp-impl@relay.engr.sgi.com  Wed Oct 15 13:54:29 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA02865 for tcp-impl-list; Wed, 15 Oct 1997 13:53:16 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA02852 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 13:53:14 -0700
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA28007
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 13:53:12 -0700
	env-from (jt@mentat.com)
Received: from rock.mentat.com ([192.88.122.136]) by mentat.com (4.1/SMI-4.1)
	id AA13131; Wed, 15 Oct 97 13:51:27 PDT
Date: Wed, 15 Oct 97 13:51:27 PDT
From: jt@mentat.com (Jerry Toporek)
Message-Id: <9710152051.AA13131@mentat.com>
To: tcp-impl@cthulhu.engr.sgi.com, raj@hpisrdq.cup.hp.com
Subject: Re: Out Of Band and Nagle
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> 
> What is the value of the URG flag without the URG data being present?

It notifies the receiving side TCP that urgent mode has been entered.  In a
UNIX implementation, this typically results in a SIGURG being delivered to
the application.  The URG flag will stay on in all subsequent segments until
the urgent point is delivered.  This is all normal operation whenever urgent
data is presented to a sending side TCP that has more unsent data already
queued than can fit in a single segment.

> 
> Is there any concern about the legacy TCP's processing such a segment
> correctly?

I don't see why it should result in any concerns that you would not already
have.  RFC 793 is pretty explicit that the URG flag is valid whether or not
there is any data in the segment that first carries the flag.

> 
> Also, how is the proposed behaviour any greater a back door than
> TCP_NODELAY, which is a huge, gaping door already?

TCP_NODELAY overrides the Nagle algorithm.  The BSD behavior, according to
Rich's description, overrides "all its normal checks".  I don't see a good
reason why entering urgent mode should override any data sending rules.  As
I said, the crucial part is to get the urgent flag delivered without delay.

jt


From owner-tcp-impl@relay.engr.sgi.com  Wed Oct 15 13:55:06 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA03291 for tcp-impl-list; Wed, 15 Oct 1997 13:54:00 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA03268 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 13:53:58 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id OAA13221 for tcp-impl@cthulhu.engr.sgi.com; Wed, 15 Oct 1997 14:53:55 -0600
Date: Wed, 15 Oct 1997 14:53:55 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199710152053.OAA13221@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Out Of Band and Nagle
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

If the Nagle algorithm is keeping you from sending data, then by
definition you do not have much to send, and you have space in the
window to send it.

So why waste the time of the routers and the remote host by sending the
urgent pointer and no data?  You would save everyone cycles and
bandwidth by including the little dab of Nagle-delayed data along with
the urgent pointer.


Vernon Schryver,  vjs@sgi.com

From owner-tcp-impl@relay.engr.sgi.com  Wed Oct 15 14:07:17 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA06436 for tcp-impl-list; Wed, 15 Oct 1997 14:05:01 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA06400 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 14:04:51 -0700
Received: from frantic.BSDI.COM (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA01203
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 14:04:45 -0700
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id QAA04811;
	Wed, 15 Oct 1997 16:02:11 -0500 (CDT)
Date: Wed, 15 Oct 1997 16:02:11 -0500 (CDT)
From: David Borman <dab@BSDI.COM>
Message-Id: <199710152102.QAA04811@frantic.BSDI.COM>
To: raj@hpisrdq.cup.hp.com, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Out Of Band and Nagle
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Rick Jones <raj@hpisrdq.cup.hp.com>
> Subject: Re: Out Of Band and Nagle
> 
> Jerry Toporek wrote:
> > The important point, that is not explict in RFC 1122, is that the URG flag
> > should be made visible to the receiving side without delay.  Forcing out a
> > segment with data that would not otherwise be sendable (which may or may not
> > include the data at the urgent pointer) is a way to accomplish this, but I am
> > not sure that it is a good idea to provide this back-door mechanism to bypass
> > all send controls.  I believe that the right thing to do when the sender TCP
> > is presented with outbound urgent data at a time when the send rules preclude
> > putting a segment on the wire is to generate a zero-length segment with the
> > URG flag on and the urgent pointer set.
> 
> What is the value of the URG flag without the URG data being present?

TCP does not have urgent data in the sense that seems to be being discussed
here (at least my understanding of what is being said).  TCP has an urgent
pointer and an URG flag.  In TCP the Urgent Data is all the data up to
the Urgent pointer, and when a TCP sees the URG flag, it goes into urgent
mode while processing the data up to the Urgent pointer.  This means that
if an application has pending, unsent data and it sends more data in urgent
mode, all the previous, unsent data becomes urgent data.  In fact, if
the receiving TCP has accepted data that it hasn't yet presented to the
user and it gets an URG, all that unread data becomes "urgent data" to
the application.

Also remember that the urgent pointer is allowed to point beyond the
packet that contains it, and even beyond the TCP window.

> Is there any concern about the legacy TCP's processing such a segment
> correctly?

>From other mail on this list, I seems so.  But the only hard reason
that a TCP can't send new data is if the window is closed.  And if it
is closed, you can still send one byte of data as a window probe.
So there is no need to send out a zero-length packet just to get out
the urgent notification.

> Also, how is the proposed behaviour any greater a back door than
> TCP_NODELAY, which is a huge, gaping door already?

Setting TCP_NODELAY doesn't short-circuit any of the congestion control
code.  In the BSD stack, on an idle connection the first write will
always go out, no matter how small.  It is successive small writes that
get delayed until an ACK is received or a full segments worth of data is
accumulated.  Setting TCP_NODELAY says that if another small write is
done while there is un-acked data, don't wait for the ack to come back
before sending the data.

Sending data in urgent mode is the same as turning on TCP_NODELAY
just before doing the write and turning it back off after the write.
One additional thing about sending data in urgent mode is that in
the BSD stack if the window is closed, it will force a one-byte
window probe to be sent (which will have the urgent flag/pointer).

		-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Wed Oct 15 14:22:24 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA11216 for tcp-impl-list; Wed, 15 Oct 1997 14:19:10 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA11182 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 14:19:08 -0700
Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.219]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA05115
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 14:19:07 -0700
	env-from (raj@hpisrdq.cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185])
	by palrel3.hp.com (8.8.5/8.8.5tis) with SMTP id OAA12517
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 14:19:06 -0700 (PDT)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA14297; Wed, 15 Oct 1997 14:15:22 -0700
Message-Id: <3445326A.1DDA@cup.hp.com>
Date: Wed, 15 Oct 1997 14:15:22 -0700
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Out Of Band and Nagle
References: <199710152102.QAA04811@frantic.BSDI.COM>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

David Borman wrote:
> > From: Rick Jones <raj@hpisrdq.cup.hp.com>
> > Also, how is the proposed behaviour any greater a back door than
> > TCP_NODELAY, which is a huge, gaping door already?
> 
> Setting TCP_NODELAY doesn't short-circuit any of the congestion control
> code.  In the BSD stack, on an idle connection the first write will

I thought that one of the tenets of congestion control (the
paper/concept, not necessarily the BSD implementation) was conservation
of _packets_.

> always go out, no matter how small.  It is successive small writes that
> get delayed until an ACK is received or a full segments worth of data is
> accumulated.  Setting TCP_NODELAY says that if another small write is
> done while there is un-acked data, don't wait for the ack to come back
> before sending the data.
> 
> Sending data in urgent mode is the same as turning on TCP_NODELAY
> just before doing the write and turning it back off after the write.
> One additional thing about sending data in urgent mode is that in
> the BSD stack if the window is closed, it will force a one-byte
> window probe to be sent (which will have the urgent flag/pointer).
> 
>                 -David Borman, dab@bsdi.com

From owner-tcp-impl@relay.engr.sgi.com  Wed Oct 15 15:09:28 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA26914 for tcp-impl-list; Wed, 15 Oct 1997 15:07:41 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA26904 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 15:07:39 -0700
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id PAA19213
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 15:07:38 -0700
	env-from (braden@ISI.EDU)
Received: from can.isi.edu by zephyr.isi.edu (5.65c/5.61+local-29)
	id <AA01973>; Wed, 15 Oct 1997 15:07:36 -0700
Date: Wed, 15 Oct 97 15:07:10 PDT
From: braden@ISI.EDU
Posted-Date: Wed, 15 Oct 97 15:07:10 PDT
Message-Id: <9710152207.AA14329@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA14329>; Wed, 15 Oct 97 15:07:10 PDT
To: raj@hpisrdq.cup.hp.com, rstevens@kohala.com, jt@mentat.com
Subject: Re: Out Of Band and Nagle
Cc: tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

  *> 
  *> Rich:
  *> 
  *> The important point, that is not explict in RFC 1122, is that the URG flag
  *> should be made visible to the receiving side without delay.

Uhhh... in order words, "Urgent" data is *urgent*.  I think we believed
that people would figure that out... :-)

Bob Braden

From owner-tcp-impl@relay.engr.sgi.com  Wed Oct 15 15:34:48 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA04759 for tcp-impl-list; Wed, 15 Oct 1997 15:32:43 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA04753 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 15:32:41 -0700
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA27086
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 15:32:40 -0700
	env-from (Erik.Nordmark@eng.Sun.COM)
Received: from sunmail1.Sun.COM ([129.145.1.2]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id PAA27934; Wed, 15 Oct 1997 15:24:56 -0700
Received: from jurassic.eng.sun.com by sunmail1.Sun.COM (SMI-8.6/SMI-4.1)
	id PAA27027; Wed, 15 Oct 1997 15:24:53 -0700
Received: from bobo.eng.sun.com (bobo [129.146.86.130])
	by jurassic.eng.sun.com (8.8.7+Sun.Alpha.7/8.8.7) with SMTP id PAA21577;
	Wed, 15 Oct 1997 15:24:54 -0700 (PDT)
Date: Wed, 15 Oct 1997 15:23:45 -0700 (PDT)
From: Erik Nordmark <Erik.Nordmark@eng.Sun.COM>
Reply-To: Erik Nordmark <Erik.Nordmark@eng.Sun.COM>
Subject: Re: Out Of Band and Nagle
To: Jerry Toporek <jt@mentat.com>
Cc: raj@hpisrdq.cup.hp.com, rstevens@kohala.com, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: "Your message with ID" <199710151924.MAA01035@feller.mentat.com>
Message-ID: <Roam.SIMC.2.0.6.876954225.3674.nordmark@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Jerry,

> The important point, that is not explict in RFC 1122, is that the URG flag
> should be made visible to the receiving side without delay.  Forcing out a
> segment with data that would not otherwise be sendable (which may or may not
> include the data at the urgent pointer) is a way to accomplish this, but I am
> not sure that it is a good idea to provide this back-door mechanism to bypass
> all send controls.  I believe that the right thing to do when the sender TCP
> is presented with outbound urgent data at a time when the send rules preclude
> putting a segment on the wire is to generate a zero-length segment with the
> URG flag on and the urgent pointer set.

A zero length segment can not be reliably delivered by TCP since it can not
be acknowledged.

I believe t_force in BSD allows TCP to send one byte i.e. it might not
send the urgent byte but it will at least send a segment that contains
the urgent pointer. This allows the reliable delivery of an urgent notification
to the other application.

A case that does work on BSD derived systems is to have an
application send data until it gets an EWOULDBLOCK (i.e. TCP is flow
controlled) and then send an urgent byte. With BSD derived sender
and receiver this causes an immediate SIGURG on the receiver even though
TCP is flow controlled by the receiving application.

I don't know of any applications that depend on this behavior (because
Solaris 2.0 through 2.5.1 didn't do this) and I know it is somewhat
painful to implement this and BSD SIOCATMARK behavior correctly in
an asynchronous (streams based) protocol stack. But in any case, I haven't seen
anything (except the BSD source :-) that document this behavior in sufficient
detail.

Would it be useful to nail down this detailed behavior?
A lot of the behavior is tied to the socket API and not to the TCP
protocol specification.

   Erik


From owner-tcp-impl@relay.engr.sgi.com  Wed Oct 15 16:56:21 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAB27980 for tcp-impl-list; Wed, 15 Oct 1997 16:53:16 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA27956 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 16:53:11 -0700
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id QAA15913
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 15 Oct 1997 16:53:09 -0700
	env-from (jt@mentat.com)
Received: from rock.mentat.com ([192.88.122.136]) by mentat.com (4.1/SMI-4.1)
	id AA14117; Wed, 15 Oct 97 16:51:13 PDT
Date: Wed, 15 Oct 97 16:51:13 PDT
From: jt@mentat.com (Jerry Toporek)
Message-Id: <9710152351.AA14117@mentat.com>
To: Erik.Nordmark@eng.Sun.COM
Subject: Re: Out Of Band and Nagle
Cc: tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 
> A zero length segment can not be reliably delivered by TCP since it can not
> be acknowledged.
> 
> I believe t_force in BSD allows TCP to send one byte i.e. it might not
> send the urgent byte but it will at least send a segment that contains
> the urgent pointer. This allows the reliable delivery of an urgent notification
> to the other application.

Erik:

Thank you...  That's a very good point.  I withdraw the suggestion...
The point I was trying to make was that immediate visibility of the URG
flag at the receiver is crucial.  As you note, an application could deadlock
if this does not happen.  Therefore, it is important that the flag get delivered
reliably, which will only be true if some data is sent.

> Would it be useful to nail down this detailed behavior?

Yes, I think so.  It may seem obvious that a sender should force a segment
onto the wire when entering urgent mode, but is it obvious that only a single
segment should be forced?  Suppose I note that I can force all the data through
the urgent point onto the wire, because I have enough window, and I don't
feel like it is so bad to exceed the congestion window by a few MSS because
the data is really *urgent* and I want to do the application a favor and
get it all delivered quickly...  This would be bad.  I would prefer to say
that the sender should not override any send rules when entering urgent mode,
but a zero-length segment won't get the job done (reliably).

> A lot of the behavior is tied to the socket API and not to the TCP
> protocol specification.

Yes...  Most of the difficulty that I see with TCP urgent data is as a result
of applications which are not really portable from one sockets implementation
to another.  Artifacts of the implementation are believed to be guarantees,
but don't carry over from one system to another.  That's a whole can of worms
that can't be sorted out here!

jt

From owner-tcp-impl@relay.engr.sgi.com  Thu Oct 16 01:29:30 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA02270 for tcp-impl-list; Thu, 16 Oct 1997 01:28:01 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id BAA02265 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 16 Oct 1997 01:28:00 -0700
Received: from fly.cnuce.cnr.it (foda-devel.cnuce.cnr.it [131.114.192.86]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id BAA29432
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 16 Oct 1997 01:27:51 -0700
	env-from (pot@fly.cnuce.cnr.it)
Received: by fly.cnuce.cnr.it (Smail3.1.26.7 #3)
	id m0xLlKG-00008sC; Thu, 16 Oct 97 10:30 MET
Message-Id: <m0xLlKG-00008sC@fly.cnuce.cnr.it>
Date: Thu, 16 Oct 97 10:30 MET
From: Francesco Potorti` <F.Potorti@cnuce.cnr.it>
To: jt@mentat.com (Jerry Toporek)
CC: tcp-impl@cthulhu.engr.sgi.com, Erik.Nordmark@eng.Sun.COM
In-reply-to: <9710152351.AA14117@mentat.com> (jt@mentat.com)
Subject: Re: Out Of Band and Nagle
Organization: CNUCE-CNR, Via S.Maria 36, Pisa - Italy +39-50-593211
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

			      Suppose I note that I can force all the
   data through the urgent point onto the wire, because I have enough
   window, and I don't feel like it is so bad to exceed the congestion
   window by a few MSS because the data is really *urgent* and I want
   to do the application a favor and get it all delivered quickly...
   This would be bad.  I would prefer to say that the sender should
   not override any send rules when entering urgent mode, but a
   zero-length segment won't get the job done (reliably).
   
Do you mean that a one-byte segment would do the job?

Wouldn't all the data from that point up to the urgent pointer be
considered urgent data in that case?

-- 
Francesco Potorti` (researcher)        Voice:    +39-50-593203
Computer Network Division              Operator: +39-50-593211
CNUCE-CNR, Via Santa Maria 36          Fax:      +39-50-904052
56126 Pisa - Italy                     Email:    F.Potorti@cnuce.cnr.it

From owner-tcp-impl@relay.engr.sgi.com  Thu Oct 16 08:49:35 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA23324 for tcp-impl-list; Thu, 16 Oct 1997 08:47:37 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA23315 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 16 Oct 1997 08:47:34 -0700
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id IAA23728
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 16 Oct 1997 08:47:31 -0700
	env-from (jt@mentat.com)
Received: from feller.mentat.com ([192.88.122.144]) by mentat.com (4.1/SMI-4.1)
	id AA18548; Thu, 16 Oct 97 08:45:27 PDT
Received: by feller.mentat.com (SMI-8.6/SMI-SVR4)
	id IAA01853; Thu, 16 Oct 1997 08:47:48 -0700
Date: Thu, 16 Oct 1997 08:47:48 -0700
From: jt@mentat.com (Jerry Toporek)
Message-Id: <199710161547.IAA01853@feller.mentat.com>
To: F.Potorti@cnuce.cnr.it
Subject: Re: Out Of Band and Nagle
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 
> 			      Suppose I note that I can force all the
>    data through the urgent point onto the wire, because I have enough
>    window, and I don't feel like it is so bad to exceed the congestion
>    window by a few MSS because the data is really *urgent* and I want
>    to do the application a favor and get it all delivered quickly...
>    This would be bad.  I would prefer to say that the sender should
>    not override any send rules when entering urgent mode, but a
>    zero-length segment won't get the job done (reliably).
>    
> Do you mean that a one-byte segment would do the job?
> 
> Wouldn't all the data from that point up to the urgent pointer be
> considered urgent data in that case?

Yes, it would.  Does that mean I have to force it all onto the wire without
delay?  Try this example...  48K of unsent data, a 50K send window, a 2920
byte congestion window, and I am waiting for an ACK to open the congestion
window.  I get an urgent mode send.  All 48K+1 bytes are now "urgent data".
It's all in window.  Do you want me to ignore the congestion window and send
it all out immediately?  I don't think so.

jt

From owner-tcp-impl@relay.engr.sgi.com  Thu Oct 16 09:21:06 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA04232 for tcp-impl-list; Thu, 16 Oct 1997 09:18:09 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA04205 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 16 Oct 1997 09:18:07 -0700
Received: from fly.cnuce.cnr.it (foda-devel.cnuce.cnr.it [131.114.192.86]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id JAA04014
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 16 Oct 1997 09:18:05 -0700
	env-from (pot@fly.cnuce.cnr.it)
Received: by fly.cnuce.cnr.it (Smail3.1.26.7 #3)
	id m0xLsew-0001LLC; Thu, 16 Oct 97 18:20 MET
Message-Id: <m0xLsew-0001LLC@fly.cnuce.cnr.it>
Date: Thu, 16 Oct 97 18:20 MET
From: Francesco Potorti` <F.Potorti@cnuce.cnr.it>
To: jt@mentat.com (Jerry Toporek)
CC: tcp-impl@cthulhu.engr.sgi.com
In-reply-to: <199710161547.IAA01853@feller.mentat.com> (jt@mentat.com)
Subject: Re: Out Of Band and Nagle
Organization: CNUCE-CNR, Via S.Maria 36, Pisa - Italy +39-50-593211
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   > 			      Suppose I note that I can force all the
   >    data through the urgent point onto the wire, because I have enough
   >    window, and I don't feel like it is so bad to exceed the congestion
   >    window by a few MSS because the data is really *urgent* and I want
   >    to do the application a favor and get it all delivered quickly...
   >    This would be bad.  I would prefer to say that the sender should
   >    not override any send rules when entering urgent mode, but a
   >    zero-length segment won't get the job done (reliably).
   >    
   > Do you mean that a one-byte segment would do the job?
   > 
   > Wouldn't all the data from that point up to the urgent pointer be
   > considered urgent data in that case?
   
   Yes, it would.  Does that mean I have to force it all onto the wire
   without delay?  

I don't think so.

		   Try this example...  48K of unsent data, a 50K send
   window, a 2920 byte congestion window, and I am waiting for an ACK
   to open the congestion window.  I get an urgent mode send.  All
   48K+1 bytes are now "urgent data".  It's all in window.  Do you
   want me to ignore the congestion window and send it all out
   immediately?  

Who?  Me?  Sure I wouldn't dare telling you how to send your data :-)

Seriously speaking, I was worried about the possible semantic change.
The user only wants to send one byte of urgent data, and TCP sends all
48KB of data as urgent.  Is this common behaviour on the part of a TCP
stack?  Expected by applications?

-- 
Francesco Potorti` (researcher)        Voice:    +39-50-593203
Computer Network Division              Operator: +39-50-593211
CNUCE-CNR, Via Santa Maria 36          Fax:      +39-50-904052
56126 Pisa - Italy                     Email:    F.Potorti@cnuce.cnr.it

From owner-tcp-impl@relay.engr.sgi.com  Thu Oct 16 10:26:29 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA27796 for tcp-impl-list; Thu, 16 Oct 1997 10:21:52 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA27773 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 16 Oct 1997 10:21:49 -0700
Received: from mentat.com (mentat.com [192.88.122.129]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id KAA25554
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 16 Oct 1997 10:21:48 -0700
	env-from (jt@mentat.com)
Received: from feller.mentat.com ([192.88.122.144]) by mentat.com (4.1/SMI-4.1)
	id AA19103; Thu, 16 Oct 97 10:20:06 PDT
Received: by feller.mentat.com (SMI-8.6/SMI-SVR4)
	id KAA01885; Thu, 16 Oct 1997 10:22:26 -0700
Date: Thu, 16 Oct 1997 10:22:26 -0700
From: jt@mentat.com (Jerry Toporek)
Message-Id: <199710161722.KAA01885@feller.mentat.com>
To: F.Potorti@cnuce.cnr.it
Subject: Re: Out Of Band and Nagle
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 
> Seriously speaking, I was worried about the possible semantic change.
> The user only wants to send one byte of urgent data, and TCP sends all
> 48KB of data as urgent.  Is this common behaviour on the part of a TCP
> stack?  Expected by applications?

It is required behavior for the TCP.

As for what is expected by the application, that depends on how weirdly
the API (the implementation of the API, to be more exact) has mapped TCP
urgent data into it's notion of urgent/expedited/out-of-band data, and on how
well the application writer understands that mapping.

jt



From owner-tcp-impl@relay.engr.sgi.com  Thu Oct 16 11:29:09 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA22831 for tcp-impl-list; Thu, 16 Oct 1997 11:27:34 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA22815 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 16 Oct 1997 11:27:32 -0700
Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.219]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA17577
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 16 Oct 1997 11:27:29 -0700
	env-from (raj@hpisrdq.cup.hp.com)
Received: from hpisrdq.cup.hp.com (hpindio.cup.hp.com [15.13.104.185])
	by palrel3.hp.com (8.8.5/8.8.5tis) with SMTP id LAA19477
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 16 Oct 1997 11:27:26 -0700 (PDT)
Received: from hpindio by hpisrdq.cup.hp.com with SMTP
	(1.38.193.4/15.5+IOS 3.20+cup+OMrelay) id AA16369; Thu, 16 Oct 1997 11:23:40 -0700
Message-Id: <34465BAC.1B8D@cup.hp.com>
Date: Thu, 16 Oct 1997 11:23:40 -0700
From: Rick Jones <raj@hpisrdq.cup.hp.com>
Organization: Hewlett-Packard Co.
X-Mailer: Mozilla 3.01 (X11; I; HP-UX A.09.05 9000/715)
Mime-Version: 1.0
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Out Of Band and Nagle
References: <199710161722.KAA01885@feller.mentat.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Can we consider as separate the issues of bypassing cwnd and bypassing
Nagle?

Based on the discussion thusfar, can we agree that there is rough
concensus that the sending of urgent data by an application should
temporarily override the Nagle algorithm?

rick jones

From owner-tcp-impl@relay.engr.sgi.com  Thu Oct 16 11:55:35 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA03267 for tcp-impl-list; Thu, 16 Oct 1997 11:53:56 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA03256 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 16 Oct 1997 11:53:55 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA00729
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 16 Oct 1997 11:53:53 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.7/8.8.5)
	id LAA11379; Thu, 16 Oct 1997 11:47:02 -0700 (PDT)
Message-Id: <199710161847.LAA11379@daffy.ee.lbl.gov>
To: Rick Jones <raj@hpisrdq.cup.hp.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Out Of Band and Nagle
In-reply-to: Your message of Thu, 16 Oct 1997 11:23:40 PDT.
Date: Thu, 16 Oct 1997 11:47:02 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Based on the discussion thusfar, can we agree that there is rough
> concensus that the sending of urgent data by an application should
> temporarily override the Nagle algorithm?

That's how it appears to me.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Thu Oct 16 18:14:19 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA15660 for tcp-impl-list; Thu, 16 Oct 1997 18:12:12 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA15654 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 16 Oct 1997 18:12:11 -0700
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id SAA26138
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 16 Oct 1997 18:12:10 -0700
	env-from (Allyn.Romanow@eng.Sun.COM)
Received: from sunmail1.Sun.COM ([129.145.1.2]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id SAA12724; Thu, 16 Oct 1997 18:11:21 -0700
Received: from jurassic.eng.sun.com by sunmail1.Sun.COM (SMI-8.6/SMI-4.1)
	id SAA10043; Thu, 16 Oct 1997 18:11:19 -0700
Received: from offshore.eng.sun.com (offshore [129.146.86.134])
	by jurassic.eng.sun.com (8.8.7+Sun.Alpha.7/8.8.7) with SMTP id SAA13191;
	Thu, 16 Oct 1997 18:11:19 -0700 (PDT)
Received: by offshore.eng.sun.com (SMI-8.6/SMI-SVR4)
	id SAA02951; Thu, 16 Oct 1997 18:02:01 -0700
Date: Thu, 16 Oct 1997 18:02:01 -0700
From: Allyn.Romanow@eng.Sun.COM (Allyn Romanow)
Message-Id: <199710170102.SAA02951@offshore.eng.sun.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: TCP SACK
Cc: end2end-interest@isi.edu, tcp-over-satellite@achtung.sp.trw.com
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Folks,

It's been awhile now that TCP SACK implementations have been available -
see http://www.psc.edu/networking/tcp.html for a list -
and I'm wondering what experience folks have had with the protocol.
What are the behavioral effects on the TCPs using SACK, 
TCPs not using SACK, and the overall effects on the network traffic?

There have been a few studies, (again see Matt Mathis' url above and
Sally Floyd's SACK page http://ftp.ee.lbl.gov/floyd/sacks.html), and
there have been some references to simulation work for use of SACK 
over satellites on the tcpsat list.  
I'm wondering what further experience people might have - ???
   
I've cc'ed the end2end and tcpsat mailing lists - but the intention is to
have any discussion that might arise be on the tcpimpl list- not replicated
on multiple lists.


thanks-
Allyn

From owner-tcp-impl@relay.engr.sgi.com  Fri Oct 17 02:30:30 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id CAA23417 for tcp-impl-list; Fri, 17 Oct 1997 02:28:57 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id CAA23411 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 17 Oct 1997 02:28:54 -0700
Received: from iiic.ethz.ch (rif-stud-a.iiic.ethz.ch [129.132.179.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id CAA12767
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 17 Oct 1997 02:28:46 -0700
	env-from (uhengart@iiic.ethz.ch)
Received: from raf35.iiic.ethz.ch (uhengart@raf35 [129.132.179.105]) by iiic.ethz.ch (8.8.4/8.7.1) with ESMTP id LAA28887 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 17 Oct 1997 11:28:44 +0200 (MET DST)
Received: from localhost (uhengart@localhost) by raf35.iiic.ethz.ch (8.7.1/8.7.1) with ESMTP id LAA14704 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 17 Oct 1997 11:28:42 +0200 (MET DST)
Message-Id: <199710170928.LAA14704@raf35.iiic.ethz.ch>
X-Authentication-Warning: raf35.iiic.ethz.ch: uhengart owned process doing -bs
X-Mailer: exmh version 1.6.7 5/3/96
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP SACK 
In-reply-to: Your message of "Thu, 16 Oct 1997 18:02:01 PDT."
             <199710170102.SAA02951@offshore.eng.sun.com> 
X-URL: http://www.vis.inf.ethz.ch/students/uhengart/
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Fri, 17 Oct 1997 11:28:41 +0200
From: Urs Beda Hengartner <uhengart@iiic.ethz.ch>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

On Thu, 16 Oct 1997 18:02:01 PDT Allyn Romanow writes
 > It's been awhile now that TCP SACK implementations have been 
available -
 > see http://www.psc.edu/networking/tcp.html for a list -
 > and I'm wondering what experience folks have had with the protocol.
 > What are the behavioral effects on the TCPs using SACK, 
 > TCPs not using SACK, and the overall effects on the network 
traffic?

We implemented a user-level transport protocol based on UDP with 
congestion control mechanisms similar to TCP. There is also a version 
which uses FACK [1]. Using tests over the internet, we noted that 
there were lots of unnecessary retransmissions with FACK. 

These were caused by not clearly separating between congestion 
control and data recovery (as proposed in [2]): Data recovery relies 
on the congestion control to be blocked long enough (about half an 
rtt because of halving cwnd) before beginning to retransmit not yet 
selectively acked packets. But towards the end of a recovery, very 
often the situation arised that a packet was retransmitted shortly 
before it was selectively acked. 

We also looked at the scoreboard algorithm from [2] where a packet is 
only retransmitted after that three acks selectively acking packets 
with higher sequence numbers have been received. There were almost no 
unnecessary retransmissions and the timeout rate didn't increase.

- Urs


[1] M. Mathis and J. Mahdavi, "Forward Acknowledgment: Refining TCP 
Congestion Control," Proceedings of ACM SIGCOMM `96, pp. 281-292, 
August 1996.

[2] M. Mathis and J. Mahdavi, "TCP Rate-Halving with Bounding 
Parameters," Technical Note, Obtain via: http://www.psc.edu/networking
/papers/FACKknotes/current/., October 1996.



From owner-tcp-impl@relay.engr.sgi.com  Fri Oct 17 06:19:05 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id GAA13995 for tcp-impl-list; Fri, 17 Oct 1997 06:17:38 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id GAA13988 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 17 Oct 1997 06:17:36 -0700
Received: from labinfo.iet.unipi.it (labinfo.iet.unipi.it [131.114.9.5]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id GAA24165
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 17 Oct 1997 06:17:23 -0700
	env-from (luigi@labinfo.iet.unipi.it)
Received: from localhost (luigi@localhost) by labinfo.iet.unipi.it (8.6.5/8.6.5) id MAA13754; Fri, 17 Oct 1997 12:59:45 +0100
From: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Message-Id: <199710171159.MAA13754@labinfo.iet.unipi.it>
Subject: Re: TCP SACK
To: Allyn.Romanow@eng.sun.com (Allyn Romanow)
Date: Fri, 17 Oct 1997 12:59:45 +0100 (MET)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199710170102.SAA02951@offshore.eng.sun.com> from "Allyn Romanow" at Oct 16, 97 06:01:42 pm
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 2876      
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Folks,
> 
> It's been awhile now that TCP SACK implementations have been available -
> see http://www.psc.edu/networking/tcp.html for a list -
> and I'm wondering what experience folks have had with the protocol.
> What are the behavioral effects on the TCPs using SACK, 
> TCPs not using SACK, and the overall effects on the network traffic?

i doubt that there are numbers to draw any conclusion. In last
september i enabled SACK on our www proxy (50-100K request per day,
for maybe 500..1000 different clients) and I have rarely seen any
access from clients requesting SACK (I log them).

After a few months I disabled SACK (see below), although I still
log incoming SACK options: in the last few days there is only one
client who tries to negotiate the use of SACK (except my own machines
of course).

It is noticeable that only once I have encountered a server with
SACK enabled, and this was the machine of some other guy who also
implemented SACK.

This might have to do with the fact that there are still stacks
around which do not like TCP options and popular servers tend to
disable SACK, RFC1323 and other options which might cause them to
lose clients.

Note that there are other things which have a good impact on traffic
(at least for web-type traffic), such as lowering the threshold
for fast retransmit when there are too few packets in transit. As
an example, here is an excerpt of netstat -p tcp on our www proxy:

    tcp:
        21485459 packets sent
                9066382 data packets (-1186795353 bytes)
                273802 data packets (293684047 bytes) retransmitted
		...
        20507038 packets received
                7930285 acks (for -1243676530 bytes)
                1083212 duplicate acks
		...
		846742 out-of-order packets (387268097 bytes)
		...
        1328417 connections established (including accepts)
        1428952 connections closed (including 390569 drops)
	...
        369509 retransmit timeouts
                57191 retransmit timeout on syn
                297393 retransmit timeout with 0 dup acks
                10908 retransmit timeout with 1 dup acks
                4705 retransmit timeout with 2 dup acks
        86724 fast retransmits
                60120 with 1 dup ack
                14233 with 2 dup ack
                12371 with 3 dup ack
        35570 newreno retrans

from the above you can see that we prevent a lot of timeouts at
the receivers without changing the overall behaviour of TCP.

	Cheers
	Luigi
-----------------------------+--------------------------------------
Luigi Rizzo                  |  Dip. di Ingegneria dell'Informazione
email: luigi@iet.unipi.it    |  Universita' di Pisa
tel: +39-50-568533           |  via Diotisalvi 2, 56126 PISA (Italy)
fax: +39-50-568522           |  http://www.iet.unipi.it/~luigi/
_____________________________|______________________________________

From owner-tcp-impl@relay.engr.sgi.com  Fri Oct 17 13:39:30 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA18030 for tcp-impl-list; Fri, 17 Oct 1997 13:36:59 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA18006 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 17 Oct 1997 13:36:56 -0700
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA14380
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 17 Oct 1997 13:36:55 -0700
	env-from (kcpoon@jurassic.eng.Sun.COM)
Received: from sunmail1.Sun.COM ([129.145.1.2]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id NAA00315; Fri, 17 Oct 1997 13:36:05 -0700
Received: from jurassic.eng.sun.com by sunmail1.Sun.COM (SMI-8.6/SMI-4.1)
	id NAA26122; Fri, 17 Oct 1997 13:36:03 -0700
Received: from shield (shield [129.146.83.81])
	by jurassic.eng.sun.com (8.8.7+Sun.Alpha.7/8.8.7) with SMTP id NAA23366;
	Fri, 17 Oct 1997 13:36:03 -0700 (PDT)
Date: Fri, 17 Oct 1997 13:36:02 -0700 (PDT)
From: Kacheong Poon <kcpoon@jurassic.eng.Sun.COM>
Reply-To: Kacheong Poon <kcpoon@jurassic.eng.Sun.COM>
Subject: Re: TCP SACK
To: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Cc: Allyn Romanow <Allyn.Romanow@Eng.Sun.COM>, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: "Your message with ID" <199710171159.MAA13754@labinfo.iet.unipi.it>
Message-ID: <Roam.SIMC.2.0.6.877120562.24836.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> This might have to do with the fact that there are still stacks
> around which do not like TCP options and popular servers tend to
> disable SACK, RFC1323 and other options which might cause them to
> lose clients.

Or there are just very few, if any, servers out there that are capable of
doing SACK, or RFC 1323.

> Note that there are other things which have a good impact on traffic
> (at least for web-type traffic), such as lowering the threshold
> for fast retransmit when there are too few packets in transit. As
> an example, here is an excerpt of netstat -p tcp on our www proxy:

Lowering the threshold may not be safe.  This can lead to a lot of unnecessary
"fast retransmissions."  I think it is safer to do the SACK way.  It is noted
in Fall and Floyd's "Simulation-based Comparisons of Tahoe, Reno, and SACK
TCP" that SACK implementations can make a more intelligent use of the first
and second dup ACKs.  After getting the first and second dup ACKs, SACK info
may indicate that packets have left the network and one or two new packets can
be sent, if window allows.  This in turn can generate more dup ACKs, assuming
the new packets get through, and fast retransmit can then be triggered. 
Actually, existing stacks can just increase cwnd by 1 MSS after getting the
first or second dup ACKs.  This also has the same effect.

							K. Poon.
							kcpoon@eng.sun.com



From owner-tcp-impl@relay.engr.sgi.com  Fri Oct 17 14:49:30 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA11119 for tcp-impl-list; Fri, 17 Oct 1997 14:47:41 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA11106 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 17 Oct 1997 14:47:39 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA08128
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 17 Oct 1997 14:47:36 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.7/8.8.5)
	id OAA13900; Fri, 17 Oct 1997 14:46:58 -0700 (PDT)
Message-Id: <199710172146.OAA13900@daffy.ee.lbl.gov>
To: Kacheong Poon <kcpoon@jurassic.eng.Sun.COM>
Cc: Luigi Rizzo <luigi@labinfo.iet.unipi.it>,
        Allyn Romanow <Allyn.Romanow@Eng.Sun.COM>,
        tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP SACK
In-reply-to: Your message of Fri, 17 Oct 1997 13:36:02 PDT.
Date: Fri, 17 Oct 1997 14:46:58 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Actually, existing stacks can just increase cwnd by 1 MSS after getting the
> first or second dup ACKs.  This also has the same effect.

(Providing they are careful to undo the increase later.)

From owner-tcp-impl@relay.engr.sgi.com  Fri Oct 17 23:12:51 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA18560 for tcp-impl-list; Fri, 17 Oct 1997 23:10:22 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA18547 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 17 Oct 1997 23:10:17 -0700
Received: from labinfo.iet.unipi.it (labinfo.iet.unipi.it [131.114.9.5]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id XAA18315
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 17 Oct 1997 23:10:10 -0700
	env-from (luigi@labinfo.iet.unipi.it)
Received: from localhost (luigi@localhost) by labinfo.iet.unipi.it (8.6.5/8.6.5) id FAA15282; Sat, 18 Oct 1997 05:53:28 +0100
From: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Message-Id: <199710180453.FAA15282@labinfo.iet.unipi.it>
Subject: Re: TCP SACK
To: kcpoon@jurassic.eng.Sun.COM
Date: Sat, 18 Oct 1997 05:53:28 +0100 (MET)
Cc: Allyn.Romanow@Eng.Sun.COM, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <Roam.SIMC.2.0.6.877120562.24836.kcpoon@jurassic> from "Kacheong Poon" at Oct 17, 97 01:35:43 pm
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 2720      
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > This might have to do with the fact that there are still stacks
> > around which do not like TCP options and popular servers tend to
> > disable SACK, RFC1323 and other options which might cause them to
> > lose clients.
> 
> Or there are just very few, if any, servers out there that are capable of
> doing SACK, or RFC 1323.

For SACK you are probably right, but for RFC1323 support is widespread
in OS typically used for servers. Yet I talked once with some people
who run a *big* server running FreeBSD and they explicitly turned
off RFC1323 because some clients were negatively affected by TCP
options. The net is big and full of old equipment...

> > Note that there are other things which have a good impact on traffic
> > (at least for web-type traffic), such as lowering the threshold
> > for fast retransmit when there are too few packets in transit. As
> > an example, here is an excerpt of netstat -p tcp on our www proxy:
> 
> Lowering the threshold may not be safe.  This can lead to a lot of unnecessary
> "fast retransmissions."  I think it is safer to do the SACK way.  It is noted

any choice you make might not be safe, it all depends on how it
works in practice -- and still, what works now might not work next
year.  E.g. when/if people will start using several ISDN channels
in parallel, then the chance of massive reorderings of packets
(being sent in parallel on all channels) becomes so high that even
the 'fast rxmit after 3 dups' becomes risky. Check Vern's thesis for a
description of the phenomenon with 2 channels -- where it looks at the
'packet pair' technique to estimate bandwidth.

This is an area where there is almost no experience in the field. I
just tried to give some data -- by no means conclusive -- which suggest
this (_conditionally_ lowering the fast rxmit threshold) as an
alternative to investigate. My data do not tell how many useless rxmit
I made, just that I could reduce the number of timeouts by 15-20%, and
that 75% of the timeouts on my server occur when there is only one
packet in transit.

Increasing the window on dup acks is safer provided you have more
traffic to transmit. But web traffic is made of short files -- median
and mean 5 and 10K respectively -- so chances are that you will not
have enough data to exploit the increased window size.

	Cheers
	Luigi
-----------------------------+--------------------------------------
Luigi Rizzo                  |  Dip. di Ingegneria dell'Informazione
email: luigi@iet.unipi.it    |  Universita' di Pisa
tel: +39-50-568533           |  via Diotisalvi 2, 56126 PISA (Italy)
fax: +39-50-568522           |  http://www.iet.unipi.it/~luigi/
_____________________________|______________________________________

From owner-tcp-impl@relay.engr.sgi.com  Sat Oct 18 00:01:46 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA21996 for tcp-impl-list; Fri, 17 Oct 1997 23:59:28 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA21991 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 17 Oct 1997 23:59:27 -0700
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id XAA24931
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 17 Oct 1997 23:59:26 -0700
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.7/8.8.5)
	id XAA14936; Fri, 17 Oct 1997 23:59:20 -0700 (PDT)
Message-Id: <199710180659.XAA14936@daffy.ee.lbl.gov>
To: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Cc: kcpoon@jurassic.eng.Sun.COM, Allyn.Romanow@Eng.Sun.COM,
        tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP SACK
In-reply-to: Your message of Sat, 18 Oct 1997 05:53:28 PDT.
Date: Fri, 17 Oct 1997 23:59:20 PDT
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> E.g. when/if people will start using several ISDN channels
> in parallel, then the chance of massive reorderings of packets
> (being sent in parallel on all channels) becomes so high that even
> the 'fast rxmit after 3 dups' becomes risky. Check Vern's thesis for a
> description of the phenomenon with 2 channels -- where it looks at the
> 'packet pair' technique to estimate bandwidth.

These particular paths *don't* lead to reordering.  The problem I analyzed
is that they don't introduce the usual bottleneck link spacing, so packet
pair completely fails in the presence of such bottlenecks.

Other mechanisms do indeed introduce reordering: router flutter due to load
balancing, and router "lulls" (leading to massive reorderings) are the main
ones I looked at.

> My data do not tell how many useless rxmit
> I made, just that I could reduce the number of timeouts by 15-20%, and
> that 75% of the timeouts on my server occur when there is only one
> packet in transit.

I looked in my thesis at the case of blindly lowering the threshold to
2 dups and found that it leads to a lot more unnecessary retransmissions
(while also avoiding a lot more timeouts).  I haven't yet looked at the
suggestion you made to me a while back, in which the TCP only lowers the
threshold when there are very few packets in flight.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Sat Oct 18 05:13:03 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id FAA15642 for tcp-impl-list; Sat, 18 Oct 1997 05:11:36 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA15634 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 18 Oct 1997 05:11:33 -0700
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id FAA02599
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 18 Oct 1997 05:11:27 -0700
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.5-q-beta3/8.7.1) with SMTP id NAA23115; Sat, 18 Oct 1997 13:07:43 +0100
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0xMX0l-0005FsC; Sat, 18 Oct 97 12:25 BST
Message-Id: <m0xMX0l-0005FsC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: TCP SACK
To: luigi@labinfo.iet.unipi.it (Luigi Rizzo)
Date: Sat, 18 Oct 1997 12:25:39 +0100 (BST)
Cc: kcpoon@jurassic.eng.Sun.COM, Allyn.Romanow@Eng.Sun.COM,
        tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199710180453.FAA15282@labinfo.iet.unipi.it> from "Luigi Rizzo" at Oct 18, 97 05:53:28 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > Or there are just very few, if any, servers out there that are capable of
> > doing SACK, or RFC 1323.
> 
> For SACK you are probably right, but for RFC1323 support is widespread
> in OS typically used for servers. Yet I talked once with some people
> who run a *big* server running FreeBSD and they explicitly turned
> off RFC1323 because some clients were negatively affected by TCP
> options. The net is big and full of old equipment...

I anticipate having every option we can do ON for IPv6 and OFF for IPv4
by default. I've given up hoping that IPv4 legacy systems will die off.

> year.  E.g. when/if people will start using several ISDN channels
> in parallel, then the chance of massive reorderings of packets
> (being sent in parallel on all channels) becomes so high that even
> the 'fast rxmit after 3 dups' becomes risky. Check Vern's thesis for a

This depends on your ISDN handling. MPP tends not to reorder as it fragments
frames. Most other systems also do ordering simply because stuff like IPX
suffers most spectacularly in some cases (notably NCP burst mode).

Alan


From owner-tcp-impl@relay.engr.sgi.com  Sat Oct 18 07:43:14 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA24797 for tcp-impl-list; Sat, 18 Oct 1997 07:41:36 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from mica.denver.sgi.com (mica.denver.sgi.com [169.238.67.6]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA24791 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 18 Oct 1997 07:41:33 -0700
Received: (from vjs@localhost) by mica.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id IAA21231 for tcp-impl@cthulhu.engr.sgi.com; Sat, 18 Oct 1997 08:41:27 -0600
Date: Sat, 18 Oct 1997 08:41:27 -0600
From: vjs@mica.denver.sgi.com (Vernon Schryver)
Message-Id: <199710181441.IAA21231@mica.denver.sgi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP SACK
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > year.  E.g. when/if people will start using several ISDN channels
> > in parallel, then the chance of massive reorderings of packets
> > (being sent in parallel on all channels) becomes so high that even
> > the 'fast rxmit after 3 dups' becomes risky. Check Vern's thesis for a
> 
> This depends on your ISDN handling. MPP tends not to reorder as it fragments
> frames. ...

That's an understatement.  One of the desgin goals of MP (RFC 1717) was
that absolutely no network packets are ever reordered.  Code to deal
with MP fragments is a lot messier than IP reassembly code because of
that requirement.  Any PPP system that fails in any way to maintain the
order of data is broken.  Some systems did not use MP over PPP, but
just sent IP packets at random over a bundle of PPP links, and did
reorder.


Vernon Schryver,  vjs@sgi.com


P.S. "MPP" is meaningless and I hope was a typo.  "MP" is what RFC 1717
 named the IETF PPP multilink protocol.  "MPPP" was the proprietary
 protocol of a big ISDN hub vendor for sending phone numbers for links
 in an MP bundle, since replaced by its proprietary "MP+", which was in
 turn replaced by the IETF boondoggle BACP.  "MLPPP" is the invention
 of trade rag ex-spurt consultants who never read RFCs and don't know
 and don't that the protocol is named "MP".

From owner-tcp-impl@relay.engr.sgi.com  Sat Oct 18 11:55:42 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA14132 for tcp-impl-list; Sat, 18 Oct 1997 11:54:11 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA14113 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 18 Oct 1997 11:54:08 -0700
Received: from lox.sandelman.ottawa.on.ca (lox.sandelman.ottawa.on.ca [205.233.54.146]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA25864
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 18 Oct 1997 11:54:05 -0700
	env-from (mcr@istari.sandelman.ottawa.on.ca)
Received: from istari.sandelman.ottawa.on.ca (istari.sandelman.ottawa.on.ca [205.233.54.136])
	by lox.sandelman.ottawa.on.ca (8.8.7/8.8.7) with ESMTP id OAA07544
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 18 Oct 1997 14:58:26 -0400 (EDT)
Received: from istari.sandelman.ottawa.on.ca ([[UNIX: localhost]]) by istari.sandelman.ottawa.on.ca (8.7.5/8.7.3) with ESMTP id OAA00765 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 18 Oct 1997 14:58:13 -0400 (EDT)
Message-Id: <199710181858.OAA00765@istari.sandelman.ottawa.on.ca>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP SACK 
In-reply-to: Your message of "Sat, 18 Oct 1997 05:53:28 BST."
             <199710180453.FAA15282@labinfo.iet.unipi.it> 
Date: Sat, 18 Oct 1997 14:58:12 -0400
From: "Michael C. Richardson" <mcr@sandelman.ottawa.on.ca>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

-----BEGIN PGP SIGNED MESSAGE-----


>>>>> "Luigi" == Luigi Rizzo <luigi@labinfo.iet.unipi.it> writes:
    Luigi> For SACK you are probably right, but for RFC1323 support is
    Luigi> widespread in OS typically used for servers. Yet I talked
    Luigi> once with some people who run a *big* server running
    Luigi> FreeBSD and they explicitly turned off RFC1323 because some
    Luigi> clients were negatively affected by TCP options. The net is
    Luigi> big and full of old equipment...

  I'm really confused by this. rfc1323 says:

      indicated that both sides understand the extension.  Furthermore,
      an extension option will be sent in a <SYN,ACK> segment only if
      the corresponding option was received in the initial <SYN>
      segment.

  So, if this is a server, it should never initiate a connection. Or
does this have to do with FTP and data ports?

   :!mcr!:            |  Network and security consulting/contract programming
   Michael Richardson |   I do IPsec policy code for SSH <http://www.ssh.fi/>
 Personal: <A HREF="http://www.sandelman.ottawa.on.ca/People/Michael_Richardson/Bio.html">mcr@sandelman.ottawa.on.ca</A>. PGP key available.
 Corporate: <A HREF="http://www.sandelman.ottawa.on.ca/SSW/">sales@sandelman.ottawa.on.ca</A>. 


-----BEGIN PGP SIGNATURE-----
Version: 2.6.3ia
Charset: latin1
Comment: Processed by Mailcrypt 3.4, an Emacs/PGP interface

iQB1AwUBNEkGw6ZpLyXYhL+BAQHtRQMAunJ8/jqhIHVAcJzlM45sh/ElDt/POD0Y
E0/CKaaDEfKJgpYnzkvrR4mQu28RIIKYZruATXaTk6kmlWMJr9UF3qMd7nyWEVpz
skCSq8NVsJgz2Hu94tonWvo0trYHIb23
=7SIg
-----END PGP SIGNATURE-----

From owner-tcp-impl@relay.engr.sgi.com  Mon Oct 20 12:23:17 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA27807 for tcp-impl-list; Mon, 20 Oct 1997 12:17:26 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA27576 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 20 Oct 1997 12:17:15 -0700
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA03370
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 20 Oct 1997 11:33:49 -0700
	env-from (chetan@protocol.ece.iisc.ernet.in)
Received: from ece.iisc.ernet.in by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id QAA07777; Mon, 20 Oct 1997 16:06:49 +0530
Received: from protocol.ece.iisc.ernet.in by ece.iisc.ernet.in (4.1/SMI-4.1)
	id AA29840; Mon, 20 Oct 97 16:06:49+0530
Received: from localhost (chetan@localhost)
	by protocol.ece.iisc.ernet.in (8.8.5/8.8.5) with SMTP id QAA08852
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 20 Oct 1997 16:09:41 GMT
Date: Mon, 20 Oct 1997 16:09:41 +0000 (GMT)
From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Silly Window avoidance
Message-Id: <Pine.LNX.3.95.971020160403.8708A-100000@protocol.ece.iisc.ernet.in>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Hello !

I came to know that Linux do not implement receiver side silly window
avoidance algorithm, but instead uses a BSD style algorithm. 
Can any one in the list, throw  some light on this

with thanks
chetan . S


E-mail chetan@protocol.ece.iisc.ernet.in

WEB PAGE - http://pclab.ece.iisc.ernet.in/chetan



From owner-tcp-impl@relay.engr.sgi.com  Mon Oct 20 23:20:49 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA04973 for tcp-impl-list; Mon, 20 Oct 1997 23:19:02 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA04951 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 20 Oct 1997 23:18:59 -0700
Received: from roma.axis.se (roma.axis.se [193.13.178.2]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id XAA20298
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 20 Oct 1997 23:18:57 -0700
	env-from (frax@axis.com)
Received: from pcfrax.axis.se (root@pcfrax.axis.se [171.16.4.90])
	by roma.axis.se (8.8.6/8.8.6) with ESMTP id IAA08301;
	Tue, 21 Oct 1997 08:18:33 +0200 (MEST)
Received: from localhost ([127.0.0.1]) by pcfrax.axis.se
	 with smtp (ident frax using rfc1413) id m0xNXf1-000TGvC
	(Debian Smail-3.2 1996-Jul-4 #2); Tue, 21 Oct 1997 08:19:23 +0200 (CEST)
Date: Tue, 21 Oct 1997 08:19:23 +0200 (CEST)
From: Fredrik Ax <frax@axis.com>
Reply-To: Fredrik Ax <frax@axis.com>
To: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Silly Window avoidance
In-Reply-To: <Pine.LNX.3.95.971020160403.8708A-100000@protocol.ece.iisc.ernet.in>
Message-ID: <Pine.LNX.3.96.971021080843.12941A-100000@pcfrax.axis.se>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=ISO-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


On Mon, 20 Oct 1997, Chetan Kumar wrote:

>=20
> Hello !
>=20

Hi,

> I came to know that Linux do not implement receiver side silly window
> avoidance algorithm, but instead uses a BSD style algorithm.=20
> Can any one in the list, throw  some light on this
>=20

I think the appropriate forum for this subject is the Linux Net-
Developers mailing list <netdev@nuclecu.unam.mx>.
To subscribe to the list send a mail with the phrase "subsribe netdev"=20
in the body to <Majordomo@nuclecu.unam.mx>.=20

BUT remember it's a list for developers, please do not
ask general network questions on it. It is meant to be a fast path
for developers to keep in touch with other programmers. =20


Fredrik Ax, Software Engineer
______________________________________________________________________
AXIS Communications AB=09=09=09Email: Fredrik.Ax@axis.com
Scheelev=E4gen 16       =09=09=09Phone: +46 46 270 18 66
S-223 70  LUND, SWEDEN=09=09=09Fax: +46 46 13 61 30







From owner-tcp-impl@relay.engr.sgi.com  Tue Oct 21 13:41:03 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA22627 for tcp-impl-list; Tue, 21 Oct 1997 13:39:01 -0700
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA22616 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 21 Oct 1997 13:38:57 -0700
Received: from mailman.cs.ucla.edu (Mailman.CS.UCLA.EDU [131.179.128.30]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA28336
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 21 Oct 1997 13:38:57 -0700
	env-from (bruno@CS.UCLA.EDU)
Received: from condor.cs.ucla.edu (condor.cs.ucla.edu [131.179.160.67])
	by mailman.cs.ucla.edu (UCLACS-3.0) with ESMTP id NAA26487;
	Tue, 21 Oct 1997 13:38:53 -0700 (PDT)
Received: from localhost (bruno@localhost)
	by condor.cs.ucla.edu (8.8.5/UCLACS-3.0) with SMTP id NAA20420;
	Tue, 21 Oct 1997 13:38:52 -0700 (PDT)
X-Authentication-Warning: condor.cs.ucla.edu: bruno owned process doing -bs
Date: Tue, 21 Oct 1997 13:38:51 -0700 (PDT)
From: Hemon Bruno <bruno@CS.UCLA.EDU>
Reply-To: Hemon Bruno <bruno@CS.UCLA.EDU>
To: Allyn Romanow <Allyn.Romanow@eng.Sun.COM>
cc: tcp-impl@cthulhu.engr.sgi.com, end2end-interest@isi.edu,
        tcp-over-satellite@achtung.sp.trw.com,
        Bruyeron Renaud <bruyeron@CS.UCLA.EDU>
Subject: Re: TCP SACK
In-Reply-To: <199710170102.SAA02951@offshore.eng.sun.com>
Message-ID: <Pine.SOL.3.96.971021103822.20101D-100000@condor.cs.ucla.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

We are 2 graduate students working for the UCLA Internet Research Lab. We
have been working on TCP Sack for almost one year now and our project is
in its final phase. We are currently writing a report on our experiments
and it should be ready in one month.

We have a web page for our project that can be found at :
	http://irl.cs.ucla.edu/sack.html
This page is under permanent construction and should be updated soon, but
it gives anyway a good idea of what we are working on.
When the report is done, it will be available on this page.

The first conclusions of our tests are that first, as expected, TCP Sack
has, in most cases, a better throughput than TCP Reno. This was confirmed
by experiments in our lab and over the Internet. We have quantified this
improvement for different delays and loss probabilities.
We have run more extensive tests in the case of long delay. We compared
the throughput of TCP Sack, for different loss probabilities, with the
throughput of TCP Reno and the maximum theoretical throughput. 
Finally, we studied the negative impact that TCP Sack could have on other
competing connections. Our conclusion is that this negative impact is
low. TCP Sack improves its throughput by using more efficiently the
bandwith, otherwise wasted by TCP Reno, but does not steal an unfair share
of bandwith from its competitors. 

We will keep on posting any important news concerning the development of
our project to the relevant mailing lists. We would like also to know
what are the research groups currently working on TCP Sack.


--------------------------------------------------------------------------
TCP Sack Project
Internet Research Lab
Computer Science Department at UCLA

Team    : Renaud Bruyeron   bruyeron@cs.ucla.edu
          Bruno Hemon       bruno@cs.ucla.edu
Advisor : Lixia Zhang       lixia@cs.ucla.edu
--------------------------------------------------------------------------
                 





From owner-tcp-impl@relay.engr.sgi.com  Mon Oct 27 19:28:40 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA02369 for tcp-impl-list; Mon, 27 Oct 1997 19:25:14 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA02335 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 27 Oct 1997 19:25:12 -0800
Received: from mailhost.yahoo.com (mailhost.yahoo.com [205.216.162.34]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id TAA19420
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 27 Oct 1997 19:25:11 -0800
	env-from (jh@yahoo-inc.com)
Received: from borogove.yahoo.com (borogove.yahoo.com [205.216.162.65])
	by mailhost.yahoo.com (8.8.7/8.8.6) with ESMTP id TAA09834
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 27 Oct 1997 19:25:10 -0800 (PST)
Received: from localhost (localhost [127.0.0.1])
	by borogove.yahoo.com (8.8.7/8.8.6) with SMTP id TAA05717
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 27 Oct 1997 19:25:10 -0800 (PST)
Message-Id: <199710280325.TAA05717@borogove.yahoo.com>
X-Authentication-Warning: borogove.yahoo.com: localhost [127.0.0.1] didn't use HELO protocol
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP SACK 
In-reply-to: Vern's message of "Fri, 17 Oct 1997 23:59:20 PDT."
             <199710180659.XAA14936@daffy.ee.lbl.gov> 
Date: Mon, 27 Oct 1997 19:25:10 -0800
From: John Hanley <jh@yahoo-inc.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I looked in my thesis at the case of blindly lowering the threshold to
> 2 dups and found that it leads to a lot more unnecessary retransmissions
> (while also avoiding a lot more timeouts).

Suppose that an implementor was keen to sometimes use a
threshold less than 3.  This can impact congestion avoidance.
At least one way to reduce the impact would be to send a tinygram:
a segment that is a fraction of MSS, but at least 100 bytes, say, so we
keep making some amount of forward progress.  It keeps the ACK clock going.

(Deciding how "tiny" is out of charter.)
(And yes, congestion comes in packet-rate and bit-rate forms... :-(


	Cheers,
	JH

From owner-tcp-impl@relay.engr.sgi.com  Thu Oct 30 10:28:43 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA07687 for tcp-impl-list; Thu, 30 Oct 1997 10:20:18 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA07635 for <tcp-impl@engr.sgi.com>; Thu, 30 Oct 1997 10:20:11 -0800
Received: from stpauli.amaonline.com (stpauli.amaonline.com [208.200.38.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id KAA11927
	for <tcp-impl@engr.sgi.com>; Thu, 30 Oct 1997 10:20:09 -0800
	env-from (lucero7@juno.com)
Date: Thu, 30 Oct 1997 10:20:09 -0800
From: lucero7@juno.com
Received: from 1Cust172.tnt21.dfw5.da.uu.net [208.254.191.172]
	(HELO moreleads.net.com)
	by stpauli.amaonline.com (AltaVista Mail V1.0/1.0 BL18 listener)
	id 0000_0061_3458_cec6_ae31;
	Thu, 30 Oct 1997 12:15:34 -0600
To: tcp-impl@engr.sgi.com
Subject: CREATE MULTIPLE STREAMS OF INCOME!!!
Message-Id: <199710301214.e-mail@moreleads.net.com.com>
Received: (from uudp@lcllhost!) by in2.i_b_m.net (8.6.9/8.6.9) id CFF569794 for <rodney@LAPD!.com>; Sun, 18 May 1997 01:12:39 GMT
Received: from tomsnet!.com (mh.tomsnet!.com [100.301.57.69]) by m4.tomsnet!.com (8.6.12/8.6.12) with ESMTP id PAA21932 
Received: from reb50.rs40_date.net (root@reb50.rs_date.net [289.36.1.176]) by tomsnet!.com (8.6.12/8.6.12) with ESMTP id PBA023891 for <zena@tomsnet!.com>;
Received: (from capt_domo@lclhost!) by pc.spark_er.net (8.7.3/6.7.3) id CFF34285 for planet_oreo_horizon; Sat, 17 May 2001 20:12:58 -0500 (CDT)
Received: from emoose.mail.n_bot.com (emoose.mx.n_bot.com [198.81.11.42]) by md.s#parpnet.net (8.7.4/8.7.3) with ESMTP id RAC035940 for <wayne_bobbit.com>;
Received: from clift.b89_crost.com (clift.b89_crost.com [199.3.12.256]) by dot.2_bycentric.net (8.8.5/04/01 3.26)) id LAT131787;
Received: from spr_most.bix.45neter!.com(204.332.183.71) by hars11.ix.45neter!.com via smapt (V1.3) id smr0029301;
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


CREATE MULTIPLE STREAMS OF INCOME!!!


Think about your future.
Learn  how to earn an enormous weekly income!!
Gain FINANCIAL FREEDOM AND IMPROVED HEALTH to enjoy it.
http://adnetmk.com/gemideas/
Email johnwooley@mailcity.com


From owner-tcp-impl@relay.engr.sgi.com  Wed Nov  5 23:31:43 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA09895 for tcp-impl-list; Wed, 5 Nov 1997 23:27:34 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA09888 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 5 Nov 1997 23:27:32 -0800
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id XAA18519
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 5 Nov 1997 23:25:53 -0800
	env-from (chetan@protocol.ece.iisc.ernet.in)
Received: from ece.iisc.ernet.in by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id MAA03863; Thu, 6 Nov 1997 12:54:05 +0530
Received: from protocol.ece.iisc.ernet.in by ece.iisc.ernet.in (4.1/SMI-4.1)
	id AA13961; Thu, 6 Nov 97 12:54:03+0530
Received: from localhost (chetan@localhost)
	by protocol.ece.iisc.ernet.in (8.8.5/8.8.5) with SMTP id MAA05960
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 6 Nov 1997 12:57:07 GMT
Date: Thu, 6 Nov 1997 12:57:07 +0000 (GMT)
From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: OUT PUT of TCPDUMP
Message-Id: <Pine.LNX.3.95.971106123844.5623A-100000@protocol.ece.iisc.ernet.in>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


The following is the output of tcpdump for connection between

wlan-> running redhat linux 2.0.30; running netscape to download a file
	from the proxy server.

ibmh8-> IMB AIX Version 4 , proxy server.

12:30:19.160434 ibmh8.3128 > wlan.16065: P 6144:6656(512) ack 1 win 16060 (ttl 60, id 3134)
12:30:19.180434 wlan.16065 > ibmh8.3128: . ack 6656 win 31744 (DF) (ttl 64, id 48911)
12:30:19.180434 wlan.16065 > ibmh8.3128: . ack 6656 win 31744 (DF) (ttl 63, id 48911)
12:30:19.200434 ibmh8.3128 > wlan.16065: P 6656:7168(512) ack 1 win 16060 (ttl 60, id 3137)
12:30:19.220434 wlan.16065 > ibmh8.3128: . ack 7168 win 31744 (DF) (ttl 64, id 48912)
12:30:19.220434 wlan.16065 > ibmh8.3128: . ack 7168 win 31744 (DF) (ttl 63, id 48912)
12:30:19.360434 ibmh8.3128 > wlan.16065: P 7168:7680(512) ack 1 win 16060 (ttl 60, id 3148)
12:30:19.380434 wlan.16065 > ibmh8.3128: . ack 7680 win 31744 (DF) (ttl 64, id 48913)
12:30:19.380434 wlan.16065 > ibmh8.3128: . ack 7680 win 31744 (DF) (ttl 63, id 48913)

Now my question is that as I can see my 'wlan' is sending two
acknowledgement for every tcp segment it is receiving from the proxy
server.

Both the acknowledgements are send at the same instant, as I can see from
the time stamp, also the ttl is reduced by one for the second
acknowledgement. 

Can anybody explain why this is happening.

 Thanks for any response

chetan . S

    ::::::::::: TREE SAVES THOSE WHO SAVE TREES ::::::::::::


E-mail chetan@protocol.ece.iisc.ernet.in

WEB PAGE - http://pclab.ece.iisc.ernet.in/chetan

Snail mail
 #104 ,East Park Road,
8 th Cross,Malleshwarm,
Bangalore,
Karnataka,India.
pin 560003.

Phone 
	work place
		(080)3092282
	res.
		(080)3349218      
		(080)3347220


From owner-tcp-impl@relay.engr.sgi.com  Thu Nov  6 00:02:17 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA13629 for tcp-impl-list; Wed, 5 Nov 1997 23:57:51 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA13624 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 5 Nov 1997 23:57:49 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id XAA23731
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 5 Nov 1997 23:57:48 -0800
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id XAA24528; Wed, 5 Nov 1997 23:57:22 -0800 (PST)
Message-Id: <199711060757.XAA24528@daffy.ee.lbl.gov>
To: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: OUT PUT of TCPDUMP
In-reply-to: Your message of Thu, 06 Nov 1997 12:57:07 PST.
Date: Wed, 05 Nov 1997 23:57:22 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 12:30:19.380434 wlan.16065 > ibmh8.3128: . ack 7680 win 31744 (DF) (ttl 64, id 48913)
> 12:30:19.380434 wlan.16065 > ibmh8.3128: . ack 7680 win 31744 (DF) (ttl 63, id 48913)
> 
> Now my question is that as I can see my 'wlan' is sending two
> acknowledgement for every tcp segment it is receiving from the proxy
> server.

Actually, it is just sending one acknowledgement, which you can tell because
the two acks have the same IP ID field.

But that single ack is passing by the packet filter twice.  It is possible
that the ack is getting replicated by the network - I've observed a similar
pattern to what you show, with a trace at the receiver showing each packet
arriving twice.  I never was able to identify the mechanism causing this,
though.

> Both the acknowledgements are send at the same instant

That's just an artifact of the limited resolution of the clock used
by the packet filter (which clearly only advances every 20 msec).

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Thu Nov  6 01:42:40 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA28051 for tcp-impl-list; Thu, 6 Nov 1997 01:39:36 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id BAA28046 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 6 Nov 1997 01:39:32 -0800
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id BAA09360
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 6 Nov 1997 01:33:37 -0800
	env-from (chetan@protocol.ece.iisc.ernet.in)
Received: from ece.iisc.ernet.in by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id OAA08524; Thu, 6 Nov 1997 14:48:28 +0530
Received: from protocol.ece.iisc.ernet.in by ece.iisc.ernet.in (4.1/SMI-4.1)
	id AA19919; Thu, 6 Nov 97 14:48:27+0530
Received: from localhost (chetan@localhost)
	by protocol.ece.iisc.ernet.in (8.8.5/8.8.5) with SMTP id OAA08878;
	Thu, 6 Nov 1997 14:48:33 GMT
Date: Thu, 6 Nov 1997 14:48:33 +0000 (GMT)
From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
To: Vern Paxson <vern@ee.lbl.gov>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: OUT PUT of TCPDUMP
In-Reply-To: <199711060757.XAA24528@daffy.ee.lbl.gov>
Message-Id: <Pine.LNX.3.95.971106144523.8729A-100000@protocol.ece.iisc.ernet.in>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk



On Wed, 5 Nov 1997, Vern Paxson wrote:

*>> 12:30:19.380434 wlan.16065 > ibmh8.3128: . ack 7680 win 31744 (DF) (ttl 64, id 48913)
*>> 12:30:19.380434 wlan.16065 > ibmh8.3128: . ack 7680 win 31744 (DF) (ttl 63, id 48913)
*>> 
*>Actually, it is just sending one acknowledgement, which you can tell because
*>the two acks have the same IP ID field.
*>
*>But that single ack is passing by the packet filter twice.  It is possible
*>that the ack is getting replicated by the network - I've observed a similar
*>pattern to what you show, with a trace at the receiver showing each packet
*>arriving twice.  I never was able to identify the mechanism causing this,
*>though.
*>
*>		Vern
*>

But what do say about the ttl ? As I can see the ttl is getting reduced by
one for the second ack.


From owner-tcp-impl@relay.engr.sgi.com  Thu Nov  6 08:00:55 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA07351 for tcp-impl-list; Thu, 6 Nov 1997 07:55:12 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA07309 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 6 Nov 1997 07:55:04 -0800
Received: from tor.securecomputing.com (tor.securecomputing.com [199.71.190.98]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA17932
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 6 Nov 1997 07:55:02 -0800
	env-from (chk@tor.securecomputing.com)
Received: by janus.tor.securecomputing.com id <11649>; Thu, 6 Nov 1997 10:54:24 -0500
Message-Id: <97Nov6.105424est.11649@janus.tor.securecomputing.com>
To: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
cc: Vern Paxson <vern@ee.lbl.gov>, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: OUT PUT of TCPDUMP 
References: <Pine.LNX.3.95.971106144523.8729A-100000@protocol.ece.iisc.ernet.in>
In-reply-to: chetan's message of "Thu, 06 Nov 1997 09:48:33 -0500".
	 <Pine.LNX.3.95.971106144523.8729A-100000@protocol.ece.iisc.ernet.in> 
From: "C. Harald Koch" <chk@utcc.utoronto.ca>
X-uri: <URL:http://chk.home.ml.org/>
X-Face: )@F:jK?*}hv!eJ}*r*0DD"k8x1.d#i>7`ETe2;hSD2T!:Fh#wu`0pW7lO|Dfe'AbyNy[\Pw
 z'.bAtgTM!+iq2$yXiv4gf<:D*rZ-|f$\YQi7"D"=CG!JB?[^_7v>8Mm;z:NJ7pss)l__Cw+.>xUJ)
 did@Pr9
Date: Thu, 6 Nov 1997 10:53:33 -0500
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

In message <Pine.LNX.3.95.971106144523.8729A-100000@protocol.ece.iisc.ernet.in>, Chetan Kumar writes:
> 
> But what do say about the ttl ? As I can see the ttl is getting reduced by
> one for the second ack.

tcpdump -e is very useful in these cases; you often find that the outbound
packet is going direct, while the inbound packet is bouncing off a router.
This is usually caused by routing and/or netmask configuration errors on the
remote host (in this case wlan).

-- 
Harald

From owner-tcp-impl@relay.engr.sgi.com  Thu Nov  6 09:00:25 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA21515 for tcp-impl-list; Thu, 6 Nov 1997 08:56:03 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA21501 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 6 Nov 1997 08:56:01 -0800
Received: from olympus.eecs.umich.edu (olympus.eecs.umich.edu [141.213.8.56]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA05017
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 6 Nov 1997 08:56:00 -0800
	env-from (wuchang@eecs.umich.edu)
Received: from eecs.umich.edu (olympus.eecs.umich.edu [141.213.8.56])
	by olympus.eecs.umich.edu (8.8.7/8.8.7) with ESMTP id LAA04947;
	Thu, 6 Nov 1997 11:52:37 -0500 (EST)
Message-ID: <3461F5D4.7F048C92@eecs.umich.edu>
Date: Thu, 06 Nov 1997 11:52:36 -0500
From: Wu-chang Feng <wuchang@eecs.umich.edu>
Organization: University of Michigan
X-Mailer: Mozilla 4.02 [en] (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
CC: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: OUT PUT of TCPDUMP
References: <Pine.LNX.3.95.971106123844.5623A-100000@protocol.ece.iisc.ernet.in>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Now my question is that as I can see my 'wlan' is sending two
> acknowledgement for every tcp segment it is receiving from the proxy
> server.

Are you running tcpdump on the same machine (i.e. wlan?)
I had a problem similar to this where, on a token ring network,
the filter would see the ACK initially sent out as well as a copy
of the ACK as it came back around the token ring.  Running
tcpdump on a different machine on the token ring network,
eliminated the problem.  This wouldn't explain the identical
timestamp, however.  The duplicates I saw in the output had
different timestamps.

Wu


From owner-tcp-impl@relay.engr.sgi.com  Thu Nov  6 12:46:42 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA08568 for tcp-impl-list; Thu, 6 Nov 1997 12:39:54 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA08553 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 6 Nov 1997 12:39:49 -0800
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA20166
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 6 Nov 1997 12:38:39 -0800
	env-from (chetan@protocol.ece.iisc.ernet.in)
Received: from ece.iisc.ernet.in by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id OAA08524; Thu, 6 Nov 1997 14:48:28 +0530
Received: from protocol.ece.iisc.ernet.in by ece.iisc.ernet.in (4.1/SMI-4.1)
	id AA19919; Thu, 6 Nov 97 14:48:27+0530
Received: from localhost (chetan@localhost)
	by protocol.ece.iisc.ernet.in (8.8.5/8.8.5) with SMTP id OAA08878;
	Thu, 6 Nov 1997 14:48:33 GMT
Date: Thu, 6 Nov 1997 14:48:33 +0000 (GMT)
From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
To: Vern Paxson <vern@ee.lbl.gov>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: OUT PUT of TCPDUMP
In-Reply-To: <199711060757.XAA24528@daffy.ee.lbl.gov>
Message-Id: <Pine.LNX.3.95.971106144523.8729A-100000@protocol.ece.iisc.ernet.in>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk



On Wed, 5 Nov 1997, Vern Paxson wrote:

*>> 12:30:19.380434 wlan.16065 > ibmh8.3128: . ack 7680 win 31744 (DF) (ttl 64, id 48913)
*>> 12:30:19.380434 wlan.16065 > ibmh8.3128: . ack 7680 win 31744 (DF) (ttl 63, id 48913)
*>> 
*>Actually, it is just sending one acknowledgement, which you can tell because
*>the two acks have the same IP ID field.
*>
*>But that single ack is passing by the packet filter twice.  It is possible
*>that the ack is getting replicated by the network - I've observed a similar
*>pattern to what you show, with a trace at the receiver showing each packet
*>arriving twice.  I never was able to identify the mechanism causing this,
*>though.
*>
*>		Vern
*>

But what do say about the ttl ? As I can see the ttl is getting reduced by
one for the second ack.


From owner-tcp-impl@relay.engr.sgi.com  Thu Nov  6 13:40:58 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA28232 for tcp-impl-list; Thu, 6 Nov 1997 13:33:37 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA28158 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 6 Nov 1997 13:33:29 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA06995
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 6 Nov 1997 13:33:26 -0800
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id NAA25814; Thu, 6 Nov 1997 13:32:38 -0800 (PST)
Message-Id: <199711062132.NAA25814@daffy.ee.lbl.gov>
To: "C. Harald Koch" <chk@utcc.utoronto.ca>
Cc: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>,
        tcp-impl@cthulhu.engr.sgi.com
Subject: Re: OUT PUT of TCPDUMP 
In-reply-to: Your message of Thu, 06 Nov 1997 10:53:33 PST.
Date: Thu, 06 Nov 1997 13:32:37 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> tcpdump -e is very useful in these cases ...

Good point!

Here's the trace I mentioned before, as recorded at the sender:

	23:21:27.468911 0:0:c0:e5:54:8e 0:0:c:38:a8:2b ip 566: 
			ABC.7505 > XYZ.7505: . 91137:91649(512)
				ack 1 win 4096 (ttl 60, id 45658)

	23:21:27.469461 0:0:c0:e5:54:8e 0:0:c:38:a8:2b ip 566:
			ABC.7505 > XYZ.7505: . 91649:92161(512)
				ack 1 win 4096 (ttl 60, id 45659)

	23:21:27.473008 0:0:c0:d2:3e:96 0:0:c:38:a8:2b ip 566:
			ABC.7505 > XYZ.7505: . 91137:91649(512)
				ack 1 win 4096 (ttl 59, id 45658)

The sender transmits 91137:91649 with IP ID 45658 and TTL 60.  Soon
after it transmits 91649:92161 with IP ID 45659.  Then the packet filter
sees 91137:91649 again, only this time with TTL 59 (but same ID), and
coming from a different link-layer address (but, surprisingly, headed
to the same link-layer address, so not a simple case of the first-hop
router redirecting it).

Here's the same traffic recorded at the receiver (clocks are not
closely synchronized):

	23:21:27.370635 0:0:c:d:ff:32 8:0:20:23:19:e1 ip 566:
			ABC.7505 > XYZ.7505: . 91137:91649(512)
				ack 1 win 4096 (ttl 52, id 45658)

	23:21:27.373372 0:0:c:d:ff:32 8:0:20:23:19:e1 ip 566:
			ABC.7505 > XYZ.7505: . 91649:92161(512)
				ack 1 win 4096 (ttl 52, id 45659)

	23:21:27.385453 0:0:c:d:ff:32 8:0:20:23:19:e1 ip 566:
			ABC.7505 > XYZ.7505: . 91137:91649(512)
				ack 1 win 4096 (ttl 51, id 45658)

Clearly, the packet has been replicated, as it arrives twice.

Wu-chang Feng writes:

> The duplicates I saw in the output had different timestamps.

I think that's a red herring in the case Chetan's describing - in the
example above, if the clock resolution were 10 msec, then all of the
sender-side packets would've had the same timestamp too.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Thu Nov  6 13:46:59 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA02585 for tcp-impl-list; Thu, 6 Nov 1997 13:44:01 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA02574 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 6 Nov 1997 13:43:58 -0800
Received: from tor.securecomputing.com (tor.securecomputing.com [199.71.190.98]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA10933
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 6 Nov 1997 13:43:56 -0800
	env-from (chk@tor.securecomputing.com)
Received: by janus.tor.securecomputing.com id <11654>; Thu, 6 Nov 1997 16:43:51 -0500
Message-Id: <97Nov6.164351est.11654@janus.tor.securecomputing.com>
X-Mailer: exmh version 2.0delta 6/3/97
To: Vern Paxson <vern@ee.lbl.gov>
cc: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>,
        tcp-impl@cthulhu.engr.sgi.com
Subject: Re: OUT PUT of TCPDUMP 
References: <199711062132.NAA25814@daffy.ee.lbl.gov>
In-reply-to: vern's message of "Thu, 06 Nov 1997 16:32:37 -0500".
	 <199711062132.NAA25814@daffy.ee.lbl.gov> 
From: "C. Harald Koch" <chk@utcc.utoronto.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 6 Nov 1997 16:43:22 -0500
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

In message <199711062132.NAA25814@daffy.ee.lbl.gov>, Vern Paxson writes:

> The sender transmits 91137:91649 with IP ID 45658 and TTL 60.  Soon
> after it transmits 91649:92161 with IP ID 45659.  Then the packet filter
> sees 91137:91649 again, only this time with TTL 59 (but same ID), and
> coming from a different link-layer address (but, surprisingly, headed
> to the same link-layer address, so not a simple case of the first-hop
> router redirecting it).

Some Linux boxes will replicate packets like this if an interface gets set to 
promiscuous mode; I just saw a similar event on our DMZ caused by a Linux 
machine replicating traffic.

-- 
Harald Koch <chk@utcc.utoronto.ca>


From owner-tcp-impl@relay.engr.sgi.com  Thu Nov  6 17:28:06 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA12607 for tcp-impl-list; Thu, 6 Nov 1997 17:24:03 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA12594 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 6 Nov 1997 17:24:01 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA16576
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 6 Nov 1997 17:23:59 -0800
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id RAA26767; Thu, 6 Nov 1997 17:23:54 -0800 (PST)
Message-Id: <199711070123.RAA26767@daffy.ee.lbl.gov>
To: "C. Harald Koch" <chk@utcc.utoronto.ca>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: OUT PUT of TCPDUMP 
In-reply-to: Your message of Thu, 06 Nov 1997 16:43:22 PST.
Date: Thu, 06 Nov 1997 17:23:54 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Some Linux boxes will replicate packets like this if an interface gets set to 
> promiscuous mode; I just saw a similar event on our DMZ caused by a Linux 
> machine replicating traffic.

Interesting.

In this case, neither machine was running Linux (they were BSDI 1.1 and
Solaris 2.4).

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Fri Nov  7 17:41:20 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA29730 for tcp-impl-list; Fri, 7 Nov 1997 17:36:54 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA29724 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 7 Nov 1997 17:36:52 -0800
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA09790
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 7 Nov 1997 17:36:04 -0800
	env-from (chetan@protocol.ece.iisc.ernet.in)
Received: from ece.iisc.ernet.in by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id QAA04816; Fri, 7 Nov 1997 16:46:41 +0530
Received: from protocol.ece.iisc.ernet.in by ece.iisc.ernet.in (4.1/SMI-4.1)
	id AA16376; Fri, 7 Nov 97 16:46:40+0530
Received: from localhost (chetan@localhost)
	by protocol.ece.iisc.ernet.in (8.8.5/8.8.5) with SMTP id QAA09414;
	Fri, 7 Nov 1997 16:49:29 GMT
Date: Fri, 7 Nov 1997 16:49:29 +0000 (GMT)
From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
Reply-To: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
To: Vern Paxson <vern@ee.lbl.gov>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: OUT PUT of TCPDUMP 
In-Reply-To: <199711062132.NAA25814@daffy.ee.lbl.gov>
Message-Id: <Pine.LNX.3.95.971107163933.9176A-100000@protocol.ece.iisc.ernet.in>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Here is somthing I got with -e option in TCPDUMP 

2:60:8c:2c:fc:86 0:40:5:1d:e9:2e ip 566: ibmh8.3128 > wlan.3367: P
311296:311808(512) ack 1 win 16060 (ttl 60, id 5084)

0:40:5:1d:e9:2e 0:0:c:19:31:b8 ip 54: wlan.3367 > ibmh8.3128: . ack 311808
win 31744 (DF) (ttl 64, id 45844)

0:0:c:19:31:b8 2:60:8c:2c:fc:86 ip 60: wlan.3367 > ibmh8.3128: . ack
311808 win 31744 (DF) (ttl 63, id 45844)



2:60:8c:2c:fc:86 0:40:5:1d:e9:2e ip 566: ibmh8.3128 > wlan.3367: P
311808:312320(512) ack 1 win 16060 (ttl 60, id 5086)

0:40:5:1d:e9:2e 0:0:c:19:31:b8 ip 54: wlan.3367 > ibmh8.3128: . ack 312320
win 31744 (DF) (ttl 64, id 45849)

0:0:c:19:31:b8 2:60:8c:2c:fc:86 ip 60: wlan.3367 >ibmh8.3128: . ack
312320 win 31744 (DF) (ttl 63, id 45849)


In my case it is clearly, the first hop router redirecting. Also, 
I confirmed that 0:0:c:19:31:b8 is the hardware address of the default
router on my 'wlan'. So there is no duplecate of packets (?).   

chetan . S




From owner-tcp-impl@relay.engr.sgi.com  Sat Nov  8 17:24:08 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA01191 for tcp-impl-list; Sat, 8 Nov 1997 17:18:36 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA01182 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 8 Nov 1997 17:18:31 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA29987
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 8 Nov 1997 17:18:29 -0800
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (centre.swanlink.ukuu.org.uk [137.44.10.205]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id BAA25153; Sun, 9 Nov 1997 01:12:23 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0xUKuv-0005GNC; Sun, 9 Nov 97 00:07 GMT
Message-Id: <m0xUKuv-0005GNC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: OUT PUT of TCPDUMP
To: chk@utcc.utoronto.ca (C. Harald Koch)
Date: Sun, 9 Nov 1997 00:07:52 +0000 (GMT)
Cc: vern@ee.lbl.gov, chetan@protocol.ece.iisc.ernet.in,
        tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <97Nov6.164351est.11654@janus.tor.securecomputing.com> from "C. Harald Koch" at Nov 6, 97 04:43:22 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Some Linux boxes will replicate packets like this if an interface gets set to 
> promiscuous mode; I just saw a similar event on our DMZ caused by a Linux 
> machine replicating traffic.

No.

Linux will respond to packets addressed to it but with the wrong MAC address
in 2.0.x (probably not 2.0.32 when it appears). It will not however forward
frames for others that fall into this category


From owner-tcp-impl@relay.engr.sgi.com  Tue Nov 11 03:20:50 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id DAA02520 for tcp-impl-list; Tue, 11 Nov 1997 03:16:04 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id DAA02512 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 11 Nov 1997 03:16:02 -0800
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id DAA03614
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 11 Nov 1997 03:14:29 -0800
	env-from (chetan@protocol.ece.iisc.ernet.in)
Received: from ece.iisc.ernet.in by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id QAA08705; Tue, 11 Nov 1997 16:43:52 +0530
Received: from protocol.ece.iisc.ernet.in by ece.iisc.ernet.in (4.1/SMI-4.1)
	id AA19167; Tue, 11 Nov 97 16:43:50+0530
Received: from localhost (chetan@localhost)
	by protocol.ece.iisc.ernet.in (8.8.5/8.8.5) with SMTP id QAA28612
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 11 Nov 1997 16:46:46 GMT
Date: Tue, 11 Nov 1997 16:46:46 +0000 (GMT)
From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: What does IW=1 mean
Message-Id: <Pine.LNX.3.95.971111164013.28526A-100000@protocol.ece.iisc.ernet.in>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Hello !
 
 I know this is a silly question but, I want to get an expert solution
from the list.
 
I know that TCP is byte-stream protocol. Now does IW=1 mean we are going
to have a initial window of size 1 byte. If yes to reach just MSS we have
to wait for 10*rtt. Is this right ? or Have I gone wrong some where ?

Thanks for any comments
chetan . S

    ::::::::::: TREE SAVES THOSE WHO SAVE TREES ::::::::::::


E-mail chetan@protocol.ece.iisc.ernet.in

WEB PAGE - http://pclab.ece.iisc.ernet.in/chetan

Snail mail
 #104 ,East Park Road,
8 th Cross,Malleshwarm,
Bangalore,
Karnataka,India.
pin 560003.

Phone 
	work place
		(080)3092282
	res.
		(080)3349218      
		(080)3347220


From owner-tcp-impl@relay.engr.sgi.com  Tue Nov 11 05:45:07 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id FAA14595 for tcp-impl-list; Tue, 11 Nov 1997 05:41:18 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA14590 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 11 Nov 1997 05:41:15 -0800
Received: from zero.aec.at (zero.aec.at [193.170.192.102]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id FAA25997
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 11 Nov 1997 05:41:11 -0800
	env-from (andi@zero.aec.at)
Received: (qmail 19351 invoked by uid 573); 11 Nov 1997 13:40:59 -0000
To: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: What does IW=1 mean
References: <Pine.LNX.3.95.971111164013.28526A-100000@protocol.ece.iisc.ernet.in>
From: Andi Kleen <ak@muc.de>
Date: 11 Nov 1997 14:40:59 +0100
In-Reply-To: Chetan Kumar's message of Tue, 11 Nov 1997 16:46:46 +0000 (GMT)
Message-ID: <k2affb35w4.fsf@zero.aec.at>
Lines: 15
X-Mailer: Gnus v5.4.41/Emacs 19.34
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Chetan Kumar <chetan@protocol.ece.iisc.ernet.in> writes:

> Hello !
>  
>  I know this is a silly question but, I want to get an expert solution
> from the list.
>  
> I know that TCP is byte-stream protocol. Now does IW=1 mean we are going
> to have a initial window of size 1 byte. If yes to reach just MSS we have
> to wait for 10*rtt. Is this right ? or Have I gone wrong some where ?

I think IW=1 means initial congestion window size of 1 MSS, not 1 byte.

-Andi
 

From owner-tcp-impl@relay.engr.sgi.com  Wed Nov 12 03:49:38 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id DAA12165 for tcp-impl-list; Wed, 12 Nov 1997 03:42:35 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id DAA12160 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 12 Nov 1997 03:42:33 -0800
Received: from dimail.epfl.ch (dimail.epfl.ch [128.178.79.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id DAA06421
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 12 Nov 1997 03:42:30 -0800
	env-from (Sergio.Valle@studi.epfl.ch)
Received: from disun58.epfl.ch by dimail.epfl.ch (SMI-8.6/EPFL-DI-5.3-S-MX)
	id MAA28467; Wed, 12 Nov 1997 12:40:18 +0100
Received: from localhost by disun58.epfl.ch (SMI-8.6/EPFL-DI-5.3-MX)
	id MAA21569; Wed, 12 Nov 1997 12:40:16 +0100
From: Sergio.Valle@studi.epfl.ch (Sergio Valle)
Message-Id: <199711121140.MAA21569@disun58.epfl.ch>
X-Mailer: exmh version 1.6.6 3/24/96
To: tcp-impl@cthulhu.engr.sgi.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Date: Wed, 12 Nov 1997 12:40:15 +0100
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Bonjour,

I am a student at Swiss Federal Institute of Technology and I must choose=

a TCP implementation (public domain) for recoding it in C++ (maybe make
some changes). Are there comparison studies of different versions of TCP =
?
If anybody has a suggestion please don't hesitate. =


Merci beaucoup,

	Sergio Valle , Lausanne 12 Novembre 1997



From owner-tcp-impl@relay.engr.sgi.com  Wed Nov 12 07:17:53 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA02077 for tcp-impl-list; Wed, 12 Nov 1997 07:11:13 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA02067 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 12 Nov 1997 07:11:10 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA14501
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 12 Nov 1997 07:11:09 -0800
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (the-great-packet-bucket-in-the-sky [163.164.160.21] (may be forged)) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id PAA16621; Wed, 12 Nov 1997 15:10:59 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0xVeUv-0005FrC; Wed, 12 Nov 97 15:14 GMT
Message-Id: <m0xVeUv-0005FrC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: your mail
To: Sergio.Valle@studi.epfl.ch (Sergio Valle)
Date: Wed, 12 Nov 1997 15:14:28 +0000 (GMT)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199711121140.MAA21569@disun58.epfl.ch> from "Sergio Valle" at Nov 12, 97 12:40:15 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I am a student at Swiss Federal Institute of Technology and I must choose=
> 
> a TCP implementation (public domain) for recoding it in C++ (maybe make
> some changes). Are there comparison studies of different versions of TCP =
> ?
> If anybody has a suggestion please don't hesitate. =

I dont know any "public domain" tcp stacks. There are severa free ones and
if you wanted to rewrite one in C++ I guess the BSD stack is by far the most
heavily performance analysed in terms of published papers




From owner-tcp-impl@relay.engr.sgi.com  Wed Nov 12 09:29:20 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA01167 for tcp-impl-list; Wed, 12 Nov 1997 09:20:41 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA01156 for <tcp-impl@engr.sgi.com>; Wed, 12 Nov 1997 09:20:38 -0800
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA23964
	for <tcp-impl@engr.sgi.com>; Wed, 12 Nov 1997 09:20:36 -0800
	env-from (sparker@fstop.Eng.Sun.COM)
Received: from Eng.Sun.COM ([129.146.1.25]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id JAA18729 for <tcp-impl@engr.sgi.com>; Wed, 12 Nov 1997 09:22:27 -0800
Received: from fstop. (fstop.Eng.Sun.COM [192.9.204.16])
	by Eng.Sun.COM (SMI-8.6/SMI-5.3) with SMTP id JAA12090
	for <tcp-impl@engr.sgi.com>; Wed, 12 Nov 1997 09:20:20 -0800
Received: from fstop by fstop. (SMI-8.6/SMI-SVR4)
	id JAA02008; Wed, 12 Nov 1997 09:20:15 -0800
Message-Id: <199711121720.JAA02008@fstop.>
From: sparker@Eng.Sun.COM
To: tcp-impl@engr.sgi.com
Subject: New draft of TCP tools coming soon...
Date: Wed, 12 Nov 1997 09:20:15 -0800
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


The new draft incorporates the following changes of substance:

* 'dbs' tool is added.  This tool, by Yukio Murayama which is something
like a 'ttcp' except it orchestrates multiple transfers between N hosts
driven by a script file.  See http://www.ai3.net/products/dbs.

* In an attempt to clarify the scope of the document I've added to the
introduction:

	This document lists only tools which can evaluate one or more
	implementations, and which can provide some specific results
	which describe or evalute the TCP being tested.

(Wordsmithing suggestions here are welcome.)

This was motivated by evaluating SPIN and Richard Stevens' sock program
for inclusion.  I felt neither should be included, and here's why.

SPIN cannot operate on an implementation.  Although a very cool tool, it
requires you transliterate the protocol machine into the validation
language.  At this point you are no longer testing the implementation
but an interpretation of it.

The sock program was nominated, basically, because it has lots of knobs,
some of which affect the TCP behavior.  However, I felt since sock had
little if any way to detect whether the TCP beneath it actually behaved
in any particular way, it wasn't per se much of TCP test tool.  The value
(I gather from other members of the group who expressed interest in
listing 'sock') is mainly that it allows you to twiddle lots of controls
at the socket level.  You still have to sit there with at least a snooping
tool to tell anything.

(And I suppose while we're at it, I've left out snooping tools because I
consider them too obvious, and because they are strictly passive not much
of a test either.)

So this message by way of soliciting any final comment before I rev the
draft later this week.  Although I've made some comments to the list
about the above, this is a summary of all but typographic changes.

Cheers,

	~sparker

From owner-tcp-impl@relay.engr.sgi.com  Wed Nov 12 14:17:46 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA05688 for tcp-impl-list; Wed, 12 Nov 1997 14:09:45 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA05657 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 12 Nov 1997 14:09:40 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA02033
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 12 Nov 1997 14:09:31 -0800
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id OAA13406; Wed, 12 Nov 1997 14:09:27 -0800 (PST)
Message-Id: <199711122209.OAA13406@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: tentative scheduling for Washington DC IETF
Date: Wed, 12 Nov 1997 14:09:27 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

We've been scheduled for:

	Monday, December 8 at 1930-2200

Currently, it appears the schedule will not be available via multicast.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Nov 12 14:47:45 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA18491 for tcp-impl-list; Wed, 12 Nov 1997 14:39:53 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA16079 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 12 Nov 1997 14:34:22 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA09938
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 12 Nov 1997 14:34:13 -0800
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id OAA13590; Wed, 12 Nov 1997 14:34:04 -0800 (PST)
Message-Id: <199711122234.OAA13590@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: tentative scheduling for Washington DC IETF
In-reply-to: Your message of Wed, 12 Nov 1997 14:09:27 PST.
Date: Wed, 12 Nov 1997 14:34:04 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Currently, it appears the schedule will not be available via multicast.

Oops - that's of course supposed to say that the *session* will not be
available via multicast.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Nov 12 21:19:11 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA01213 for tcp-impl-list; Wed, 12 Nov 1997 21:13:56 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id VAA01205 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 12 Nov 1997 21:13:53 -0800
Received: from mailhub.Stanford.EDU (mailhub.Stanford.EDU [171.64.14.35]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id VAA14543
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 12 Nov 1997 21:13:51 -0800
	env-from (aaa@stanford.edu)
Received: from saga16.Stanford.EDU (saga16.Stanford.EDU [171.64.15.146])
	by mailhub.Stanford.EDU (8.8.7/8.8.7/L) with SMTP id VAA10208;
	Wed, 12 Nov 1997 21:13:46 -0800 (PST)
Date: Wed, 12 Nov 1997 21:13:45 -0800 (PST)
From: "Amr A. Awadallah" <aaa@stanford.edu>
To: Sergio Valle <Sergio.Valle@studi.epfl.ch>
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: your mail
In-Reply-To: <199711121140.MAA21569@disun58.epfl.ch>
Message-ID: <Pine.GSO.3.96.971112210818.17509B-100000@saga16.Stanford.EDU>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I am a student at Swiss Federal Institute of Technology and I must choose
> a TCP implementation (public domain) for recoding it in C++ (maybe make

You can get the FreeBSD version of netinet from:

ftp://ftp.FreeBSD.ORG/pub/FreeBSD/FreeBSD-stable/src/sys/netinet/

also  ns2 (network simulator 2) has a very good C++ implementation of
TCP:

http://www-mash.cs.berkeley.edu/ns/

> some changes). Are there comparison studies of different versions of TCP ?
> If anybody has a suggestion please don't hesitate. 

 Some comparisons may  be found at:

  http://www-mash.cs.berkeley.edu/ns/ns-research.html

-- Amr


From owner-tcp-impl@relay.engr.sgi.com  Sat Nov 15 19:42:14 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA07865 for tcp-impl-list; Sat, 15 Nov 1997 19:32:07 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA07842 for <tcp-impl@engr.sgi.com>; Sat, 15 Nov 1997 19:32:06 -0800
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id TAA15497
	for <tcp-impl@engr.sgi.com>; Sat, 15 Nov 1997 19:32:02 -0800
	env-from (aron@cs.rice.edu)
Received: from mrsclaus.cs.rice.edu (mrsclaus.cs.rice.edu [128.42.1.108]) by cs.rice.edu (8.8.5/8.7.1) with ESMTP id VAA11927 for <tcp-impl@engr.sgi.com>; Sat, 15 Nov 1997 21:32:00 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Received: (from aron@localhost) by mrsclaus.cs.rice.edu (8.8.5/8.7.3) id VAA14978 for tcp-impl@engr.sgi.com; Sat, 15 Nov 1997 21:31:59 -0600 (CST)
Message-Id: <199711160331.VAA14978@mrsclaus.cs.rice.edu>
Subject: delayed ACKs in TCP
To: tcp-impl@engr.sgi.com
Date: Sat, 15 Nov 1997 21:31:58 -0600 (CST)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hi,
	a TCP receiver normally transmits an ACK after receiving 2MSS worth
of data. Otherwise the ACK is delayed till the delayed ACK timer fires. 
The work in [1] shows that the timestamp option in TCP results in the
TCP receiver sending an ACK for every three segments that are received. 
The work in [2] points some other unfortunate interactions between the delayed
ack algorithm in TCP with HTTP traffic. 

	I'm wondering about the effect of changing the TCP algorithms so as to
send an ACK after every 2 segments (irrespective of the amount of data
contained in them). This would also prevent some of the unfortunate
interactions between the Nagle's algorithm at the sender and the delayed ack 
algorithm at the receiver. Further, the presence of TCP options (like the 
timestamp option) would not affect the transmission of an ACK.




1. L. Brakmo and L. Peterson. "Performance Problems in 4.4BSD TCP", ACM
   Computer Communication Review, Oct 1995.

2. J. Heidemann. "Performance Interactions Between P-HTTP and TCP 
   Implementations", ACM Computer Communication Review, Apr 1997. 



- Mohit Aron
  aron@cs.rice.edu

From owner-tcp-impl@relay.engr.sgi.com  Mon Nov 17 10:27:43 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA01330 for tcp-impl-list; Mon, 17 Nov 1997 10:16:27 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA01238 for <tcp-impl@engr.sgi.com>; Mon, 17 Nov 1997 10:16:19 -0800
Received: from owl.ee.lbl.gov (owl.ee.lbl.gov [131.243.1.50]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA18798
	for <tcp-impl@engr.sgi.com>; Mon, 17 Nov 1997 08:47:39 -0800
	env-from (floyd@ee.lbl.gov)
Received: by owl.ee.lbl.gov (8.8.8/8.8.5)
	id IAA04052; Mon, 17 Nov 1997 08:47:38 -0800 (PST)
Message-Id: <199711171647.IAA04052@owl.ee.lbl.gov>
To: tcp-impl@engr.sgi.com
cc: kkrama@research.att.com
Subject: Internet Draft on Explicit Congestion Notification (ECN) 
Date: Mon, 17 Nov 1997 08:47:38 PST
From: Sally Floyd <floyd@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

K. K. Ramakrishnan and I have submitted an Internet draft on "A
Proposal to add Explicit Congestion Notification (ECN) to IPv6 and to
TCP" today.  A copy of the draft is attached.

We would welcome comments.  While the proposal involves the domain
of several working groups (ipng, tcpimp, etc.), and we are therefore sending
this announcement to the mailing lists of several working groups,
we are assuming that general discussion will happen on the
end2end-interest mailing list.  (Subscription information for
end2end-interest: "http://www.irtf.org/irtf/charters/end2end.htm".)

The first step is for ipng to decide whether or not to add the
appropriate bits to the IPv6 header.  But if that happens, then I
believe that the second step is discussion of the required changes to
TCP in tcpimp.

Thanks very much,
Sally Floyd and K. K. Ramakrishnan.

----------------------------------------------------------------------


Internet Engineering Task Force                       K. K. Ramakrishnan
INTERNET DRAFT                                        AT&T Labs Research
draft-kksjf-ecn-00.txt                                       Sally Floyd
                                                                    LBNL
                                                           November 1997
                                                      Expires:  May 1998



A Proposal to add Explicit Congestion Notification (ECN) to IPv6 and to TCP



                          Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   To view the entire list of current Internet-Drafts, please check the
   "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
   ftp.isi.edu (US West Coast).

Abstract

   This note describes a proposed addition of ECN (Explicit Congestion
   Notification) to IPv6 and to TCP.  First we describe TCP's use of
   packet drops as an indication of congestion.  Next we argue that with
   the addition of active queue management (e.g., RED) to the Internet
   infrastructure, where routers detect congestion before the queue
   overflows, routers are no longer limited to packet drops as an
   indication of congestion, but could instead set an ECN bit in the
   packet header, for ECN-capable transport protocols.  We describe when
   the ECN bit would be set in the routers, and describe what
   modifications would be needed to TCP to make it ECN-capable.
   Modifications to other transport protocols (e.g., unreliable unicast
   or multicast, reliable multicast, other reliable unicast transport
   protocols) could be considered as those protocols advance through the
   standards process.



Ramakrishnan and Floyd       Informational                      [Page 1]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


   TCP's congestion control and avoidance algorithms are based on the
   notion that the network is a black-box [Jacobson88, Jacobson90].  The
   network's state of congestion or otherwise is determined by end-
   systems probing for the network state, by gradually increasing the
   load on the network (by increasing the window of packets that are
   outstanding in the network) until the network becomes congested and a
   packet is lost.  Treating the network as a "black-box" and treating
   loss as an indication of congestion in the network is appropriate for
   pure best-effort data carried by TCP that has little or no
   sensitivity to delay or loss of individual packets.  In addition,
   TCP's congestion management algorithms have techniques built-in (such
   as fast retransmit and fast recovery) to minimize the impact of
   losses from a throughput perspective.

   However, these mechanisms are not intended to help applications that
   are in fact sensitive to the delay or loss of one or more individual
   packets.  Interactive traffic such as telnet, web-browsing, and
   transfer of audio and video data ("real-audio" and "real-video") can
   be sensitive to packet losses (for unreliable data delivery such as
   UDP) or to the increased latency of the packet caused by the need to
   retransmit the packet after a loss (for reliable data delivery such
   as TCP).

   Since TCP determines the appropriate congestion window to use by
   gradually increasing the window size until it experiences a dropped
   packet, this causes the queues at the bottleneck router to build up.
   With most packet drop policies at the router that are not sensitive
   to the load placed by each individual flow, this means that some of
   the packets of latency-sensitive flows are going to be dropped.
   Active queue management mechanisms that detect congestion before the
   queue overflows, and provide an indication of this congestion to TCP,
   is desirable because it avoids some bad properties of dropping on
   queue overflow, especially with drop-tail schemes.  Drop tail
   introduces synchronization of loss across multiple flows which is
   undesirable.  Indicating incipient congestion means that TCP does not
   have to increase its window size up to the point where a router's
   buffer is filled up. This can reduce queuing delays and avoid
   synchronization, which are desirable characteristics.

2. Random Early Detection (RED)

   Random Early Detection (RED) is a mechanism for active queue
   management that has been proposed to detect incipient congestion
   [FJ93], and is currently being deployed in the Internet backbone
   [RED-ietf-draft].  Although RED is meant to be a general mechanism
   using one of several alternatives for congestion indication, in the
   current environment of the Internet RED is restricted to using packet
   drops as a mechanism for congestion indication.  By dropping packets


Ramakrishnan and Floyd       Informational                      [Page 2]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


   based on the average queue length exceeding a threshold, rather than
   only when the queue overflows, RED maintains the average queue at a
   smaller level, and improves the delay experienced by the flows.
   However, when RED drops packets before the queue actually overflows,
   RED is not forced by memory limitations to discard the packet.  RED
   could set an Explicit Congestion Notification bit in the packet
   header instead of dropping the packet, if such a bit was provided in
   the IP header and understood by the transport protocol.  The use of
   the Explicit Congestion Notification bit would allow the receiver(s)
   to receive the packet, avoiding the potential for excessive delays
   due to retransmissions after packet losses.

3. Explicit Congestion Notification

   We propose that the Internet provide a congestion indication for
   incipient congestion (as in RED and earlier work [RJ90]) where the
   notification can sometimes be through marking packets rather than
   dropping them.  This would require an ECN field in the IP header with
   two bits.  The ECN-Capable bit would be set by the data sender to
   indicate an ECN-capable transport protocol.  The ECN bit would be set
   by the router to indicate congestion to the end nodes. ([Floyd94]
   outlines a scheme where a single bit could be overloaded to serve the
   function of both the ECN-Capable bit and the ECN bit, but the two-bit
   scheme is more straightforward to explain). We expect that routers
   would provide the congestion indication on incipient congestion as
   indicated by the average queue size, using the RED algorithms
   suggested in [FJ93, RED-ietf-draft].  Routers that have a packet
   arriving at a full queue would drop the packet, just as they do now.

   The congestion control algorithms followed at the end-systems would
   be essentially the same as the congestion control response to a
   *single* dropped packet, for a transport protocol where a dropped
   packet is used as an indication of congestion.  For TCP in
   particular, the source TCP would halve its congestion window "cwnd"
   in response to an ECN indication received by the data receiver.
   However, this action is done only once per window of data (i.e., at
   most once per roundtrip time), to avoid reacting multiple times to
   multiple indications of congestion within a roundtrip time.

4. Proposed Algorithm at the Router

   We describe the proposed algorithm at the router in the context of
   current router implementations.  We assume that the router is capable
   of implementing the probability computation for RED and uses a pure
   packet drop mechanism (e.g., drop from front, drop from tail, or
   random drop) whenever a packet arrives at a full queue.

   When the router's buffer is not yet full and the router is prepared


Ramakrishnan and Floyd       Informational                      [Page 3]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


   to drop a packet to inform end nodes of incipient congestion, the
   router should first check to see if the ECN-Capable bit is set in
   that packet's IP header.  If so, then instead of dropping the packet,
   the router could instead set the ECN bit in the IP header.  When more
   severe congestion has occurred and the router's queue is full, then
   the router has no choice but to drop some packet when a new packet
   arrives.

   The router determines it is congested if the AVERAGE length of any of
   its queues where packets are waiting to be processed or transmitted
   exceeds a threshold. We believe that the router should use the ECN
   bit to notify that it is congested only when the *average* queue
   length, rather than the instantaneous queue length, exceeds a
   threshold.

   There are potentially several alternatives for estimating the average
   queue length and marking the ECN bit. Since there is considerable
   effort involved already in implementing RED, we believe it is best to
   leverage these efforts for ECN as well.  One potential mechanism for
   the averaging and marking is to perform functions similar to RED
   queue management: RED uses an exponential moving average of the queue
   size.  When the average queue size goes above a lower threshold,
   packets are marked with a probability of marking that increases with
   the average queue size.  (Packets that are not ECN-capable are
   dropped instead of marked.) When the average queue size gets up to or
   above a high threshold, all incoming packets should be dropped
   (assuming that the router intends to control the average queue size
   even in the presence of unresponsive traffic).

   It is anticipated that when all of the source end-systems participate
   in TCP's congestion management mechanisms or other compatible
   congestion control, and respond to ECN by reducing their offered
   load, packet losses would be relatively infrequent.  Packet losses in
   this case would occur primarily during transients and in the presence
   of non-cooperating entities.

   When a packet is received by a router with the ECN bit set indicating
   that congestion was encountered upstream, then the bit is left
   unchanged, and the packet transmitted as usual.

5. Support from the Transport Protocol

   ECN requires support from the transport protocol, in addition to the
   ECN field in the IPv6 packet header.  For TCP, ECN requires two new
   mechanisms:  negotiation between the endpoints during setup to
   determine if they are both ECN-capable, and an ECN-Notify bit in the
   TCP header so that the data receiver can inform the data sender when
   a packet has been received with the ECN bit set.  The support


Ramakrishnan and Floyd       Informational                      [Page 4]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


   required from other transport protocols is likely to be different,
   particular for unreliable or reliable multicast transport protocols,
   and will have to be determined as other transport protocols are
   brought to the IETF for standardization.  The following sections
   describe in detail the proposed TCP use of ECN.  This is also
   described in [Floyd94].  We assume that the source TCP uses the
   current set of congestion control algorithms of Slow-start, Fast
   Retransmit and Fast Recovery [RFC 2001].

5.1. TCP Initialization

   Initially, the source and destination TCPs exchange the desire and/or
   capability to use ECN in the TCP connection setup phase.  As a result
   of the negotiation, the TCP sender indicates using the ECN-Capable
   bit in the IPv6 header that the transport is capable and willing to
   participate in ECN.  This will indicate to the routers that they may
   mark packets with the ECN bit, if they would like to use that as a
   method of congestion notification. If the TCP connection does not
   wish to use ECN notification, the sending TCP sets the ECN-Capable
   bit equal to 0 (i.e., not set), and the TCP receiver ignores the ECN
   bit in received packets.

5.2. The TCP Sender

   For a connection that expects to use ECN, packets are transmitted
   with the ECN-Capable bit set in the IP header (set to a "1").  If the
   sender receives a TCP acknowledgement with the ECN-Notify bit set in
   the TCP header, then the sender knows that congestion was encountered
   in the network on the path from the sender to the receiver.  The
   indication of congestion should be treated just as a congestion loss
   in non-ECN-Capable TCP. That is, the TCP source halves the congestion
   window "cwnd" and reduces the slow start threshold "ssthresh".  The
   sending TCP does NOT increase the congestion window in response to
   the receipt of an ACK packet with the ECN-Notify bit set.  However, a
   very important difference is that TCP does not react to ECN
   congestion indications more than once every window of data (or more
   loosely, more than once every round-trip time). If a response to the
   ECN-Notify bit was made over the last round-trip time, based on the
   window of packets, then the sending TCP doesn't respond to any
   further ECN messages. If at time "t", the source TCP reacted to an
   ECN, then it notes the packets that are outstanding at that time and
   have not yet been acknowledged. Until all these packets are
   acknowledged, say at time "u", the source TCP does not react to
   another ECN indication of congestion.

   In addition, when a TCP sender receives duplicate acks during the
   time interval between "t" and "u", it does not reduce the congestion
   window.  The result is that decreases in the congestion window occur


Ramakrishnan and Floyd       Informational                      [Page 5]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


   at most once per roundtrip time.

   When the TCP sender receives a packet with the ECN-Notify bit set,
   and therefore reduces its congestion window, the sender does not need
   to slow-start (as is done in Tahoe TCP in response to a packet drop)
   or to stop sending packets for a period of time to allow the queue to
   dissipate (as is done by Reno TCP for roughly half a round-trip time
   in response to a packet drop).  The ECN-Notify bit being set does not
   indicate the urgent transient congestion state of a buffer overflow.
   Incoming acknowledgements will still arrive to "clock out" outgoing
   packets when allowed by the congestion window.

   TCP follows existing algorithms for sending data packets in response
   to incoming ACKs, multiple duplicate acknowledgements, or retransmit
   timeouts [RFC2001].

5.3. The TCP Receiver

   At the destination end-system, when TCP receives a packet with the
   ECN bit set in the IP header, TCP sets the ECN-Notify bit in the TCP
   header in the returning ACK packet.  We do not provide here any
   notion of destination congestion, because this is already being
   indicated in the receiver's advertised window.

   The destination TCP continues to perform the duplicate ACK procedure
   already specified - to generate a duplicate ACK when an out-of-
   sequence packet is received.

   If there is any ACK withholding implemented, as in current TCP
   implementations where the TCP receiver often sends an ACK for two
   arriving data packets, then the TCP destination will send the OR of
   all the ECN bits of packets that the ACK is acknowledging. That is,
   if any packet is received with the ECN bit set, then the ACK carries
   the ECN-Notify bit set.

5.4. Congestion on the ACK-path

   For the current generation of TCP congestion control algorithms, pure
   acknowledgement packets (e.g., packets that do not contain any
   accompanying data) should be sent with the ECN-capable bit off.
   Current TCP receivers have no mechanisms for reducing traffic on the
   ACK-path in response to congestion notification.  Mechanisms for
   responding to congestion on the ACK-path can be relegated as an area
   for future research.  (One simple possibility would be for the sender
   to reduce its congestion window when it receives a pure ACK packet
   with the ECN bit set). For current TCP implementations, a single
   dropped ACK generally has only a very small effect on the TCP's
   sending rate.


Ramakrishnan and Floyd       Informational                      [Page 6]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


6. Summary of changes required in IPv6 and TCP

   Two bits need to be specified in the IPv6 header, the ECN-Capable bit
   and the ECN bit.  The ECN-Capable bit set to "0" indicates that the
   transport protocol will ignore the ECN bit.  This is the default
   value.  The ECN-Capable bit set to "1" indicates that the transport
   protocol is willing and able to participate in ECN.

   The default value for the ECN bit is "0".  The router sets the ECN
   bit to "1" to indicate congestion to the end nodes.  The ECN bit in a
   packet header should never be reset by a router from "1" to "0".

   TCP requires two changes, a negotiation phase during setup to
   determine if both end nodes are ECN-capable, and a bit in the TCP
   header (possibly one of the "reserved" bits in the TCP flags field)
   as an ECN-Notify bit so that the receiver can inform the sender of a
   packet received with the ECN bit set.

7. Non-relationship to ATM's EFCI indicator or Frame Relay's FECN

   Since these ATM and Frame Relay mechanisms typically have been
   defined without any notion of average queue size as the basis for
   concluding that there is congestion, we believe that they provide a
   very noisy signal. The interpretation we have here for ECN is NOT the
   appropriate reaction for such a noisy signal of congestion
   notification. It is our belief that such mechanisms would be phased
   out over time within the ATM network.  However, if the routers that
   interface to the ATM network have a way of maintaining the average
   queue at the interface, and use it to come to a conclusion that the
   ATM subnet is congested or otherwise, they may use the ECN
   notification that is defined here.

8. Non-compliance by the End Nodes

   We believe that, for the most part, the fairness properties of TCP
   will not be changed with the introduction of ECN.

   A key issue concerns the vulnerability of ECN to non-compliant end-
   nodes (i.e., end nodes that set the ECN-capable bit in packets, but
   do not respond to the ECN bit itself).  These concerns exist even in
   non-ECN environments.  An end-node could "turn off congestion
   control" by not reducing its congestion window in response to packet
   drops.  We recognize that this is a concern for the current Internet.
   It has been argued that routers will have to deploy mechanisms to
   detect and differentially treat packets from non-compliant flows.  It
   is likely that techniques such as end-to-end per-flow scheduling and
   isolation of one flow from another, potentially accompanied by end-
   to-end reservations, could mitigate such effects. Such isolation


Ramakrishnan and Floyd       Informational                      [Page 7]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


   mechanisms could remove some of the more egregious effects of non-
   compliance.

   However, even in networks just restricted to packet losses as an
   indication of congestion, several methods have been proposed to
   identify and treat non-compliant or unresponsive flows.  These
   mechanisms would be equally applicable for identifying flows that do
   not respond to ECN.  If anything, routers would have a slightly
   easier time identifying flows that do not respond to ECN.  For
   example, routers can observe packets arriving at the router with the
   ECN bit set, as well as keeping note of packets that have the ECN bit
   set at that router itself.

   It has been argued that dropping packets in itself may be considered
   a deterrrent for non-compliance.  However, we believe that the packet
   drop rates are likely to be reasonably low in environments where ECN
   is deployed.  The reduction in load due to packet drops to deal with
   non-compliant nodes is likely to be small.  The control of congestion
   is more likely to come from end-nodes reacting to congestion - either
   from responding to dropped packets or ECN Notify indications and
   halving the window.  ECN should be used at a router when the average
   queue size is below some high threshold; when the average queue size
   exceeds the high threshold, and therefore packet drop/marking rates
   are higher, our recommendation is that routers drop packets, rather
   then setting the ECN bit in packet headers.  Thus, in scenarios with
   low packet drop rates, the fact that the congestion control
   indications are in the form of packet drops rather than ECN bits does
   not significantly change the negative consequences on the compliant
   flows because of some flow "turning off" congestion control.

   We also do not believe that packet dropping itself is an effective
   deterrent for non-compliance.  Many flows that retransmit dropped
   packets could have an incentive to maintain or even increase their
   sending rate in response to packet drops, rather than decreasing
   their sending rate, in the absence of mechanisms at the router to
   provide a negative deterrance for such behavior.  For example, flows
   that use unreliable transport protocols could simply increase their
   use of FEC in response to an increased packet drop rate, and might
   choose increased FEC and no congestion control.  We believe that the
   effect of packet dropping as a deterrence for non-compliance with
   congestion control mechanisms is quite small.  The possibility of
   non-compliant flows does not offer a compelling reason not to deploy
   ECN.

9. Additional Considerations

   Some care is required to handle the ECN and ECN-Capable bits
   appropriately when packets are encapsulated and un-encapsulated for


Ramakrishnan and Floyd       Informational                      [Page 8]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


   tunnels.  When the router at the end of the tunnel decapsulates the
   packet, then the ECN bit in the encapsulating ('outside') header
   should be ORed with the ECN bit in the encapsulated ('inside') header
   that remains.  Basically, a 1 in the encapsulating header should be
   copied into the encapsulated header.

   An additional issue concerns packets that have the ECN bit set at one
   router, and are later dropped at another router.  For the proposed
   use for ECN in this paper (that is, for data packets for TCP), this
   is not a concern, because end nodes detect dropped data packets, and
   the congestion response of the end nodes to a dropped data packet is
   at least as strong as the congestion response to a packet received
   with the ECN bit set.  This issue will have to be addressed if ECN
   and ECN-Capable bits are used on pure ACK packets, because in current
   implementations of TCP the drop of an ACK packet is not explicitly
   detected by the end nodes.

   If a packet with the ECN bit is later dropped due to corruption (bit
   errors), the end node should still invoke congestion control, just as
   TCP would today, to a dropped data packet.  This issue would also
   have to be addressed in future proposals for distinguishing between
   packets dropped due to corruption and packets dropped due to
   congestion.

10. Conclusions

   Given the current effort to implement RED, we believe this is the
   right time for router vendors to examine how to also implement
   congestion avoidance mechanisms that do not depend on packet drops
   alone.  With the growth of applications and transports that are
   sensitive to delay and loss of a single packet, depending on packet
   loss as a normal congestion notification mechanism appears to be
   insufficient (or at the very least, non-optimal).

















Ramakrishnan and Floyd       Informational                      [Page 9]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


REFERENCES

   [FJ93] Floyd, S., and Jacobson, V., "Random Early Detection gateways
   for Congestion Avoidance", IEEE/ACM Transactions on Networking, V.1
   N.4, August 1993, p. 397-413.  URL
   "ftp://ftp.ee.lbl.gov/papers/early.pdf".

   [Floyd94] Floyd, S., "TCP and Explicit Congestion Notification", ACM
   Computer Communication Review, V. 24 N. 5, October 1994, p. 10-23.
   URL "ftp://ftp.ee.lbl.gov/papers/tcp_ecn.4.ps.Z".

   [Floyd97] Floyd, S., and Fall, K., "Router Mechanisms to Support
   End-to-End Congestion Control", Technical report, February 1997.  URL
   "ftp://ftp.ee.lbl.gov/papers/collapse.ps".

   [FRED] Lin, D., and Morris, R., "Dynamics of Random Early Detection",
   SIGCOMM '97, September 1997.  URL
   "http://www.inria.fr/rodeo/sigcomm97/program.html#ab078".

   [Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc.
   ACM SIGCOMM '88, pp. 314-329.  URL
   "ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z".

   [Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance
   Algorithm", Message to end2end-interest mailing list, April 1990.
   URL "ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt".

   [RED-ietf-draft] B. Braden, D. Clark, J. Crowcroft, B. Davie, S.
   Deering, D. Estrin, S. Floyd, V. Jacobson, G. Minshall, C. Partridge,
   L. Peterson, K. Ramakrishnan, S. Shenker, J. Wroclawski, L. Zhang,
   "Recommendations on Queue Management and Congestion Avoidance in the
   Internet", Internet draft draft-irtf-e2e-queue-mgt-00.txt, March 25,
   1997.

   [RFC2001] W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast
   Retransmit, and Fast Recovery Algorithms", RFC 2001, January 1997.

   [RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback Scheme for
   Congestion Avoidance in Computer Networks", ACM Transactions on
   Computer Systems, Vol.8, No.2, pp. 158-181, May 1990.

SECURITY CONSIDERATIONS

   Security issues are not discussed in this document.






Ramakrishnan and Floyd       Informational                     [Page 10]


draft-kksjf-ecn     Addition of ECN to IPv6 and TCP        November 1997


AUTHORS' ADDRESSES


   K. K. Ramakrishnan
   AT&T Labs. Research
   Phone: +1 (973) 360-8766
   Email: kkrama@research.att.com
   URL: http://www.research.att.com/info/kkrama

   Sally Floyd
   Lawrence Berkeley National Laboratory
   Phone: +1 (510) 486-7518
   Email: floyd@ee.lbl.gov
   URL: http://www-nrg.ee.lbl.gov/floyd/


   This draft was created in November 1997.
   It expires May 1998.
































Ramakrishnan and Floyd       Informational                     [Page 11]


From owner-tcp-impl@relay.engr.sgi.com  Mon Nov 17 16:34:11 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA00580 for tcp-impl-list; Mon, 17 Nov 1997 16:23:49 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA00554 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 17 Nov 1997 16:23:44 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA09187
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 17 Nov 1997 16:23:43 -0800
	env-from (braden@ISI.EDU)
From: braden@ISI.EDU
Received: from can.isi.edu (can.isi.edu [128.9.160.148])
	by zephyr.isi.edu (8.8.7/8.8.6) with SMTP id QAA29080
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 17 Nov 1997 16:23:36 -0800 (PST)
Date: Mon, 17 Nov 97 16:23:12 PST
Posted-Date: Mon, 17 Nov 97 16:23:12 PST
Message-Id: <9711180023.AA02282@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA02282>; Mon, 17 Nov 97 16:23:12 PST
To: tcp-impl@cthulhu.engr.sgi.com
Subject: mailing lists
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


It appears that someone has put end2end-interest@isi.edu on the
tcp-impl mailing list.  That is a way-bad idea; please undo it.

Bob Braden

From owner-tcp-impl@relay.engr.sgi.com  Mon Nov 17 16:34:11 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA01600 for tcp-impl-list; Mon, 17 Nov 1997 16:26:20 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA01588 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 17 Nov 1997 16:26:17 -0800
Received: from venera.isi.edu (venera.isi.edu [128.9.176.32]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA09873
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 17 Nov 1997 16:26:16 -0800
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu (rum.isi.edu [128.9.192.237])
	by venera.isi.edu (8.8.7/8.8.6) with SMTP id QAA19201;
	Mon, 17 Nov 1997 16:26:11 -0800 (PST)
Date: Tue, 18 Nov 1997 00:26:10 GMT
Posted-Date: Tue, 18 Nov 1997 00:26:10 GMT
Message-Id: <199711180026.AAA22612@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-4)
	id <AAA22612>; Tue, 18 Nov 1997 00:26:10 GMT
To: tcp-impl@cthulhu.engr.sgi.com, aron@cs.rice.edu
Subject: Re: delayed ACKs in TCP
X-Sun-Charset: US-ASCII
X-auto-sig-adder-by: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From end2end-interest-owner@ISI.EDU Mon Nov 17 16:01:06 1997
> From: Mohit Aron <aron@cs.rice.edu>
> Subject: delayed ACKs in TCP
> To: tcp-impl@cthulhu.engr.sgi.com
> Date: Sat, 15 Nov 1997 21:31:58 -0600 (CST)
> X-Mailer: ELM [version 2.4 PL25]
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Precedence: bulk
> X-Lines: 29
> Status: RO
> 
> Hi,
> 	a TCP receiver normally transmits an ACK after receiving 2MSS worth
> of data. Otherwise the ACK is delayed till the delayed ACK timer fires. 
> The work in [1] shows that the timestamp option in TCP results in the
> TCP receiver sending an ACK for every three segments that are received. 
> 	I'm wondering about the effect of changing the TCP algorithms so as to
> send an ACK after every 2 segments (irrespective of the amount of data
> contained in them). This would also prevent some of the unfortunate
> interactions between the Nagle's algorithm at the sender and the delayed ack 
> algorithm at the receiver. Further, the presence of TCP options (like the 
> timestamp option) would not affect the transmission of an ACK.

The unfortunate interactions appear to be the result of a difference in
design; one algorithm measures 'segments' (windowing on the send side),
and the other measures 'full segments (MSS)' (delayed ACK on the
receive side).

Presumably, you are therefore proposing to unify the definition
of 'segment' in both algorithms. This would clearly be a win, but
it seems that modifying the send window to measure MSS's, rather
than segments, would be more effective and closer to the original 
intent...

No?

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Mon Nov 17 16:35:07 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA04495 for tcp-impl-list; Mon, 17 Nov 1997 16:33:13 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA04422 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 17 Nov 1997 16:33:07 -0800
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA11468
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 17 Nov 1997 16:33:06 -0800
	env-from (aron@cs.rice.edu)
Received: from mrsclaus.cs.rice.edu (mrsclaus.cs.rice.edu [128.42.1.108]) by cs.rice.edu (8.8.5/8.7.1) with ESMTP id SAA27387; Mon, 17 Nov 1997 18:33:04 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Received: (from aron@localhost) by mrsclaus.cs.rice.edu (8.8.5/8.7.3) id SAA18315; Mon, 17 Nov 1997 18:33:03 -0600 (CST)
Message-Id: <199711180033.SAA18315@mrsclaus.cs.rice.edu>
Subject: Re: delayed ACKs in TCP
To: touch@ISI.EDU
Date: Mon, 17 Nov 1997 18:33:02 -0600 (CST)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199711180026.AAA22612@rum.isi.edu> from "touch@ISI.EDU" at Nov 18, 97 00:26:10 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> 
> The unfortunate interactions appear to be the result of a difference in
> design; one algorithm measures 'segments' (windowing on the send side),
> and the other measures 'full segments (MSS)' (delayed ACK on the
> receive side).
> 
> Presumably, you are therefore proposing to unify the definition
> of 'segment' in both algorithms. This would clearly be a win, but
> it seems that modifying the send window to measure MSS's, rather
> than segments, would be more effective and closer to the original 
> intent...
> 



I'm not sure I understand this fully, but how would making changes on the
send side help when the receiver is determined to ACK only upon seeing 
2MSS worth of data. The TCP options limit how much actual data can be put 
in a 'segment'.




- Mohit Aron
  aron@cs.rice.edu

From owner-tcp-impl@relay.engr.sgi.com  Mon Nov 17 16:59:03 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA11767 for tcp-impl-list; Mon, 17 Nov 1997 16:48:54 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA11724 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 17 Nov 1997 16:48:45 -0800
Received: from venera.isi.edu (venera.isi.edu [128.9.176.32]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA15471
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 17 Nov 1997 16:48:43 -0800
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu (rum.isi.edu [128.9.192.237])
	by venera.isi.edu (8.8.7/8.8.6) with SMTP id QAA19618;
	Mon, 17 Nov 1997 16:48:42 -0800 (PST)
Date: Tue, 18 Nov 1997 00:48:42 GMT
Posted-Date: Tue, 18 Nov 1997 00:48:42 GMT
Message-Id: <199711180048.AAA23716@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-4)
	id <AAA23716>; Tue, 18 Nov 1997 00:48:42 GMT
To: touch@ISI.EDU, aron@cs.rice.edu
Subject: Re: delayed ACKs in TCP
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
X-auto-sig-adder-by: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From aron@cs.rice.edu Mon Nov 17 16:33:10 1997
> From: Mohit Aron <aron@cs.rice.edu>
> > 
> > The unfortunate interactions appear to be the result of a difference in
> > design; one algorithm measures 'segments' (windowing on the send side),
> > and the other measures 'full segments (MSS)' (delayed ACK on the
> > receive side).
> > 
> > Presumably, you are therefore proposing to unify the definition
> > of 'segment' in both algorithms. This would clearly be a win, but
> > it seems that modifying the send window to measure MSS's, rather
> > than segments, would be more effective and closer to the original 
> > intent...
> 
> I'm not sure I understand this fully, but how would making changes on the
> send side help when the receiver is determined to ACK only upon seeing 
> 2MSS worth of data. The TCP options limit how much actual data can be put 
> in a 'segment'.

Maybe - it would allow the sender to send as many segments as it takes
to exhaust its right-to-send, which would be in terms of MSS's.

This would avoid the stalling that occurs when the sender has
emitted K segments, but not K MSS's...

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Tue Nov 18 00:15:57 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA19501 for tcp-impl-list; Tue, 18 Nov 1997 00:09:20 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA19495 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 18 Nov 1997 00:09:18 -0800
Received: from melimelo.enst-bretagne.fr (melimelo.enst-bretagne.fr [192.108.115.36]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id AAA13623
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 18 Nov 1997 00:09:13 -0800
	env-from (Hossam.Afifi@enst-bretagne.fr)
Received: from rsm.rennes.enst-bretagne.fr (rsm.rennes.enst-bretagne.fr [192.44.77.1])
	by melimelo.enst-bretagne.fr (8.8.8/8.8.8) with ESMTP id JAA11616;
	Tue, 18 Nov 1997 09:08:58 +0100
Received: from albemuth (albemuth.rennes.enst-bretagne.fr [193.52.74.199])
	by rsm.rennes.enst-bretagne.fr (8.8.8/8.8.8) with SMTP id JAA26063;
	Tue, 18 Nov 1997 09:08:29 +0100 (MET)
Message-ID: <34714CFD.33A3@rennes.enst-bretagne.fr>
Date: Tue, 18 Nov 1997 09:08:29 +0100
From: Hossam Afifi <Hossam.Afifi@enst-bretagne.fr>
X-Mailer: Mozilla 3.0 (X11; I; SunOS 5.5 sun4m)
MIME-Version: 1.0
To: Mohit Aron <aron@cs.rice.edu>
CC: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: delayed ACKs in TCP
References: <199711160331.VAA14978@mrsclaus.cs.rice.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Mohit:

 We have a paper (in french...) that evaluates the effect of dynamic
 delayed ack evaluation especially for ADSL. It is based on another
 paper in ICC'97.

 If you can read french , then it is in my homepage, otherwise
 wait until we traduce it.

Hossam

From owner-tcp-impl@relay.engr.sgi.com  Wed Nov 19 02:33:07 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id CAA28620 for tcp-impl-list; Wed, 19 Nov 1997 02:27:09 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id CAA28603 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 19 Nov 1997 02:27:04 -0800
Received: from callisto.ftel.co.uk (callisto.ftel.co.uk [192.131.79.11]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id CAA13429
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 19 Nov 1997 02:26:55 -0800
	env-from (G.Cope@ftel.co.uk)
Received: from callisto.ftel.co.uk (PAU1rcRDrjF0nk2VG+wvp5EUTpjmZFqG@localhost.ftel.co.uk [127.0.0.1])
	by callisto.ftel.co.uk (8.8.7/8.8.7/Revision:1.32) with SMTP id KAA03871;
	Wed, 19 Nov 1997 10:26:51 GMT
Message-ID: <3472BEEA.7E02@ftel.co.uk>
Date: Wed, 19 Nov 1997 10:26:50 +0000
From: Graham Cope <G.Cope@ftel.co.uk>
Organization: Fujitsu Telecommunications Europe Ltd
X-Mailer: Mozilla 3.01Gold (X11; I; SunOS 5.6 sun4u)
MIME-Version: 1.0
To: tcp-impl@cthulhu.engr.sgi.com
Subject: TCP over L2TP
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

I've just read L2TP draft and its time-out mechanism is based on that of
TCP.

Can anyone give me references to any simulation work that's been done on
the interaction between TCP implementations and proposed L2TP
implementations?
   I suspect that there is danger of multiple re-transmissions at
multiple layers fo the protocol stack.


Thanks in advance,


Graham Cope

From owner-tcp-impl@relay.engr.sgi.com  Wed Nov 19 08:00:42 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA06623 for tcp-impl-list; Wed, 19 Nov 1997 07:48:43 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA06608 for <tcp-impl@engr.sgi.com>; Wed, 19 Nov 1997 07:48:42 -0800
Received: from ns.ietf.org (ietf.org [132.151.1.19]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA15664
	for <tcp-impl@engr.sgi.com>; Wed, 19 Nov 1997 07:48:38 -0800
	env-from (cclark@cnri.reston.va.us)
Received: from CNRI.Reston.VA.US (localhost [127.0.0.1])
	by ns.ietf.org (8.8.7/8.8.7a) with ESMTP id KAA28884;
	Wed, 19 Nov 1997 10:48:34 -0500 (EST)
Message-Id: <199711191548.KAA28884@ns.ietf.org>
Mime-Version: 1.0
Content-Type: Multipart/Mixed; Boundary="NextPart"
To: IETF-Announce@ns.ietf.org
Cc: tcp-impl@engr.sgi.com
From: Internet-Drafts@ns.ietf.org
Reply-to: Internet-Drafts@ns.ietf.org
Subject: I-D ACTION:draft-ietf-tcpimpl-tools-01.txt
Date: Wed, 19 Nov 1997 10:48:33 -0500
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

--NextPart

A Revised Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the TCP Implementation Working Group of the IETF.

	Title		: Some Testing Tools for TCP Implementors
	Author(s)	: S. Parker, C. Schmechel
	Filename	: draft-ietf-tcpimpl-tools-01.txt
	Pages		: 14
	Date		: 18-Nov-97
	
       Available tools for testing TCP implementations are catalogued by
       this memo.  Hopefully disseminating this information will
       encourage those responsible for building and maintaining TCP to
       make the best use of available tests.  The type of testing the
       tool provides, the type of tests it is capable of doing, and its
       availability is enumerated.  This document lists only tools which
       can evaluate one or more TCP implementations, and which can
       privde some specific results which describe or evaluate the TCP
       being tested.

Internet-Drafts are available by anonymous FTP.  Login wih the username
"anonymous" and a password of your e-mail address.  After logging in,
type "cd internet-drafts" and then
	"get draft-ietf-tcpimpl-tools-01.txt".
A URL for the Internet-Draft is:
ftp://ds.internic.net/internet-drafts/draft-ietf-tcpimpl-tools-01.txt

Internet-Drafts directories are located at:

	Africa:	ftp.is.co.za
	
	Europe: ftp.nordu.net
		ftp.nis.garr.it
			
	Pacific Rim: munnari.oz.au
	
	US East Coast: ds.internic.net
	
	US West Coast: ftp.isi.edu

Internet-Drafts are also available by mail.

Send a message to:	mailserv@ds.internic.net.  In the body type:
	"FILE /internet-drafts/draft-ietf-tcpimpl-tools-01.txt".
	
NOTE:	The mail server at ds.internic.net can return the document in
	MIME-encoded form by using the "mpack" utility.  To use this
	feature, insert the command "ENCODING mime" before the "FILE"
	command.  To decode the response(s), you will need "munpack" or
	a MIME-compliant mail reader.  Different MIME-compliant mail readers
	exhibit different behavior, especially when dealing with
	"multipart" MIME messages (i.e. documents which have been split
	up into multiple messages), so check your local documentation on
	how to manipulate these messages.
		
		
Below is the data which will enable a MIME compliant mail reader
implementation to automatically retrieve the ASCII version of the
Internet-Draft.

--NextPart
Content-Type: Multipart/Alternative; Boundary="OtherAccess"

--OtherAccess
Content-Type: Message/External-body;
	access-type="mail-server";
	server="mailserv@ds.internic.net"

Content-Type: text/plain
Content-ID:	<19971118153233.I-D@ietf.org>

ENCODING mime
FILE /internet-drafts/draft-ietf-tcpimpl-tools-01.txt

--OtherAccess
Content-Type: Message/External-body;
	name="draft-ietf-tcpimpl-tools-01.txt";
	site="ds.internic.net";
	access-type="anon-ftp";
	directory="internet-drafts"

Content-Type: text/plain
Content-ID:	<19971118153233.I-D@ietf.org>

--OtherAccess--

--NextPart--




From owner-tcp-impl@relay.engr.sgi.com  Thu Nov 20 02:55:15 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id CAA18147 for tcp-impl-list; Thu, 20 Nov 1997 02:42:21 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id CAA18138 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 20 Nov 1997 02:42:16 -0800
Received: from pec.etri.re.kr ([129.254.201.2]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id CAA06821
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 20 Nov 1997 02:42:14 -0800
	env-from (qkim@pec.etri.re.kr)
Received: from p_qkim.etri.re.kr by pec.etri.re.kr (8.6.9H1/8.6.4)
	id TAA16157; Thu, 20 Nov 1997 19:39:41 +0900
Posted-Date: Thu, 20 Nov 1997 19:39:41 +0900
Message-Id: <3.0.5.32.19971120193839.00b4e5b0@pec.etri.re.kr>
MIME-Version: 1.0
X-Sender: qkim@pec.etri.re.kr
X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.5 (32)
Date: Thu, 20 Nov 1997 19:38:39 +0900
To: tcp-impl@cthulhu.engr.sgi.com
From: Yong-Woon Kim <qkim@pec.etri.re.kr>
Subject: [Q] socket address vs. socket interface
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

I'm confusing the same word of socket.

According to RFC 793:

To allow for many processes within a single host to use communication 
facilities simultaneously, TCP provides a set of ports. A host machine 
can support multiple ports, and each port has its own unique port number 
within the host. A socket is formed from a concatenation of the port 
number and a network address. The socket is a unique identifier 
throughout all networks connected together.

A pair of sockets specifies the two end points and uniquely identifies 
each connection. That is, a socket may be simultaneously used in multiple 
connections. Therefore a server is capable of handling many clients at 
the same time. The server's unique socket address is accessed 
simultaneously by all of its clients. Since data segments for 
a particular transport connection is always identified by both network 
addresses and both ports, it is easy for a server to keep track of multiple 
client connections.

The binding of ports to processes is handled independently by each host. 
However, it proves useful to attach frequently used processes to fixed 
sockets which are made known to the public. These services can then be 
accessed through the known addresses called well-known ports.

Therefore a socket number can be associated with multiple remote sockets.

------

But according to the socket interface, 

result = socket (pf, type, protocol)


My questions are 

1. the above socket number is the 'result' value?

2. do the same word of 'socket' have the same semantics?

3. when a server port is shared by multiple connections, 
   do corresponding server sockets in view of socket interface have
   the same 'result' value? or different 'result' values?

Please fix my confusions.



From owner-tcp-impl@relay.engr.sgi.com  Thu Nov 20 07:36:03 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA20591 for tcp-impl-list; Thu, 20 Nov 1997 07:25:17 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA20567 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 20 Nov 1997 07:25:15 -0800
Received: from frantic.BSDI.COM (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA02459
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 20 Nov 1997 07:25:12 -0800
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id JAA26664;
	Thu, 20 Nov 1997 09:25:07 -0600 (CST)
Date: Thu, 20 Nov 1997 09:25:07 -0600 (CST)
From: David Borman <dab@BSDI.COM>
Message-Id: <199711201525.JAA26664@frantic.BSDI.COM>
To: qkim@pec.etri.re.kr, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: [Q] socket address vs. socket interface
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Date: Thu, 20 Nov 1997 19:38:39 +0900
> From: Yong-Woon Kim <qkim@pec.etri.re.kr>
> Subject: [Q] socket address vs. socket interface
> 
> I'm confusing the same word of socket.
> ...
> My questions are 
> 
> 1. the above socket number is the 'result' value?

The socket() system call creates a communications endpoint which is
associatied with a file descriptor, and that descriptor is the 'result'.

> 2. do the same word of 'socket' have the same semantics?

No.  In RFC 793 "socket" is a lable placed on one end of a TCP connection,
and thus refers to just the local address/port combination.

The file descriptor, or socket, returned by socket() can be bound via
bind() to assign the local address and/or port, and the remote address/port
is specified using connect().  Each socket represents at most one TCP
connection.  If you have multiple TCP connections with the same local
address/port combination, they will each have a unique socket.

> 3. when a server port is shared by multiple connections, 
>    do corresponding server sockets in view of socket interface have
>    the same 'result' value? or different 'result' values?

Different values.  The 'result' value is the file descriptor, and is
unique within a single process.  So, if a server process is processing
multiple connections at the same time, it will have a unique file
descriptor for each socket that it is processing.

> Please fix my confusions.

I hope this helps.

		-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Thu Nov 20 07:36:08 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA21150 for tcp-impl-list; Thu, 20 Nov 1997 07:28:27 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA21141 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 20 Nov 1997 07:28:26 -0800
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA03363
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 20 Nov 1997 07:28:23 -0800
	env-from (aron@cs.rice.edu)
Received: from noel.cs.rice.edu (noel.cs.rice.edu [128.42.1.136]) by cs.rice.edu (8.8.5/8.7.1) with ESMTP id JAA00855; Thu, 20 Nov 1997 09:28:17 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Received: (from aron@localhost) by noel.cs.rice.edu (8.8.5/8.7.5) id JAA26408; Thu, 20 Nov 1997 09:28:16 -0600 (CST)
Message-Id: <199711201528.JAA26408@noel.cs.rice.edu>
Subject: Re: [Q] socket address vs. socket interface
To: qkim@pec.etri.re.kr (Yong-Woon Kim)
Date: Thu, 20 Nov 1997 09:28:15 -0600 (CST)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <3.0.5.32.19971120193839.00b4e5b0@pec.etri.re.kr> from "Yong-Woon Kim" at Nov 20, 97 07:38:39 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk



> But according to the socket interface, 
> 
> result = socket (pf, type, protocol)
> 
> 
> My questions are 
> 
> 1. the above socket number is the 'result' value?
> 

Yes.


> 2. do the same word of 'socket' have the same semantics?
> 

I don't understand what you mean here.

> 3. when a server port is shared by multiple connections, 
>    do corresponding server sockets in view of socket interface have
>    the same 'result' value? or different 'result' values?
> 

The different server sockets are created in response to the 'accept' 
system call upon getting a connection request. So if you're talking about
the integer values stored in this socket descriptors - yes they are 
different for each socket.

If you had bothered to write a short program to test these out, you would've
found the answers easily enough. I also don't think that your questions
were appropriate for this list. 




- Mohit

From owner-tcp-impl@relay.engr.sgi.com  Thu Nov 20 10:34:05 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA09605 for tcp-impl-list; Thu, 20 Nov 1997 10:21:42 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA09565 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 20 Nov 1997 10:21:37 -0800
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA24860
	for <tcp-impl@relay.engr.sgi.com>; Thu, 20 Nov 1997 10:21:32 -0800
	env-from (aron@cs.rice.edu)
Received: from mrsclaus.cs.rice.edu (mrsclaus.cs.rice.edu [128.42.1.108]) by cs.rice.edu (8.8.5/8.7.1) with ESMTP id MAA11979; Thu, 20 Nov 1997 12:21:11 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Received: (from aron@localhost) by mrsclaus.cs.rice.edu (8.8.5/8.7.3) id MAA21360; Thu, 20 Nov 1997 12:21:10 -0600 (CST)
Message-Id: <199711201821.MAA21360@mrsclaus.cs.rice.edu>
Subject: detecting loss of retransmitted packets in TCP
To: tcp-impl@cthulhu.engr.sgi.com, end2end-interest@isi.edu
Date: Thu, 20 Nov 1997 12:21:10 -0600 (CST)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hi,
	I have a question concerning TCP behaviour. Current TCP implementations
resort to a timeout when a retransmitted packet is lost. Consider the following
behaviour for TCP Reno:

Sender transmits packets 1, 2, ... 10. Packet 1 gets lost. The retransmission
of packet 1 upon getting 3 duplicate ACKs also gets lost. The duplicate ACKs
generated due to packets 5 - 10 cause the sender to send new packets (11
- 15). These packets would generate duplicate ACKs again in response to which
the sender would send packets 16 - 20.  This would go on till the advertised
window is exhausted and then the sender waits for a retransmission timeout.

There are some other observations that can be made here wrt congestion control.
If any of the new packets generated also get lost, then the number of duplicate
ACKs received by the sender get correspondingly reduced. The sender would
correspondingly send less number of new segments. Thus further losses during
fast recovery cause the injected data to be reduced "linearly" and not
"multiplicatively". Secondly, the long wait for retransmission timeouts can be
detrimental for performance on high bandwidth networks.

The sender can easily detect loss of a retransmitted segment by counting the
number of duplicate ACKs received (if more than 9 are received, then the
sender can assume that the retransmitted packet was also lost). The sender
can immediately go over to slow start rather than waiting for a retransmission
timeout. The purpose of a retransmission timeout - clearing packets in the
network - can also be achieved by slow-start.

I'd be glad to know opinions about this.




- Mohit Aron
  aron@cs.rice.edu

From owner-tcp-impl@relay.engr.sgi.com  Thu Nov 20 11:50:36 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA10788 for tcp-impl-list; Thu, 20 Nov 1997 11:43:16 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA10584 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 20 Nov 1997 11:42:59 -0800
Received: from frantic.BSDI.COM (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA23099
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 20 Nov 1997 11:42:55 -0800
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id NAA26980;
	Thu, 20 Nov 1997 13:43:17 -0600 (CST)
Date: Thu, 20 Nov 1997 13:43:17 -0600 (CST)
From: David Borman <dab@BSDI.COM>
Message-Id: <199711201943.NAA26980@frantic.BSDI.COM>
To: aron@cs.rice.edu, end2end-interest@ISI.EDU, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: detecting loss of retransmitted packets in TCP
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Mohit Aron <aron@cs.rice.edu>
> Subject: detecting loss of retransmitted packets in TCP
> Date: Thu, 20 Nov 1997 12:21:10 -0600 (CST)
> ...
> Sender transmits packets 1, 2, ... 10. Packet 1 gets lost. The retransmission
> of packet 1 upon getting 3 duplicate ACKs also gets lost. The duplicate ACKs
> generated due to packets 5 - 10 cause the sender to send new packets (11
> - 15). These packets would generate duplicate ACKs again in response to which
> the sender would send packets 16 - 20.  This would go on till the advertised
> window is exhausted and then the sender waits for a retransmission timeout.
> ...
> The sender can easily detect loss of a retransmitted segment by counting the
> number of duplicate ACKs received (if more than 9 are received, then the
> sender can assume that the retransmitted packet was also lost). The sender
> can immediately go over to slow start rather than waiting for a retransmission
> timeout. The purpose of a retransmission timeout - clearing packets in the
> network - can also be achieved by slow-start.

A few quick thoughts:

 o You'd need to count 12, not 9 duplicate acks before retransmitting
   the retransmitted packet.  Just as it takes 3 duplicate acks to
   trigger fast retransmit to account for reordering, you'd need to
   allow 3 additional duplicate acks to be sure that the retransmitted
   packet didn't just get reorderd.

 o Clearing packets in the network is a side-effect of doing a
   retransmission timeout, not the reason for it.  The reason for
   the retransmission timer is to get the data flowing when it has
   stopped due to loss.

 o Is this just a theoretical question, or do you have a real-life
   problem where retransmitted packets are being dropped on a
   regular basis?

		-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Thu Nov 20 13:41:12 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA13724 for tcp-impl-list; Thu, 20 Nov 1997 13:34:51 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA13693 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 20 Nov 1997 13:34:45 -0800
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA28650
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 20 Nov 1997 13:34:43 -0800
	env-from (aron@cs.rice.edu)
Received: from mrsclaus.cs.rice.edu (mrsclaus.cs.rice.edu [128.42.1.108]) by cs.rice.edu (8.8.5/8.7.1) with ESMTP id PAA26237; Thu, 20 Nov 1997 15:34:23 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Received: (from aron@localhost) by mrsclaus.cs.rice.edu (8.8.5/8.7.3) id PAA21760; Thu, 20 Nov 1997 15:34:22 -0600 (CST)
Message-Id: <199711202134.PAA21760@mrsclaus.cs.rice.edu>
Subject: Re: detecting loss of retransmitted packets in TCP
To: dab@BSDI.COM (David Borman)
Date: Thu, 20 Nov 1997 15:34:21 -0600 (CST)
Cc: aron@cs.rice.edu, end2end-interest@ISI.EDU, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199711201943.NAA26980@frantic.BSDI.COM> from "David Borman" at Nov 20, 97 01:43:17 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> 
>  o Is this just a theoretical question, or do you have a real-life
>    problem where retransmitted packets are being dropped on a
>    regular basis?
> 


I admit my question is more theoretical than being based upon any actual
observation. But I would imagine that with TCP implementations like SACK and
new-Reno which try to recover all segments lost in an RTT, the loss of 
retransmitted packets might occur more often than it occurred in Reno.




- Mohit Aron
  aron@cs.rice.edu

From owner-tcp-impl@relay.engr.sgi.com  Thu Nov 20 15:39:08 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA23930 for tcp-impl-list; Thu, 20 Nov 1997 15:31:58 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA23904 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 20 Nov 1997 15:31:52 -0800
Received: from simon.cs.cornell.edu (SIMON.CS.CORNELL.EDU [128.84.154.10]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA06967
	for <tcp-impl@relay.engr.sgi.com>; Thu, 20 Nov 1997 15:31:49 -0800
	env-from (skeshav@CS.Cornell.EDU)
Received: from cloyd.cs.cornell.edu (CLOYD.CS.CORNELL.EDU [128.84.227.15])
	by simon.cs.cornell.edu (8.8.5/8.8.5/R-1.8) with ESMTP id SAA19653;
	Thu, 20 Nov 1997 18:31:42 -0500 (EST)
Received: from dar (DHCP22.CS.CORNELL.EDU [128.84.248.153])
	by cloyd.cs.cornell.edu (8.8.5/8.8.5/M-1.9) with ESMTP id SAA20832;
	Thu, 20 Nov 1997 18:31:40 -0500 (EST)
Message-ID: <3474C84F.E02D2959@cs.cornell.edu>
Date: Thu, 20 Nov 1997 18:31:27 -0500
From: "S. Keshav" <skeshav@CS.Cornell.EDU>
X-Mailer: Mozilla 4.01 [en] (WinNT; I)
MIME-Version: 1.0
To: Mohit Aron <aron@cs.rice.edu>
CC: tcp-impl@cthulhu.engr.sgi.com, end2end-interest@ISI.EDU
Subject: Re: detecting loss of retransmitted packets in TCP
X-Priority: 3 (Normal)
References: <199711201821.MAA21360@mrsclaus.cs.rice.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

You can detect the loss of a retransmitted packet if the cumulative
ack does not increase one RTT after sending the retransmission.
I use this trick in the packet-pair protocol (for details, see
my paper on SMART in Infocom '97, also available from my
home page below).

-- 
keshav

Dept. Computer Science, 4130 Upson, Cornell U. Ithaca NY 14853 
Tel. 607.255.5395 Fax x4428 http://www.cs.cornell.edu/home/skeshav

From owner-tcp-impl@relay.engr.sgi.com  Fri Nov 21 03:20:18 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id DAA29999 for tcp-impl-list; Fri, 21 Nov 1997 03:11:15 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id DAA29984 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 21 Nov 1997 03:11:13 -0800
Received: from callisto.ftel.co.uk (callisto.ftel.co.uk [192.131.79.11]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id DAA24852
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 21 Nov 1997 03:10:39 -0800
	env-from (G.Cope@ftel.co.uk)
Received: from callisto.ftel.co.uk (pCw12P0r2BuGPW69aMhGZd5yprx+x0HK@localhost.ftel.co.uk [127.0.0.1])
	by callisto.ftel.co.uk (8.8.7/8.8.7/Revision:1.32) with SMTP id LAA08870;
	Fri, 21 Nov 1997 11:10:35 GMT
Message-ID: <34756C29.3723@ftel.co.uk>
Date: Fri, 21 Nov 1997 11:10:33 +0000
From: Graham Cope <G.Cope@ftel.co.uk>
Organization: Fujitsu Telecommunications Europe Ltd
X-Mailer: Mozilla 3.01Gold (X11; I; SunOS 5.6 sun4u)
MIME-Version: 1.0
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: detecting loss of retransmitted packets in TCP
References: <199711201821.MAA21360@mrsclaus.cs.rice.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Mohit Aron wrote:
> 
> Hi,
>         I have a question concerning TCP behaviour. Current TCP implementations
> resort to a timeout when a retransmitted packet is lost. Consider the following
> behaviour for TCP Reno:
> 
> Sender transmits packets 1, 2, ... 10. Packet 1 gets lost. The retransmission
> of packet 1 upon getting 3 duplicate ACKs also gets lost. The duplicate ACKs
> generated due to packets 5 - 10 cause the sender to send new packets (11
> - 15). 


I don't believe that this will happen, if based on taking rfc 2001's
description of fast recovery literally (see P4).

This is because cwnd is set to ssthresh + 3*seg_size, after ssthresh has
been halved. As I've coded it, this usually causes my simulation model
to stop sending, becausethe halving of ssthresh is not compensated by
adding 2*seg_size. If it had taken the old, un-halved ssthresh, it would
be OK.



Graham Cope

From owner-tcp-impl@relay.engr.sgi.com  Fri Nov 21 05:47:31 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id FAA12170 for tcp-impl-list; Fri, 21 Nov 1997 05:41:38 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA12161 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 21 Nov 1997 05:41:36 -0800
Received: from mercury.spider.com (mercury.spider.com [194.217.109.10]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id FAA19115
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 21 Nov 1997 05:41:29 -0800
	env-from (ian@spider.com)
Received: from asimov.spider.com (asimov.spider.com [194.217.109.66]) by mercury.spider.com (8.8.3/8.8.3) with SMTP id NAA12396 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 21 Nov 1997 13:41:17 GMT
Received: from malatesta. by asimov.spider.com (SMI-8.6/SMI-SVR4)
	id NAA16988; Fri, 21 Nov 1997 13:40:51 GMT
Received: by malatesta. (SMI-8.6/SMI-SVR4)
	id NAA06505; Fri, 21 Nov 1997 13:40:50 GMT
Date: Fri, 21 Nov 1997 13:40:50 GMT
From: ian@spider.com (Ian Heavens)
Message-Id: <199711211340.NAA06505@malatesta.>
X-Mailer: Mail User's Shell (7.2.6 beta(2) 2/29/96)
To: tcp-impl@cthulhu.engr.sgi.com
Subject: RSTs and Half Duplex Close bug
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


I sent this to Vern for the Known Problems I-D

cheers

ian


Failure to send a RST after Half Duplex Close       
 
   Category
        Reliability?
 
   Description

	RFC 1122 4.2.2.13 mandates that TCP SHOULD send a RST if data is 
	received after "half duplex close", i.e. if it cannot be delivered to 
	the application.  Failure to do so can lead to permanently hung TCP 
	connections and has been demonstrated when HTTP clients abort 
	connections, common when users move on to a new page before the 
	current page has finished downloading.  The HTTP client closes by 
	transmitting a FIN while the server is transmitting images, text, etc.
	The server TCP receives the FIN,  but the application does not close 
	until all data has been queued for transmission (typically, a write() 
	call blocks), and the server does not transmit a FIN until all the data	
	has been transmitted.  The window decreases to zero, since it cannot 
	pass the data to the application, and the server sends probe segments.  
	The client acknowledges the probe segments with a zero window. As mandated 
	in RFC1122 4.2.2.17, the probe segments are transmitted forever.  Server 
	connection state remains in CLOSE_WAIT, and eventually server processes 
	are exhausted.

	Note that there are two bugs.  First, probe segments should be ignored 
	if the window can never subsequently increase.  Second, a RST should 
	be sent when data is received after half duplex close.  Fixing the 
	first bug, but not the second, results in the probe segments eventually
	timing out the connection, but the server remains in CLOSE_WAIT for a 
	significant and unnecessary period.

Significance
        Serious
 
Implications
        Web servers require frequent rebooting
 
Relevant RFCs
  	 RFC 1122 sections 4.2.2.13 and 4.2.2.17

Trace file demonstrating the problem
	Made using an unknown network analyser
 
	client.1391 > server.8080: S 0:1(0) ack: 0 win: 2000 <mss: 5b4>
	server.8080 > client.1391: SA 8c01:8c02(0) ack: 1 win: 8000 <mss:100>
	client.1391 > server.8080: PA 
	client.1391 > server.8080: PA 1:1c2(1c1) ack: 8c02 win: 2000
	server.8080 > client.1391: [DF] PA 8c02:8cde(dc) ack: 1c2 win: 8000
	server.8080 > client.1391: [DF] A 8cde:9292(5b4) ack: 1c2 win: 8000
	server.8080 > client.1391: [DF] A 9292:9846(5b4) ack: 1c2 win: 8000
	server.8080 > client.1391: [DF] A 9846:9dfa(5b4) ack: 1c2 win: 8000
	client.1391 > server.8080: PA 
	server.8080 > client.1391: [DF] A 9dfa:a3ae(5b4) ack: 1c2 win: 8000
	server.8080 > client.1391: [DF] A a3ae:a962(5b4) ack: 1c2 win: 8000
	server.8080 > client.1391: [DF] A a962:af16(5b4) ack: 1c2 win: 8000
	server.8080 > client.1391: [DF] A af16:b4ca(5b4) ack: 1c2 win: 8000
	client.1391 > server.8080: PA 
	server.8080 > client.1391: [DF] A b4ca:ba7e(5b4) ack: 1c2 win: 8000
	server.8080 > client.1391: [DF] A b4ca:ba7e(5b4) ack: 1c2 win: 8000
	client.1391 > server.8080: PA 
	server.8080 > client.1391: [DF] A ba7e:bdfa(37c) ack: 1c2 win: 8000
	client.1391 > server.8080: PA 
	server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c2 win: 8000
	client.1391 > server.8080: PA 
	
	[ HTTP client aborts and enters FIN_WAIT_1 ]

	client.1391 > server.8080: FPA 

	[ server ACKs the FIN and enters CLOSE_WAIT ]
	
	server.8080 > client.1391: [DF] A 

	[ client enters FIN_WAIT_2 ]

	server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000

	[ server continues to try to send its data ]

	client.1391 > server.8080: PA < window = 0 >
	server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000
	client.1391 > server.8080: PA < window = 0 >
	server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000
	client.1391 > server.8080: PA < window = 0 >
	server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000
	client.1391 > server.8080: PA < window = 0 >
	server.8080 > client.1391: [DF] A bdfa:bdfb(1) ack: 1c3 win: 8000
	client.1391 > server.8080: PA < window = 0 >

	[ .. repeat ad nauseam .. ]

Trace file demonstrating correct behaviour
	Made using unknown network analyser 

	0.000 client > server D=80 S=59500 Syn Seq=337 Len=0 Win=8760
	0.084 server > client D=59500 S=80 Syn Ack=338 Seq=80153 Len=0 Win=8760
	0.000 client > server D=80 S=59500 Ack=80154 Seq=338 Len=0 Win=8760

	 [..  normal data ommitted ..]

	0.000 client > server D=80 S=59500 Ack=14559 Seq=596 Len=0 Win=8760
	0.009 server > client D=59500 S=80 Ack=596 Seq=114559 Len=1460 Win=8760
 
	[.. client closes connection ..]

	0.003 client > server D=80 S=59500 Fin Seq=596 Len=0 Win=8760
	0.045 server > client D=59500 S=80 Ack=597 Seq=116019 Len=1460 Win=8760
 
	[.. client sends RST (RFC1122 4.2.2.13) ]

	0.000 client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
	0.010 server > client D=59500 S=80 Ack=597 Seq=117479 Len=1460 Win=8760
	0.000 client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
	0.030 server > client D=59500 S=80 Ack=597 Seq=118939 Len=1460 Win=8760
	0.000 client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
	0.005 server > client D=59500 S=80 Ack=597 Seq=120399 Len=892 Win=8760
	0.000 client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
	0.012 server > client D=59500 S=80 Ack=597 Seq=121291 Len=1460 Win=8760
	0.000 client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0


From owner-tcp-impl@relay.engr.sgi.com  Fri Nov 21 11:30:28 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA05844 for tcp-impl-list; Fri, 21 Nov 1997 11:21:32 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA05828 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 21 Nov 1997 11:21:30 -0800
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA23080
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 21 Nov 1997 11:21:28 -0800
	env-from (aron@cs.rice.edu)
Received: from klio.cs.rice.edu (klio.cs.rice.edu [128.42.1.78]) by cs.rice.edu (8.8.5/8.7.1) with ESMTP id NAA11434; Fri, 21 Nov 1997 13:21:25 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Received: (from aron@localhost) by klio.cs.rice.edu (8.8.5/8.7.5) id NAA05247; Fri, 21 Nov 1997 13:21:24 -0600 (CST)
Message-Id: <199711211921.NAA05247@klio.cs.rice.edu>
Subject: Re: detecting loss of retransmitted packets in TCP
To: G.Cope@ftel.co.uk (Graham Cope)
Date: Fri, 21 Nov 1997 13:21:22 -0600 (CST)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <34756C29.3723@ftel.co.uk> from "Graham Cope" at Nov 21, 97 11:10:33 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> > Sender transmits packets 1, 2, ... 10. Packet 1 gets lost. The retransmission
> > of packet 1 upon getting 3 duplicate ACKs also gets lost. The duplicate ACKs
> > generated due to packets 5 - 10 cause the sender to send new packets (11
> > - 15). 
> 
> 
> I don't believe that this will happen, if based on taking rfc 2001's
> description of fast recovery literally (see P4).
> 
> This is because cwnd is set to ssthresh + 3*seg_size, after ssthresh has
> been halved. As I've coded it, this usually causes my simulation model
> to stop sending, becausethe halving of ssthresh is not compensated by
> adding 2*seg_size. If it had taken the old, un-halved ssthresh, it would
> be OK.
> 



This is completely incorrect. The primary reason behind fast recovery is
to send half the segments that were sent before detecting the loss. Just
to satisfy you, here's how it will happen:

Congestion window before detecting loss = 10
ssthresh after detecting loss = 10/2 = 5
congestion window immediately after detecting loss = 5+3 = 8
Amount of unacknowledged data in the network = 10



Congestion window is now increased upon getting every duplicate ACK. As soon
as it becomes greater than 10 (amount of unacknowledged data), the sender would
start sending again.






- Mohit

From owner-tcp-impl@relay.engr.sgi.com  Fri Nov 21 15:55:20 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA05017 for tcp-impl-list; Fri, 21 Nov 1997 15:50:10 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA04895 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 21 Nov 1997 15:50:04 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA11791
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 21 Nov 1997 15:50:03 -0800
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id PAA04132; Fri, 21 Nov 1997 15:49:59 -0800 (PST)
Message-Id: <199711212349.PAA04132@daffy.ee.lbl.gov>
To: ian@spider.com (Ian Heavens)
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: RSTs and Half Duplex Close bug
In-reply-to: Your message of Fri, 21 Nov 1997 13:40:50 PST.
Date: Fri, 21 Nov 1997 15:49:59 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

I've included a version of this in the I-D, which I submitted earlier today.
It should show up on the list some time next week.

> 	[.. client sends RST (RFC1122 4.2.2.13) ]
> 
> 	0.000 client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
> 	0.010 server > client D=59500 S=80 Ack=597 Seq=117479 Len=1460 Win=8760
> 	0.000 client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
> 	0.030 server > client D=59500 S=80 Ack=597 Seq=118939 Len=1460 Win=8760

How come these RSTs aren't tearing down the connection?

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Fri Nov 21 18:44:12 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA16747 for tcp-impl-list; Fri, 21 Nov 1997 18:39:39 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA16738 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 21 Nov 1997 18:39:36 -0800
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id SAA25338
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 21 Nov 1997 18:39:32 -0800
	env-from (sparker@fstop.Eng.Sun.COM)
Received: from Eng.Sun.COM ([129.146.1.25]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id SAA04004; Fri, 21 Nov 1997 18:38:57 -0800
Received: from fstop. ([192.9.204.16])
	by Eng.Sun.COM (SMI-8.6/SMI-5.3) with SMTP id SAA09197;
	Fri, 21 Nov 1997 18:07:30 -0800
Received: from fstop by fstop. (SMI-8.6/SMI-SVR4)
	id SAA02999; Fri, 21 Nov 1997 18:07:03 -0800
Message-Id: <199711220207.SAA02999@fstop.>
From: sparker@Eng.Sun.COM
To: Vern Paxson <vern@ee.lbl.gov>
cc: ian@spider.com (Ian Heavens), tcp-impl@cthulhu.engr.sgi.com
Subject: Re: RSTs and Half Duplex Close bug 
Date: Fri, 21 Nov 1997 18:07:03 -0800
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


- > 	[.. client sends RST (RFC1122 4.2.2.13) ]
- > 
- > 	0.000 client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
- > 	0.010 server > client D=59500 S=80 Ack=597 Seq=117479 Len=1460 Win=8760
- > 	0.000 client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
- > 	0.030 server > client D=59500 S=80 Ack=597 Seq=118939 Len=1460 Win=8760
- 
- How come these RSTs aren't tearing down the connection?

It probably is, however tearing down a connection doesn't usually nuke
packets currently queued on the interface.  I suspect both segments shown
here were already queued either on the host's interface, or in the
network somewhere.

Cheers,

	~sparker

From owner-tcp-impl@relay.engr.sgi.com  Mon Nov 24 00:10:42 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA25402 for tcp-impl-list; Mon, 24 Nov 1997 00:00:04 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA25371 for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 23 Nov 1997 23:59:56 -0800
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id XAA25123
	for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 23 Nov 1997 23:59:55 -0800
	env-from (Jerry.Chu@eng.Sun.COM)
Received: from sunmail1.Sun.COM ([129.145.1.2]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id XAA02443; Sun, 23 Nov 1997 23:59:52 -0800
Received: from jurassic.eng.sun.com by sunmail1.Sun.COM (SMI-8.6/SMI-4.1)
	id XAA00339; Sun, 23 Nov 1997 23:59:51 -0800
Received: from taipei.eng.sun.com (taipei [129.146.86.158])
	by jurassic.eng.sun.com (8.8.8+Sun.Beta.4/8.8.8) with SMTP id XAA04192;
	Sun, 23 Nov 1997 23:59:50 -0800 (PST)
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id XAA06758; Sun, 23 Nov 1997 23:59:02 -0800
Date: Sun, 23 Nov 1997 23:59:02 -0800
From: Jerry.Chu@eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199711240759.XAA06758@taipei.eng.sun.com>
To: ian@spider.com
Subject: Re: RSTs and Half Duplex Close bug
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Another common "bug" in many TCP implementations is that after
the server side is reset, neither end holds a TIME_WAIT state to
safeguard against old, lingering packets in the network.

Just how serious should we treat the 2MSL requirement? :-(.

Jerry


From owner-tcp-impl@relay.engr.sgi.com  Mon Nov 24 00:30:59 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA29066 for tcp-impl-list; Mon, 24 Nov 1997 00:21:53 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA29014 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 24 Nov 1997 00:21:40 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id AAA29282
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 24 Nov 1997 00:21:39 -0800
	env-from (touch@ISI.EDU)
Received: from tau-i.isi.edu (tau-i.isi.edu [128.9.102.3])
	by zephyr.isi.edu (8.8.7/8.8.6) with SMTP id AAA23772;
	Mon, 24 Nov 1997 00:21:32 -0800 (PST)
Message-Id: <2.2.32.19971124082049.006ebf38@zephyr.isi.edu>
X-Sender: touch@zephyr.isi.edu
X-Mailer: Windows Eudora Pro Version 2.2 (32)
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Mon, 24 Nov 1997 00:20:49 -0800
To: Jerry.Chu@Eng.Sun.COM (Hsiao-keng Jerry Chu), ian@spider.com
From: Joe Touch <touch@ISI.EDU>
Subject: Re: RSTs and Half Duplex Close bug
Cc: tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

At 11:59 PM 11/23/97 -0800, Hsiao-keng Jerry Chu wrote:
>Another common "bug" in many TCP implementations is that after
>the server side is reset, neither end holds a TIME_WAIT state to
>safeguard against old, lingering packets in the network.
>
>Just how serious should we treat the 2MSL requirement? :-(.

That depends on who you believe you're making decisions for.

When you change YOUR 2MSL, you affect not only the data you
are willing to accept from anyone else, but also the data
that you are willing to let anyone who talks to you accept
as if from your connection.

I.e., changing the MSL changes only the interaction between you
and parties you talk with, but either side can end up accepting
data for connections that have expired.

Another consequence is that new connections can be aborted 
prematurely. 

Both of these issues are discussed in a summary of premature
TIME_WAIT release hazards, which we recently submitted for
the bug list draft.

Joe


From owner-tcp-impl@relay.engr.sgi.com  Mon Nov 24 02:06:00 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA19902 for tcp-impl-list; Mon, 24 Nov 1997 01:53:33 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id BAA19885 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 24 Nov 1997 01:53:27 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id BAA13622
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 24 Nov 1997 01:53:25 -0800
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (the-great-packet-bucket-in-the-sky [163.164.160.21] (may be forged)) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id JAA20252; Mon, 24 Nov 1997 09:53:16 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0xZvAK-0005FtC; Mon, 24 Nov 97 09:50 GMT
Message-Id: <m0xZvAK-0005FtC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: RSTs and Half Duplex Close bug
To: Jerry.Chu@Eng.Sun.COM (Hsiao-keng Jerry Chu)
Date: Mon, 24 Nov 1997 09:50:52 +0000 (GMT)
Cc: ian@spider.com, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199711240759.XAA06758@taipei.eng.sun.com> from "Hsiao-keng Jerry Chu" at Nov 23, 97 11:59:02 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Another common "bug" in many TCP implementations is that after
> the server side is reset, neither end holds a TIME_WAIT state to
> safeguard against old, lingering packets in the network.
> 
> Just how serious should we treat the 2MSL requirement? :-(.

Depends if you care about data corruption and sessions jamming up or not.
Some of this (one of the simpler real cases) is dealt with in RFC1337.

Alan


From owner-tcp-impl@relay.engr.sgi.com  Mon Nov 24 03:48:55 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id DAA09564 for tcp-impl-list; Mon, 24 Nov 1997 03:43:30 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id DAA09525 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 24 Nov 1997 03:43:16 -0800
Received: from callisto.ftel.co.uk (callisto.ftel.co.uk [192.131.79.11]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id DAA00300
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 24 Nov 1997 03:43:13 -0800
	env-from (G.Cope@ftel.co.uk)
Received: from callisto.ftel.co.uk (9RKt3deRnyBXe8UxfRkcCgHVytxf3cQO@localhost.ftel.co.uk [127.0.0.1])
	by callisto.ftel.co.uk (8.8.7/8.8.7/Revision:1.32) with SMTP id LAA23152;
	Mon, 24 Nov 1997 11:42:35 GMT
Message-ID: <3479682A.5705@ftel.co.uk>
Date: Mon, 24 Nov 1997 11:42:34 +0000
From: Graham Cope <G.Cope@ftel.co.uk>
Organization: Fujitsu Telecommunications Europe Ltd
X-Mailer: Mozilla 3.01Gold (X11; I; SunOS 5.6 sun4u)
MIME-Version: 1.0
To: Mohit Aron <aron@cs.rice.edu>
CC: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: detecting loss of retransmitted packets in TCP
References: <199711211921.NAA05247@klio.cs.rice.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Mohit Aron wrote:
> 
> > > Sender transmits packets 1, 2, ... 10. Packet 1 gets lost. The retransmission
> > > of packet 1 upon getting 3 duplicate ACKs also gets lost. The duplicate ACKs
> > > generated due to packets 5 - 10 cause the sender to send new packets (11
> > > - 15).
> >
> >
> > I don't believe that this will happen, if based on taking rfc 2001's
> > description of fast recovery literally (see P4).
> >
> > This is because cwnd is set to ssthresh + 3*seg_size, after ssthresh has
> > been halved. As I've coded it, this usually causes my simulation model
> > to stop sending, becausethe halving of ssthresh is not compensated by
> > adding 2*seg_size. If it had taken the old, un-halved ssthresh, it would
> > be OK.
> >
> 
> This is completely incorrect. The primary reason behind fast recovery is
> to send half the segments that were sent before detecting the loss. Just
> to satisfy you, here's how it will happen:
> 
> Congestion window before detecting loss = 10
> ssthresh after detecting loss = 10/2 = 5
> congestion window immediately after detecting loss = 5+3 = 8
> Amount of unacknowledged data in the network = 10

Thus ACKs for segment 5-10 will push the window to 13, and hence
segments 11-13 will be sent, and not 11-15 as in your last e-mail. The
duplicate ACKs for these would then cause segments 14-16 to be sent,
etc.
   Subsequent loss, would, I agree, give further linear reduction. As to
your supposition about going straight into slow-start, I must think
about.

> 
> Congestion window is now increased upon getting every duplicate ACK. As soon
> as it becomes greater than 10 (amount of unacknowledged data), the sender would
> start sending again.


Thanks for the clarification.

I think describing my previous relpy as 'completely incorrect' is a
little harsh. In order to 'start sending again', it must first 'stop'.

In your above example TCP does in fact stop sending, until either:
a) Two more ACKs arrive, which should happen fairly soon provided no
other segments were lost, or
b) An ACK for a presumed lost (but possibly delayed) segment arrives, in
which case the window moves forward, cwnd gets reset to ssthresh, etc.

Although I admit that such a 'stop' might better be termed a (very
temporary) stall, which would only really be detrimental on high BW
networks (as you state).



Graham Cope

From owner-tcp-impl@relay.engr.sgi.com  Mon Nov 24 04:16:31 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id EAA13561 for tcp-impl-list; Mon, 24 Nov 1997 04:02:40 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id EAA13545 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 24 Nov 1997 04:02:37 -0800
Received: from mercury.spider.com (mercury.spider.com [194.217.109.10]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id EAA02878
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 24 Nov 1997 04:02:34 -0800
	env-from (ian@spider.com)
Received: from asimov.spider.com (asimov.spider.com [194.217.109.66]) by mercury.spider.com (8.8.3/8.8.3) with SMTP id MAA20582; Mon, 24 Nov 1997 12:02:31 GMT
Received: from malatesta by asimov.spider.com (SMI-8.6/SMI-SVR4)
	id MAA13956; Mon, 24 Nov 1997 12:02:01 GMT
Message-ID: <34796CB9.3BF0@spider.com>
Date: Mon, 24 Nov 1997 12:02:01 +0000
From: Ian Heavens <ian@spider.com>
Organization: Spider Software Ltd.
X-Mailer: Mozilla 3.01 (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: sparker@Eng.Sun.COM
CC: Vern Paxson <vern@ee.lbl.gov>, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: RSTs and Half Duplex Close bug
References: <199711220207.SAA02999@fstop.>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

sparker@Eng.Sun.COM wrote:
> 
> - >     [.. client sends RST (RFC1122 4.2.2.13) ]
> - >
> - >     0.000 client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
> - >     0.010 server > client D=59500 S=80 Ack=597 Seq=117479 Len=1460 Win=8760
> - >     0.000 client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
> - >     0.030 server > client D=59500 S=80 Ack=597 Seq=118939 Len=1460 Win=8760
> -
> - How come these RSTs aren't tearing down the connection?
> 
> It probably is, however tearing down a connection doesn't usually nuke
> packets currently queued on the interface.  I suspect both segments shown
> here were already queued either on the host's interface, or in the
> network somewhere.


The client was offering a window of 8760, so it's possible that all
the subsequent segments were in transit when the RST was transmitted.

By the way, this is what makes the lack of TIME-WAIT after RST more
dangerous : it is very likely that up to an entire window of data
will be delivered after the connection has closed down, whereas 
after a FIN exchange data is unlikely to be delivered.

ian

Ian Heavens, Spider Software Ltd., http://www.spider.com/
8 John's Place, Leith, Edinburgh EH6 7EL. 
Tel +44 131 475 7015 fax. +44 131 475 7001  ian@spider.com

From owner-tcp-impl@relay.engr.sgi.com  Mon Nov 24 07:39:30 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA02370 for tcp-impl-list; Mon, 24 Nov 1997 07:29:04 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA02314 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 24 Nov 1997 07:28:53 -0800
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA11120
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 24 Nov 1997 07:28:51 -0800
	env-from (aron@cs.rice.edu)
Received: from noel.cs.rice.edu (noel.cs.rice.edu [128.42.1.136]) by cs.rice.edu (8.8.5/8.7.1) with ESMTP id JAA21209; Mon, 24 Nov 1997 09:28:50 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Received: (from aron@localhost) by noel.cs.rice.edu (8.8.5/8.7.5) id JAA09617; Mon, 24 Nov 1997 09:28:49 -0600 (CST)
Message-Id: <199711241528.JAA09617@noel.cs.rice.edu>
Subject: Re: detecting loss of retransmitted packets in TCP
To: G.Cope@ftel.co.uk (Graham Cope)
Date: Mon, 24 Nov 1997 09:28:49 -0600 (CST)
Cc: aron@cs.rice.edu, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <3479682A.5705@ftel.co.uk> from "Graham Cope" at Nov 24, 97 11:42:34 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> 
> Thus ACKs for segment 5-10 will push the window to 13, and hence
> segments 11-13 will be sent, and not 11-15 as in your last e-mail. The
> duplicate ACKs for these would then cause segments 14-16 to be sent,
> etc.


Actually ACKs for segments 5-10 will push the window to 14 and thus 11-14
would be sent. Anyway, the purpose of my mail was not to show these details.
It was to illustrate the point that I was trying to make.


> Although I admit that such a 'stop' might better be termed a (very
> temporary) stall, which would only really be detrimental on high BW
> networks (as you state).
> 
> 

I didn't state that this temporary stall is the one that is detrimental. 
The stall that will result while waiting for the rexmt timeout is the one
that is detrimental.




- Mohit

From owner-tcp-impl@relay.engr.sgi.com  Mon Nov 24 15:30:24 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA18297 for tcp-impl-list; Mon, 24 Nov 1997 15:18:11 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA18271 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 24 Nov 1997 15:18:09 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA05981
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 24 Nov 1997 15:18:07 -0800
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id PAA08340; Mon, 24 Nov 1997 15:18:03 -0800 (PST)
Message-Id: <199711242318.PAA08340@daffy.ee.lbl.gov>
To: Ian Heavens <ian@spider.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: RSTs and Half Duplex Close bug
In-reply-to: Your message of Mon, 24 Nov 1997 12:02:01 PST.
Date: Mon, 24 Nov 1997 15:18:02 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> The client was offering a window of 8760, so it's possible that all
> the subsequent segments were in transit when the RST was transmitted.
> 
> By the way, this is what makes the lack of TIME-WAIT after RST more
> dangerous : it is very likely that up to an entire window of data
> will be delivered after the connection has closed down, whereas 
> after a FIN exchange data is unlikely to be delivered.

I'm confused by this comment.  If by delivered you mean to the application,
then this seems unlikely - it requires that a new connection is very quickly
established with the same ephemeral port and insufficient ISN advance.
I guess the combination of a proxy that reuses ephemeral ports plus bad luck
with randomized ISN generation could do this; I wouldn't consider that
"very likely", though.

If by delivered you mean transmitted over the network, then since it's
already in flight, a FIN exchange wouldn't prevent that, right?

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Nov 25 07:40:00 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA02169 for tcp-impl-list; Tue, 25 Nov 1997 07:27:53 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA02164 for <tcp-impl@engr.sgi.com>; Tue, 25 Nov 1997 07:27:50 -0800
Received: from ns.ietf.org (ietf.org [132.151.1.19]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA06591
	for <tcp-impl@engr.sgi.com>; Tue, 25 Nov 1997 07:27:42 -0800
	env-from (cclark@cnri.reston.va.us)
Received: from CNRI.Reston.VA.US (localhost [127.0.0.1])
	by ns.ietf.org (8.8.7/8.8.7a) with ESMTP id KAA10013;
	Tue, 25 Nov 1997 10:27:36 -0500 (EST)
Message-Id: <199711251527.KAA10013@ns.ietf.org>
Mime-Version: 1.0
Content-Type: Multipart/Mixed; Boundary="NextPart"
To: IETF-Announce@ns.ietf.org
Cc: tcp-impl@engr.sgi.com
From: Internet-Drafts@ns.ietf.org
Reply-to: Internet-Drafts@ns.ietf.org
Subject: I-D ACTION:draft-ietf-tcpimpl-prob-02.txt
Date: Tue, 25 Nov 1997 10:27:36 -0500
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

--NextPart

A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the TCP Implementation Working Group of the IETF.

	Title		: Known TCP Implementation Problems
	Author(s)	: B. Volz, I. Heavens, S. Dawson, 
                          V. Paxson, M. Allman
	Filename	: draft-ietf-tcpimpl-prob-02.txt
	Pages		: 29
	Date		: 24-Nov-97
	
   This memo catalogs a number of  known  TCP  implementation  problems.
   The  goal  in  doing  so  is  to  improve  conditions in the existing
   Internet by enhancing the quality of current TCP/IP  implementations.
   It  is  hoped  that  both  performance  and correctness issues can be
   resolved by making implementors  aware  of  the  problems  and  their
   solutions.   In  the  long term, it is hoped that this will provide a
   reduction  in  unnecessary  traffic  on  the  network,  the  rate  of
   connection  failures  due  to  protocol  errors,  and load on network
   servers due to time spent processing  both  unsuccessful  connections
   and  retransmitted  data.   This will help to ensure the stability of
   the global Internet.

Internet-Drafts are available by anonymous FTP.  Login with the username
"anonymous" and a password of your e-mail address.  After logging in,
type "cd internet-drafts" and then
	"get draft-ietf-tcpimpl-prob-02.txt".
A URL for the Internet-Draft is:
ftp://ds.internic.net/internet-drafts/draft-ietf-tcpimpl-prob-02.txt

Internet-Drafts directories are located at:

	Africa:	ftp.is.co.za
	
	Europe: ftp.nordu.net
		ftp.nis.garr.it
			
	Pacific Rim: munnari.oz.au
	
	US East Coast: ds.internic.net
	
	US West Coast: ftp.isi.edu

Internet-Drafts are also available by mail.

Send a message to:	mailserv@ds.internic.net.  In the body type:
	"FILE /internet-drafts/draft-ietf-tcpimpl-prob-02.txt".
	
NOTE:	The mail server at ds.internic.net can return the document in
	MIME-encoded form by using the "mpack" utility.  To use this
	feature, insert the command "ENCODING mime" before the "FILE"
	command.  To decode the response(s), you will need "munpack" or
	a MIME-compliant mail reader.  Different MIME-compliant mail readers
	exhibit different behavior, especially when dealing with
	"multipart" MIME messages (i.e. documents which have been split
	up into multiple messages), so check your local documentation on
	how to manipulate these messages.
		
		
Below is the data which will enable a MIME compliant mail reader
implementation to automatically retrieve the ASCII version of the
Internet-Draft.

--NextPart
Content-Type: Multipart/Alternative; Boundary="OtherAccess"

--OtherAccess
Content-Type: Message/External-body;
	access-type="mail-server";
	server="mailserv@ds.internic.net"

Content-Type: text/plain
Content-ID:	<19971124155101.I-D@ietf.org>

ENCODING mime
FILE /internet-drafts/draft-ietf-tcpimpl-prob-02.txt

--OtherAccess
Content-Type: Message/External-body;
	name="draft-ietf-tcpimpl-prob-02.txt";
	site="ds.internic.net";
	access-type="anon-ftp";
	directory="internet-drafts"

Content-Type: text/plain
Content-ID:	<19971124155101.I-D@ietf.org>

--OtherAccess--

--NextPart--



From owner-tcp-impl@relay.engr.sgi.com  Sun Nov 30 16:26:04 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA04499 for tcp-impl-list; Sun, 30 Nov 1997 16:12:53 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA04494 for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 30 Nov 1997 16:12:47 -0800
Received: from cleopatra.ultra.net (cleopatra.ultra.net [199.232.56.35]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA26273
	for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 30 Nov 1997 16:12:45 -0800
	env-from (backman@ultranet.com)
Received: from boss (d3.dial-3.ltn.ma.ultra.net [146.115.41.67]) by cleopatra.ultra.net (8.8.5/ult1.05) with SMTP id TAA29723; Sun, 30 Nov 1997 19:12:40 -0500 (EST)
Reply-To: "Larry Backman" <backman@ultranet.com>
From: "Larry Backman" <backman@ultranet.com>
To: <backman@cleopatra.ultra.net>
Cc: <tcp-impl@cthulhu.engr.sgi.com>
Subject: Checksum travails - many documents, many RFC's, no single source
Date: Sun, 30 Nov 1997 19:10:06 -0500
Message-ID: <01bcfded$7a62c0e0$a67c7f80@boss>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 4.71.1712.3
X-MimeOLE: Produced By Microsoft MimeOLE V4.71.1712.3
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Marooned on a desert island so to speak I found myself w./ broken code
to incrementally update a TCP checksum and no reference code to check
against. The last time I debugged a TCP checksum bug was at least 8
years back.  I still remembered the theory, but found that the details
excluded me.  What did I checksum, the header, header and data, header
and data and something else?
Off I went to the RFC's, many of them in fact.  RFC793 explains what
to checksum and does so reasonably clearly.  RFC 1122 says you *must*
compute a checksum.  RFC1071 has great math on the checksum, has good
source code on how to do a checksum and mentions incremental update,
but does not have source code.  RFC1141  has theory and math as to how
to implement the incremental update. RFC 1624 is more theory and math
about incremental update of the checksum and supplants RFC1141.

then there is RFC1631 (Network address translator) which mentions
incremental update and provides source code for an incremental update
that almost works.  Not to mention RFC1936 which is a wonderfully
detailed explaination (with logic code included!) of how to implement
a checksum in hardware.

What a confusing mess.  And I know my way around many of the issues of
checksum and know where to look and what to believe I think..).

It seems to me, having experienced this first hand; that it would be a
good thing to draw all the checksum issues together into a single
implementation note which indicates who and what to believe.

In terms of clarity I'd like to see a source code example added for
incremental update which supercedes the broken source in RFC 1631.

L.


From owner-tcp-impl@relay.engr.sgi.com  Mon Dec  1 09:53:05 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA07885 for tcp-impl-list; Mon, 1 Dec 1997 09:42:47 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA07846 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 1 Dec 1997 09:42:41 -0800
Received: from mercury.spider.com (mercury.spider.com [194.217.109.10]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA06786
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 1 Dec 1997 09:42:29 -0800
	env-from (ian@spider.com)
Received: from asimov.spider.com (asimov.spider.com [194.217.109.66]) by mercury.spider.com (8.8.3/8.8.3) with SMTP id RAA14163; Mon, 1 Dec 1997 17:42:17 GMT
Received: from malatesta by asimov.spider.com (SMI-8.6/SMI-SVR4)
	id RAA26503; Mon, 1 Dec 1997 17:41:32 GMT
Message-ID: <3482F6CB.3C9A@spider.com>
Date: Mon, 01 Dec 1997 17:41:31 +0000
From: Ian Heavens <ian@spider.com>
Organization: Spider Software Ltd.
X-Mailer: Mozilla 3.01 (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: Vern Paxson <vern@ee.lbl.gov>
CC: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: RSTs and Half Duplex Close bug
References: <199711242318.PAA08340@daffy.ee.lbl.gov>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Vern Paxson wrote:
> 
> > The client was offering a window of 8760, so it's possible that all
> > the subsequent segments were in transit when the RST was transmitted.
> >
> > By the way, this is what makes the lack of TIME-WAIT after RST more
> > dangerous : it is very likely that up to an entire window of data
> > will be delivered after the connection has closed down, whereas
> > after a FIN exchange data is unlikely to be delivered.
> 
> I'm confused by this comment.  If by delivered you mean to the application,
> then this seems unlikely - it requires that a new connection is very quickly
> established with the same ephemeral port and insufficient ISN advance.
> I guess the combination of a proxy that reuses ephemeral ports plus bad luck
> with randomized ISN generation could do this; I wouldn't consider that
> "very likely", though.
> 
> If by delivered you mean transmitted over the network, then since it's
> already in flight, a FIN exchange wouldn't prevent that, right?
> 


Sorry about the delay in replying.  I meant "delivered to the peer TCP".
I agree that delivery to an application is unlikely.  I was making the
point that after a RST, one is much more likely to get data than after
a FIN exchange.  So RSTs are more dangerous than reducing the MSL after
FIN exchange; if you worry about the latter, you might might to want
to worry about RSTS, unless RST use is very unlikely.  

The current mechanisms rely on no data arriving after the connection
closes...even if it does, the ISN and port sequence numbers provide
further protection.  So the dangers are relatively low.  I guess the
issue is whether they are less than the probability of data corruption
which does not modify the checksum.

ian



-- 
Ian Heavens, Spider Software Ltd., http://www.spider.com/
8 John's Place, Leith, Edinburgh EH6 7EL. 
Tel +44 131 475 7015 fax. +44 131 475 7001  ian@spider.com

From owner-tcp-impl@relay.engr.sgi.com  Mon Dec  1 23:05:57 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id WAA05379 for tcp-impl-list; Mon, 1 Dec 1997 22:52:07 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id WAA05356 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 1 Dec 1997 22:51:57 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id WAA01153
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 1 Dec 1997 22:51:56 -0800
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id WAA20591; Mon, 1 Dec 1997 22:51:51 -0800 (PST)
Message-Id: <199712020651.WAA20591@daffy.ee.lbl.gov>
To: Ian Heavens <ian@spider.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: RSTs and Half Duplex Close bug
In-reply-to: Your message of Mon, 01 Dec 1997 17:41:31 PST.
Date: Mon, 01 Dec 1997 22:51:51 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> ... I meant "delivered to the peer TCP".
> I agree that delivery to an application is unlikely.  I was making the
> point that after a RST, one is much more likely to get data than after
> a FIN exchange.

I still don't see why this is.  It takes half an RTT (roughly) for either
the RST or the first FIN to travel to the other peer.  At that point, in
either case the peer should stop sending new data.  So I don't see why one
is much more likely to get data with a RST.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Dec  2 07:39:11 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA05273 for tcp-impl-list; Tue, 2 Dec 1997 07:24:44 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA05268 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 2 Dec 1997 07:24:43 -0800
Received: from mercury.spider.com (mercury.spider.com [194.217.109.10]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA27175
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 2 Dec 1997 07:24:37 -0800
	env-from (ian@spider.com)
Received: from asimov.spider.com (asimov.spider.com [194.217.109.66]) by mercury.spider.com (8.8.3/8.8.3) with SMTP id PAA17439; Tue, 2 Dec 1997 15:24:33 GMT
Received: from malatesta by asimov.spider.com (SMI-8.6/SMI-SVR4)
	id PAA02399; Tue, 2 Dec 1997 15:23:49 GMT
Message-ID: <34842802.665B@spider.com>
Date: Tue, 02 Dec 1997 15:23:46 +0000
From: Ian Heavens <ian@spider.com>
Organization: Spider Software Ltd.
X-Mailer: Mozilla 3.01 (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: Vern Paxson <vern@ee.lbl.gov>
CC: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: RSTs and Half Duplex Close bug
References: <199712020651.WAA20591@daffy.ee.lbl.gov>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Vern Paxson wrote:
> 
> > ... I meant "delivered to the peer TCP".
> > I agree that delivery to an application is unlikely.  I was making the
> > point that after a RST, one is much more likely to get data than after
> > a FIN exchange.
> 
> I still don't see why this is.  It takes half an RTT (roughly) for either
> the RST or the first FIN to travel to the other peer.  At that point, in
> either case the peer should stop sending new data.  So I don't see why one
> is much more likely to get data with a RST.

It's only a problem if data arrives in CLOSED state.

For the RST case, while the RST is in transit, up to a window of
data may arrive in CLOSED state, since it may have already left
the peer before the RST arrives.

For the FIN case, a window of data may be in transit - but it will
arrive in FIN_WAIT_1.  The other end can continue transmitting until
it sends a FIN (this data will arrive in FIN_WAIT_2).  To arrive in
TIME-WAIT, the data has to be reordered with the second FIN, or
duplicated in the network, or have been lost and retransmitted.
(if the peer sends a FIN before it receives our FIN, the data arrives in
CLOSING or LAST-ACK state at one or both peers).  So even if TIME-WAIT
state is omitted, the chance of data arriving in CLOSED state at one
end or the other is small - roughly the probability of a segment being
reordered, lost or duplicated.  

I might have got the above analysis not quite right, but I think
that's roughly how it works.  After a RST, you often get a lot of
segments in CLOSED state (e.g. the correct implementation of half
duplex close in the Known Bugs I-D).  After a FIN, a single segment
is unlikely to arrive in TIME-WAIT (or CLOSED state if TIME-WAIT is
omitted).

Then the chance of actual data corruption is factored by the chance
of a new application opening with the port/sequence number space
overlapping that of one or more of the segments that arrived in
CLOSED state, which makes it a low probability event for either case.

Awaiting corrections...

cheers

ian


Ian Heavens, Spider Software Ltd., http://www.spider.com/
8 John's Place, Leith, Edinburgh EH6 7EL. 
Tel +44 131 475 7015 fax. +44 131 475 7001  ian@spider.com

From owner-tcp-impl@relay.engr.sgi.com  Fri Dec  5 00:31:27 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA09631 for tcp-impl-list; Fri, 5 Dec 1997 00:27:54 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA09626; Fri, 5 Dec 1997 00:27:51 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id AAA27824; Fri, 5 Dec 1997 00:27:50 -0800
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id AAA28917; Fri, 5 Dec 1997 00:27:50 -0800 (PST)
Message-Id: <199712050827.AAA28917@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: agenda for next Monday's TCPIMPL WG meeting
Cc: sca@refugee.engr.sgi.com, Allyn.Romanow@Eng.Sun.COM, agenda@ietf.org
Date: Fri, 05 Dec 1997 00:27:50 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Here's the proposed agenda for next Monday's meeting.  (Sorry to send it
out so late.)  We have a bunch of quick items, followed by a number of
presentations (and discussion) of increasing the initial congestion window 
to 2 or more segments.  Since we have a 2.5 hour slot, there's still
quite a bit of room for other items - send email ASAP if you have one.


1.  additions to Known Problems I-D: Vern Paxson (10 min)
2.  revisions to Testing Tools I-D: Steve Parker (5 min)
3.  porting Packet Shell to libpcap: Steve Parker (10 min)
4.  TIME_WAIT problems: Joe Touch (5 min)
5.  slow-start restart bug: Joe Touch (5 min)
6.  checksum document (5 min)
7.  call for volunteers (5 min)
8.  initial slow-start (50+ min)
	Kedar Poduri / Kathie Nichols
	Tim Shepard
	Mark Allman
	Sally Floyd

From owner-tcp-impl@relay.engr.sgi.com  Tue Dec  9 10:47:08 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA14559 for tcp-impl-list; Tue, 9 Dec 1997 10:39:39 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA14508 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Dec 1997 10:39:30 -0800
Received: from zero.aec.at (zero.aec.at [193.170.192.102]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id KAA00098
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 9 Dec 1997 10:39:17 -0800
	env-from (andi@zero.aec.at)
Received: (qmail 30558 invoked by uid 573); 9 Dec 1997 18:38:48 -0000
Message-ID: <19971209183848.30557.qmail@zero.aec.at>
To: tcp-impl@cthulhu.engr.sgi.com
cc: alan@lxorguk.ukuu.org.uk, netdev@nuclecu.unam.mx
Subject: Sending FINs into zero windows.
Date: Tue, 09 Dec 1997 19:38:48 +0100
From: Andi Kleen <andi@zero.aec.at>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hello,

I'm currently trying to fix a TCP communication problem between a 
HP jetdirect based printer and a Linux 2.0.30 box. It seems the 
application side of the linux box closes the connection, but it can't
send the FIN out because the printer announced a zero window.  This
causes Linux to send a RST on the next ACK from the printer, which 
causes the printer to abort the print job. 

Here is tcpdump output for illustration:

16:18:50.053007 dinf05.1067 > prigbc.9100: P 421889:422913(1024) ack 139 wi=
n 31744 (DF)
16:18:50.053007 dinf05.1067 > prigbc.9100: P 422913:423937(1024) ack 139 wi=
n 31744 (DF)
16:18:50.053007 dinf05.1067 > prigbc.9100: P 423937:424961(1024) ack 139 wi=
n 31744 (DF)
16:18:50.333007 prigbc.9100 > dinf05.1067: . ack 424961 win 720
16:18:51.003007 dinf05.1067 > prigbc.9100: P 424961:425681(720) ack 139 win=
 31744
16:18:51.023007 prigbc.9100 > dinf05.1067: . 139:162(23) ack 425681 win 0
16:18:51.023007 dinf05.1067 > prigbc.9100: R 819166584:819166584(0) win 0

I looked at the 4.4BSD source code and it seems to just send out
window probes with the FIN bit set in this situation. What is the
current wisdom to fix the problem?

-Andi


From owner-tcp-impl@relay.engr.sgi.com  Tue Dec 23 02:50:05 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA25372 for tcp-impl-list; Tue, 23 Dec 1997 01:59:43 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id BAA25367 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 23 Dec 1997 01:59:41 -0800
Received: from callisto.ftel.co.uk (callisto.ftel.co.uk [192.131.79.11]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id BAA24523
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 23 Dec 1997 01:59:34 -0800
	env-from (G.Cope@ftel.co.uk)
Received: from callisto.ftel.co.uk (kLnWUGCGcQMrArOEpMT7iu7mJC1PB80x@localhost.ftel.co.uk [127.0.0.1])
	by callisto.ftel.co.uk (8.8.7/8.8.7/Revision:1.32) with SMTP id JAA17238;
	Tue, 23 Dec 1997 09:57:51 GMT
Message-ID: <349F8B1E.6587@ftel.co.uk>
Date: Tue, 23 Dec 1997 09:57:50 +0000
From: Graham Cope <G.Cope@ftel.co.uk>
Organization: Fujitsu Telecommunications Europe Ltd
X-Mailer: Mozilla 3.01Gold (X11; I; SunOS 5.6 sun4u)
MIME-Version: 1.0
To: Vern Paxson <vern@ee.lbl.gov>
CC: tcp-impl@cthulhu.engr.sgi.com, Walley Gary <walleygm@btlip10.bt.co.uk>
Subject: Re: When to reset TCP's timer?
References: <199710092206.PAA23469@daffy.ee.lbl.gov>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I've noted this as an area to clarify as we tweak RFC 2001.


...I have another problem as I try to understand TCP implementations.


After a fast retransmit, should/is the RTT timer be cancelled and reset?
  Resetting it would avoid entering slow-start even though the lost
segment has already been sent.


I do believe that RFC2001 is in general weak in the area of interactions
between FR, FR and Slow-start.


As a general point I read with concern your paper on why we don't know
how to simulate the Internet. One point was the fact (already partially
known to me, as I follow this list), that a number of implemented
versions of TCP were different.
  This begs the question as to which is the 'correct' one?

Maybe the ns simulator code is the only 'correct' version.



Graham

From owner-tcp-impl@relay.engr.sgi.com  Tue Dec 23 09:53:50 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA10258 for tcp-impl-list; Tue, 23 Dec 1997 09:02:41 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA10251 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 23 Dec 1997 09:02:38 -0800
Received: from ell.ee.lbl.gov (ell.ee.lbl.gov [131.243.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA01877
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 23 Dec 1997 09:02:36 -0800
	env-from (kfall@ee.lbl.gov)
Received: by ell.ee.lbl.gov (8.8.8/8.8.5)
	id JAA13511; Tue, 23 Dec 1997 09:00:47 -0800 (PST)
From: kfall@ee.lbl.gov (Kevin Fall)
Message-Id: <199712231700.JAA13511@ell.ee.lbl.gov>
To: Graham Cope <G.Cope@ftel.co.uk>
cc: Vern Paxson <vern@ee.lbl.gov>, tcp-impl@cthulhu.engr.sgi.com,
        Walley Gary <walleygm@btlip10.bt.co.uk>
Subject: Re: When to reset TCP's timer?
In-reply-to: Your communique of Tue, 23 Dec 97 09:57:50 GMT.
             <349F8B1E.6587@ftel.co.uk>
Date: Tue, 23 Dec 97 09:00:47 PST
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 
> Maybe the ns simulator code is the only 'correct' version.

I have to say this makes me a bit nervous.  There are currently
2 major flavors of tcp in the simulator.  The first one (earlier one)
is more abstract, and models primarily the operation of congestion
control on a per-packet basis.  The newer one is closer to the BSD
implementation(s), providing bidirectional connections on a per-byte
basis and a richer state machine, but neither of them contains all
the details of a real tcp (e.g. things like RSTs, urgent pointers,
zero-window probes, and 2MSL wait are, at present, intentionally
absent from the ns version).

- K

From owner-tcp-impl@relay.engr.sgi.com  Tue Dec 23 09:55:42 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA11766 for tcp-impl-list; Tue, 23 Dec 1997 09:09:17 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA11755 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 23 Dec 1997 09:09:15 -0800
Received: from callisto.ftel.co.uk (callisto.ftel.co.uk [192.131.79.11]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA04361
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 23 Dec 1997 09:09:13 -0800
	env-from (G.Cope@ftel.co.uk)
Received: from callisto.ftel.co.uk (nIYBoYovyRHzwdZzYAfzK9JRhXWUdWXZ@localhost.ftel.co.uk [127.0.0.1])
	by callisto.ftel.co.uk (8.8.7/8.8.7/Revision:1.32) with SMTP id RAA26794;
	Tue, 23 Dec 1997 17:08:29 GMT
Message-ID: <349FF00D.417C@ftel.co.uk>
Date: Tue, 23 Dec 1997 17:08:29 +0000
From: Graham Cope <G.Cope@ftel.co.uk>
Organization: Fujitsu Telecommunications Europe Ltd
X-Mailer: Mozilla 3.01Gold (X11; I; SunOS 5.6 sun4u)
MIME-Version: 1.0
To: Kevin Fall <kfall@ee.lbl.gov>
CC: Vern Paxson <vern@ee.lbl.gov>, tcp-impl@cthulhu.engr.sgi.com,
        Archer N <N.Archer@ftel.co.uk>, McCulloch A <A.McCulloch@ftel.co.uk>,
        Walley Gary <walleygm@btlip10.bt.co.uk>
Subject: Re: When to reset TCP's timer?
References: <199712231700.JAA13511@ell.ee.lbl.gov>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Kevin Fall wrote:
> 
> >
> > Maybe the ns simulator code is the only 'correct' version.
> 
> I have to say this makes me a bit nervous.  There are currently
> 2 major flavors of tcp in the simulator.  The first one (earlier one)
> is more abstract, and models primarily the operation of congestion
> control on a per-packet basis.  The newer one is closer to the BSD
> implementation(s), providing bidirectional connections on a per-byte
> basis and a richer state machine, but neither of them contains all
> the details of a real tcp (e.g. things like RSTs, urgent pointers,
> zero-window probes, and 2MSL wait are, at present, intentionally
> absent from the ns version).
> 

In which case (latest?) BSD versions are the only 'correct' ones? In
which case, are RFCs unambiguously in line with it?

This still leaves with a fundamental problem. If I'm trying to design
broadband networks, and am trying to work out the interaction betweem
TCP and ATM, how can I have any certainty at all that the TCP based
applications will behave in anyway that I predict.
   ... again, refer to Vern Paxon's and Sally Floyds paper. 



> - K

From owner-tcp-impl@relay.engr.sgi.com  Tue Dec 23 10:49:23 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA23077 for tcp-impl-list; Tue, 23 Dec 1997 09:57:54 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA23068 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 23 Dec 1997 09:57:53 -0800
Received: from owl.ee.lbl.gov (owl.ee.lbl.gov [131.243.1.50]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA18819
	for <tcp-impl@relay.engr.sgi.com>; Tue, 23 Dec 1997 09:57:50 -0800
	env-from (floyd@ee.lbl.gov)
Received: by owl.ee.lbl.gov (8.8.8/8.8.5)
	id JAA25647; Tue, 23 Dec 1997 09:57:12 -0800 (PST)
Message-Id: <199712231757.JAA25647@owl.ee.lbl.gov>
To: Mohit Aron <aron@cs.rice.edu>
cc: tcp-impl@cthulhu.engr.sgi.com, end2end-interest@ISI.EDU
Subject: Re: detecting loss of retransmitted packets in TCP
Date: Tue, 23 Dec 1997 09:57:11 PST
From: Sally Floyd <floyd@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>	I have a question concerning TCP behaviour. Current TCP implementations
>resort to a timeout when a retransmitted packet is lost. 
...
>The sender can easily detect loss of a retransmitted segment by counting the
>number of duplicate ACKs received (if more than 9 are received, then the
>sender can assume that the retransmitted packet was also lost). 
...
>The purpose of a retransmission timeout - clearing packets in the
>network - can also be achieved by slow-start.

>I'd be glad to know opinions about this.

My own opinion in the past has been that it is best to be conservative
in the face of retransmitted packets that are themselves dropped.  This
issue has also come up in the context of SACK TCP.  With SACK TCP, it
is considerably easier to detect retransmitted packets that are
themselves dropped, and several SACK implementations eliminate the
retransmit timeout in this case.

As you indicate, the choices for reacting to a retransmitted packet
that is itself dropped include the following:
(1) waiting for a retransmit timeout;
(2) re-retransmitting one packet and slow-starting;
(3) re-retransmitting dropped packets and sending new packets, subject 
  to cutting the congestion window in half for each window of data in
  which one or more packets are dropped, and subject to the constraint
  the outgoing packets need to be clocked by incoming dup acks.

I don't know which of these three is best for the network, in a
scenario of sudden heavy congestion when all TCPs are using the same
algorithm.  Quite possibly it is not a critical issue one way or
another, though (2) and (3) represent rather different choices.  But in
general, I think it is good to explicitly think of TCP's retransmit
timers as effectively one of TCP's congestion control mechanisms, for
times of heavy congestion (e.g., when a retransmitted packet is itself
dropped?, or when the congestion window is too small for three dup acks
to be received after a packet loss), to understand how they affect TCP
dynamics in times of high congestion, and to understand which of these
effects we should be careful to retain.

- Sally



From owner-tcp-impl@relay.engr.sgi.com  Wed Dec 24 00:59:35 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA01980 for tcp-impl-list; Wed, 24 Dec 1997 00:16:54 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA01972 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 24 Dec 1997 00:16:51 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id AAA14984
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 24 Dec 1997 00:16:50 -0800
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id AAA00983; Wed, 24 Dec 1997 00:16:00 -0800 (PST)
Message-Id: <199712240816.AAA00983@daffy.ee.lbl.gov>
To: Graham Cope <G.Cope@ftel.co.uk>
Cc: Kevin Fall <kfall@ee.lbl.gov>, tcp-impl@cthulhu.engr.sgi.com,
        Archer N <N.Archer@ftel.co.uk>, McCulloch A <A.McCulloch@ftel.co.uk>,
        Walley Gary <walleygm@btlip10.bt.co.uk>
Subject: Re: When to reset TCP's timer?
In-reply-to: Your message of Tue, 23 Dec 1997 17:08:29 PST.
Date: Wed, 24 Dec 1997 00:16:00 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Kevin Fall wrote:
> > ... The newer one is closer to the BSD
> > implementation(s), providing bidirectional connections on a per-byte
> ...
> In which case (latest?) BSD versions are the only 'correct' ones?

I don't think Kevin was making that claim; just that the more realistic
TCP available with ns happens to be one that's close to the BSD lineage.

> In which case, are RFCs unambiguously in line with it?

When they aren't, then that's right up tcp-impl's alley, as an
implementation (or, much more rarely, a spec) problem.

One of the shortcomings of RFC 2001 is that it's defined in terms very
closely matching the BSD implementation.  This instead needs to be
widened somewhat, to accommodate other implementations that behave
in accord with TCP congestion principles, but perhaps not with
exactly how BSD TCPs implement them.  From a practical perspective,
these differences are generally minor; but worth allowing for, so that
other implementations are not unjustly held to be out of compliance.

> This still leaves with a fundamental problem. If I'm trying to design
> broadband networks, and am trying to work out the interaction betweem
> TCP and ATM, how can I have any certainty at all that the TCP based
> applications will behave in anyway that I predict.

Um, yes.  Well, you try to understand what general TCP mechanisms have
what general effects on your studies, and if you can show that minor
variations in these mechanisms don't have much effect in your analysis,
then you're okay.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Dec 24 00:59:37 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA01214 for tcp-impl-list; Wed, 24 Dec 1997 00:09:44 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA01154 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 24 Dec 1997 00:09:42 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id AAA14135
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 24 Dec 1997 00:09:41 -0800
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id AAA00966; Wed, 24 Dec 1997 00:08:59 -0800 (PST)
Message-Id: <199712240808.AAA00966@daffy.ee.lbl.gov>
To: Graham Cope <G.Cope@ftel.co.uk>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: When to reset TCP's timer?
In-reply-to: Your message of Tue, 23 Dec 1997 09:57:50 PST.
Date: Wed, 24 Dec 1997 00:08:59 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> After a fast retransmit, should/is the RTT timer be cancelled and reset?

Seems the answer is clearly yes.  The point of the timer is to keep track
of how much time has elapsed since the earliest un-acked segment was sent,
and since it was just sent, the timer should start over.

>   This begs the question as to which is the 'correct' one?

Well, maybe the answer to this is RFC 793/1122/etc (modulo cleaning them up
a bit, esp. 2001).  More on this in my next message.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Fri Dec 26 07:49:50 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA15428 for tcp-impl-list; Fri, 26 Dec 1997 07:00:01 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id GAA15417 for <tcp-impl@engr.sgi.com>; Fri, 26 Dec 1997 06:59:34 -0800
Received: from dkr.rapide-pana.com (dkr.rapide-pana.com [57.197.0.2]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id GAA29901
	for <tcp-impl@engr.sgi.com>; Fri, 26 Dec 1997 06:59:29 -0800
	env-from (mor@rapide-pana.com)
Received: from [207.50.230.58] by dkr.rapide-pana.com
  (SMTPD32-3.00) id A44F176200B2; Fri Dec 26 14:50:55 1997
Message-ID: <345208BF.3970@rapide-pana.com>
Date: Sat, 25 Oct 1997 14:57:03 +0000
From: Mor Ndiaye Mbaye <mor@rapide-pana.com>
X-Mailer: Mozilla 3.01-NSCP  (Win95; I)
MIME-Version: 1.0
To: tcp-impl@engr.sgi.com
Subject: S.O.S.
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hello,
I am a system ingeneer ic computer science and I prepare the Microsoft
Certified Professionnal Exams. Essentially I prepare certifications on
"Internetworking with Microsoft TCP/IP", "Network essentials", "Windows
95", and All modules of Microsoft Windows NT and I want documents in
these courses. Could you help me ? 
Thank you 
Soon!!!!!!!!!!


From owner-tcp-impl@relay.engr.sgi.com  Tue Dec 30 21:49:09 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA16207 for tcp-impl-list; Tue, 30 Dec 1997 21:46:30 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id VAA16202 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 30 Dec 1997 21:46:29 -0800
Received: from irp-view4.cisco.com (irp-view4.cisco.com [171.69.63.16]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id VAA20024
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 30 Dec 1997 21:46:28 -0800
	env-from (jenny@cisco.com)
Received: (jenny@localhost) by irp-view4.cisco.com (8.8.5-Cisco.2-SunOS.5.5.1.sun4/8.6.5) id VAA10257; Tue, 30 Dec 1997 21:45:53 -0800 (PST)
From: "Jenny Y. Yuan" <jenny@cisco.com>
Message-Id: <199712310545.VAA10257@irp-view4.cisco.com>
Subject: pure ACK with seq one more over receive window
To: tcp-impl@cthulhu.engr.sgi.com
Date: Tue, 30 Dec 1997 21:45:52 -0800 (PST)
Cc: jenny@cisco.com (Jenny Y. Yuan)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk



Hello all,

I have a question on what would be the right thing to do when a TCP 
receive window is full and next received pure ACK has a sequence number 
that's one byte over the right edge of the receive window.

Does the checking in RFC 793 only apply to segments with data:

  When data is received the following comparisons are needed:

    [... deleted...]

    RCV.NXT+RCV.WND-1 = last sequence number expected on an incoming
        segment, and is the right or upper edge of the receive window

Or it applies to pure ACK segments as well? In my case, the pure ACK
segment received had sequence number RCV.NXT+RCV.WND. 

Thanks very much,
Jenny Yuan

From owner-tcp-impl@relay.engr.sgi.com  Wed Dec 31 08:45:37 1997
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA15402 for tcp-impl-list; Wed, 31 Dec 1997 08:42:42 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA15397 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 31 Dec 1997 08:42:40 -0800
Received: from aland.bbn.com (aland.bbn.com [204.162.9.10]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA22026
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 31 Dec 1997 08:42:23 -0800
	env-from (craig@aland.bbn.com)
Received: from aland.bbn.com (localhost [127.0.0.1])
	by aland.bbn.com (8.8.7/8.8.7) with ESMTP id IAA17297;
	Wed, 31 Dec 1997 08:41:47 -0800 (PST)
	(envelope-from craig@aland.bbn.com)
Message-Id: <199712311641.IAA17297@aland.bbn.com>
To: "Jenny Y. Yuan" <jenny@cisco.com>
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: pure ACK with seq one more over receive window 
In-reply-to: Your message of "Tue, 30 Dec 1997 21:45:52 PST."
             <199712310545.VAA10257@irp-view4.cisco.com> 
Date: Wed, 31 Dec 1997 08:41:46 -0800
From: Craig Partridge <craig@aland.bbn.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <199712310545.VAA10257@irp-view4.cisco.com>, "Jenny Y. Yuan" writes:

I think a little more context is needed here.

Which end's window is full and which end is receiving the pure ACK?

The RFC 793 checking applies to all segments, but the interpretation of
what might be wrong here depends on detailed context.

Thanks!

Craig

From owner-tcp-impl@relay.engr.sgi.com  Fri Jan  2 01:35:51 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA18512 for tcp-impl-list; Fri, 2 Jan 1998 01:33:12 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id BAA18507 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 2 Jan 1998 01:33:10 -0800
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id BAA26949
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 2 Jan 1998 01:31:42 -0800
	env-from (chetan@protocol.ece.iisc.ernet.in)
Received: from ece.iisc.ernet.in by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id OAA19655; Fri, 2 Jan 1998 14:56:01 +0530
Received: from protocol.ece.iisc.ernet.in by ece.iisc.ernet.in (4.1/SMI-4.1)
	id AA19845; Fri, 2 Jan 98 14:55:58+0530
Received: from localhost (chetan@localhost)
	by protocol.ece.iisc.ernet.in (8.8.5/8.8.5) with SMTP id OAA22196
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 2 Jan 1998 14:59:07 GMT
Date: Fri, 2 Jan 1998 14:59:07 +0000 (GMT)
From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Is there a revised rfc2001 ? 
Message-Id: <Pine.LNX.3.95.980102145504.21706A-100000@protocol.ece.iisc.ernet.in>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

 
Greetings !!

I think long time back Vern started preparing revised version of rfc2001,
is it ready now ? Is it available on the net ?



chetan . S

E-mail chetan@protocol.ece.iisc.ernet.in

WEB PAGE - http://pclab.ece.iisc.ernet.in/chetan

Phone 
	work place
		(080)3092282
	res.
		(080)3349218      
		(080)3347220


From owner-tcp-impl@relay.engr.sgi.com  Tue Jan  6 09:44:10 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA28950 for tcp-impl-list; Tue, 6 Jan 1998 09:37:35 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA28929 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 6 Jan 1998 09:37:33 -0800
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA15269
	for <tcp-impl@relay.engr.sgi.com>; Tue, 6 Jan 1998 09:37:31 -0800
	env-from (curtis@brookfield.ans.net)
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1])
	by brookfield.ans.net (8.8.5/8.8.5) with ESMTP id MAA11137;
	Tue, 6 Jan 1998 12:37:16 -0500 (EST)
Message-Id: <199801061737.MAA11137@brookfield.ans.net>
To: Sally Floyd <floyd@ee.lbl.gov>
cc: Mohit Aron <aron@cs.rice.edu>, tcp-impl@cthulhu.engr.sgi.com,
        end2end-interest@ISI.EDU
Reply-To: curtis@ans.net
Subject: Re: detecting loss of retransmitted packets in TCP 
In-reply-to: Your message of "Tue, 23 Dec 1997 09:57:11 PST."
             <199712231757.JAA25647@owl.ee.lbl.gov> 
Date: Tue, 06 Jan 1998 12:37:15 -0500
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <199712231757.JAA25647@owl.ee.lbl.gov>, Sally Floyd writes:
> 
> As you indicate, the choices for reacting to a retransmitted packet
> that is itself dropped include the following:
> (1) waiting for a retransmit timeout;
> (2) re-retransmitting one packet and slow-starting;
> (3) re-retransmitting dropped packets and sending new packets, subject 
>   to cutting the congestion window in half for each window of data in
>   which one or more packets are dropped, and subject to the constraint
>   the outgoing packets need to be clocked by incoming dup acks.
> 
> I don't know which of these three is best for the network, in a


If you do anything but 3, the backoff can be dramatic enough that
there is a period of link underutilization with FIFO and drop tail.

The idea of TCP (IMHO) is to maximize goodput without being so
aggressive as to risk congestion collapse and without causing
excessive delay.  (Where excessive is any more delay than is needed to
keep the link full - note that delay is a secondary consideration.)

If you lower delay or delay variation you go to SFQ and use only your
fair share (not using TCP) or go with WFQ, CBQ, or some other more
sophisticated than FIFO scheme and make a reservation and use more
than you fair share.

This doesn't mean that 1) or 2) is wrong, it just that it is currently
thought that it won't yield quite as high aggregate goodputs.

This would probably be a moot point if RED were widely deployed and
drops were spaced out by more than an RTT rather than bunched.  In
that case 3) just gets hosts past that messy time when routers don't
behave quite as well as we'd like them to despite our best efforts at
arm twisting the router (and rack modem, terminal server, etc) vendors.

Curtis

From owner-tcp-impl@relay.engr.sgi.com  Tue Jan  6 19:52:06 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id TAA01628 for tcp-impl-list; Tue, 6 Jan 1998 19:49:20 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id TAA01623 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 6 Jan 1998 19:49:18 -0800
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id TAA07931
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 6 Jan 1998 19:47:20 -0800
	env-from (chetan@protocol.ece.iisc.ernet.in)
Received: from ece.iisc.ernet.in by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id MAA26040; Tue, 6 Jan 1998 12:01:20 +0530
Received: from protocol.ece.iisc.ernet.in by ece.iisc.ernet.in (4.1/SMI-4.1)
	id AA00345; Tue, 6 Jan 98 10:45:11+0530
Received: from localhost (chetan@localhost)
	by protocol.ece.iisc.ernet.in (8.8.5/8.8.5) with SMTP id KAA02304
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 6 Jan 1998 10:48:28 GMT
Date: Tue, 6 Jan 1998 10:48:27 +0000 (GMT)
From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Documentations for Delayed Ack  
Message-Id: <Pine.LNX.3.95.980106104746.2275A-100000@protocol.ece.iisc.ernet.in>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk



Hello !

I would like to know where I can get some documentations on delayed ack 
algorthim.

with thanks
chetan . S

E-mail chetan@protocol.ece.iisc.ernet.in

WEB PAGE - http://pclab.ece.iisc.ernet.in/chetan

Phone 
	work place
		(080)3092282
	res.
		(080)3349218      
		(080)3347220





From owner-tcp-impl@relay.engr.sgi.com  Tue Jan  6 21:50:43 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id VAA15180 for tcp-impl-list; Tue, 6 Jan 1998 21:48:52 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id VAA14287 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 6 Jan 1998 21:45:06 -0800
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id VAA18585
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 6 Jan 1998 21:43:55 -0800
	env-from (chetan@protocol.ece.iisc.ernet.in)
Received: from ece.iisc.ernet.in by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id MAA26026; Tue, 6 Jan 1998 12:01:16 +0530
Received: from protocol.ece.iisc.ernet.in by ece.iisc.ernet.in (4.1/SMI-4.1)
	id AA29718; Tue, 6 Jan 98 10:34:24+0530
Received: from localhost (chetan@localhost)
	by protocol.ece.iisc.ernet.in (8.8.5/8.8.5) with SMTP id KAA01880
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 6 Jan 1998 10:37:26 GMT
Date: Tue, 6 Jan 1998 10:37:25 +0000 (GMT)
From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Is there a revised rfc2001 ?  
Message-Id: <Pine.LNX.3.95.980106103553.1498C-100000@protocol.ece.iisc.ernet.in>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

 
Greetings !!

I think long time back Vern started preparing revised version of rfc2001,
is it ready now ? Is it available on the net ?



chetan . S

E-mail chetan@protocol.ece.iisc.ernet.in

WEB PAGE - http://pclab.ece.iisc.ernet.in/chetan

Phone 
	work place
		(080)3092282
	res.
		(080)3349218      
		(080)3347220



From owner-tcp-impl@relay.engr.sgi.com  Wed Jan  7 14:12:27 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA00855 for tcp-impl-list; Wed, 7 Jan 1998 14:10:45 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA28520 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 7 Jan 1998 14:05:06 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA25873
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 7 Jan 1998 14:05:06 -0800
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id OAA18332; Wed, 7 Jan 1998 14:05:00 -0800 (PST)
Message-Id: <199801072205.OAA18332@daffy.ee.lbl.gov>
To: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Is there a revised rfc2001 ? 
In-reply-to: Your message of Tue, 06 Jan 1998 10:37:25 PST.
Date: Wed, 07 Jan 1998 14:05:00 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I think long time back Vern started preparing revised version of rfc2001,
> is it ready now ? Is it available on the net ?

Active revision of the text hasn't begun yet.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Jan  7 15:22:49 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA24254 for tcp-impl-list; Wed, 7 Jan 1998 15:19:14 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA24221 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 7 Jan 1998 15:19:09 -0800
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA21104
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 7 Jan 1998 15:16:36 -0800
	env-from (chetan@protocol.ece.iisc.ernet.in)
Received: from ece.iisc.ernet.in by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id PAA11344; Wed, 7 Jan 1998 15:02:11 +0530
Received: from protocol.ece.iisc.ernet.in by ece.iisc.ernet.in (4.1/SMI-4.1)
	id AA04274; Wed, 7 Jan 98 13:48:21+0530
Received: from localhost (chetan@localhost)
	by protocol.ece.iisc.ernet.in (8.8.5/8.8.5) with SMTP id NAA27974;
	Wed, 7 Jan 1998 13:51:39 GMT
Date: Wed, 7 Jan 1998 13:51:39 +0000 (GMT)
From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
Reply-To: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
To: linux-net@vger.rutgers.edu
Cc: tcp-impl@cthulhu.engr.sgi.com, alan@cymru.net,
        "K.N.S.Reddy" <reddy@protocol.ece.iisc.ernet.in>
Subject: Problem on TCP implementation on Linux 2.0.30
Message-Id: <Pine.LNX.3.95.980107131800.27481A-100000@protocol.ece.iisc.ernet.in>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

  
Greetings !
 
	I am running redhat Linux 4,2 kernel 2.0.30, I have done series of
experiments on my local lan with Linux hosts ( all hosts are 2.0.30),

I found the following results ( very briefly)

A->B

A & B are Ethernet host and are on the same net. The transmission is good I
do not find any
packets retransmitted { I use tcpdump to snoop the packets on the sender
host and use Tcptrace by Shawn Ostermann to analyze the results}

A->B->C

Where A & C are on the different network and B is the gateway { though all
of them are connected to a common hub I use subnet-masking to get two
network}, all are Ethernet hosts In this case I see that there is around
100% retransmission ( almost every packet is retransmitted) and of course
the TCP throughput reduces. 

A->B->C->D

In this case A and D are wireless hosts (2 MBPS MTU 1500) and B and C are
gateways. B and C are connected through Ethernet ( 10 MBPS), both these
host are on my local lan. In this case I see there is around 120 to 130 %
retransmission and the trough put decrease accordingly. 

In all the above experiments the medium was found to be error free and
packet loss were minimal OR almost nil

These experiments resulted in conclusion that in Linux if the two end host
are on different network the TCP do not behave normally. Making many
unnecessary retransmission and under-utilizing the link capacity. 

As any one looked in to this case, I am trying to isolate the problem and
make a proper analysis. If any one has to say any thing on this I will
accept with thanks.

please send comments to my personel e-mail since I am not on the list

Thanks for any comments
chetan . S

E-mail chetan@protocol.ece.iisc.ernet.in

WEB PAGE - http://pclab.ece.iisc.ernet.in/chetan

Phone 
	work place
		(080)3092282
	res.
		(080)3349218      
		(080)3347220




From owner-tcp-impl@relay.engr.sgi.com  Wed Jan  7 15:22:57 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA22593 for tcp-impl-list; Wed, 7 Jan 1998 15:14:15 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA22537 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 7 Jan 1998 15:14:04 -0800
Received: from moe.rice.edu (moe.rice.edu [128.42.5.4]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA20158
	for <tcp-impl@relay.engr.sgi.com>; Wed, 7 Jan 1998 15:14:02 -0800
	env-from (aron@cs.rice.edu)
From: aron@cs.rice.edu
Received: from noel.cs.rice.edu (noel.cs.rice.edu [128.42.1.136])
	by moe.rice.edu (8.8.5/8.8.5) with ESMTP id RAA19182
	for <tcp-impl@relay.engr.sgi.com>; Wed, 7 Jan 1998 17:14:01 -0600 (CST)
Received: (from aron@localhost) by noel.cs.rice.edu (8.8.5/8.7.5) id RAA22312 for tcp-impl@relay.engr.sgi.com; Wed, 7 Jan 1998 17:10:27 -0600 (CST)
Message-Id: <199801072310.RAA22312@noel.cs.rice.edu>
Subject: Clock granularity
To: tcp-impl@cthulhu.engr.sgi.com (TCP Implementor's List)
Date: Wed, 7 Jan 1998 17:09:57 -0600 (CST)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hi,
	does any commercial Operating System use a finer clock granularity than
500ms for scheduling the retransmission timeouts in TCP ? Thanks,



- Mohit

From owner-tcp-impl@relay.engr.sgi.com  Wed Jan  7 15:47:13 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA01644 for tcp-impl-list; Wed, 7 Jan 1998 15:39:30 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA01630 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 7 Jan 1998 15:39:28 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA27690
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 7 Jan 1998 15:39:26 -0800
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (the-village.bc.nu [163.164.160.21]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id XAA10477; Wed, 7 Jan 1998 23:39:22 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0xq5PP-0005FsC; Thu, 8 Jan 98 00:01 GMT
Message-Id: <m0xq5PP-0005FsC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Clock granularity
To: aron@cs.rice.edu
Date: Thu, 8 Jan 1998 00:01:14 +0000 (GMT)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199801072310.RAA22312@noel.cs.rice.edu> from "aron@cs.rice.edu" at Jan 7, 98 05:09:57 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 	does any commercial Operating System use a finer clock granularity than
> 500ms for scheduling the retransmission timeouts in TCP ? Thanks,

Linux uses 100Hz granularity simply because thats the natural timer granularity
we have. We clamp our rtt's to never drop under .1 of a second to cope with
the 500ms problem other stacks have. (Note we have it too in effect as a 
10ms limitation). 

Alan


From owner-tcp-impl@relay.engr.sgi.com  Wed Jan  7 15:47:18 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA02255 for tcp-impl-list; Wed, 7 Jan 1998 15:41:56 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA02248 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 7 Jan 1998 15:41:55 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA28480
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 7 Jan 1998 15:41:53 -0800
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (the-village.bc.nu [163.164.160.21]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id XAA10466; Wed, 7 Jan 1998 23:37:14 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0xq5NL-0005FsC; Wed, 7 Jan 98 23:59 GMT
Message-Id: <m0xq5NL-0005FsC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Problem on TCP implementation on Linux 2.0.30
To: chetan@protocol.ece.iisc.ernet.in
Date: Wed, 7 Jan 1998 23:59:07 +0000 (GMT)
Cc: linux-net@vger.rutgers.edu, tcp-impl@cthulhu.engr.sgi.com, alan@cymru.net,
        reddy@protocol.ece.iisc.ernet.in
In-Reply-To: <Pine.LNX.3.95.980107131800.27481A-100000@protocol.ece.iisc.ernet.in> from "Chetan Kumar" at Jan 7, 98 01:51:39 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> A->B->C
> 
> Where A & C are on the different network and B is the gateway { though all
> of them are connected to a common hub I use subnet-masking to get two
> network}, all are Ethernet hosts In this case I see that there is around
> 100% retransmission ( almost every packet is retransmitted) and of course
> the TCP throughput reduces. 

What is the collision rate on the lan at this point, and have you allowed
for the fact your snooper will hear the packets twice and fool the analyser
if you arent careful

> These experiments resulted in conclusion that in Linux if the two end host
> are on different network the TCP do not behave normally. Making many
> unnecessary retransmission and under-utilizing the link capacity. 

That doesnt fit with other traces, but if you've got a case this one will
be interesting to find out why you see problems. If you are seeing the 
packets twice then the radio one would still seem a bit high.

Alan

From owner-tcp-impl@relay.engr.sgi.com  Wed Jan  7 16:31:12 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA15225 for tcp-impl-list; Wed, 7 Jan 1998 16:25:23 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA15213 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 7 Jan 1998 16:25:21 -0800
Received: from ntrlink.hq.interlink.com (ntrlink.hq.interlink.com [138.42.128.44]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA11443
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 7 Jan 1998 16:25:18 -0800
	env-from (fab@md.interlink.com)
Received: from fab.md.interlink.com by ntrlink.hq.interlink.com (8.8.5/SMI-SVR4)
	id QAA13965; Wed, 7 Jan 1998 16:31:47 -0800 (PST)
Received: by fab.md.interlink.com (SMI-8.6/SMI-SVR4)
	id TAA26224; Wed, 7 Jan 1998 19:27:29 -0500
Date: Wed, 7 Jan 1998 19:27:29 -0500
From: Fred Bohle  <fab@md.interlink.com>
Message-Id: <199801080027.TAA26224@fab.md.interlink.com>
To: aron@cs.rice.edu, alan@lxorguk.ukuu.org.uk
Subject: Re: Clock granularity
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk



> > 	does any commercial Operating System use a finer clock granularity than
> > 500ms for scheduling the retransmission timeouts in TCP ? Thanks,

Our next release of MVS TCPAccess (5.2) will have a configurable clock granularity,
from 10ms to 1 sec., with a recommended value of 200ms.  Also configurable is the
retransmit minimum time, with a recommended value of 500ms.


Fred
------------------------------------------------------------------------
Fred Bohle			EMAIL: fab@interlink.com
Interlink Computer Sciences	AT&T : 410-992-7750 x314
9250 Rumsey Road, Suite 200     Home : 410-643-6720
Columbia, MD 21045-1946         WWW  : www.interlink.com
------------------------------------------------------------------------

From owner-tcp-impl@relay.engr.sgi.com  Wed Jan  7 17:00:10 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA24600 for tcp-impl-list; Wed, 7 Jan 1998 16:53:38 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA24588 for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 7 Jan 1998 16:53:37 -0800
Received: from alcor.process.com (alcor.process.com [192.42.95.16]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id QAA19108
	for <TCP-IMPL@CTHULHU.ENGR.SGI.COM>; Wed, 7 Jan 1998 16:53:36 -0800
	env-from (VOLZ@PROCESS.COM)
Date:     Wed, 7 Jan 1998 19:53 -0400
From: VOLZ@PROCESS.COM (Bernie Volz)
Message-Id: <009BFF2E63735C16.6429@PROCESS.COM>
To: aron@cs.rice.edu, TCP-IMPL@cthulhu.engr.sgi.com
Subject:  RE: Clock granularity
X-VMS-To: SMTP%"aron@cs.rice.edu"
X-VMS-Cc: TCP-IMPL@CTHULHU.ENGR.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>Hi,
>	does any commercial Operating System use a finer clock granularity than
>500ms for scheduling the retransmission timeouts in TCP ? Thanks,

TCPware for OpenVMS uses 200ms for scheduling retransmissions (clock
grandularity) but ships with a default minimum retransmission interval
of 600ms (it is settable).

- Bernie Volz
  Process Software

From owner-tcp-impl@relay.engr.sgi.com  Thu Jan  8 05:03:02 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id EAA18382 for tcp-impl-list; Thu, 8 Jan 1998 04:53:01 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id EAA18372 for <tcp-impl@engr.sgi.com>; Thu, 8 Jan 1998 04:52:58 -0800
Received: from mail.computextos.com.pe (computextos.com.pe [200.10.66.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id EAA07039
	for <tcp-impl@engr.sgi.com>; Thu, 8 Jan 1998 04:52:55 -0800
	env-from (PRONAMACHCS@computextos.com.pe)
Received: from computextos.com.pe ([200.37.46.23])
          by mail.computextos.com.pe (Post.Office MTA v3.1 release PO203a
          ID# 0-38109U2500L250S0) with ESMTP id AAA169
          for <tcp-impl@engr.sgi.com>; Thu, 8 Jan 1998 07:51:48 -0500
Message-ID: <34B4DB0D.52040661@computextos.com.pe>
Date: Thu, 08 Jan 1998 07:56:29 -0600
From: PRONAMACHCS@computextos.com.pe (PRONAMACHCS)
Organization: pronamachcs
X-Mailer: Mozilla 4.03 [en] (Win95; I)
MIME-Version: 1.0
To: tcp-impl@engr.sgi.com
Subject: NECESITO AYUDA
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

hola a todos, soy una alumna de la universidad nacional de Cajamarca, en
perú


From owner-tcp-impl@relay.engr.sgi.com  Thu Jan  8 07:27:44 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA07165 for tcp-impl-list; Thu, 8 Jan 1998 07:17:35 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA07150 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 8 Jan 1998 07:17:33 -0800
Received: from picard.cs.ohiou.edu (picard.cs.ohiou.edu [132.235.3.128]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA08514
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 8 Jan 1998 07:17:31 -0800
	env-from (sdo@picard.cs.ohiou.edu)
Received: from picard.cs.ohiou.edu by picard.cs.ohiou.edu (8.8.5/1.930630)
	id PAA24912; Thu, 8 Jan 1998 15:16:39 GMT
Message-Id: <199801081516.PAA24912@picard.cs.ohiou.edu>
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
From: "Shawn Ostermann" <sdo@picard.cs.OhioU.Edu>
cc: chetan@protocol.ece.iisc.ernet.in, linux-net@vger.rutgers.edu,
        tcp-impl@cthulhu.engr.sgi.com, alan@cymru.net,
        reddy@protocol.ece.iisc.ernet.in
Subject: Re: Problem on TCP implementation on Linux 2.0.30 
Date: Thu, 08 Jan 1998 10:16:39 -0500
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk



> > A->B->C
> > 
> > Where A & C are on the different network and B is the gateway { though all
> > of them are connected to a common hub I use subnet-masking to get two
> > network}, all are Ethernet hosts In this case I see that there is around
> > 100% retransmission ( almost every packet is retransmitted) and of course
> > the TCP throughput reduces. 
> 
> What is the collision rate on the lan at this point, and have you allowed
> for the fact your snooper will hear the packets twice and fool the analyser
> if you arent careful

Good point Alan.  I've asked him for a copy of this trace file to look
at (perhaps I can get the analyzer to be smarter about this).  If your
assumption is correct (and depending on the capacity of the hub), it
could be that the mere presence of twice as many segments causes the
TCP throughput reduction too.

Shawn
-------------------------------------------------------------------------
   Dr. Shawn Ostermann  -  Assistant Professor  -  Ohio University
      140 Morton Hall, Ohio University, Athens, Ohio  45701-2979
 ostermann@cs.ohiou.edu -- FAX: (740)593-0406 -- Voice: (740)593-1242
    http://ace.cs.ohiou.edu/~osterman   http://jarok.cs.ohiou.edu


From owner-tcp-impl@relay.engr.sgi.com  Thu Jan  8 17:41:35 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA21095 for tcp-impl-list; Thu, 8 Jan 1998 17:26:44 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from momserv.denver.sgi.com (momserv.denver.sgi.com [169.238.64.2]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA21090 for <tcp-impl@engr.sgi.com>; Thu, 8 Jan 1998 17:26:42 -0800
Received: from okoboji.denver.sgi.com by momserv.denver.sgi.com via ESMTP (951211.SGI.8.6.12.PATCH1502/951211.SGI.AUTO)
	for <@momserv.denver.sgi.com:tcp-impl@engr.sgi.com> id SAA28971; Thu, 8 Jan 1998 18:26:41 -0700
Received: (from beck@localhost) by okoboji.denver.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id SAA21327 for tcp-impl@engr.sgi.com; Thu, 8 Jan 1998 18:26:40 -0700
From: "Fred R. Beck" <beck@okoboji.denver.sgi.com>
Message-Id: <9801081826.ZM21325@okoboji.denver.sgi.com>
Date: Thu, 8 Jan 1998 18:26:40 -0700
X-Mailer: Z-Mail (3.2.3 08feb96 MediaMail)
To: tcp-impl@engr.sgi.com
Subject: TCP Question
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Networking Gurus,

Does anyone know if we do, or will support "Selective Acknowledgment"
(or SACK) [RFC 2018] in our TCP stack?

An enquiring customer wants to know.

Thanx,

	-fred

-- 

========================================================================
Fred R. Beck
Government Systems - Systems Engineer
Silcon Graphics, Inc.           (303) 796-0022
5975 S. Quebec St.  #220        (303) 796-0438 (FAX)
Englewood, Colorado  80111      beck@denver.sgi.com

          "As my pappy always said, 'Only fools are sure!'"
========================================================================

From owner-tcp-impl@relay.engr.sgi.com  Thu Jan  8 18:02:04 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA28106 for tcp-impl-list; Thu, 8 Jan 1998 17:48:37 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA28101 for <tcp-impl@engr.sgi.com>; Thu, 8 Jan 1998 17:48:35 -0800
Received: from lintjr.cisco.com (lintjr.cisco.com [171.68.10.78]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA27547
	for <tcp-impl@engr.sgi.com>; Thu, 8 Jan 1998 17:48:34 -0800
	env-from (ferguson@cisco.com)
Received: from big-dawgs.cisco.com (herndon-dhcp-53.cisco.com [171.68.53.53]) by lintjr.cisco.com (8.8.5/CISCO.SERVER.1.2) with SMTP id RAA19514 for <tcp-impl@engr.sgi.com>; Thu, 8 Jan 1998 17:48:32 -0800 (PST)
Message-Id: <3.0.5.32.19980108204831.00837180@lint.cisco.com>
X-Sender: pferguso@lint.cisco.com
X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.5 (32)
Date: Thu, 08 Jan 1998 20:48:31 -0500
To: tcp-impl@engr.sgi.com
From: Paul Ferguson <ferguson@cisco.com>
Subject: Re: TCP Question
In-Reply-To: <9801081826.ZM21325@okoboji.denver.sgi.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Gosh, it's nice to know that our SE's aren't the only ones
who mistakenly send messages to an IETF mailing list.  ;-)

- paul

At 06:26 PM 1/8/98 -0700, Fred R. Beck wrote:

>Networking Gurus,
>
>Does anyone know if we do, or will support "Selective Acknowledgment"
>(or SACK) [RFC 2018] in our TCP stack?
>
>An enquiring customer wants to know.
>
>Thanx,
>
>	-fred
>
>-- 
>
>========================================================================
>Fred R. Beck
>Government Systems - Systems Engineer
>Silcon Graphics, Inc.           (303) 796-0022
>5975 S. Quebec St.  #220        (303) 796-0438 (FAX)
>Englewood, Colorado  80111      beck@denver.sgi.com
>
>          "As my pappy always said, 'Only fools are sure!'"
>========================================================================
>
>

From owner-tcp-impl@relay.engr.sgi.com  Fri Jan  9 07:29:41 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA08514 for tcp-impl-list; Fri, 9 Jan 1998 07:17:44 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA08503 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 9 Jan 1998 07:17:42 -0800
Received: from postoffice.Reston.mci.net (postoffice.Reston.mci.net [204.70.128.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA05840
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 9 Jan 1998 07:17:41 -0800
	env-from (gmiller@mci.net)
Received: from mci.net (ale [166.45.4.49])
	by postoffice.Reston.mci.net (8.8.5/8.8.5) with ESMTP id KAA16171
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 9 Jan 1998 10:17:35 -0500 (EST)
Message-Id: <199801091517.KAA16171@postoffice.Reston.mci.net>
X-Mailer: exmh version 1.6.9 8/22/96
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP Question 
In-reply-to: Your message of "Thu, 08 Jan 1998 20:48:31 EST."
             <3.0.5.32.19980108204831.00837180@lint.cisco.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Fri, 09 Jan 1998 10:17:34 -0500
From: Greg Miller <gmiller@mci.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Now that the question has been asked though... :-)

there are probably folks on this list (including me) who'd like to know the 
answer. My understanding is that SACK is not supported as of 6.2. Anyone know 
the schedule for SACK in IRIX?

Greg

>Gosh, it's nice to know that our SE's aren't the only ones
>who mistakenly send messages to an IETF mailing list.  ;-)
>
>- paul
>
>At 06:26 PM 1/8/98 -0700, Fred R. Beck wrote:
>
>>Networking Gurus,
>>
>>Does anyone know if we do, or will support "Selective Acknowledgment"
>>(or SACK) [RFC 2018] in our TCP stack?
>>
>>An enquiring customer wants to know.
>>
>>Thanx,
>>
>>	-fred
>>
>>-- 
>>
>>========================================================================
>>Fred R. Beck
>>Government Systems - Systems Engineer
>>Silcon Graphics, Inc.           (303) 796-0022
>>5975 S. Quebec St.  #220        (303) 796-0438 (FAX)
>>Englewood, Colorado  80111      beck@denver.sgi.com
>>
>>          "As my pappy always said, 'Only fools are sure!'"
>>========================================================================
>>
>>



From owner-tcp-impl@relay.engr.sgi.com  Fri Jan  9 10:27:44 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA17927 for tcp-impl-list; Fri, 9 Jan 1998 10:15:13 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA17858 for <tcp-impl@engr.sgi.com>; Fri, 9 Jan 1998 10:15:04 -0800
Received: from hplms26.hpl.hp.com (hplms26.hpl.hp.com [15.255.168.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA00608
	for <tcp-impl@engr.sgi.com>; Fri, 9 Jan 1998 10:15:03 -0800
	env-from (davidm@napali.hpl.hp.com)
Received: from hplms2.hpl.hp.com (hplms2.hpl.hp.com [15.0.152.33])
	by hplms26.hpl.hp.com (8.8.6/8.8.6 HPLabs Relay) with ESMTP id KAA14694;
	Fri, 9 Jan 1998 10:15:05 -0800 (PST)
Received: from napali.hpl.hp.com (davidm@napali.hpl.hp.com [15.4.89.123])
	by hplms2.hpl.hp.com (8.8.6/8.8.6 HPLabs Hub) with ESMTP id KAA20248;
	Fri, 9 Jan 1998 10:14:58 -0800 (PST)
Received: (from davidm@localhost)
	by napali.hpl.hp.com (8.8.7/8.8.7) id KAA03091;
	Fri, 9 Jan 1998 10:14:56 -0800
Date: Fri, 9 Jan 1998 10:14:56 -0800
Message-Id: <199801091814.KAA03091@napali.hpl.hp.com>
From: David Mosberger <davidm@hpl.hp.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
To: tcp-impl@engr.sgi.com
Subject: discrepancy in TIME_WAIT state handling
X-Mailer: VM 6.33 under Emacs 20.2.1
Reply-To: davidm@hpl.hp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Yesterday I found a serious performance problem that may occur when
connecting a client on a box running HP-UX 10.20 to a server running
NT 4.0 (w/service pack 3 installed).  It turns out that the TCP in NT
4.0 appears to handle a special case in the TIME_WAIT state
differently from any BSD-derived TCP I have seen so far.

Let me first recap the behavior of BSD-derived TCPs: when a connection
is in TIME_WAIT state and a SYN segment is received with a sequence
number that is higher than the last received sequence number for the
connection in TIME_WAIT state, then the existing connection is dropped
immediately and the SYN is considered to be part of a new TCP
connection.

The behavior of TCP as implemented in NT 4.0 SP3 (and possibly other
versions of NT and Windows 95) appears to be different: when a new SYN
segment is received for a connection in TIME_WAIT state, NT
essentially ignores the SYN and simply sends back the last ACK for the
existing connection.

Both the BSD and the NT behavior are fine as long as they're not
mixed.  However, when trying to talk from a BSD-derived box to an NT
box, serious performance problems may occur if the BSD box gets
unlucky and happens to reuse a port number within the TIME_WAIT period
of the NT box.  The tcpdump trace below illustrates the problem (the
output has been edited for easier reading; the original trace is
available by request):

------------------------------------------------------
first connection:

42.55 hpux.1025 > nt40.80: S 192192000:192192000(0) win 16384
42.55 nt40.80 > hpux.1025: S 251663359:251663359(0) ack 192192001 win 8760
42.55 hpux.1025 > nt40.80: . ack 251663360 win 16384 (DF)
42.55 hpux.1025 > nt40.80: P 192192001:192192067(66) ack 251663360 win 16384
42.56 nt40.80 > hpux.1025: P 251663360:251664583(1223) ack 192192067 win 8694
42.56 nt40.80 > hpux.1025: F 251664583:251664583(0) ack 192192067 win 8694
42.56 hpux.1025 > nt40.80: . ack 251664584 win 15161 (DF)
42.56 hpux.1025 > nt40.80: F 192192067:192192067(0) ack 251664584 win 16384
42.56 nt40.80 > hpux.1025: . ack 192192068 win 8694 (DF)

second connection:

42.57 hpux.1025 > nt40.80: S 192256000:192256000(0) win 16384 <mss 1460> (DF)
42.57 nt40.80 > hpux.1025: . ack 192192068 win 8694 (DF)
42.57 hpux.1025 > nt40.80: R 192192068:192192068(0) win 16384
------------------------------------------------------

As the trace shows, a TCP connection exist between 42.55 seconds and
42.56 seconds between ports 1025 and 80.  At the end of this
connection, the NT 4.0 box (nt40) is in TIME_WAIT state as the FINs in
both directions have been sent and acknowledged.  Then, at 42.57
seconds, the hpux box attempts to create a new connection with the old
port numbers (since BSD-derived TCP implementations consider this all
right).  nt40 responds to this SYN packet by re-sending the last ACK
for the first connection.  The hpux box in turn responds with a RESET
since it was expecting a SYN segment from the nt40 box.  This
SYN/ACK/RESET game repeats itself until the nt40 box moves the first
connection out of the TIME_WAIT state, at which point the connection
establishment proceeds normally.

To summarize, the apparent discrepancy between the TIME_WAIT state
handling in BSD-derived TCPs and NT 4.0 TCP may result in serious and
hard to detect performance degradation because TCP connections may be
delayed for up to the duration of the TIME_WAIT period (i.e., 1-2
minutes for most systems).  Note that this scenario is not academic: a
busy client that has most of its TCP port numbers in use could quite
easily run into this problem.  The problem is hard to detect because
it occurs only intermittently (when the BSD-derived box gets
"unlucky") and because the connection establishment only gets delayed
(albeit by a fairly large amount of time).

I'm interested to hear other people's opinion on what the correct
behavior for this case is and suggestions on how to fix the current
situation.

I'd like to thank Vern Paxon and Rick Jones for some initial
discussions on this problem.

	--david
-- 
David Mosberger; HP Labs; 1501 Page Mill Rd MS 1U17; Palo Alto, CA 94304-1126
davidm@hpl.hp.com          voice: (650) 236-2575          fax: (650) 857-5100

From owner-tcp-impl@relay.engr.sgi.com  Fri Jan  9 13:30:18 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA14341 for tcp-impl-list; Fri, 9 Jan 1998 13:19:11 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA14321 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 9 Jan 1998 13:19:08 -0800
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA28974
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 9 Jan 1998 13:19:07 -0800
	env-from (kcpoon@jurassic.eng.Sun.COM)
Received: from sunmail1.Sun.COM ([129.145.1.2]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id NAA13704; Fri, 9 Jan 1998 13:12:00 -0800
Received: from jurassic.eng.sun.com by sunmail1.Sun.COM (SMI-8.6/SMI-4.1)
	id NAA15185; Fri, 9 Jan 1998 13:11:57 -0800
Received: from shield (shield [129.146.85.114])
	by jurassic.eng.sun.com (8.8.8+Sun+sa+re+hr/8.8.8) with SMTP id NAA08786;
	Fri, 9 Jan 1998 13:11:57 -0800 (PST)
Date: Fri, 9 Jan 1998 13:11:57 -0800 (PST)
From: Kacheong Poon <kcpoon@jurassic.eng.Sun.COM>
Reply-To: Kacheong Poon <kcpoon@jurassic.eng.Sun.COM>
Subject: Re: discrepancy in TIME_WAIT state handling
To: davidm@hpl.hp.com
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: "Your message with ID" <199801091814.KAA03091@napali.hpl.hp.com>
Message-ID: <Roam.SIMC.2.0.6.884380317.1060.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I'm interested to hear other people's opinion on what the correct
> behavior for this case is and suggestions on how to fix the current
> situation.

In RFC 1122, 4.2.2.13, 

            When a connection is closed actively, it MUST linger in
            TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).
            However, it MAY accept a new SYN from the remote TCP to
            reopen the connection directly from TIME-WAIT state, if it:

            (1)  assigns its initial sequence number for the new
                 connection to be larger than the largest sequence
                 number it used on the previous connection incarnation,
                 and

            (2)  returns to TIME-WAIT state if the SYN turns out to be
                 an old duplicate.

So BSD's behaviour is a MAY.  I think NT's behaviour is correct, as described
in RFC 793, page 70,

        If the SYN is not in the window this step would not be reached
        and an ack would have been sent in the first step (sequence
        number check).

As you said, this scenario can happen quite often.  I'd suggest we recommend
it to be a best practise.  Does anyone know of other implementations which
behave like NT's TCP stack?

BTW, I guess this scenario should also happen with NT to NT connections.  Or
does NT have special mechanism, like not reusing the same port, to deal with
it?  I heard that NT 5.0's TCP stack has many changes.  Does it follow the
same BSD behaviour in this case?

							K. Poon.
							kcpoon@eng.sun.com




From owner-tcp-impl@relay.engr.sgi.com  Fri Jan  9 14:10:57 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA26466 for tcp-impl-list; Fri, 9 Jan 1998 13:58:05 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA26458 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 9 Jan 1998 13:58:03 -0800
Received: from venera.isi.edu (venera.isi.edu [128.9.176.32]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA11069
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 9 Jan 1998 13:58:03 -0800
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu (rum.isi.edu [128.9.192.237])
	by venera.isi.edu (8.8.7/8.8.6) with SMTP id NAA27762;
	Fri, 9 Jan 1998 13:55:22 -0800 (PST)
Date: Fri, 9 Jan 1998 21:55:22 GMT
Posted-Date: Fri, 9 Jan 1998 21:55:22 GMT
Message-Id: <199801092155.VAA02239@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-4)
	id <VAA02239>; Fri, 9 Jan 1998 21:55:22 GMT
To: tcp-impl@cthulhu.engr.sgi.com, davidm@hpl.hp.com
Subject: Re: discrepancy in TIME_WAIT state handling
X-Sun-Charset: US-ASCII
X-auto-sig-adder-by: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Date: Fri, 9 Jan 1998 10:14:56 -0800
> From: David Mosberger <davidm@hpl.hp.com>
> Content-Transfer-Encoding: 7bit
> To: tcp-impl@cthulhu.engr.sgi.com
> Subject: discrepancy in TIME_WAIT state handling
> 
...
 
> Both the BSD and the NT behavior are fine as long as they're not
> mixed.  However, when trying to talk from a BSD-derived box to an NT
> box, serious performance problems may occur if the BSD box gets
> unlucky and happens to reuse a port number within the TIME_WAIT period
> of the NT box.  The tcpdump trace below illustrates the problem (the
> output has been edited for easier reading; the original trace is
> available by request):
> 
...
> 
> As the trace shows, a TCP connection exist between 42.55 seconds and
> 42.56 seconds between ports 1025 and 80.  At the end of this
> connection, the NT 4.0 box (nt40) is in TIME_WAIT state as the FINs in
> both directions have been sent and acknowledged.  Then, at 42.57
> seconds, the hpux box attempts to create a new connection with the old
> port numbers (since BSD-derived TCP implementations consider this all
> right).  nt40 responds to this SYN packet by re-sending the last ACK
> for the first connection.  The hpux box in turn responds with a RESET
> since it was expecting a SYN segment from the nt40 box.  This
> SYN/ACK/RESET game repeats itself until the nt40 box moves the first
> connection out of the TIME_WAIT state, at which point the connection
> establishment proceeds normally.

The NT is in TIME_WAIT.

The HP sends the SYN with old port numbers.

The NT responds with an ACK, and does nothing else.

The HP, receiving an ACK, sends a RST.
	it is the HP's responsibility not to reuse 
	the port numbers for 2*MSL from this time

The NT receives the RST, goes to LISTEN and deletes the TCB, and returns.

At this point, the HP would be in error if it issued a new SYN with the old port numbers,
if it did so before the 2*MSL timeout.

(If I didn't miss something)...

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Fri Jan  9 17:40:45 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA04839 for tcp-impl-list; Fri, 9 Jan 1998 17:28:48 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA04826 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 9 Jan 1998 17:28:43 -0800
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA22311
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 9 Jan 1998 17:28:41 -0800
	env-from (kcpoon@jurassic.eng.Sun.COM)
Received: from sunmail1.Sun.COM ([129.145.1.2]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id RAA07505; Fri, 9 Jan 1998 17:28:08 -0800
Received: from jurassic.eng.sun.com by sunmail1.Sun.COM (SMI-8.6/SMI-4.1)
	id RAA04663; Fri, 9 Jan 1998 17:28:06 -0800
Received: from shield (shield [129.146.85.114])
	by jurassic.eng.sun.com (8.8.8+Sun+sa+re+hr/8.8.8) with SMTP id RAA09177;
	Fri, 9 Jan 1998 17:28:06 -0800 (PST)
Date: Fri, 9 Jan 1998 17:28:05 -0800 (PST)
From: Kacheong Poon <kcpoon@jurassic.eng.Sun.COM>
Reply-To: Kacheong Poon <kcpoon@jurassic.eng.Sun.COM>
Subject: Re: discrepancy in TIME_WAIT state handling
To: touch@ISI.EDU
Cc: tcp-impl@cthulhu.engr.sgi.com, davidm@hpl.hp.com
In-Reply-To: "Your message with ID" <199801092155.VAA02239@rum.isi.edu>
Message-ID: <Roam.SIMC.2.0.6.884395685.24655.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> The HP, receiving an ACK, sends a RST.
> 	it is the HP's responsibility not to reuse 
> 	the port numbers for 2*MSL from this time

Unfortunately, NT does the active close, that's why it is in TIME-WAIT state. 
So applications on HP can reuse the same port as HP's TCP does not know about
the previous connection.  The previous connection's TCB is deleted when it
changes to CLOSED state

> The NT receives the RST, goes to LISTEN and deletes the TCB, and returns.

As the seq num of the RST from HP is outside the window, it is ignored.  So NT
will not terminate the TIME-WAIT TCB.

							K. Poon.
							kcpoon@eng.sun.com



From owner-tcp-impl@relay.engr.sgi.com  Fri Jan  9 18:03:11 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA09409 for tcp-impl-list; Fri, 9 Jan 1998 17:48:51 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA09405 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 9 Jan 1998 17:48:49 -0800
Received: from venera.isi.edu (venera.isi.edu [128.9.176.32]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA26886
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 9 Jan 1998 17:48:48 -0800
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu (rum.isi.edu [128.9.192.237])
	by venera.isi.edu (8.8.7/8.8.6) with SMTP id RAA00944;
	Fri, 9 Jan 1998 17:46:15 -0800 (PST)
Date: Sat, 10 Jan 1998 01:46:13 GMT
Posted-Date: Sat, 10 Jan 1998 01:46:13 GMT
Message-Id: <199801100146.BAA02403@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-4)
	id <BAA02403>; Sat, 10 Jan 1998 01:46:13 GMT
To: tcp-impl@cthulhu.engr.sgi.com, davidm@hpl.hp.com, touch@ISI.EDU
Subject: Re: discrepancy in TIME_WAIT state handling
X-Sun-Charset: US-ASCII
X-auto-sig-adder-by: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Steps now numbered:


Step #1:
> The NT is in TIME_WAIT.

Step #2: 
> The HP sends the SYN with old port numbers.

Step #3:
> The NT responds with an ACK, and does nothing else.

Step #4:
> The HP, receiving an ACK, sends a RST.
> 	it is the HP's responsibility not to reuse 
> 	the port numbers for 2*MSL from this time

Step #5:
> The NT receives the RST, goes to LISTEN and deletes the TCB, and returns.

> At this point, the HP would be in error if it issued a new SYN with the old port numbers,
> if it did so before the 2*MSL timeout.

Step #6 (new - this is the part that starts the cycle back to step #2, if it would occur
	before 2*MSL after step #4)

The HP sends the SYN with the old port numbers.


--------------------------------------------

> Date: Fri, 9 Jan 1998 17:28:05 -0800 (PST)
> From: Kacheong Poon <kcpoon@jurassic.eng.Sun.COM>
...
> > The HP, receiving an ACK, sends a RST.
> > 	it is the HP's responsibility not to reuse 
> > 	the port numbers for 2*MSL from this time
> 
> Unfortunately, NT does the active close, that's why it is in TIME-WAIT state. 
> So applications on HP can reuse the same port as HP's TCP does not know about
> the previous connection.  The previous connection's TCB is deleted when it
> changes to CLOSED state

That's the reason the HP sends the SYN with the old port numbers (step #2). That
initial use is fine. After the HP sends the RST (step #4) it should not later
resend a SYN with the same port numbers (step #6, added)

> > The NT receives the RST, goes to LISTEN and deletes the TCB, and returns.
> 
> As the seq num of the RST from HP is outside the window, it is ignored.  So NT
> will not terminate the TIME-WAIT TCB.

The RST should be in response to the ACK from the NT.

(Wouldn't it have the seq number from the ACK, not from the original SYN?)

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Fri Jan  9 18:22:50 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA14292 for tcp-impl-list; Fri, 9 Jan 1998 18:10:36 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA14284 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 9 Jan 1998 18:10:31 -0800
Received: from hplms26.hpl.hp.com (hplms26.hpl.hp.com [15.255.168.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id SAA02177
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 9 Jan 1998 18:10:31 -0800
	env-from (davidm@napali.hpl.hp.com)
Received: from hplms2.hpl.hp.com (hplms2.hpl.hp.com [15.0.152.33])
	by hplms26.hpl.hp.com (8.8.6/8.8.6 HPLabs Relay) with ESMTP id SAA27476;
	Fri, 9 Jan 1998 18:10:34 -0800 (PST)
Received: from napali.hpl.hp.com (davidm@napali.hpl.hp.com [15.4.89.123])
	by hplms2.hpl.hp.com (8.8.6/8.8.6 HPLabs Hub) with ESMTP id SAA13180;
	Fri, 9 Jan 1998 18:10:28 -0800 (PST)
Received: (from davidm@localhost)
	by napali.hpl.hp.com (8.8.7/8.8.7) id SAA04047;
	Fri, 9 Jan 1998 18:10:25 -0800
Date: Fri, 9 Jan 1998 18:10:25 -0800
Message-Id: <199801100210.SAA04047@napali.hpl.hp.com>
From: David Mosberger <davidm@hpl.hp.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
To: Kacheong Poon <kcpoon@jurassic.eng.Sun.COM>, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: discrepancy in TIME_WAIT state handling
In-Reply-To: <Roam.SIMC.2.0.6.884395685.24655.kcpoon@jurassic>
References: <199801092155.VAA02239@rum.isi.edu>
	<Roam.SIMC.2.0.6.884395685.24655.kcpoon@jurassic>
X-Mailer: VM 6.33 under Emacs 20.2.1
Reply-To: davidm@hpl.hp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>>>>> On Fri, 9 Jan 1998 17:28:05 -0800 (PST), Kacheong Poon <kcpoon@jurassic.eng.Sun.COM> said:

  >> The NT receives the RST, goes to LISTEN and deletes the TCB, and
  >> returns.

  Kacheong> As the seq num of the RST from HP is outside the window,
  Kacheong> it is ignored.  So NT will not terminate the TIME-WAIT
  Kacheong> TCB.

Oops, I think I misinterpreted my trace as far as what happens after
HP sent the RST.  In the trace, the second connection goes through
after roughly 3 seconds.  I first assumed that this was due to a
(mis-)configured NT box with 2*MSL=3 seconds.  However, looking at the
trace more carefully, I found that the first retransmission of the SYN
of the second connection succeeds, so it may indeed be that NT had
deleted the TCB in response to HP's RST.  With this explanation, the 3
second delay is simply due to HP's retransmission timeout.

So the sequence of events for the second connection is:

   1. HP sends SYN
   2. NT response with ACK for first connection
   3. HP sends RST
   4. HP waits for approx. 3 seconds
   5. HP resends SYN
   6. NT responds with SYN ACK
   7. ...normal connection processing continues...

	--david

From owner-tcp-impl@relay.engr.sgi.com  Fri Jan  9 18:55:20 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA22184 for tcp-impl-list; Fri, 9 Jan 1998 18:44:03 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA22179 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 9 Jan 1998 18:44:01 -0800
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id SAA08747
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 9 Jan 1998 18:44:00 -0800
	env-from (kcpoon@jurassic.eng.Sun.COM)
Received: from sunmail1.Sun.COM ([129.145.1.2]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id SAA14809; Fri, 9 Jan 1998 18:43:29 -0800
Received: from jurassic.eng.sun.com by sunmail1.Sun.COM (SMI-8.6/SMI-4.1)
	id SAA08292; Fri, 9 Jan 1998 18:43:27 -0800
Received: from shield (shield [129.146.85.114])
	by jurassic.eng.sun.com (8.8.8+Sun+sa+re+hr/8.8.8) with SMTP id SAA15199;
	Fri, 9 Jan 1998 18:43:27 -0800 (PST)
Date: Fri, 9 Jan 1998 18:43:26 -0800 (PST)
From: Kacheong Poon <kcpoon@jurassic.eng.Sun.COM>
Reply-To: Kacheong Poon <kcpoon@jurassic.eng.Sun.COM>
Subject: Re: discrepancy in TIME_WAIT state handling
To: touch@ISI.EDU
Cc: tcp-impl@cthulhu.engr.sgi.com, davidm@hpl.hp.com
In-Reply-To: "Your message with ID" <199801100146.BAA02403@rum.isi.edu>
Message-ID: <Roam.SIMC.2.0.6.884400206.14432.kcpoon@jurassic>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> That's the reason the HP sends the SYN with the old port numbers (step #2).
> That initial use is fine. After the HP sends the RST (step #4) it should not
> later resend a SYN with the same port numbers (step #6, added)

The second SYN is a retransmission of the first SYN.  The ACK from NT does not
terminate SYN-SENT state.  So HP's TCP is still waiting for the SYN|ACK.

> The RST should be in response to the ACK from the NT.
> 
> (Wouldn't it have the seq number from the ACK, not from the original SYN?)

I was wrong.  You are correct.  The seq num should be equal to seg ack.  So
the RST should terminate the TIME-WAIT TCB in NT.  So the second SYN should
be accepted.  There is no 2*MSL wait period.  The waiting period is just a
retransmission timeout.  It is not that bad (-:

							K. Poon.
							kcpoon@eng.sun.com



From owner-tcp-impl@relay.engr.sgi.com  Sat Jan 10 13:13:52 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA22327 for tcp-impl-list; Sat, 10 Jan 1998 13:00:39 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA22319 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 10 Jan 1998 13:00:38 -0800
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA21067
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 10 Jan 1998 13:00:37 -0800
	env-from (Jerry.Chu@eng.Sun.COM)
Received: from sunmail1.Sun.COM ([129.145.1.2]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id NAA20522; Sat, 10 Jan 1998 13:00:36 -0800
Received: from jurassic.eng.sun.com by sunmail1.Sun.COM (SMI-8.6/SMI-4.1)
	id NAA05879; Sat, 10 Jan 1998 13:00:33 -0800
Received: from taipei.eng.sun.com (taipei [129.146.86.158])
	by jurassic.eng.sun.com (8.8.8+Sun+sa+re+hr/8.8.8) with SMTP id NAA08551;
	Sat, 10 Jan 1998 13:00:29 -0800 (PST)
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id MAA13844; Sat, 10 Jan 1998 12:59:08 -0800
Date: Sat, 10 Jan 1998 12:59:08 -0800
From: Jerry.Chu@eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199801102059.MAA13844@taipei.eng.sun.com>
To: aron@cs.rice.edu
Subject: Re: Clock granularity
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Solaris 2 uses the system timeout() function directly to schedule
each retransmission. This not only means it has a higher granularity
(same as the system clock at 100Hz by default), but also it's not
a heart-heat timer like most other BSD derived implementations have.

Jerry

> From owner-tcp-impl@cthulhu.engr.sgi.com  Wed Jan  7 15:23:33 1998
> From: aron@cs.rice.edu
> Subject: Clock granularity
> To: tcp-impl@cthulhu.engr.sgi.com (TCP Implementor's List)
> Date: Wed, 7 Jan 1998 17:09:57 -0600 (CST)
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> 
> Hi,
> 	does any commercial Operating System use a finer clock granularity than
> 500ms for scheduling the retransmission timeouts in TCP ? Thanks,
> 
> 
> 
> - Mohit
> 

From owner-tcp-impl@relay.engr.sgi.com  Sat Jan 10 13:49:20 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA28039 for tcp-impl-list; Sat, 10 Jan 1998 13:39:50 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA28034 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 10 Jan 1998 13:39:48 -0800
Received: from fly.cnuce.cnr.it (foda-devel.cnuce.cnr.it [131.114.192.86]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id NAA27204
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 10 Jan 1998 13:39:47 -0800
	env-from (pot@fly.cnuce.cnr.it)
Received: by fly.cnuce.cnr.it (Smail3.1.26.7 #3)
	id m0xr8gz-00032AC; Sat, 10 Jan 98 22:43 MET
Message-Id: <m0xr8gz-00032AC@fly.cnuce.cnr.it>
Date: Sat, 10 Jan 98 22:43 MET
From: Francesco Potorti` <F.Potorti@cnuce.cnr.it>
To: Jerry.Chu@eng.Sun.COM (Hsiao-keng Jerry Chu)
CC: tcp-impl@cthulhu.engr.sgi.com, aron@cs.rice.edu
In-reply-to: <199801102059.MAA13844@taipei.eng.sun.com> (Jerry.Chu@eng.Sun.COM)
Subject: Re: Clock granularity
Organization: CNUCE-CNR, Via S.Maria 36, Pisa - Italy +39-50-593211
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   Solaris 2 uses the system timeout() function directly to schedule
   each retransmission. This not only means it has a higher granularity
   (same as the system clock at 100Hz by default), but also it's not
   a heart-heat timer like most other BSD derived implementations have.
   
heart-heat?

From owner-tcp-impl@relay.engr.sgi.com  Sat Jan 10 14:20:51 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA02433 for tcp-impl-list; Sat, 10 Jan 1998 14:05:28 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from refugee.engr.sgi.com (fddi-refugee.engr.sgi.com [192.26.75.26]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA02380; Sat, 10 Jan 1998 14:05:22 -0800
Received: from refugee.engr.sgi.com (localhost [127.0.0.1]) by refugee.engr.sgi.com (971110.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id OAA02891; Sat, 10 Jan 1998 14:05:20 -0800 (PST)
Message-Id: <199801102205.OAA02891@refugee.engr.sgi.com>
To: aron@cs.rice.edu
Cc: tcp-impl@cthulhu.engr.sgi.com (TCP Implementor's List)
Subject: Re: Clock granularity 
In-reply-to: Message from aron@cs.rice.edu of 7 Jan 1998 17:09:57 CST
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Sat, 10 Jan 1998 14:05:20 -0800
From: Steve Alexander <sca@refugee.engr.sgi.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

aron@cs.rice.edu writes:
>	does any commercial Operating System use a finer clock granularity than
>500ms for scheduling the retransmission timeouts in TCP ? Thanks,

IRIX 6.4 (and 6.5, which is not yet available in stores) use 200ms granularity
for the retransmit timer.

-- Steve


From owner-tcp-impl@relay.engr.sgi.com  Sat Jan 10 16:08:41 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA14636 for tcp-impl-list; Sat, 10 Jan 1998 15:54:01 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA14628 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 10 Jan 1998 15:53:51 -0800
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA17719
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 10 Jan 1998 15:53:50 -0800
	env-from (Jerry.Chu@eng.Sun.COM)
Received: from sunmail1.Sun.COM ([129.145.1.2]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id PAA28265; Sat, 10 Jan 1998 15:53:50 -0800
Received: from jurassic.eng.sun.com by sunmail1.Sun.COM (SMI-8.6/SMI-4.1)
	id PAA09310; Sat, 10 Jan 1998 15:53:47 -0800
Received: from taipei.eng.sun.com (taipei [129.146.86.158])
	by jurassic.eng.sun.com (8.8.8+Sun+sa+re+hr/8.8.8) with SMTP id PAA12454;
	Sat, 10 Jan 1998 15:53:48 -0800 (PST)
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id PAA13986; Sat, 10 Jan 1998 15:52:26 -0800
Date: Sat, 10 Jan 1998 15:52:26 -0800
From: Jerry.Chu@eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199801102352.PAA13986@taipei.eng.sun.com>
To: davidm@hpl.hp.com, touch@ISI.EDU
Subject: Re: discrepancy in TIME_WAIT state handling
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>Both the BSD and the NT behavior are fine as long as they're not
>mixed.  However, when trying to talk from a BSD-derived box to an NT
>box, serious performance problems may occur if the BSD box gets
>unlucky and happens to reuse a port number within the TIME_WAIT period
>of the NT box.

Unix utilities like rsh relies on the BSD behavior to work in UNIX <->
UNIX communication. So it's not just being unlucky, i.e. this case
happens all the time in Solaris when one logout and rlogin back in
the same machine within 2MSL time.

>The HP, receiving an ACK, sends a RST.
>	it is the HP's responsibility not to reuse 
>	the port numbers for 2*MSL from this time

Hmmm, i don't think this is described in any TCP specs, or is commonly
implemented, although it seems like one way of fixing the TWA hazard
below.

>The NT receives the RST, goes to LISTEN and deletes the TCB, and returns.

This is the typical TIME-WAIT assassination hazard. See rfc1337. An easy
fix is to not truncate the TIME-WAIT TCB.

>Oops, I think I misinterpreted my trace as far as what happens after
>HP sent the RST.  In the trace, the second connection goes through
>after roughly 3 seconds.  I first assumed that this was due to a
>(mis-)configured NT box with 2*MSL=3 seconds.  However, looking at the
>trace more carefully, I found that the first retransmission of the SYN
>of the second connection succeeds, so it may indeed be that NT had
>deleted the TCB in response to HP's RST.

So this is strictly a performance concern, not a violation of the spec
(except the TWA hazard, which I wonder if anyone is actively addressing).
We've had a slightly different problem that went on un-noticed in this
area for months because it only happened when one rlogin back to the
same machine. The additional 3sec was taken as sluggish machine...

Jerry

From owner-tcp-impl@relay.engr.sgi.com  Sun Jan 11 00:03:42 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA27481 for tcp-impl-list; Sat, 10 Jan 1998 23:50:13 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA27417 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 10 Jan 1998 23:50:04 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id XAA26427
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 10 Jan 1998 23:50:04 -0800
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id XAA25734; Sat, 10 Jan 1998 23:50:02 -0800 (PST)
Message-Id: <199801110750.XAA25734@daffy.ee.lbl.gov>
To: Jerry.Chu@Eng.Sun.COM (Hsiao-keng Jerry Chu)
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: discrepancy in TIME_WAIT state handling
In-reply-to: Your message of Sat, 10 Jan 1998 15:52:26 PST.
Date: Sat, 10 Jan 1998 23:50:02 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Unix utilities like rsh relies on the BSD behavior to work in UNIX <->
> UNIX communication. So it's not just being unlucky, i.e. this case
> happens all the time in Solaris when one logout and rlogin back in
> the same machine within 2MSL time.

Doesn't rsh just use a different ephemeral port and avoid the problem
entirely?

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Sun Jan 11 00:40:43 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id AAA02799 for tcp-impl-list; Sun, 11 Jan 1998 00:26:24 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id AAA02793 for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 11 Jan 1998 00:26:18 -0800
Received: from scanner.worldgate.com (scanner.worldgate.com [198.161.84.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id AAA02344
	for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 11 Jan 1998 00:26:17 -0800
	env-from (marcs@znep.com)
Received: from znep.com (uucp@localhost)
	by scanner.worldgate.com (8.8.7/8.8.7) with UUCP id BAA21424;
	Sun, 11 Jan 1998 01:26:11 -0700 (MST)
Received: from localhost (marcs@localhost) by alive.znep.com (8.7.5/8.7.3) with SMTP id BAA08605; Sun, 11 Jan 1998 01:24:53 -0700 (MST)
Date: Sun, 11 Jan 1998 01:24:53 -0700 (MST)
From: Marc Slemko <marcs@znep.com>
To: Vern Paxson <vern@ee.lbl.gov>
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: discrepancy in TIME_WAIT state handling
In-Reply-To: <199801110750.XAA25734@daffy.ee.lbl.gov>
Message-ID: <Pine.BSF.3.95.980111011619.3955F-100000@alive.znep.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

On Sat, 10 Jan 1998, Vern Paxson wrote:

> > Unix utilities like rsh relies on the BSD behavior to work in UNIX <->
> > UNIX communication. So it's not just being unlucky, i.e. this case
> > happens all the time in Solaris when one logout and rlogin back in
> > the same machine within 2MSL time.
> 
> Doesn't rsh just use a different ephemeral port and avoid the problem
> entirely?

No.  On most systems, it starts at IPPORT_RESERVED-1 (or something
similar) and works down until it finds an unused port.

On some systems it may, but that isn't traditional behaviour.

You can argue that an application with this behaviour (ie. needless
deterministic picking of the port) are broken, however it is common.

I often see a similar problem with ssh where I am connected via a dialup
connection, the link goes down, ssh exits from a network read error, then
after reconnecting I can't connect at all with ssh to the same server
because the connection is still established on the server.  This goes on
until I can make ssh pick a different port on my side (eg. running two
copies at once) or the connection times out on the server (eg. keepalive,
some output to cause it to send data).  This problem, of course, isn't
helped by the TIME_WAIT behaviour being discussed because the connection
is still established. 

The obvious solution would appear to be making clients randomize local
ports better.  Doesn't eliminate problems of this type, but reduces them
so you won't repeatedly fail.


From owner-tcp-impl@relay.engr.sgi.com  Sun Jan 11 05:17:29 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id FAA27184 for tcp-impl-list; Sun, 11 Jan 1998 05:02:34 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA27179 for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 11 Jan 1998 05:02:32 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id FAA06579
	for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 11 Jan 1998 05:02:31 -0800
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (the-village.bc.nu [163.164.160.21]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id NAA17955; Sun, 11 Jan 1998 13:02:10 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0xrNOo-0005FsC; Sun, 11 Jan 98 13:25 GMT
Message-Id: <m0xrNOo-0005FsC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: discrepancy in TIME_WAIT state handling
To: vern@ee.lbl.gov (Vern Paxson)
Date: Sun, 11 Jan 1998 13:25:58 +0000 (GMT)
Cc: Jerry.Chu@Eng.Sun.COM, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199801110750.XAA25734@daffy.ee.lbl.gov> from "Vern Paxson" at Jan 10, 98 11:50:02 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> > happens all the time in Solaris when one logout and rlogin back in
> > the same machine within 2MSL time.
> 
> Doesn't rsh just use a different ephemeral port and avoid the problem
> entirely?

Not in the normal libc's. It tries from the same base each time. I ended up
modifying the older Linux libc because of that and because we didnt support
the BSD "128,000 further on" sequence hack mess.

Alan


From owner-tcp-impl@relay.engr.sgi.com  Sun Jan 11 10:36:08 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA20937 for tcp-impl-list; Sun, 11 Jan 1998 10:29:11 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA20919 for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 11 Jan 1998 10:29:08 -0800
Received: from lox.sandelman.ottawa.on.ca (lox.sandelman.ottawa.on.ca [205.233.54.146]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA20810
	for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 11 Jan 1998 10:29:02 -0800
	env-from (mcr@istari.sandelman.ottawa.on.ca)
Received: from istari.sandelman.ottawa.on.ca (istari.sandelman.ottawa.on.ca [205.233.54.136])
	by lox.sandelman.ottawa.on.ca (8.8.7/8.8.7) with ESMTP id NAA07773
	for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 11 Jan 1998 13:32:43 -0500 (EST)
Received: from istari.sandelman.ottawa.on.ca ([[UNIX: localhost]]) by istari.sandelman.ottawa.on.ca (8.8.7/8.7.3) with ESMTP id NAA04270 for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 11 Jan 1998 13:29:41 -0500 (EST)
Message-Id: <199801111829.NAA04270@istari.sandelman.ottawa.on.ca>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: discrepancy in TIME_WAIT state handling 
In-reply-to: Your message of "Sat, 10 Jan 1998 23:50:02 PST."
             <199801110750.XAA25734@daffy.ee.lbl.gov> 
Date: Sun, 11 Jan 1998 13:29:41 -0500
From: "Michael C. Richardson" <mcr@sandelman.ottawa.on.ca>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


>>>>> "Vern" == Vern Paxson <vern@ee.lbl.gov> writes:
    >> Unix utilities like rsh relies on the BSD behavior to work in UNIX <->
    >> UNIX communication. So it's not just being unlucky, i.e. this case
    >> happens all the time in Solaris when one logout and rlogin back in the
    >> same machine within 2MSL time.

    Vern> Doesn't rsh just use a different ephemeral port and avoid the
    Vern> problem entirely?

  RSH uses bindresvport(), which just does:
	for(i=1023; i--; i>0) {
		try to bind(2) local port i.
	}

  So, whether or not the local machine has a previous socket in TIME_WAIT
depends on which end did the active close.

   :!mcr!:            |  Sandelman Software Works Corporation, Ottawa, ON  
   Michael Richardson |Network and security consulting and contract programming
 Personal: <A HREF="http://www.sandelman.ottawa.on.ca/People/Michael_Richardson/Bio.html">mcr@sandelman.ottawa.on.ca</A>. PGP key available.
 Corporate: <A HREF="http://www.sandelman.ottawa.on.ca/SSW/">sales@sandelman.ottawa.on.ca</A>. 





From owner-tcp-impl@relay.engr.sgi.com  Sun Jan 11 23:27:43 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA03122 for tcp-impl-list; Sun, 11 Jan 1998 23:18:38 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA03095; Sun, 11 Jan 1998 23:18:24 -0800
Received: from desolation.CS.Berkeley.EDU (desolation.CS.Berkeley.EDU [128.32.33.142]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id XAA20073; Sun, 11 Jan 1998 23:18:23 -0800
	env-from (hari@desolation.CS.Berkeley.EDU)
Received: from desolation.CS.Berkeley.EDU (hari@localhost) by desolation.CS.Berkeley.EDU (8.8.3/8.8.2) with ESMTP id XAA07241; Sun, 11 Jan 1998 23:14:45 -0800 (PST)
From: Hari Balakrishnan <hari@desolation.CS.Berkeley.EDU>
Message-Id: <199801120714.XAA07241@desolation.CS.Berkeley.EDU>
X-Mailer: exmh version 2.0zeta 7/24/97
Reply-To: Hari Balakrishnan <hari@cs.berkeley.edu>
X-url: http://www.cs.berkeley.edu/~hari
To: Steve Alexander <sca@refugee.engr.sgi.com>
cc: aron@cs.rice.edu, tcp-impl@cthulhu.engr.sgi.com (TCP Implementor's List)
Subject: Re: Clock granularity 
In-reply-to: Your message of "Sat, 10 Jan 1998 14:05:20 PST."
             <199801102205.OAA02891@refugee.engr.sgi.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Sun, 11 Jan 1998 23:14:44 -0800
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>>> Steve Alexander said:
 > aron@cs.rice.edu writes:
 > >	does any commercial Operating System use a finer clock granularity than
 > >500ms for scheduling the retransmission timeouts in TCP ? Thanks,
 > 
 > IRIX 6.4 (and 6.5, which is not yet available in stores) use 200ms 
granulari
     > ty
 > for the retransmit timer.

Has the maximum delayed ack duration at the receiver been changed from 500ms to something <= 200ms?  If not, then wouldn't this (unilateral) sender modification lead to spurious retransmissions, especially when the connection's rto is less than 500ms?

For example, if you have only 1 outstanding segment in the window that has already reached a "500ms-delayed-ack" receiver, then this would lead to an unnecessary retransmission, because that receiver would delay its ack.  If you had a larger window and all but the first segment were lost, the same problem would occur again.  

I don't know if there are any "500ms-delack" receivers out there, but the hosts requirements (to my knowledge) doesn't preclude them.  Maybe these should be explicitly changed to 200ms (or smaller) in the RFC.  This will reduce the probability of a spurious retransmission.

-- Hari.



From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 08:14:52 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA02592 for tcp-impl-list; Mon, 12 Jan 1998 08:00:52 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA02576 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 08:00:47 -0800
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA21309
	for <tcp-impl@relay.engr.sgi.com>; Mon, 12 Jan 1998 08:00:46 -0800
	env-from (aron@cs.rice.edu)
Received: (from aron@localhost)
          by cs.rice.edu (8.8.5/8.8.4)
	  id KAA11396; Mon, 12 Jan 1998 10:00:43 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Message-Id: <199801121600.KAA11396@cs.rice.edu>
Subject: delayed ACKs
To: tcp-impl@cthulhu.engr.sgi.com (TCP Implementor's List)
Date: Mon, 12 Jan 1998 10:00:43 -0600 (CST)
Cc: aron@cs.rice.edu (Mohit Aron)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hi,
	 is there some work that demonstrates how important it is to delay 
the ACKs in the Internet ? 

The disadvantages of delaying the ACKs appear to be the following:

i) The slow-start and congestion avoidance at the sender get slower as these
   do not increase the window based on the bytes acknowledged, but rather on
   the number of ACKs received (which are halved).
ii) A retransmission timeout of less than 200ms cannot be used as it would 
   lead to spurious retransmissions when the receiver delays the ACKs for
   200ms (at least for the BSD implementations).
iii) When TCP options are used (such as the timestamp option), a TCP receiver 
   would ACK every 3 segments as shown in [1].
iv) Some other unfortunate interactions due to delayed ACKs are demonstrated in
    [2].


The advantages of using delayed ACKs can be enumerated as follows:

i) Enables the piggybacking of ACKs on any data segments that might be sent
   by the receiver.
ii) Reduces the number of ACKs in the network as well as reduces the number of
    ACKs that the sender needs to process.


As most of the traffic on the Internet is web traffic, it is questionable how
beneficial is i). Most of the HTTP traffic consists of short requests by
clients that are served by servers - so there's not much opportunity for
piggybacking here. FTP starts up two different TCP connections for the two
sides of data transfer, so there's not much piggypacking opportunities for 
FTP traffic too. I'm not aware of any work that demonstrates how important is
ii). At least reducing the ACKs in the network doesn't appear to provide much
benefit as ACKs are typically quite small in size as compared to data segments.






1. L. Brakmo and L. Peterson. Performance Problems in 4.4BSD TCP, ACM CCR, 
   Oct 1995.

2. J. Heidmann. Performance Interactions between P-HTTP and TCP
   Implementations, ACM CCR, Apr 1997.

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 08:38:26 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA10335 for tcp-impl-list; Mon, 12 Jan 1998 08:36:53 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA10295 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 08:36:44 -0800
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA27149
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 08:36:43 -0800
	env-from (aron@cs.rice.edu)
Received: (from aron@localhost)
          by cs.rice.edu (8.8.5/8.8.4)
	  id KAA12404; Mon, 12 Jan 1998 10:36:29 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Message-Id: <199801121636.KAA12404@cs.rice.edu>
Subject: Re: delayed ACKs
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
Date: Mon, 12 Jan 1998 10:36:29 -0600 (CST)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <m0xrn7u-0005FsC@lightning.swansea.linux.org.uk> from "Alan Cox" at Jan 12, 98 04:54:13 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> 
> Try a telnet session over a low speed radio link and you'll understand why
> its needed. 
> 



So should all other kinds of traffic suffer on account of telnet traffic
that don't even form a significant component of either Internet traffic or
even the TCP connections on the Internet ?





- Mohit

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 08:38:26 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA08888 for tcp-impl-list; Mon, 12 Jan 1998 08:30:14 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA08877 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 08:30:13 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA19293
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 08:30:07 -0800
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (the-village.bc.nu [163.164.160.21]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id QAA07437; Mon, 12 Jan 1998 16:29:50 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0xrn7u-0005FsC; Mon, 12 Jan 98 16:54 GMT
Message-Id: <m0xrn7u-0005FsC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: delayed ACKs
To: aron@cs.rice.edu (Mohit Aron)
Date: Mon, 12 Jan 1998 16:54:13 +0000 (GMT)
Cc: tcp-impl@cthulhu.engr.sgi.com, aron@cs.rice.edu
In-Reply-To: <199801121600.KAA11396@cs.rice.edu> from "Mohit Aron" at Jan 12, 98 10:00:43 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> piggybacking here. FTP starts up two different TCP connections for the two
> sides of data transfer, so there's not much piggypacking opportunities for 
> FTP traffic too. I'm not aware of any work that demonstrates how important is
> ii). At least reducing the ACKs in the network doesn't appear to provide much
> benefit as ACKs are typically quite small in size as compared to data segments.

Try a telnet session over a low speed radio link and you'll understand why
its needed. 

Alan


From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 08:53:39 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA11379 for tcp-impl-list; Mon, 12 Jan 1998 08:40:48 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA11359 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 08:40:43 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA01252
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 08:40:41 -0800
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (the-village.bc.nu [163.164.160.21]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id QAA07650; Mon, 12 Jan 1998 16:40:38 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0xrnIP-0005FsC; Mon, 12 Jan 98 17:05 GMT
Message-Id: <m0xrnIP-0005FsC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: delayed ACKs
To: aron@cs.rice.edu (Mohit Aron)
Date: Mon, 12 Jan 1998 17:05:04 +0000 (GMT)
Cc: alan@lxorguk.ukuu.org.uk, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199801121636.KAA12404@cs.rice.edu> from "Mohit Aron" at Jan 12, 98 10:36:29 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> So should all other kinds of traffic suffer on account of telnet traffic
> that don't even form a significant component of either Internet traffic or
> even the TCP connections on the Internet ?

I can see no real evidence they do suffer. Furthermore any ack is implicitly
a delayed ack viewed on anything but a local lan level because its been 
jittered by routers, rerouted via australia and all the other things that
happen to packets.

Secondly your argument makes the oft repeated flawed assumption that 
"traffic by volume" equates to "importance and usage by user base".

Alan


From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 08:53:35 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA13606 for tcp-impl-list; Mon, 12 Jan 1998 08:49:39 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA13579 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 08:49:37 -0800
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA10861
	for <tcp-impl@relay.engr.sgi.com>; Mon, 12 Jan 1998 08:49:36 -0800
	env-from (aron@cs.rice.edu)
Received: (from aron@localhost)
          by cs.rice.edu (8.8.5/8.8.4)
	  id KAA12791; Mon, 12 Jan 1998 10:49:26 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Message-Id: <199801121649.KAA12791@cs.rice.edu>
Subject: Re: delayed ACKs
To: alan@lxorguk.ukuu.org.uk (Alan Cox)
Date: Mon, 12 Jan 1998 10:49:26 -0600 (CST)
Cc: tcp-impl@cthulhu.engr.sgi.com (TCP Implementor's List)
In-Reply-To: <m0xrnIP-0005FsC@lightning.swansea.linux.org.uk> from "Alan Cox" at Jan 12, 98 05:05:04 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> 
> I can see no real evidence they do suffer. Furthermore any ack is implicitly
> a delayed ack viewed on anything but a local lan level because its been 
> jittered by routers, rerouted via australia and all the other things that
> happen to packets.
> 

Do all packets happen to get rerouted via Australia ? There's a lot of work
that demonstrates the slowness of slow-start for web traffic. Delayed ACKs
further slow down slow-start. What other real evidence would you ask for ?


> Secondly your argument makes the oft repeated flawed assumption that 
> "traffic by volume" equates to "importance and usage by user base".
> 


For one thing I also said that the telnet traffic doesn't form a significant
percentage of the number of TCP connections - so my argument doesn't consider
volume only. Secondly, your argument in favour of delayed ACKs holds for
'telnet traffic over low speed radio links'. What percentage of the telnet
traffic on the Internet goes over low speed radio links ?




- Mohit

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 09:08:38 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA16985 for tcp-impl-list; Mon, 12 Jan 1998 09:03:17 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA16972 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 09:03:14 -0800
Received: from assateague.lerc.nasa.gov (assateague.lerc.nasa.gov [139.88.35.25]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA26747
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 09:03:09 -0800
	env-from (mallman@guns.lerc.nasa.gov)
Received: from guns.lerc.nasa.gov by assateague.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id MAA16969; Mon, 12 Jan 1998 12:02:43 -0500 (EST)
Received: from guns by guns.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-local)
        id MAA01110; Mon, 12 Jan 1998 12:02:42 -0500 (EST)
Message-Id: <199801121702.MAA01110@guns.lerc.nasa.gov>
To: Mohit Aron <aron@cs.rice.edu>
Cc: tcp-impl@cthulhu.engr.sgi.com
From: Mark Allman <mallman@lerc.nasa.gov>
Reply-To: mallman@lerc.nasa.gov
Subject: Re: delayed ACKs 
Organization: Late Night Hackers, NASA LeRC, Cleveland, Ohio
Song-of-the-Day: TV Dinner
Date: Mon, 12 Jan 1998 12:02:42 -0500
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


I have been experimenting with ACK generation and use lately as
well...  Just a couple more data points....

1.  Delayed ACKs have been shown to improve throughput in bulk
    transfers in some situations.  In fact, using an ACK interval
    larger than 2 increases throughput even more, in some cases.
    See my old officemate's thesis, for more specfics...

    http://jarok.cs.ohiou.edu/papers/johnson-thesis.ps

2.  The Berkeley folks showed that in asymmetric networks ACKing
    each incoming segment can cause the reverse link to become
    saturated before the forward link.  This can limit throughput.
    In fact, if I remember correctly, they outlined situations in
    which the required ACK interval was larger than 2.  I am sure
    the paper is available on-line, but I don't have a pointer to it
    handy.  (Try looking at http://daedalus.cs.berkeley.edu).

allman


---
http://gigahertz.lerc.nasa.gov/~mallman/

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 09:08:37 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA14412 for tcp-impl-list; Mon, 12 Jan 1998 08:52:03 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA14381 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 08:51:54 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA13835
	for <tcp-impl@relay.engr.sgi.com>; Mon, 12 Jan 1998 08:51:50 -0800
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (the-village.bc.nu [163.164.160.21]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id QAA07904; Mon, 12 Jan 1998 16:51:47 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0xrnTB-0005FsC; Mon, 12 Jan 98 17:16 GMT
Message-Id: <m0xrnTB-0005FsC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: delayed ACKs
To: aron@cs.rice.edu (Mohit Aron)
Date: Mon, 12 Jan 1998 17:16:13 +0000 (GMT)
Cc: alan@lxorguk.ukuu.org.uk, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199801121649.KAA12791@cs.rice.edu> from "Mohit Aron" at Jan 12, 98 10:49:26 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> percentage of the number of TCP connections - so my argument doesn't consider
> volume only. Secondly, your argument in favour of delayed ACKs holds for
> 'telnet traffic over low speed radio links'. What percentage of the telnet
> traffic on the Internet goes over low speed radio links ?

Enough that I still remember why IP is used - because it handles all cases
well rather than a few cases superbly.

IFF you want to handle cases where delayed ack is obviously not beneficial
then stick NODELACK in as a proposed tcp option. Alternatively take a look
at the Linux OS where we do adaptive ack delays to minimse the bad things
that happen with the conventional implementation


From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 09:28:09 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA20531 for tcp-impl-list; Mon, 12 Jan 1998 09:13:56 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA20494 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 09:13:51 -0800
Received: from firewall.agranat.com (agranat.com [146.115.131.2]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA14042
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 09:13:49 -0800
	env-from (lawrence@agranat.com)
Received: from agranat.com (alice [192.104.71.130]) by firewall.agranat.com (8.6.12/8.6.9) with ESMTP id MAA17671; Mon, 12 Jan 1998 12:13:47 -0500
Received: from localhost (lawrence@localhost)
	by agranat.com (8.8.5/8.8.5) with SMTP id MAA25028;
	Mon, 12 Jan 1998 12:13:46 -0500
Date: Mon, 12 Jan 1998 12:13:46 -0500 (EST)
From: Scott Lawrence <lawrence@agranat.com>
To: Mohit Aron <aron@cs.rice.edu>
cc: "TCP Implementor's List" <tcp-impl@cthulhu.engr.sgi.com>
Subject: Re: delayed ACKs
In-Reply-To: <199801121600.KAA11396@cs.rice.edu>
Message-ID: <Pine.LNX.3.96.980112121032.24532C-100000@alice.agranat.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


On Mon, 12 Jan 1998, Mohit Aron wrote:

> The advantages of using delayed ACKs can be enumerated as follows:
> 
> i) Enables the piggybacking of ACKs on any data segments that might be sent
>    by the receiver.
>[...]
> As most of the traffic on the Internet is web traffic, it is questionable how
> beneficial is i). Most of the HTTP traffic consists of short requests by
> clients that are served by servers - so there's not much opportunity for
> piggybacking here.

With HTTP/1.1 (going to Draft Standard shortly and already deployed by one
major browser and a number of servers) multiple requests and responses may
be pipelined on a single TCP connection.  Delayed ACKs will once again be
an important improvement - the fact that earlier versions of HTTP did not
have this feature was a bug in the design of HTTP, not a reason to change
the deferred ACKs design.



From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 09:59:32 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA02640 for tcp-impl-list; Mon, 12 Jan 1998 09:50:26 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA02625 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 09:50:24 -0800
Received: from diabolo.upc.es (diabolo.upc.es [147.83.2.2]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA10171
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 09:44:35 -0800
	env-from (jlinares@mat.upc.es)
Received: from mat.upc.es (mat.upc.es [147.83.39.3])
	by diabolo.upc.es (8.8.6/8.8.6) with ESMTP id SAA02729;
	Mon, 12 Jan 1998 18:28:01 +0100 (MET)
Received: from maite150.upc.es (maite150 [147.83.39.150]) by mat.upc.es (8.7.6/8.7.3) with SMTP id SAA15793; Mon, 12 Jan 1998 18:26:14 GMT
Received: from maite150 by maite150.upc.es (SMI-8.6/SMI-SVR4)
	id SAA29899; Mon, 12 Jan 1998 18:25:13 GMT
Message-ID: <34BA6008.51C6@mat.upc.es>
Date: Mon, 12 Jan 1998 18:25:12 +0000
From: Jaume Linares <jlinares@mat.upc.es>
X-Mailer: Mozilla 3.0 (X11; I; SunOS 5.5.1 sun4m)
MIME-Version: 1.0
To: Mohit Aron <aron@cs.rice.edu>
CC: "TCP Implementor's List" <tcp-impl@cthulhu.engr.sgi.com>
Subject: Re: delayed ACKs
References: <199801121600.KAA11396@cs.rice.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hi,

	Delayed acks could be interesting when there is an assymetric link and
the bandwith in the reverse direction is smaller than the forward
direction (for example in wireless cable modem networs, direct broadcast
satellite networks and Asymmetric Digital Subscriber Loop (ASDL)). In
that case there is a limitation of the throughput achievable due to the
transmission of acks, and it's necessary to decrease the amount of acks.
See paper "The Effects of Asymmetry on TCP Performance" at
http://daedalus.cs.berkeley.edu/

Jaume


Mohit Aron wrote:
> 
> Hi,
>          is there some work that demonstrates how important it is to delay
> the ACKs in the Internet ?
> 
> The disadvantages of delaying the ACKs appear to be the following:
> 
> i) The slow-start and congestion avoidance at the sender get slower as these
>    do not increase the window based on the bytes acknowledged, but rather on
>    the number of ACKs received (which are halved).
> ii) A retransmission timeout of less than 200ms cannot be used as it would
>    lead to spurious retransmissions when the receiver delays the ACKs for
>    200ms (at least for the BSD implementations).
> iii) When TCP options are used (such as the timestamp option), a TCP receiver
>    would ACK every 3 segments as shown in [1].
> iv) Some other unfortunate interactions due to delayed ACKs are demonstrated in
>     [2].
> 
> The advantages of using delayed ACKs can be enumerated as follows:
> 
> i) Enables the piggybacking of ACKs on any data segments that might be sent
>    by the receiver.
> ii) Reduces the number of ACKs in the network as well as reduces the number of
>     ACKs that the sender needs to process.
> 
> As most of the traffic on the Internet is web traffic, it is questionable how
> beneficial is i). Most of the HTTP traffic consists of short requests by
> clients that are served by servers - so there's not much opportunity for
> piggybacking here. FTP starts up two different TCP connections for the two
> sides of data transfer, so there's not much piggypacking opportunities for
> FTP traffic too. I'm not aware of any work that demonstrates how important is
> ii). At least reducing the ACKs in the network doesn't appear to provide much
> benefit as ACKs are typically quite small in size as compared to data segments.
> 
> 1. L. Brakmo and L. Peterson. Performance Problems in 4.4BSD TCP, ACM CCR,
>    Oct 1995.
> 
> 2. J. Heidmann. Performance Interactions between P-HTTP and TCP
>    Implementations, ACM CCR, Apr 1997.

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 10:13:45 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA06881 for tcp-impl-list; Mon, 12 Jan 1998 10:02:16 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA06867 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:02:14 -0800
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA16264
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:02:13 -0800
	env-from (aron@cs.rice.edu)
Received: (from aron@localhost)
          by cs.rice.edu (8.8.5/8.8.4)
	  id LAA14761; Mon, 12 Jan 1998 11:53:38 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Message-Id: <199801121753.LAA14761@cs.rice.edu>
Subject: Re: delayed ACKs
To: jlinares@mat.upc.es (Jaume Linares)
Date: Mon, 12 Jan 1998 11:53:37 -0600 (CST)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <34BA6008.51C6@mat.upc.es> from "Jaume Linares" at Jan 12, 98 06:25:12 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> 
> 	Delayed acks could be interesting when there is an assymetric link and
> the bandwith in the reverse direction is smaller than the forward
> direction (for example in wireless cable modem networs, direct broadcast
> satellite networks and Asymmetric Digital Subscriber Loop (ASDL)). In
> that case there is a limitation of the throughput achievable due to the
> transmission of acks, and it's necessary to decrease the amount of acks.
> See paper "The Effects of Asymmetry on TCP Performance" at
> http://daedalus.cs.berkeley.edu/
> 


Yes, but in such a situation, one would go for using some sort of 'ack
filtering' method (also suggested in the paper). If ACKs would be filtered,
then whether or not the receiver delays ACKs is anyway not a concern.




- Mohit

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 10:13:49 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA08949 for tcp-impl-list; Mon, 12 Jan 1998 10:07:54 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA08935 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:07:51 -0800
Received: from databus.databus.com (databus.databus.com [198.186.154.34]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id KAA18584
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:07:50 -0800
	env-from (barney@databus.databus.com)
From: Barney Wolff <barney@databus.com>
To: <tcp-impl@cthulhu.engr.sgi.com>
Date: Mon, 12 Jan 1998 13:05 EST
Subject: Re: delayed ACKs
Content-Length: 235
Content-Type: text/plain
Message-ID: <34ba5bf20.12a8@databus.databus.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Delayed ACKs do bad things with slow-start and with the Nagle algorithm.
At a minimum, I'd like to see a way, without changing the protocol, to
know when it's wise to delay an ACK and when it's not.

Barney Wolff  <barney@databus.com>

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 10:27:21 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA12306 for tcp-impl-list; Mon, 12 Jan 1998 10:15:44 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA12297 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:15:42 -0800
Received: from venera.isi.edu (venera.isi.edu [128.9.176.32]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA21388
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:15:41 -0800
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu (rum.isi.edu [128.9.192.237])
	by venera.isi.edu (8.8.7/8.8.6) with SMTP id KAA00736;
	Mon, 12 Jan 1998 10:15:39 -0800 (PST)
Date: Mon, 12 Jan 1998 18:15:38 GMT
Posted-Date: Mon, 12 Jan 1998 18:15:38 GMT
Message-Id: <199801121815.SAA04157@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-4)
	id <SAA04157>; Mon, 12 Jan 1998 18:15:38 GMT
To: aron@cs.rice.edu, jlinares@mat.upc.es
Subject: Re: delayed ACKs
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
X-auto-sig-adder-by: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Jaume Linares <jlinares@mat.upc.es>
> X-Mailer: Mozilla 3.0 (X11; I; SunOS 5.5.1 sun4m)
> CC: "TCP Implementor's List" <tcp-impl@cthulhu.engr.sgi.com>
> Subject: Re: delayed ACKs
...
> 	Delayed acks could be interesting when there is an assymetric link and
> the bandwith in the reverse direction is smaller than the forward
> direction (for example in wireless cable modem networs, direct broadcast
> satellite networks and Asymmetric Digital Subscriber Loop (ASDL)).


ACKs are occur every 1 or 2 segments, and are 40 bytes long.
Given even minimal data segments (typ. 552 bytes - 512 data + header),
that's a ratio of 14:1 to 28:1 needed for maximum forward-channel bandwidth.

Are wireless cable modems and ASDL more lopsided than that?

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 10:27:25 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA11642 for tcp-impl-list; Mon, 12 Jan 1998 10:14:15 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA11627 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:14:13 -0800
Received: from frantic.BSDI.COM (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA20928
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:14:11 -0800
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id MAA00264;
	Mon, 12 Jan 1998 12:15:25 -0600 (CST)
Date: Mon, 12 Jan 1998 12:15:25 -0600 (CST)
From: David Borman <dab@BSDI.COM>
Message-Id: <199801121815.MAA00264@frantic.BSDI.COM>
To: alan@lxorguk.ukuu.org.uk, aron@cs.rice.edu
Subject: Re: delayed ACKs
Cc: tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From owner-tcp-impl@cthulhu.engr.sgi.com Mon Jan 12 10:56:48 1998
> From: Mohit Aron <aron@cs.rice.edu>
> Subject: Re: delayed ACKs
> To: alan@lxorguk.ukuu.org.uk (Alan Cox)
> Date: Mon, 12 Jan 1998 10:49:26 -0600 (CST)
> Cc: tcp-impl@cthulhu.engr.sgi.com (TCP Implementor's List)
> 
> 
> > 
> > I can see no real evidence they do suffer. Furthermore any ack is implicitly
> > a delayed ack viewed on anything but a local lan level because its been 
> > jittered by routers, rerouted via australia and all the other things that
> > happen to packets.
> > 
> 
> Do all packets happen to get rerouted via Australia ? There's a lot of work
> that demonstrates the slowness of slow-start for web traffic. Delayed ACKs
> further slow down slow-start. What other real evidence would you ask for ?

Then address the interaction between slow-start and delayed ACKs.
There are things that can be done to improve web traffic in the face
of slow-start and delayed ACKs, without getting rid of delayed ACKs.
In BSD/OS we specifically addressed web performance, and as part of
that effort we were able to eliminate most of the artificial delays
without eliminating either slow-start or delayed ACKs.  (See
http://www.bsdi.com/press/19960827 for the press release.)

			-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 10:52:02 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA23446 for tcp-impl-list; Mon, 12 Jan 1998 10:41:34 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA23429 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:41:33 -0800
Received: from onet2.cup.hp.com (onet2.cup.hp.com [15.255.208.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA03702
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:41:31 -0800
	env-from (raj@cup.hp.com)
Received: from cup.hp.com (loiter.cup.hp.com [15.13.104.252])
	by onet2.cup.hp.com (8.8.6/8.8.6/LJ-00) with ESMTP id KAA04815
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:41:29 -0800 (PST)
Message-ID: <34BA63D8.83025CEB@cup.hp.com>
Date: Mon, 12 Jan 1998 10:41:28 -0800
From: Rick Jones <raj@cup.hp.com>
Organization: HP 9000 Network Performance
X-Mailer: Mozilla 4.03 [en] (X11; I; HP-UX B.10.20 9000/735)
MIME-Version: 1.0
To: "TCP Implementor's List" <tcp-impl@cthulhu.engr.sgi.com>
Subject: Re: delayed ACKs
References: <199801121649.KAA12791@cs.rice.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Mohit Aron wrote:
> 
> >
> > I can see no real evidence they do suffer. Furthermore any ack is implicitly
> > a delayed ack viewed on anything but a local lan level because its been
> > jittered by routers, rerouted via australia and all the other things that
> > happen to packets.
> >
> 
> Do all packets happen to get rerouted via Australia ? There's a lot of work

No, but TCP must assume it could happen. I prefer to route my packets
through Pluto myself... :)

> that demonstrates the slowness of slow-start for web traffic. Delayed ACKs
> further slow down slow-start. What other real evidence would you ask for ?

I thought that the discussions on IW=2 (or should that be IW>1?) were
addressing that issue with delayed ACKs?

That being the case, most of the rest of the time, the problems I have
seen with applications (such as web servers) has been that they are not
making "apropriate" calls to send their data - for instance, sending the
http headers and the http data in separate calls to send() instead of
one call to writev() or something. 

Put another way, all "logically associated" (I am sure there is a better
term) data should be presented to the transport at the same time.

> 
> > Secondly your argument makes the oft repeated flawed assumption that
> > "traffic by volume" equates to "importance and usage by user base".
> >
> 
> For one thing I also said that the telnet traffic doesn't form a significant
> percentage of the number of TCP connections - so my argument doesn't consider
> volume only. Secondly, your argument in favour of delayed ACKs holds for
> 'telnet traffic over low speed radio links'. What percentage of the telnet
> traffic on the Internet goes over low speed radio links ?

Another consideration is that the delta in processing time between a
data packet and an ACK is not necessarily that large. If one were to
stop delaying ACK's it would certainly increase the CPU overhead on the
end systems. I do not know what the delta is for the "network" but
always figure that if a router queue is mesured in packets, one packet
occupies a slot in that queue just as well as another.

rick jones

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 10:52:04 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA24277 for tcp-impl-list; Mon, 12 Jan 1998 10:43:52 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA24268 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:43:51 -0800
Received: from venera.isi.edu (venera.isi.edu [128.9.176.32]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA04342
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:43:49 -0800
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu (rum.isi.edu [128.9.192.237])
	by venera.isi.edu (8.8.7/8.8.6) with SMTP id KAA01159;
	Mon, 12 Jan 1998 10:43:46 -0800 (PST)
Date: Mon, 12 Jan 1998 18:43:45 GMT
Posted-Date: Mon, 12 Jan 1998 18:43:45 GMT
Message-Id: <199801121843.SAA04176@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-4)
	id <SAA04176>; Mon, 12 Jan 1998 18:43:45 GMT
To: touch@ISI.EDU, hkruse1@ohiou.edu
Subject: Re: delayed ACKs
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
X-auto-sig-adder-by: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From hkruse1@ohiou.edu Mon Jan 12 10:39:45 1998
> From: Hans Kruse <hkruse1@ohiou.edu>
> Subject: Re: delayed ACKs
> 
> >ACKs are occur every 1 or 2 segments, and are 40 bytes long.
> >Given even minimal data segments (typ. 552 bytes - 512 data + header),
> >that's a ratio of 14:1 to 28:1 needed for maximum forward-channel bandwidth.
> >
> >Are wireless cable modems and ASDL more lopsided than that?
> 
> DirecPC uses up to (roughly) 1Mbps forward, 28.8kbps (modem) backwards; 33:1.

Good reasons to buy 33.6 (30:1, very close) or 56K modems (18:1, fine)
... not necessarily to change the protocol :-)

??

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 10:52:03 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA22801 for tcp-impl-list; Mon, 12 Jan 1998 10:39:55 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA22760 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:39:50 -0800
Received: from oak.cats.ohiou.edu (oak.cats.ohiou.edu [132.235.8.44]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA02790
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:39:49 -0800
	env-from (hkruse1@ohiou.edu)
Received: from [132.235.75.17] (kruse.tcom.ohiou.edu [132.235.75.17])
	by oak.cats.ohiou.edu (8.8.8/8.8.8) with ESMTP id NAA30760;
	Mon, 12 Jan 1998 13:39:40 -0500 (EST)
X-Sender: kruse@oak.cats.ohiou.edu
Message-Id: <v030078c7b0e0138ac179@[132.235.75.17]>
In-Reply-To: <199801121815.SAA04157@rum.isi.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Mon, 12 Jan 1998 13:40:13 -0500
To: touch@ISI.EDU
From: Hans Kruse <hkruse1@ohiou.edu>
Subject: Re: delayed ACKs
Cc: tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>ACKs are occur every 1 or 2 segments, and are 40 bytes long.
>Given even minimal data segments (typ. 552 bytes - 512 data + header),
>that's a ratio of 14:1 to 28:1 needed for maximum forward-channel bandwidth.
>
>Are wireless cable modems and ASDL more lopsided than that?

DirecPC uses up to (roughly) 1Mbps forward, 28.8kbps (modem) backwards; 33:1.

I think ADSL is usually about 10:1 or so; cable modems may be
implementation specific.


Hans Kruse, Associate Professor and Director
McClure School of Communication Systems Management, Ohio University
9 S. College Street
Athens, OH 45701
614-593-4891 voice,  614-593-4889 fax,  hkruse1@ohiou.edu



From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 11:06:51 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA00592 for tcp-impl-list; Mon, 12 Jan 1998 10:59:45 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA00575 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:59:44 -0800
Received: from venera.isi.edu (venera.isi.edu [128.9.176.32]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA10592
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:59:43 -0800
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu (rum.isi.edu [128.9.192.237])
	by venera.isi.edu (8.8.7/8.8.6) with SMTP id KAA01324;
	Mon, 12 Jan 1998 10:59:41 -0800 (PST)
Date: Mon, 12 Jan 1998 18:59:40 GMT
Posted-Date: Mon, 12 Jan 1998 18:59:40 GMT
Message-Id: <199801121859.SAA04188@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-4)
	id <SAA04188>; Mon, 12 Jan 1998 18:59:40 GMT
To: aron@cs.rice.edu, lawrence@agranat.com
Subject: Re: delayed ACKs
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
X-auto-sig-adder-by: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> With HTTP/1.1 (going to Draft Standard shortly and already deployed by one
> major browser and a number of servers) multiple requests and responses may
> be pipelined on a single TCP connection.  Delayed ACKs will once again be
> an important improvement - the fact that earlier versions of HTTP did not
> have this feature was a bug in the design of HTTP, not a reason to change
> the deferred ACKs design.

Even in 1.1, the request:response ratio is still very low. It's still one
request per response, which means that, most of the time, the request
channel is still comparatively idle. 

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 11:06:54 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA01354 for tcp-impl-list; Mon, 12 Jan 1998 11:01:10 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA01348 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 11:01:09 -0800
Received: from venera.isi.edu (venera.isi.edu [128.9.176.32]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA11127
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 11:01:07 -0800
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu (rum.isi.edu [128.9.192.237])
	by venera.isi.edu (8.8.7/8.8.6) with SMTP id KAA01263;
	Mon, 12 Jan 1998 10:57:06 -0800 (PST)
Date: Mon, 12 Jan 1998 18:57:06 GMT
Posted-Date: Mon, 12 Jan 1998 18:57:06 GMT
Message-Id: <199801121857.SAA04185@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-4)
	id <SAA04185>; Mon, 12 Jan 1998 18:57:06 GMT
To: tcp-impl@cthulhu.engr.sgi.com, raj@cup.hp.com
Subject: Re: delayed ACKs
X-Sun-Charset: US-ASCII
X-auto-sig-adder-by: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Rick Jones <raj@cup.hp.com>
> Organization: HP 9000 Network Performance
> To: "TCP Implementor's List" <tcp-impl@cthulhu.engr.sgi.com>
> 
...
> That being the case, most of the rest of the time, the problems I have
> seen with applications (such as web servers) has been that they are not
> making "apropriate" calls to send their data - for instance, sending the
> http headers and the http data in separate calls to send() instead of
> one call to writev() or something. 

There are other problems - not disabling Nagle, not correcting
socket buffer sizes, etc...

> Put another way, all "logically associated" (I am sure there is a better
> term) data should be presented to the transport at the same time.

That's an artifact of leaving Nagle's algorithm on (by default), and
trying to anticipate it at the application layer.

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 11:06:51 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA28238 for tcp-impl-list; Mon, 12 Jan 1998 10:53:08 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA28205 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:53:06 -0800
Received: from diable.upc.es (diable.upc.es [147.83.98.7]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA08109
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:53:02 -0800
	env-from (jlinares@mat.upc.es)
Received: from mat.upc.es (mat.upc.es [147.83.39.3])
	by diable.upc.es (8.8.6/8.8.6) with ESMTP id TAA29165;
	Mon, 12 Jan 1998 19:42:27 +0100 (MET)
Received: from maite150.upc.es (maite150 [147.83.39.150]) by mat.upc.es (8.7.6/8.7.3) with SMTP id TAA16882; Mon, 12 Jan 1998 19:40:39 GMT
Received: from maite150 by maite150.upc.es (SMI-8.6/SMI-SVR4)
	id TAA00032; Mon, 12 Jan 1998 19:39:39 GMT
Message-ID: <34BA717A.5691@mat.upc.es>
Date: Mon, 12 Jan 1998 19:39:38 +0000
From: Jaume Linares <jlinares@mat.upc.es>
X-Mailer: Mozilla 3.0 (X11; I; SunOS 5.5.1 sun4m)
MIME-Version: 1.0
To: touch@isi.edu
CC: aron@cs.rice.edu, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: delayed ACKs
References: <199801121815.SAA04157@rum.isi.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

touch@isi.edu wrote:
> ACKs are occur every 1 or 2 segments, and are 40 bytes long.
> Given even minimal data segments (typ. 552 bytes - 512 data + header),
> that's a ratio of 14:1 to 28:1 needed for maximum forward-channel bandwidth.
> 
> Are wireless cable modems and ASDL more lopsided than that?
>
> Joe

But you must also see the bandwith in each direction. For example the
wireless cable modem of Hybrid Networks Inc. has a forward bandwith of
10 Mbps. If we supose an asymmetric link with a reverse channel of 28.8
Kbps (for example a dialup phone line) there is a bandwith ratio of
10Mbps / 28.8 Kbps (347:1). Then the real ratio is 347/28 (12:1). So I
will receive 12 packets while I send just 1 ack. If I don't delay the
ack 12 packets the reverse direction will limit the forward direction.

Jaume

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 11:06:54 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA28168 for tcp-impl-list; Mon, 12 Jan 1998 10:53:01 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA28163 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:53:00 -0800
Received: from venera.isi.edu (venera.isi.edu [128.9.176.32]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA08090
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 10:52:59 -0800
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu (rum.isi.edu [128.9.192.237])
	by venera.isi.edu (8.8.7/8.8.6) with SMTP id KAA01231;
	Mon, 12 Jan 1998 10:52:56 -0800 (PST)
Date: Mon, 12 Jan 1998 18:52:55 GMT
Posted-Date: Mon, 12 Jan 1998 18:52:55 GMT
Message-Id: <199801121852.SAA04182@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-4)
	id <SAA04182>; Mon, 12 Jan 1998 18:52:55 GMT
To: touch@ISI.EDU, jlinares@mat.upc.es
Subject: Re: delayed ACKs
Cc: aron@cs.rice.edu, tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
X-auto-sig-adder-by: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Jaume Linares <jlinares@mat.upc.es>
> 
> touch@isi.edu wrote:
> > ACKs are occur every 1 or 2 segments, and are 40 bytes long.
> > Given even minimal data segments (typ. 552 bytes - 512 data + header),
> > that's a ratio of 14:1 to 28:1 needed for maximum forward-channel bandwidth.
> > 
> > Are wireless cable modems and ASDL more lopsided than that?
> >
> > Joe
> 
> But you must also see the bandwith in each direction. For example the
> wireless cable modem of Hybrid Networks Inc. has a forward bandwith of
> 10 Mbps. If we supose an asymmetric link with a reverse channel of 28.8
> Kbps (for example a dialup phone line) there is a bandwith ratio of
> 10Mbps / 28.8 Kbps (347:1). Then the real ratio is 347/28 (12:1). So I
> will receive 12 packets while I send just 1 ack. If I don't delay the
> ack 12 packets the reverse direction will limit the forward direction.

Sure. Except that the 10 Mbps is not the effective shared channel
capacity back to the rest of the Internet, let alone the bottleneck
bandwidth thereafter. Hybrid might have made an engineering decision to
mux 100 users onto one 10 Mbps uplink, and that using a 28.8
backchannel was sufficient.

The problem appears to be one of advertised link rate vs. 
what the engineers thought you would get. TCP would not be
the place to address that issue, if so...

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 11:12:49 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA02769 for tcp-impl-list; Mon, 12 Jan 1998 11:04:39 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA02760 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 11:04:38 -0800
Received: from www10.w3.org (www10.w3.org [18.23.0.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA12657
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 11:04:37 -0800
	env-from (frystyk@w3.org)
Received: from batman.w3.org (dialup-243.lcs.mit.edu [18.23.2.43]) by www10.w3.org (8.8.5/8.7.3) with SMTP id OAA18411; Mon, 12 Jan 1998 14:04:30 -0500 (EST)
X-Authentication-Warning: www10.w3.org: Host dialup-243.lcs.mit.edu [18.23.2.43] claimed to be batman.w3.org
Message-Id: <3.0.3.32.19980112140101.006e8e98@localhost>
X-Sender: frystyk@localhost
X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.3 (32)
Date: Mon, 12 Jan 1998 14:01:01 -0500
To: Scott Lawrence <lawrence@agranat.com>, Mohit Aron <aron@cs.rice.edu>
From: Henrik Frystyk Nielsen <frystyk@w3.org>
Subject: Re: delayed ACKs
Cc: "TCP Implementor's List" <tcp-impl@cthulhu.engr.sgi.com>
In-Reply-To: <Pine.LNX.3.96.980112121032.24532C-100000@alice.agranat.com
 >
References: <199801121600.KAA11396@cs.rice.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

At 12:13 01/12/98 -0500, Scott Lawrence wrote:
>With HTTP/1.1 (going to Draft Standard shortly and already deployed by one
>major browser and a number of servers) multiple requests and responses may
>be pipelined on a single TCP connection.  Delayed ACKs will once again be
>an important improvement - the fact that earlier versions of HTTP did not
>have this feature was a bug in the design of HTTP, not a reason to change
>the deferred ACKs design.

yes indeed!

Mohit, if you wanna see some examples of how an HTTP/1.1 robot downloads
1575 files on a single TCP connection using pipelining and smart output
buffering then take a look at 

	http://www.w3.org/Protocols/HTTP/Performance/#HTTP_TCP

This is all part of HTTP/1.1 - as Scott nicely points out - it is time to
revisit the presumptions of how HTTP affects TCP.

Henrik

--
Henrik Frystyk Nielsen,
World Wide Web Consortium
http://www.w3.org/People/Frystyk

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 11:13:00 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA04876 for tcp-impl-list; Mon, 12 Jan 1998 11:09:24 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA04845 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 11:09:22 -0800
Received: from diabolo.upc.es (diabolo.upc.es [147.83.2.2]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA13229
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 11:05:51 -0800
	env-from (jlinares@mat.upc.es)
Received: from mat.upc.es (mat.upc.es [147.83.39.3])
	by diabolo.upc.es (8.8.6/8.8.6) with ESMTP id TAA04098;
	Mon, 12 Jan 1998 19:56:25 +0100 (MET)
Received: from maite150.upc.es (maite150 [147.83.39.150]) by mat.upc.es (8.7.6/8.7.3) with SMTP id TAA17148; Mon, 12 Jan 1998 19:54:37 GMT
Received: from maite150 by maite150.upc.es (SMI-8.6/SMI-SVR4)
	id TAA00039; Mon, 12 Jan 1998 19:53:37 GMT
Message-ID: <34BA74C0.18A5@mat.upc.es>
Date: Mon, 12 Jan 1998 19:53:36 +0000
From: Jaume Linares <jlinares@mat.upc.es>
X-Mailer: Mozilla 3.0 (X11; I; SunOS 5.5.1 sun4m)
MIME-Version: 1.0
To: Mohit Aron <aron@cs.rice.edu>
CC: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: delayed ACKs
References: <199801121753.LAA14761@cs.rice.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Mohit Aron wrote:
> Yes, but in such a situation, one would go for using some sort of 'ack
> filtering' method (also suggested in the paper). If ACKs would be filtered,
> then whether or not the receiver delays ACKs is anyway not a concern.
> 
> - Mohit

You are right. Ack filtering is an alternative way to solve that
situation. Then you have not to change the receiver and the
modifications must be done at the router connected to the constrained
link. However the sender will see the incoming acks as delayed acks
(each ack acknowledges more than 2 packets and are received with a
delay). So it can be seen as an implementation of 'delayed acks' that
has been done at the router and not at the receiver.

Jaume

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 11:20:12 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA06522 for tcp-impl-list; Mon, 12 Jan 1998 11:13:39 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA06506 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 11:13:37 -0800
Received: from onet2.cup.hp.com (onet2.cup.hp.com [15.255.208.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA17068
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 11:13:36 -0800
	env-from (raj@cup.hp.com)
Received: from cup.hp.com (loiter.cup.hp.com [15.13.104.252])
	by onet2.cup.hp.com (8.8.6/8.8.6/LJ-00) with ESMTP id LAA06992
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 11:13:34 -0800 (PST)
Message-ID: <34BA6B5E.E4CA501B@cup.hp.com>
Date: Mon, 12 Jan 1998 11:13:34 -0800
From: Rick Jones <raj@cup.hp.com>
Organization: HP 9000 Network Performance
X-Mailer: Mozilla 4.03 [en] (X11; I; HP-UX B.10.20 9000/735)
MIME-Version: 1.0
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: delayed ACKs
References: <199801121857.SAA04185@rum.isi.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

touch@ISI.EDU wrote:
> 
> > From: Rick Jones <raj@cup.hp.com>
> > Organization: HP 9000 Network Performance
> > To: "TCP Implementor's List" <tcp-impl@cthulhu.engr.sgi.com>
> >
> ...
> > That being the case, most of the rest of the time, the problems I have
> > seen with applications (such as web servers) has been that they are not
> > making "apropriate" calls to send their data - for instance, sending the
> > http headers and the http data in separate calls to send() instead of
> > one call to writev() or something.
> 
> There are other problems - not disabling Nagle, not correcting
> socket buffer sizes, etc...

I agree about the socket buffer sizes, but I'm not fololowing you on
Nagle - why should a web server disable Nagle if it is presenting the
http headers and data to the transport at the same time.

> > Put another way, all "logically associated" (I am sure there is a better
> > term) data should be presented to the transport at the same time.
> 
> That's an artifact of leaving Nagle's algorithm on (by default), and
> trying to anticipate it at the application layer.

I would have thought that presenting as much data to the transport was
goodness in and of itself and did not imply applications trying to
anticipate an implementation specific?

OK, sitting here typing I think I may have figured-out one area you
might consider disabling Nagle. Were you alluding to pipelined requests?
I could see where three http requests sent on a single connection could
have the second and third delayed if they were smaller that MSS/2 (I
guess that generalizes to MSS/(N-1)?)

That being the case, should Nagle be disabled, or would it be better to
have the application tell the transport that there will be no more data
coming for a while - "fflush" for a TCP connection? 

Same question for the server.

rick jones

Even with Nagle on, I would think that pipelined requests "degenerate"
into nothing worse than one at a time, with the added connectoin
overhead, and sometimes have more than one at a time, so it is still
goodness right?

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 11:54:42 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA18269 for tcp-impl-list; Mon, 12 Jan 1998 11:43:38 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA18252 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 11:43:33 -0800
Received: from ns3.harborcom.net (ns3.harborcom.net [206.158.4.7]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id LAA27773
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 11:43:32 -0800
	env-from (jmb@freebsd.org)
Received: from hub.freebsd.org [204.216.27.18] 
	by ns3.harborcom.net with esmtp (Exim 1.73 #1)
	id 0xrpla-0004RF-00; Mon, 12 Jan 1998 14:43:22 -0500
Received: (from jmb@localhost)
          by hub.freebsd.org (8.8.7/8.8.7) id LAA19030;
          Mon, 12 Jan 1998 11:43:20 -0800 (PST)
          (envelope-from jmb)
From: "Jonathan M. Bresler" <jmb@freebsd.org>
Message-Id: <199801121943.LAA19030@hub.freebsd.org>
Subject: Re: delayed ACKs
To: frystyk@w3.org (Henrik Frystyk Nielsen)
Date: Mon, 12 Jan 1998 11:43:20 -0800 (PST)
Cc: lawrence@agranat.com, aron@cs.rice.edu, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <3.0.3.32.19980112140101.006e8e98@localhost> from "Henrik Frystyk Nielsen" at Jan 12, 98 02:01:01 pm
X-Mailer: ELM [version 2.4 PL24]
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Henrik Frystyk Nielsen wrote:
> 
> At 12:13 01/12/98 -0500, Scott Lawrence wrote:
> >With HTTP/1.1 (going to Draft Standard shortly and already deployed by one
> >major browser and a number of servers) multiple requests and responses may
> >be pipelined on a single TCP connection.  Delayed ACKs will once again be
> >an important improvement - the fact that earlier versions of HTTP did not
> >have this feature was a bug in the design of HTTP, not a reason to change
> >the deferred ACKs design.
> 
> yes indeed!
> 
> Mohit, if you wanna see some examples of how an HTTP/1.1 robot downloads
> 1575 files on a single TCP connection using pipelining and smart output
> buffering then take a look at 
> 
> 	http://www.w3.org/Protocols/HTTP/Performance/#HTTP_TCP
> 
> This is all part of HTTP/1.1 - as Scott nicely points out - it is time to
> revisit the presumptions of how HTTP affects TCP.

	HTTP may be an excellent reason to revive T/TCP.
	transfer data in as little as three packets per connection
	with an initial window of 4kB.

	small data connections are handled quickly.
	larger data tranfers move thru slow-start into congestin
	avoidance quickly.

jmb

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 11:54:44 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA20126 for tcp-impl-list; Mon, 12 Jan 1998 11:49:05 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA20082 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 11:49:00 -0800
Received: from frantic.BSDI.COM (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA29590
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 11:48:55 -0800
	env-from (dab@frantic.BSDI.COM)
Received: (from dab@localhost)
	by frantic.BSDI.COM (8.8.5/8.8.5) id NAA00430
	for tcp-impl@cthulhu.engr.sgi.com; Mon, 12 Jan 1998 13:50:19 -0600 (CST)
Date: Mon, 12 Jan 1998 13:50:19 -0600 (CST)
From: David Borman <dab@BSDI.COM>
Message-Id: <199801121950.NAA00430@frantic.BSDI.COM>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: delayed ACKs
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Date: Mon, 12 Jan 1998 11:13:34 -0800
> From: Rick Jones <raj@cup.hp.com>
> ...
> OK, sitting here typing I think I may have figured-out one area you
> might consider disabling Nagle. Were you alluding to pipelined requests?
> I could see where three http requests sent on a single connection could
> have the second and third delayed if they were smaller that MSS/2 (I
> guess that generalizes to MSS/(N-1)?)
> 
> That being the case, should Nagle be disabled, or would it be better to
> have the application tell the transport that there will be no more data
> coming for a while - "fflush" for a TCP connection? 
> 
> Same question for the server.
> 
> rick jones
> 
> Even with Nagle on, I would think that pipelined requests "degenerate"
> into nothing worse than one at a time, with the added connectoin
> overhead, and sometimes have more than one at a time, so it is still
> goodness right?

There is no need to disable the Nagle algorithm.  With pipelined
requests, the first one will be sent immediatly, and the next two
buffered, because of Nagle.  But the first request is going to cause
some data to come back, so the delayed ACK will be piggy-backed on
the data.  When the ACK is received, there is no outstanding TCP data
to send, so the TCP should then kick out the next two requests in a
single packet.

Everything should work just fine.

The case where the interaction between Nagle and delayed ACKs
becomes a problem is when the receiver can't do anything until
it gets the data that hasn't been sent yet, defered by the Nagle
algorithm.  The Nagle code is waiting for the ACK, and there is
no return data on which to piggy-back the delayed ACK, so you
wait for the delayed ACK timer to go off.

			-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 12:09:38 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA21882 for tcp-impl-list; Mon, 12 Jan 1998 11:54:52 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA21873 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 11:54:50 -0800
Received: from venera.isi.edu (venera.isi.edu [128.9.176.32]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA01491
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 11:54:49 -0800
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu (rum.isi.edu [128.9.192.237])
	by venera.isi.edu (8.8.7/8.8.6) with SMTP id LAA02490;
	Mon, 12 Jan 1998 11:51:02 -0800 (PST)
Date: Mon, 12 Jan 1998 19:51:01 GMT
Posted-Date: Mon, 12 Jan 1998 19:51:01 GMT
Message-Id: <199801121951.TAA04462@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-4)
	id <TAA04462>; Mon, 12 Jan 1998 19:51:01 GMT
To: tcp-impl@cthulhu.engr.sgi.com, raj@cup.hp.com
Subject: Re: delayed ACKs
X-Sun-Charset: US-ASCII
X-auto-sig-adder-by: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> From: Rick Jones <raj@cup.hp.com>
> Organization: HP 9000 Network Performance
> Status: RO
> 
> > There are other problems - not disabling Nagle, not correcting
> > socket buffer sizes, etc...
...
> OK, sitting here typing I think I may have figured-out one area you
> might consider disabling Nagle. Were you alluding to pipelined requests?
> I could see where three http requests sent on a single connection could
> have the second and third delayed if they were smaller that MSS/2 (I
> guess that generalizes to MSS/(N-1)?)

Nagle is designed to aggregate traffic and minimize header overhead
for telnet-style connections. 

> That being the case, should Nagle be disabled, or would it be better to
> have the application tell the transport that there will be no more data
> coming for a while - "fflush" for a TCP connection? 

Might be better - if it existed. The PSH bit does this, somewhat.
The problem is that PSH is overloaded with "end of socket write", 
in BSD.

> Even with Nagle on, I would think that pipelined requests "degenerate"
> into nothing worse than one at a time, with the added connectoin
> overhead, and sometimes have more than one at a time, so it is still
> goodness right?

Nagle wants to delay things in the hope of future aggregation;
if there is nothing in the future, this causes delays...

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 12:26:24 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA29100 for tcp-impl-list; Mon, 12 Jan 1998 12:13:38 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA29092 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 12:13:36 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA07721
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 12:13:34 -0800
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (the-village.bc.nu [163.164.160.21]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id UAA11177; Mon, 12 Jan 1998 20:13:11 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0xrqc8-0005FsC; Mon, 12 Jan 98 20:37 GMT
Message-Id: <m0xrqc8-0005FsC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: delayed ACKs
To: dab@bsdi.com (David Borman)
Date: Mon, 12 Jan 1998 20:37:38 +0000 (GMT)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199801121950.NAA00430@frantic.BSDI.COM> from "David Borman" at Jan 12, 98 01:50:19 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> The case where the interaction between Nagle and delayed ACKs
> becomes a problem is when the receiver can't do anything until
> it gets the data that hasn't been sent yet, defered by the Nagle
> algorithm.  The Nagle code is waiting for the ACK, and there is
> no return data on which to piggy-back the delayed ACK, so you
> wait for the delayed ACK timer to go off.

However 4.4 BSD explicitly checks and does not delay an ack for a 
smaller than MTU sized frame. So this appears less of a problem anyway
if Im reading my copy of Stevens right

Alan


From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 12:37:38 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA03940 for tcp-impl-list; Mon, 12 Jan 1998 12:28:33 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA03905 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 12:28:24 -0800
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA12176
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 12:28:23 -0800
	env-from (braden@ISI.EDU)
From: braden@ISI.EDU
Received: from can.isi.edu (can.isi.edu [128.9.160.148])
	by zephyr.isi.edu (8.8.7/8.8.6) with SMTP id MAA15174;
	Mon, 12 Jan 1998 12:28:18 -0800 (PST)
Date: Mon, 12 Jan 98 12:27:52 PST
Posted-Date: Mon, 12 Jan 98 12:27:52 PST
Message-Id: <9801122027.AA03361@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA03361>; Mon, 12 Jan 98 12:27:52 PST
To: frystyk@w3.org, jmb@freebsd.org
Subject: Re: delayed ACKs
Cc: lawrence@agranat.com, aron@cs.rice.edu, tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


  *> 
  *> 	HTTP may be an excellent reason to revive T/TCP.

Revive?  More like, breathe life into... :-)

Bob Braden

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 13:04:28 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA12210 for tcp-impl-list; Mon, 12 Jan 1998 12:51:10 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA12183 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 12:51:04 -0800
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA18990
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 12:51:03 -0800
	env-from (Jerry.Chu@eng.Sun.COM)
Received: from sunmail1.Sun.COM ([129.145.1.2]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id MAA07248; Mon, 12 Jan 1998 12:51:02 -0800
Received: from jurassic.eng.sun.com by sunmail1.Sun.COM (SMI-8.6/SMI-4.1)
	id MAA06547; Mon, 12 Jan 1998 12:48:54 -0800
Received: from taipei.eng.sun.com (taipei [129.146.86.158])
	by jurassic.eng.sun.com (8.8.8+Sun+sa+re+hr/8.8.8) with SMTP id MAA00389;
	Mon, 12 Jan 1998 12:45:26 -0800 (PST)
Received: by taipei.eng.sun.com (SMI-8.6/SMI-SVR4)
	id MAA15051; Mon, 12 Jan 1998 12:44:01 -0800
Date: Mon, 12 Jan 1998 12:44:01 -0800
From: Jerry.Chu@eng.Sun.COM (Hsiao-keng Jerry Chu)
Message-Id: <199801122044.MAA15051@taipei.eng.sun.com>
To: aron@cs.rice.edu
Subject: Re: delayed ACKs
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>I'm not aware of any work that demonstrates how important is
>ii). At least reducing the ACKs in the network doesn't appear to provide much
>benefit as ACKs are typically quite small in size as compared to data segments.

It's not so much of a saving in network bandwidth as in host processing.
We have data showing delayed-ack significantly reduces the sender's CPU
time over a high-speed LAN environment due to less interrupts.

>i) The slow-start and congestion avoidance at the sender get slower as these
>   do not increase the window based on the bytes acknowledged, but rather on
>   the number of ACKs received (which are halved).

This can be remedied by increasing the cwnd according to bytes-acked, not
# of acks, which i proposed a year ago, but wasn't received well...

Jerry Chu
Internet Engineering
SunSoft

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 13:32:48 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA19982 for tcp-impl-list; Mon, 12 Jan 1998 13:20:00 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA19955 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 13:19:51 -0800
Received: from venera.isi.edu (venera.isi.edu [128.9.176.32]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA28135
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 13:19:50 -0800
	env-from (touch@ISI.EDU)
From: touch@ISI.EDU
Received: from rum.isi.edu (rum.isi.edu [128.9.192.237])
	by venera.isi.edu (8.8.7/8.8.6) with SMTP id NAA03746;
	Mon, 12 Jan 1998 13:19:49 -0800 (PST)
Date: Mon, 12 Jan 1998 21:19:48 GMT
Posted-Date: Mon, 12 Jan 1998 21:19:48 GMT
Message-Id: <199801122119.VAA04946@rum.isi.edu>
Received: by rum.isi.edu (SMI-8.6/4.0.3-4)
	id <VAA04946>; Mon, 12 Jan 1998 21:19:48 GMT
To: aron@cs.rice.edu, Jerry.Chu@Eng.Sun.COM
Subject: Re: delayed ACKs
Cc: tcp-impl@cthulhu.engr.sgi.com
X-Sun-Charset: US-ASCII
X-auto-sig-adder-by: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> >i) The slow-start and congestion avoidance at the sender get slower as these
> >   do not increase the window based on the bytes acknowledged, but rather on
> >   the number of ACKs received (which are halved).
> 
> This can be remedied by increasing the cwnd according to bytes-acked, not
> # of acks, which i proposed a year ago, but wasn't received well...
> 
> Jerry Chu
> Internet Engineering

I don't know if we ever got back to these, but there are two
related issues:

	- CWND increase by # bytes ACK'd only (as above)

	- delayed-ACK when any two segments received
		not wait for two full segments


The general notion is that there are conflicting uses of the
following terms:

	- segment

	- full segment

Both above fall into this category (increase by full segment, not actual;
wait for full segments, not actual). Are there any others??

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 14:58:05 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA18154 for tcp-impl-list; Mon, 12 Jan 1998 14:47:47 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA18136 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 14:47:41 -0800
Received: from brookfield.ans.net (brookfield-ef0.brookfield.ans.net [204.148.1.20]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA27017
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 14:47:40 -0800
	env-from (curtis@brookfield.ans.net)
Received: from brookfield.ans.net (localhost.brookfield.ans.net [127.0.0.1])
	by brookfield.ans.net (8.8.5/8.8.5) with ESMTP id RAA06621;
	Mon, 12 Jan 1998 17:47:17 -0500 (EST)
Message-Id: <199801122247.RAA06621@brookfield.ans.net>
To: Mohit Aron <aron@cs.rice.edu>
cc: jlinares@mat.upc.es (Jaume Linares), tcp-impl@cthulhu.engr.sgi.com
Reply-To: curtis@ans.net
Subject: Re: delayed ACKs 
In-reply-to: Your message of "Mon, 12 Jan 1998 11:53:37 CST."
             <199801121753.LAA14761@cs.rice.edu> 
Date: Mon, 12 Jan 1998 17:47:17 -0500
From: Curtis Villamizar <curtis@brookfield.ans.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In message <199801121753.LAA14761@cs.rice.edu>, Mohit Aron writes:
> 
> > 	Delayed acks could be interesting when there is an assymetric link and
> > the bandwith in the reverse direction is smaller than the forward
> > direction (for example in wireless cable modem networs, direct broadcast
> > satellite networks and Asymmetric Digital Subscriber Loop (ASDL)). In
> > that case there is a limitation of the throughput achievable due to the
> > transmission of acks, and it's necessary to decrease the amount of acks.
> > See paper "The Effects of Asymmetry on TCP Performance" at
> > http://daedalus.cs.berkeley.edu/
> 
> Yes, but in such a situation, one would go for using some sort of 'ack
> filtering' method (also suggested in the paper). If ACKs would be filtered,
> then whether or not the receiver delays ACKs is anyway not a concern.
> 
> - Mohit


This sort of filtering isn't even close to feasible at high speed so
it makes much more sense for the end systems to do it.  End systems do
TCP.  Routers in the middle forward IP packets and try to avoid
looking inside them.

Curtis


From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 15:23:01 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA26000 for tcp-impl-list; Mon, 12 Jan 1998 15:09:43 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA25987 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 15:09:42 -0800
Received: from scanner.worldgate.com (scanner.worldgate.com [198.161.84.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA04540
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 15:09:36 -0800
	env-from (marcs@znep.com)
Received: from znep.com (uucp@localhost)
	by scanner.worldgate.com (8.8.7/8.8.7) with UUCP id QAA03369;
	Mon, 12 Jan 1998 16:08:04 -0700 (MST)
Received: from localhost (marcs@localhost) by alive.znep.com (8.7.5/8.7.3) with SMTP id QAA05519; Mon, 12 Jan 1998 16:06:35 -0700 (MST)
Date: Mon, 12 Jan 1998 16:06:35 -0700 (MST)
From: Marc Slemko <marcs@znep.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: delayed ACKs
In-Reply-To: <m0xrqc8-0005FsC@lightning.swansea.linux.org.uk>
Message-ID: <Pine.BSF.3.95.980112155909.5045A-100000@alive.znep.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

On Mon, 12 Jan 1998, Alan Cox wrote:

> > The case where the interaction between Nagle and delayed ACKs
> > becomes a problem is when the receiver can't do anything until
> > it gets the data that hasn't been sent yet, defered by the Nagle
> > algorithm.  The Nagle code is waiting for the ACK, and there is
> > no return data on which to piggy-back the delayed ACK, so you
> > wait for the delayed ACK timer to go off.
> 
> However 4.4 BSD explicitly checks and does not delay an ack for a 
> smaller than MTU sized frame. So this appears less of a problem anyway
> if Im reading my copy of Stevens right

I don't know exactly what you are looking at, but unless I am missing
something, it doesn't behave that way for me...

In fact, just the other day, I ran into a cute problem doing some simple
web server benchmarking.  I was getting 5 requests/sec.  That number alone
made me suspicious; 200ms*5 = 1s.

What was happening is a BSD bug (cf. TCP/IP Ill vol3, section 14.11) was
causing the client (FreeBSD 2.2) to put a write() of between.. erm...  101
and 208 bytes into two mbufs instead of a mbuf cluster.  This means it was
sent in two packets.  Slow start wasn't the issue since I was using
persistent connections, but Nagle on the client combined with delayed ack
from the server was causing the client to delay sending the second packet
in the request until the ack arrived for the first packet, 200ms later. 
Hence almost exactly 5 requests/sec. 

The server was also FreeBSD 2.2; unless something has changed from 4.4BSD
there, it does delay acks for frames smaller than the MTU. 

I guess I can be thankful that most HTTP requests are more bloated than
208 bytes or the client could disable Nagle to work around it.  Slow-start
still could pose a problem though.


From owner-tcp-impl@relay.engr.sgi.com  Mon Jan 12 16:29:07 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA20759 for tcp-impl-list; Mon, 12 Jan 1998 16:17:52 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA20750 for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 16:17:50 -0800
Received: from mailgate2.aist-nara.ac.jp (fse4.aist-nara.ac.jp [163.221.76.12]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA24013
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 12 Jan 1998 16:17:47 -0800
	env-from (yukio-m@is.aist-nara.ac.jp)
Received: from sgi054.aist-nara.ac.jp (sgi054.aist-nara.ac.jp [163.221.74.141]) by mailgate2.aist-nara.ac.jp (8.8.5+2.7Wbeta5/3.5Wpl5/NAIST/GATE-2.2) with ESMTP id JAA09875 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 13 Jan 1998 09:17:42 +0900 (JST)
Received: from localhost (localhost [127.0.0.1]) by sgi054.aist-nara.ac.jp (8.8.4+2.7Wbeta4/3.5Wpl5/NAIST/2.0) with ESMTP id AAA23432 for tcp-impl@cthulhu.engr.sgi.com; Tue, 13 Jan 1998 00:17:37 GMT
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: delayed ACKs
In-Reply-To: Your message of "Mon, 12 Jan 1998 10:00:43 -0600 (CST)"
References: <199801121600.KAA11396@cs.rice.edu>
X-Mailer: Mew version 1.70 on Emacs 19.28.551 / Mule 2.3
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Date: Tue, 13 Jan 1998 09:17:36 +0900
Message-ID: <23429.884650656@sgi054.aist-nara.ac.jp>
From: Yukio Murayama (=?ISO-2022-JP?B?GyRCQjw7Mzh4Sl0bKEI=?= ) <yukio-m@is.aist-nara.ac.jp>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hi,

I have been working about TCP problems between sender side
controls (Nagle's algorithm, slow start, path MTU discovery)
and a receiver side control (delayed ACK).

http://shika.aist-nara.ac.jp/member/yukio-m/papers/icoin12.ps

I think the delayed ACK is very useful mechanism to reduce
the traffic and the CPU loads.

// Yukio Murayama yukio-m@is.aist-nara.ac.jp
// Graduate School of Information Science,
// Nara Institute of Science and Technology.

From owner-tcp-impl@relay.engr.sgi.com  Tue Jan 13 16:22:05 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA10130 for tcp-impl-list; Tue, 13 Jan 1998 16:10:30 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA10063 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 13 Jan 1998 16:10:24 -0800
Received: from enterprise.hybrid.com (enterprise.hybrid.com [166.117.10.2]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id QAA10869
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 13 Jan 1998 16:10:23 -0800
	env-from (subir@hybrid.com)
Received: by enterprise.hybrid.com with SMTP (Microsoft Exchange Server Internet Mail Connector Version 4.0.995.52)
	id <01BD203C.F97C2030@enterprise.hybrid.com>; Tue, 13 Jan 1998 16:04:49 -0800
Message-ID: <c=US%a=_%p=hybrid%l=ENTERPRISE-980114000448Z-10194@enterprise.hybrid.com>
From: Subir Varma <subir@hybrid.com>
To: "'Mohit Aron'" <aron@cs.rice.edu>, "'curtis@ans.net'" <curtis@ans.net>
Cc: "'jlinares@mat.upc.es'" <jlinares@mat.upc.es>,
        "'tcp-impl@cthulhu.engr.sgi.com'" <tcp-impl@cthulhu.engr.sgi.com>
Subject: RE: delayed ACKs 
Date: Tue, 13 Jan 1998 16:04:48 -0800
X-Mailer:  Microsoft Exchange Server Internet Mail Connector Version 4.0.995.52
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hybrid Networks has been incorporating an algorithm that does ACK
filtering over slow upstream links, into its cable and wireless
modems for at least two years. From this point of view we are neutral on
the subject of whether delayed ACKs
are implemented in the end system or not, even in highly asymmetric
systems.
In general, trying to delay ACKs in order to combat asymmetry is more
difficult in the end system, since the
amount of ACK delay that needs to be introduced depends upon the
specific upstream and downstream link speeds and the resulting asymmetry
factor, and this
information is more readily available at the modem, as compared to the
end system.

Subir Varma
Hybrid Networks

>----------
>From: 	Curtis Villamizar[SMTP:curtis@brookfield.ans.net]
>Sent: 	Monday, January 12, 1998 11:47 PM
>To: 	Mohit Aron
>Cc: 	jlinares@mat.upc.es; tcp-impl@cthulhu.engr.sgi.com
>Subject: 	Re: delayed ACKs 
>
>
>In message <199801121753.LAA14761@cs.rice.edu>, Mohit Aron writes:
>> 
>> > 	Delayed acks could be interesting when there is an assymetric link and
>> > the bandwith in the reverse direction is smaller than the forward
>> > direction (for example in wireless cable modem networs, direct broadcast
>> > satellite networks and Asymmetric Digital Subscriber Loop (ASDL)). In
>> > that case there is a limitation of the throughput achievable due to the
>> > transmission of acks, and it's necessary to decrease the amount of acks.
>> > See paper "The Effects of Asymmetry on TCP Performance" at
>> > http://daedalus.cs.berkeley.edu/
>> 
>> Yes, but in such a situation, one would go for using some sort of 'ack
>> filtering' method (also suggested in the paper). If ACKs would be filtered,
>> then whether or not the receiver delays ACKs is anyway not a concern.
>> 
>> - Mohit
>
>
>This sort of filtering isn't even close to feasible at high speed so
>it makes much more sense for the end systems to do it.  End systems do
>TCP.  Routers in the middle forward IP packets and try to avoid
>looking inside them.
>
>Curtis
>
>

From owner-tcp-impl@relay.engr.sgi.com  Wed Jan 14 08:51:25 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id IAA15385 for tcp-impl-list; Wed, 14 Jan 1998 08:38:06 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id IAA15369 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 14 Jan 1998 08:38:01 -0800
Received: from diable.upc.es (diable.upc.es [147.83.98.7]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id IAA01248
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 14 Jan 1998 08:37:19 -0800
	env-from (jlinares@mat.upc.es)
Received: from mat.upc.es (mat.upc.es [147.83.39.3])
	by diable.upc.es (8.8.6/8.8.6) with ESMTP id RAA16100;
	Wed, 14 Jan 1998 17:34:13 +0100 (MET)
Received: from maite123.upc.es (maite123 [147.83.39.123]) by mat.upc.es (8.7.6/8.7.3) with SMTP id RAA23318; Wed, 14 Jan 1998 17:32:23 GMT
Received: from maite123 by maite123.upc.es (SMI-8.6/SMI-SVR4)
	id RAA04367; Wed, 14 Jan 1998 17:39:30 GMT
Message-ID: <34BCF852.744B@mat.upc.es>
Date: Wed, 14 Jan 1998 17:39:30 +0000
From: Jaume Linares <jlinares@mat.upc.es>
X-Mailer: Mozilla 3.0 (X11; I; SunOS 5.5.1 sun4m)
MIME-Version: 1.0
To: Subir Varma <subir@hybrid.com>
CC: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: delayed ACKs
References: <c=US%a=_%p=hybrid%l=ENTERPRISE-980114000448Z-10194@enterprise.hybrid.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Possible advantatges of delayed acks over ack filtering:

1) There is not an innecessary transmission of acks (spending resources)
that are going to be filtered at the router connected to the asymmetric
link.

2) If we consider that the dupacks are not delayed in the end system,
there won't be problems with fast retransmit/fast recovery algorithms,
which are triggered by 3 dupacks. The filtering of the dupacks at the
router may cause not to achieve the threshold of dupacks.

Both algorithms cause sender burstiness. How do you solve that problem?

Thanks in advance.

Jaume



Subir Varma wrote:
> 
> Hybrid Networks has been incorporating an algorithm that does ACK
> filtering over slow upstream links, into its cable and wireless
> modems for at least two years. From this point of view we are neutral on
> the subject of whether delayed ACKs
> are implemented in the end system or not, even in highly asymmetric
> systems.
> In general, trying to delay ACKs in order to combat asymmetry is more
> difficult in the end system, since the
> amount of ACK delay that needs to be introduced depends upon the
> specific upstream and downstream link speeds and the resulting asymmetry
> factor, and this
> information is more readily available at the modem, as compared to the
> end system.
> 
> Subir Varma
> Hybrid Networks

From owner-tcp-impl@relay.engr.sgi.com  Fri Jan 16 15:00:26 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id OAA18112 for tcp-impl-list; Fri, 16 Jan 1998 14:49:19 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id OAA18093 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 16 Jan 1998 14:49:13 -0800
Received: from desolation.CS.Berkeley.EDU (desolation.CS.Berkeley.EDU [128.32.33.142]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id OAA05611
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 16 Jan 1998 14:49:12 -0800
	env-from (padmanab@desolation.CS.Berkeley.EDU)
Received: from desolation.CS.Berkeley.EDU (padmanab@localhost) by desolation.CS.Berkeley.EDU (8.8.3/8.8.2) with ESMTP id OAA04216; Fri, 16 Jan 1998 14:45:17 -0800 (PST)
Message-Id: <199801162245.OAA04216@desolation.CS.Berkeley.EDU>
X-Mailer: exmh version 1.6.9 8/22/96
X-Face: 8oz'i+bl`|5PbRnbf:lhb^%e[KkX6s2O+~WXUjjyZy3<eONU1x6ko7NU'ZWaahSA9@z67tI
 imt6ht.__nYj:ufI1]z(DMC4k*hEJO=Y3iihd[[ZDHRV<%<Gl,tJqvQ`2h*[FyU&7F=>Ew*s3R1@]D
 {~a]r4V]),Mlwru>UYa+!f7aeLD3),v{_U3S*(e/Os}3N7*+U+#;5\W0!-U+zs&>c/Gb2FH/|KZ*Li
 eMcCH0X~${-18~JhYDf3Dc}H1,F<V
X-url: http://www.cs.berkeley.edu/~padmanab
Reply-To: Venkat Padmanabhan <padmanab@cs.berkeley.edu>
From: Venkat Padmanabhan <padmanab@cs.berkeley.edu>
To: Jaume Linares <jlinares@mat.upc.es>
cc: Subir Varma <subir@hybrid.com>, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: delayed ACKs 
In-reply-to: Your message of "Wed, 14 Jan 1998 17:39:30 GMT."
             <34BCF852.744B@mat.upc.es> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Fri, 16 Jan 1998 14:45:17 -0800
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> Possible advantatges of delayed acks over ack filtering:
> 
> 1) There is not an innecessary transmission of acks (spending resources)
> that are going to be filtered at the router connected to the asymmetric
> link.
> 
> 2) If we consider that the dupacks are not delayed in the end system,
> there won't be problems with fast retransmit/fast recovery algorithms,
> which are triggered by 3 dupacks. The filtering of the dupacks at the
> router may cause not to achieve the threshold of dupacks.
> 
> Both algorithms cause sender burstiness. How do you solve that problem?
> 

A few of us at Berkeley (Hari Balakrishnan, Randy Katz and myself)
investigated this problem and several solutions using a wireless cable 
modem network from Hybrid, Inc. We considered ack filtering as
well as a generalized version of delayed acks (which we called 
"ack congestion control"). The idea was that ack congestion controlled
could be triggered by RED-like marking of ack packets, which then
gets reflected back to the receiver in later data packets.

We also considered the problem of source burstiness and designed a
simple timer-based scheme to break up potentially large bursts into
smaller ones. We have done an implementation of this is the BSD/OS 3.0
kernel, and it works quite well.

In our investigations, we also found that the same techniques help
improve performance in packet-radio networks such as the Ricochet
network from Metricom, Inc.

These techniques techniques, together with simulation results, are
described in our Mobicom '97 paper, which is available from:

  http://http.cs.berkeley.edu/~padmanab/papers/mobicom97.ps
  http://http.cs.berkeley.edu/~padmanab/papers/mobicom97.ps.gz

-Venkat




From owner-tcp-impl@relay.engr.sgi.com  Tue Jan 27 11:58:28 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA22233 for tcp-impl-list; Tue, 27 Jan 1998 11:51:34 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA22200 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 27 Jan 1998 11:51:28 -0800
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA01643
	for <tcp-impl@relay.engr.sgi.com>; Tue, 27 Jan 1998 11:51:27 -0800
	env-from (aron@cs.rice.edu)
Received: (from aron@localhost)
          by cs.rice.edu (8.8.5/8.8.4)
	  id NAA00054 for tcp-impl@relay.engr.sgi.com; Tue, 27 Jan 1998 13:51:26 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Message-Id: <199801271951.NAA00054@cs.rice.edu>
Subject: New Reno
To: tcp-impl@cthulhu.engr.sgi.com (TCP Implementor's List)
Date: Tue, 27 Jan 1998 13:51:26 -0600 (CST)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hi,
	I have a couple of quesitons regarding New-Reno.

a) Does TCP New-Reno incorporate the initial ssthresh prediction mentioned in
   [1] ? 

b) In recovering from multiple-packet losses, two different schemes have been
   suggested in [1] and [2]. The method in [2] recovers N losses in a window
   in N RTTs. The method in [1] recovers losses faster using slow-start and 
   takes less time than N RTTs. Which of these schemes has been considered
   appropriate for New-Reno ?


- Mohit



1. J. C. Hoe. Improving the Start-up behaviour of a Congestion Control Scheme
   for TCP. In Proceedings of SIGCOMM '96.

2. J. C. Hoe. Startup Dynamics of TCP's Congestion Control and Avoidance 
   Schemes. Master's Thesis, MIT, 1995.

From owner-tcp-impl@relay.engr.sgi.com  Thu Jan 29 17:46:26 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA29555 for tcp-impl-list; Thu, 29 Jan 1998 17:38:28 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA29490 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 29 Jan 1998 17:38:19 -0800
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA17404
	for <tcp-impl@relay.engr.sgi.com>; Thu, 29 Jan 1998 17:38:17 -0800
	env-from (aron@cs.rice.edu)
Received: from mrsclaus.cs.rice.edu (mrsclaus.cs.rice.edu [128.42.1.108])
          by cs.rice.edu (8.8.5/8.8.4) with ESMTP
	  id TAA15587 for <tcp-impl@relay.engr.sgi.com>; Thu, 29 Jan 1998 19:38:10 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Received: (from aron@localhost) by mrsclaus.cs.rice.edu (8.8.5/8.7.3) id TAA15599 for tcp-impl@relay.engr.sgi.com; Thu, 29 Jan 1998 19:38:09 -0600 (CST)
Message-Id: <199801300138.TAA15599@mrsclaus.cs.rice.edu>
Subject: TCP problem
To: tcp-impl@cthulhu.engr.sgi.com (TCP Implementor's List)
Date: Thu, 29 Jan 1998 19:38:09 -0600 (CST)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hi,
	current BSD implementations set the ssthresh value to half the value of
the congestion window upon a timeout. Suppose a timeout happens due to a
retransmitted segment getting lost (or all its ACKs getting lost). As the
congestion window is normally inflated during fast recovery, the ssthresh value
upon a timeout will be set to 1/2 the inflated congestion window.  In
particular, if the retransmitted segment gets lost, then the congestion window
will keep getting inflated (due to duplicate ACKs) during fast recovery till it
hits the advertised window. The ssthresh would then be set to 1/2 the
advertised window upon a timeout!

I think implementations should check whether TCP is in fast recovery and 
then set the ssthresh accordingly upon a timeout. A timeout in fast recovery
should simply halve the ssthresh value IMHO.



- Mohit Aron
  aron@cs.rice.edu

From owner-tcp-impl@relay.engr.sgi.com  Mon Feb  2 09:54:48 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA05549 for tcp-impl-list; Mon, 2 Feb 1998 09:45:01 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA05538; Mon, 2 Feb 1998 09:44:59 -0800
Received: from bossette.engr.sgi.com (tree.engr.sgi.com [150.166.61.12]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA05974; Mon, 2 Feb 1998 09:44:58 -0800
	env-from (sm@bossette.engr.sgi.com)
Received: (from sm@localhost) by bossette.engr.sgi.com (971110.SGI.8.8.8/970903.SGI.AUTOCF) id JAA68051; Mon, 2 Feb 1998 09:44:52 -0800 (PST)
From: sm@bossette.engr.sgi.com (Sam Manthorpe)
Message-Id: <199802021744.JAA68051@bossette.engr.sgi.com>
Subject: Re: TCP problem
In-Reply-To: <199801300138.TAA15599@mrsclaus.cs.rice.edu> from Mohit Aron at "Jan 29, 98 07:38:09 pm"
To: aron@cs.rice.edu (Mohit Aron)
Date: Mon, 2 Feb 1998 09:44:51 -0800 (PST)
Cc: tcp-impl@cthulhu.engr.sgi.com
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Hi,

> 	current BSD implementations set the ssthresh value to half the value of
> the congestion window upon a timeout. Suppose a timeout happens due to a
> retransmitted segment getting lost (or all its ACKs getting lost). As the
> congestion window is normally inflated during fast recovery, the ssthresh value
> upon a timeout will be set to 1/2 the inflated congestion window.  
> In particular, if the retransmitted segment gets lost, then the 
> congestion window will keep getting inflated (due to duplicate ACKs) 
> during fast recovery till it hits the advertised window. The ssthresh 
> would then be set to 1/2 the advertised window upon a timeout!

The retransmission timer should be switched off once the fast-retransmit
phase is entered (e.g. line 869 of netinet/tcp_input.c of 4.4BSD), so
this problem should not occur, as far as I can see.

Sam.

------------------------------------------------------------
Sam Manthorpe, SGI.  tel: (650) 933-2856 fax: (650) 932-1788

From owner-tcp-impl@relay.engr.sgi.com  Mon Feb  2 10:03:13 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA11392 for tcp-impl-list; Mon, 2 Feb 1998 09:59:52 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA11375; Mon, 2 Feb 1998 09:59:50 -0800
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA11302; Mon, 2 Feb 1998 09:59:49 -0800
	env-from (aron@cs.rice.edu)
Received: from mrsclaus.cs.rice.edu (mrsclaus.cs.rice.edu [128.42.1.108])
          by cs.rice.edu (8.8.5/8.8.4) with ESMTP
	  id LAA10810; Mon, 2 Feb 1998 11:59:48 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Received: (from aron@localhost) by mrsclaus.cs.rice.edu (8.8.5/8.7.3) id LAA23514; Mon, 2 Feb 1998 11:59:47 -0600 (CST)
Message-Id: <199802021759.LAA23514@mrsclaus.cs.rice.edu>
Subject: Re: TCP problem
To: sm@bossette.engr.sgi.com (Sam Manthorpe)
Date: Mon, 2 Feb 1998 11:59:46 -0600 (CST)
Cc: aron@cs.rice.edu, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199802021744.JAA68051@bossette.engr.sgi.com> from "Sam Manthorpe" at Feb 2, 98 09:44:51 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> 
> The retransmission timer should be switched off once the fast-retransmit
> phase is entered (e.g. line 869 of netinet/tcp_input.c of 4.4BSD), so
> this problem should not occur, as far as I can see.
> 


This is incorrect. What if the retransmitted packet as well as all other
ACKs in transit get lost ? There'll be no timer to tell TCP about this
situation and the connection would remain in this state forever.



- Mohit Aron
  aron@cs.rice.edu

From owner-tcp-impl@relay.engr.sgi.com  Mon Feb  2 10:47:38 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA24736 for tcp-impl-list; Mon, 2 Feb 1998 10:37:30 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA24725; Mon, 2 Feb 1998 10:37:28 -0800
Received: from bossette.engr.sgi.com (tree.engr.sgi.com [150.166.61.12]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA24909; Mon, 2 Feb 1998 10:37:27 -0800
	env-from (sm@bossette.engr.sgi.com)
Received: (from sm@localhost) by bossette.engr.sgi.com (971110.SGI.8.8.8/970903.SGI.AUTOCF) id KAA68786; Mon, 2 Feb 1998 10:37:26 -0800 (PST)
From: sm@bossette.engr.sgi.com (Sam Manthorpe)
Message-Id: <199802021837.KAA68786@bossette.engr.sgi.com>
Subject: Re: TCP problem
In-Reply-To: <199802021759.LAA23514@mrsclaus.cs.rice.edu> from Mohit Aron at "Feb 2, 98 11:59:46 am"
To: aron@cs.rice.edu (Mohit Aron)
Date: Mon, 2 Feb 1998 10:37:25 -0800 (PST)
Cc: aron@cs.rice.edu, tcp-impl@cthulhu.engr.sgi.com
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Hi,

> > The retransmission timer should be switched off once the fast-retransmit
> > phase is entered (e.g. line 869 of netinet/tcp_input.c of 4.4BSD), so
> > this problem should not occur, as far as I can see.
> > 
> This is incorrect. What if the retransmitted packet as well as all other
> ACKs in transit get lost ? There'll be no timer to tell TCP about this
> situation and the connection would remain in this state forever.

Yes, you're right, my mistake.

Sam.

------------------------------------------------------------
Sam Manthorpe, SGI.  tel: (650) 933-2856 fax: (650) 932-1788

From owner-tcp-impl@relay.engr.sgi.com  Tue Feb  3 03:16:05 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id DAA15191 for tcp-impl-list; Tue, 3 Feb 1998 03:07:24 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id DAA15166 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 3 Feb 1998 03:07:18 -0800
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id DAA21033
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 3 Feb 1998 03:06:05 -0800
	env-from (chetan@protocol.ece.iisc.ernet.in)
Received: from ece.iisc.ernet.in by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id QAA04722; Tue, 3 Feb 1998 16:27:52 +0530
Received: from protocol.ece.iisc.ernet.in by ece.iisc.ernet.in (4.1/SMI-4.1)
	id AA23371; Tue, 3 Feb 98 16:26:55+0530
Received: from localhost (chetan@localhost)
	by protocol.ece.iisc.ernet.in (8.8.5/8.8.5) with SMTP id QAA19938;
	Tue, 3 Feb 1998 16:30:26 GMT
Date: Tue, 3 Feb 1998 16:30:26 +0000 (GMT)
From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
To: Mohit Aron <aron@cs.rice.edu>
Cc: "TCP Implementor's List" <tcp-impl@cthulhu.engr.sgi.com>,
        "K.N.S.Reddy" <reddy@protocol.ece.iisc.ernet.in>
Subject: Re: TCP problem
In-Reply-To: <199801300138.TAA15599@mrsclaus.cs.rice.edu>
Message-Id: <Pine.LNX.3.95.980203162034.18943B-100000@protocol.ece.iisc.ernet.in>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


On Thu, 29 Jan 1998, Mohit Aron wrote:

*>Hi,
*>	current BSD implementations set the ssthresh value to half the value of
*>the congestion window upon a timeout. Suppose a timeout happens due to a
*>retransmitted segment getting lost (or all its ACKs getting lost). As the
*>congestion window is normally inflated during fast recovery, the ssthresh value
*>upon a timeout will be set to 1/2 the inflated congestion window.  In
*>particular, if the retransmitted segment gets lost, then the congestion window
*>will keep getting inflated (due to duplicate ACKs) during fast recovery till it

If this were to be the case ( congestion window getting inflated) then
network is capable of taking packets (dupacks tell me this ), and it only
a transiant that re-tansmitted packet got lost. So we can still continue
in fast recovery mode. May be one can retransmit the re-tansmitted packet
even after the timeout, if one is still sees that congestion window
getting inflated. Did I missed any thing Mohit ?

Chetan S


From owner-tcp-impl@relay.engr.sgi.com  Tue Feb  3 06:13:08 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id FAA07537 for tcp-impl-list; Tue, 3 Feb 1998 05:55:28 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA07531 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 3 Feb 1998 05:55:26 -0800
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id FAA23518
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 3 Feb 1998 05:55:26 -0800
	env-from (aron@cs.rice.edu)
Received: (from aron@localhost)
          by cs.rice.edu (8.8.5/8.8.4)
	  id HAA02775; Tue, 3 Feb 1998 07:54:18 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Message-Id: <199802031354.HAA02775@cs.rice.edu>
Subject: Re: TCP problem
To: chetan@protocol.ece.iisc.ernet.in (Chetan Kumar)
Date: Tue, 3 Feb 1998 07:54:18 -0600 (CST)
Cc: tcp-impl@cthulhu.engr.sgi.com, reddy@protocol.ece.iisc.ernet.in
In-Reply-To: <Pine.LNX.3.95.980203162034.18943B-100000@protocol.ece.iisc.ernet.in> from "Chetan Kumar" at Feb 3, 98 04:30:26 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> If this were to be the case ( congestion window getting inflated) then
> network is capable of taking packets (dupacks tell me this ), and it only
> a transiant that re-tansmitted packet got lost. So we can still continue
> in fast recovery mode. May be one can retransmit the re-tansmitted packet
> even after the timeout, if one is still sees that congestion window
> getting inflated. Did I missed any thing Mohit ?
> 
> Chetan S
> 


It is generally accepted that the right behaviour after a timeout is to 
cut down the congestion window to 1 and do a slow-start. My concern primarily
involvs the setting of ssthresh in this situation.




- Mohit

From owner-tcp-impl@relay.engr.sgi.com  Tue Feb  3 18:31:46 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA03335 for tcp-impl-list; Tue, 3 Feb 1998 18:24:01 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA03319 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 3 Feb 1998 18:23:55 -0800
Received: from mail2.geo.net (mail2.geo.net [166.90.101.12]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id SAA21465
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 3 Feb 1998 18:23:55 -0800
	env-from (kwe@geo.net)
Message-Id: <199802040223.SAA21465@sgi.sgi.com>
Received: (qmail 12456 invoked from network); 4 Feb 1998 02:23:53 -0000
Received: from kent.geo.net (207.90.136.95)
  by mail2.geo.net with SMTP; 4 Feb 1998 02:23:53 -0000
X-Sender: kwe@zeus.geo.net
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0
Date: Tue, 03 Feb 1998 18:11:13 -0800
To: tcp-impl@cthulhu.engr.sgi.com
From: "Kent W. England" <kwe@geo.net>
Subject: Sitara
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Folks;

Everyone wants to do something about the world wide wait including fiddling
tcp. You may have heard of Sitara ( http://www.sitara.net ).

Sitara is selling shim software that is supposed to speed web page delivery
up to 3x by replacing the http/tcp code in the browser and server.

Some of the things they say they are doing include:

streamlined handshaking (by combining http-request with tcp-handshake?)
efficient loss recovery (resend lost segment only)
single, persistent connection (ala http 1.1?)
control of network congestion and data flow (replace tcp slow-start with ???)

I've been watching Sitara for a while and their client and server software
is now available for download. I was wondering if any of you had had a
chance to look at the things they are doing to see if they are
tcp-compliant or might be one of those "overly-aggressive" applications.

--Kent


From owner-tcp-impl@relay.engr.sgi.com  Wed Feb  4 00:01:33 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA25197 for tcp-impl-list; Tue, 3 Feb 1998 23:53:45 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA25186; Tue, 3 Feb 1998 23:53:44 -0800
Received: from bossette.engr.sgi.com (tree.engr.sgi.com [150.166.61.12]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id XAA28266; Tue, 3 Feb 1998 23:53:43 -0800
	env-from (sm@bossette.engr.sgi.com)
Received: (from sm@localhost) by bossette.engr.sgi.com (971110.SGI.8.8.8/970903.SGI.AUTOCF) id XAA74884; Tue, 3 Feb 1998 23:53:37 -0800 (PST)
From: sm@bossette.engr.sgi.com (Sam Manthorpe)
Message-Id: <199802040753.XAA74884@bossette.engr.sgi.com>
Subject: Re: Sitara
In-Reply-To: <199802040223.SAA21465@sgi.sgi.com> from "Kent W. England" at "Feb 3, 98 06:11:13 pm"
To: kwe@geo.net (Kent W. England)
Date: Tue, 3 Feb 1998 23:53:36 -0800 (PST)
Cc: tcp-impl@cthulhu.engr.sgi.com
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Hi,

> streamlined handshaking (by combining http-request with tcp-handshake?)

I guess they mean sending the first data packet together with the ACK of
the servers SYN-ACK (third handshake).  This is allowed by the spec but
most systems don't do it.

> efficient loss recovery (resend lost segment only)

Do they mean SACK?

Sam.
------------------------------------------------------------
Sam Manthorpe, SGI.  tel: (650) 933-2856 fax: (650) 932-1788

From owner-tcp-impl@relay.engr.sgi.com  Tue Feb 10 16:50:09 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA20193 for tcp-impl-list; Tue, 10 Feb 1998 16:43:59 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA20180 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 10 Feb 1998 16:43:57 -0800
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id QAA15038
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 10 Feb 1998 16:41:02 -0800
	env-from (chetan@protocol.ece.iisc.ernet.in)
Received: from ece.iisc.ernet.in by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id LAA28143; Tue, 10 Feb 1998 11:48:42 +0530
Received: from protocol.ece.iisc.ernet.in by ece.iisc.ernet.in (4.1/SMI-4.1)
	id AA26276; Tue, 10 Feb 98 11:48:40+0530
Received: from localhost (chetan@localhost)
	by protocol.ece.iisc.ernet.in (8.8.5/8.8.5) with SMTP id LAA13529
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 10 Feb 1998 11:52:01 GMT
Date: Tue, 10 Feb 1998 11:52:01 +0000 (GMT)
From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Doubt in fast recovery algo..
Message-Id: <Pine.LNX.3.95.980210111635.12735A-100000@protocol.ece.iisc.ernet.in>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Greetings,

	
I.) I fast recovery, according to rfc2001, after receiving 3 dupacks and
setting ssthresh to cwind/2 and inflating cwind to cwind+3, TCP output 
will transmit the lost segment and on the 4th dupack TCP output will
transmit with the new cwind, after increments the cwind. This will
continue.

Now my doubt is if the retransmitted segment is lost.. then  we have to
count for 3 dupacks, and once again back to step 1 of the fast recovery
algorithm right ? 
 
II.)Next once an  ACK arrives that acknowledges new data, TCP will have to
set cwind to ssthresh and then the TCP should go to congestion avoidance. 

The point to be noted here is once congestion is indicated by 3 dupacks we
are not reducing reducing the rate at which we are incrementing the
cwind, although the cwind itself is reduced to half,  the reason for this 3
dupacks tells more then just a packet is lost. But on the other hand when
new ack arrives cwind is set to ssthresh and the rate of increase of cwind
is reduced more linear and only for every round trip time. I do not see
any justification for this. 
I would say not to reduce the cwind to ssthresh, since network still
capable of accepting the packets. 


with thanks
Chetan S


From owner-tcp-impl@relay.engr.sgi.com  Wed Feb 11 01:10:00 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA22059 for tcp-impl-list; Wed, 11 Feb 1998 01:05:06 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id BAA22022 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 11 Feb 1998 01:05:00 -0800
Received: from relay-2.ftel.co.uk (relay-2.ftel.co.uk [192.65.220.25]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id BAA05685
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 11 Feb 1998 01:04:58 -0800
	env-from (G.Cope@ftel.co.uk)
Received: from callisto.ftel.co.uk (nzZlHAnWOQh8qE4nqzIoVUjsTDEn9kN6@callisto.ftel.co.uk [172.16.2.14])
	by relay-2.ftel.co.uk (8.8.7/8.8.7/Revision:1.35) with ESMTP id JAA04341
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 11 Feb 1998 09:04:54 GMT
Received: from callisto.ftel.co.uk (25EqlsqASwpAwcMO1c6eUAeErVzD8Psv@localhost.ftel.co.uk [127.0.0.1])
	by callisto.ftel.co.uk (8.8.7/8.8.7/Revision:1.32) with SMTP id JAA26425;
	Wed, 11 Feb 1998 09:04:24 GMT
Message-ID: <34E16997.6165@ftel.co.uk>
Date: Wed, 11 Feb 1998 09:04:23 +0000
From: Graham Cope <G.Cope@ftel.co.uk>
Organization: Fujitsu Telecommunications Europe Ltd
X-Mailer: Mozilla 3.01Gold (X11; I; SunOS 5.6 sun4u)
MIME-Version: 1.0
To: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
CC: tcp-impl@cthulhu.engr.sgi.com, Walley Gary <walleygm@btlip10.bt.co.uk>
Subject: Re: Doubt in fast recovery algo..
References: <Pine.LNX.3.95.980210111635.12735A-100000@protocol.ece.iisc.ernet.in>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Chetan Kumar wrote:
> 
> Greetings,
> 
> 
> I.) I fast recovery, according to rfc2001, after receiving 3 dupacks and
> setting ssthresh to cwind/2 and inflating cwind to cwind+3, TCP output
> will transmit the lost segment and on the 4th dupack TCP output will
> transmit with the new cwind, after increments the cwind. This will
> continue.
> 
> Now my doubt is if the retransmitted segment is lost.. then  we have to
> count for 3 dupacks, and once again back to step 1 of the fast recovery
> algorithm right ?

Sorry, no.

Let us ignore delayed ACKs. Suppose that there are 10 segments in
transit, the first of which gets lost. The next 3 (I think NOT the 4th)
generate dup ACKs, and the lost one gets retransmitted. Suppose that the
sender goes back to step 1 again. Then after 3 (or 4?) more from the
original 10 arrive, then we get fast retransmit again.

I guess there are 2 approached.

Only go to step 1 after receiving 9 (in this example) dup ACKs for the
rest of the data that was in transit. This requires a bit counting,
(note that it does not need knowledge of the delayed ACK algorithm,
since all out of sequence segments should generate ACKs). Also fast
recovery should be speeding ahead for the other 6 dup ACKs.

Second appraoch is to admit that Fast Recovery is now confused, and if
the retransmitted segment is lost, then rely on time-out / slow-start.

I'm bot a SACK guru, but this might have a 3rd method of recovery.


> 
> II.)Next once an  ACK arrives that acknowledges new data, TCP will have to
> set cwind to ssthresh and then the TCP should go to congestion avoidance.
> 
> The point to be noted here is once congestion is indicated by 3 dupacks we
> are not reducing reducing the rate at which we are incrementing the
> cwind, although the cwind itself is reduced to half,  the reason for this 3
> dupacks tells more then just a packet is lost. 

In particular, at tells us that at least 3 other segments got through.

But on the other hand when
> new ack arrives cwind is set to ssthresh and the rate of increase of cwind
> is reduced more linear and only for every round trip time. I do not see
> any justification for this.

The assumption is that a segment was lost due to congestion, hence it
should slow down. If congestion was not the cause (e.g. a bit error on
a  DSL line), then slowing down is not necessary. But how does TCP know
the cause?

Again SACK guru's might have another approach.


> I would say not to reduce the cwind to ssthresh, since network still
> capable of accepting the packets.

That is the key question. Is it still capable? Other segments got
through, but at least one didn't.



Graham Cope

From owner-tcp-impl@relay.engr.sgi.com  Wed Feb 11 04:59:52 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id EAA11669 for tcp-impl-list; Wed, 11 Feb 1998 04:46:09 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id EAA11628 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 11 Feb 1998 04:46:03 -0800
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id EAA13955
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 11 Feb 1998 04:39:55 -0800
	env-from (chetan@protocol.ece.iisc.ernet.in)
Received: from ece.iisc.ernet.in by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id SAA24152; Wed, 11 Feb 1998 18:06:50 +0530
Received: from protocol.ece.iisc.ernet.in by ece.iisc.ernet.in (4.1/SMI-4.1)
	id AA20293; Wed, 11 Feb 98 18:06:48+0530
Received: from localhost (chetan@localhost)
	by protocol.ece.iisc.ernet.in (8.8.5/8.8.5) with SMTP id SAA09278;
	Wed, 11 Feb 1998 18:10:09 GMT
Date: Wed, 11 Feb 1998 18:10:09 +0000 (GMT)
From: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
To: Graham Cope <G.Cope@ftel.co.uk>
Cc: tcp-impl@cthulhu.engr.sgi.com, Walley Gary <walleygm@btlip10.bt.co.uk>
Subject: Re: Doubt in fast recovery algo..
In-Reply-To: <34E16997.6165@ftel.co.uk>
Message-Id: <Pine.LNX.3.95.980211175635.8692A-100000@protocol.ece.iisc.ernet.in>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk



On Wed, 11 Feb 1998, Graham Cope wrote:

*>Chetan Kumar wrote:

*>> 
*>> Greetings,
*>> 
*>> 
*>
*>Sorry, no.
*>
*>Let us ignore delayed ACKs. Suppose that there are 10 segments in
*>transit, the first of which gets lost. The next 3 (I think NOT the 4th)

Hi,
	I meant after TCP receive 3 dupack, it will enter fast recovery
and 4th ack should cause TCP to start transmission of new data with new
cnwind

*>generate dup ACKs, and the lost one gets retransmitted. Suppose that the
*>sender goes back to step 1 again. Then after 3 (or 4?) more from the
*>original 10 arrive, then we get fast retransmit again.
*>
*>I guess there are 2 approached.
*>
*>Only go to step 1 after receiving 9 (in this example) dup ACKs for the

!! Why 9 dupack ? !!

*>rest of the data that was in transit. This requires a bit counting,
*>(note that it does not need knowledge of the delayed ACK algorithm,
*>since all out of sequence segments should generate ACKs). Also fast
*>recovery should be speeding ahead for the other 6 dup ACKs.
*>
*>Second appraoch is to admit that Fast Recovery is now confused, and if
*>the retransmitted segment is lost, then rely on time-out / slow-start.

But this will cause TCP to slow down. 

Say assume as U said there are 10 packets in transit and first of which
will get lost. So now next 3 packets will generate dupack, and TCP output
will transmit the missing packet. But at the same time packet 5, 6 and 7
will cause 3 more dupack( because dupack's should not be delayed rfc1122)
and we may have to go to slowstart ( as you said Fast Recovery is now
confused,).


*>
*>I'm bot a SACK guru, but this might have a 3rd method of recovery.
*>
*>
*>> 
*>> II.)Next once an  ACK arrives that acknowledges new data, TCP will have to
*>> set cwind to ssthresh and then the TCP should go to congestion avoidance.
*>> 
*>> The point to be noted here is once congestion is indicated by 3 dupacks we
*>> are not reducing reducing the rate at which we are incrementing the
*>> cwind, although the cwind itself is reduced to half,  the reason for this 3
*>> dupacks tells more then just a packet is lost. 
*>
*>In particular, at tells us that at least 3 other segments got through.
*>
*>But on the other hand when
*>> new ack arrives cwind is set to ssthresh and the rate of increase of cwind
*>> is reduced more linear and only for every round trip time. I do not see
*>> any justification for this.
*>
*>The assumption is that a segment was lost due to congestion, hence it
*>should slow down. If congestion was not the cause (e.g. a bit error on
*>a  DSL line), then slowing down is not necessary. But how does TCP know
*>the cause?
*>
*>Again SACK guru's might have another approach.
*>
*>
*>> I would say not to reduce the cwind to ssthresh, since network still
*>> capable of accepting the packets.
*>
*>That is the key question. Is it still capable? Other segments got
*>through, but at least one didn't.
*>
*>
*>
*>Graham Cope
*>


From owner-tcp-impl@relay.engr.sgi.com  Wed Feb 11 05:35:12 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id FAA16028 for tcp-impl-list; Wed, 11 Feb 1998 05:24:19 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA16023 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 11 Feb 1998 05:24:17 -0800
Received: from relay-2.ftel.co.uk (relay-2.ftel.co.uk [192.65.220.25]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id FAA21598
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 11 Feb 1998 05:23:46 -0800
	env-from (G.Cope@ftel.co.uk)
Received: from callisto.ftel.co.uk (bseHnhv4BIZuLFoCUgYFGHfK3emaF/i0@callisto.ftel.co.uk [172.16.2.14])
	by relay-2.ftel.co.uk (8.8.7/8.8.7/Revision:1.35) with ESMTP id NAA05049
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 11 Feb 1998 13:23:37 GMT
Received: from callisto.ftel.co.uk (lFm/iQGViGoZ5kUWZrE1coH3hH9JNOM4@localhost.ftel.co.uk [127.0.0.1])
	by callisto.ftel.co.uk (8.8.7/8.8.7/Revision:1.32) with SMTP id NAA13109;
	Wed, 11 Feb 1998 13:23:19 GMT
Message-ID: <34E1A643.565@ftel.co.uk>
Date: Wed, 11 Feb 1998 13:23:15 +0000
From: Graham Cope <G.Cope@ftel.co.uk>
Organization: Fujitsu Telecommunications Europe Ltd
X-Mailer: Mozilla 3.01Gold (X11; I; SunOS 5.6 sun4u)
MIME-Version: 1.0
To: Chetan Kumar <chetan@protocol.ece.iisc.ernet.in>
CC: tcp-impl@cthulhu.engr.sgi.com, Walley Gary <walleygm@btlip10.bt.co.uk>
Subject: Re: Doubt in fast recovery algo..
References: <Pine.LNX.3.95.980211175635.8692A-100000@protocol.ece.iisc.ernet.in>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Hi,
>         I meant after TCP receive 3 dupack, it will enter fast recovery
> and 4th ack should cause TCP to start transmission of new data with new
> cnwind
> 
> *>generate dup ACKs, and the lost one gets retransmitted. Suppose that the
> *>sender goes back to step 1 again. Then after 3 (or 4?) more from the
> *>original 10 arrive, then we get fast retransmit again.
> *>
> *>I guess there are 2 approached.
> *>
> *>Only go to step 1 after receiving 9 (in this example) dup ACKs for the
> 
> !! Why 9 dupack ? !!

In my example there are 10 segments in-transit. The first is lost, and
the next 9 generate dup ACKs.

> 
> *>rest of the data that was in transit. This requires a bit counting,
> *>(note that it does not need knowledge of the delayed ACK algorithm,
> *>since all out of sequence segments should generate ACKs). Also fast
> *>recovery should be speeding ahead for the other 6 dup ACKs.
> *>
> *>Second appraoch is to admit that Fast Recovery is now confused, and if
> *>the retransmitted segment is lost, then rely on time-out / slow-start.
> 
> But this will cause TCP to slow down.

Unfortunately yes. Losing 2 segments in the same window, or indeed
losing a retransmitted segment, does confuse 'ordinary' TCPs, from which
the only 'safe' way to recover is a timeout.

> 
> Say assume as U said there are 10 packets in transit and first of which
> will get lost. So now next 3 packets will generate dupack, and TCP output
> will transmit the missing packet. But at the same time packet 5, 6 and 7
> will cause 3 more dupack( because dupack's should not be delayed rfc1122)
> and we may have to go to slowstart ( as you said Fast Recovery is now
> confused,).


I think that there are some 'smart' solutions to this (I'll try to find
references. but Sally Floyd's work probably discusses it). But, a smart
solution for the above case, might fail in a different case.
  Life, and protocols, are often a compromise.




Graham

From owner-tcp-impl@relay.engr.sgi.com  Tue Feb 17 11:24:18 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA21071 for tcp-impl-list; Tue, 17 Feb 1998 11:20:54 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA21046 for <TCP-IMPL@ENGR.sgi.com>; Tue, 17 Feb 1998 11:20:49 -0800
Received: from alcor.process.com (alcor.process.com [192.42.95.16]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id LAA07680
	for <TCP-IMPL@ENGR.SGI.COM>; Tue, 17 Feb 1998 11:20:47 -0800
	env-from (VOLZ@PROCESS.COM)
Date:     Tue, 17 Feb 1998 14:20 -0500
From: VOLZ@PROCESS.COM (Bernie Volz)
Message-Id: <009C1F37AEFBFBD2.2E27@PROCESS.COM>
To: TCP-IMPL@ENGR.SGI.COM
Subject:  New problem for TCPIMPL "known Problems" I-D?
X-VMS-To: TCP-IMPL@ENGR.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hi:

Here's another "problem" to consider adding to the TCPIMPL "Known
Problems" I-D. Vern suggested I ask the list for feedback as to whether
this should be added and whether people had any comments in general
regarding it.

If you have suggestions for improving the text associated with it, I'm
very interested.

- Bernie Volz
  Process Software

3.x. Failure to open window on close

Classification
	Resource management

Description
	When an application closes a connection in such a way that it
	can no longer read any received data, the TCP must assure that
	it discards this data and opens the receive window if it was
	not open.

	It may be open for discussion as to whether the TCP should
	simply abort the connection, by sending a RST, in this case (as
	it likely would should any additional data be received).
	Aborting the connection might provide an indication to the peer
	that data has been discarded.

Significance
	Failure to open the window or reset the connection can lead
	to permanently hung TCP connections. Further, these connections
	will consume resources - processing time, memory, and network
	bandwidth. Note that since the window is closed, the peer TCP
	will be doing periodic persists.

Implications
	Failure to open the window or reset the connection can lead
	to permanently hung TCP connections. Further, these connections
	will consume resources.

Trace file demonstrating it:
	Made using tcpdump:

	13:11:46.04 A > B: S 458659166:458659166(0) win 4096
	                    <mss 1460,wscale 0,eol> (DF)
	13:11:46.04 B > A: S 792320000:792320000(0) ack 458659167
	                    win 4096
	13:11:46.04 A > B: . ack 1 win 4096 (DF)
	13:11.55.80 A > B: . 1:513(512) ack 1 win 4096 (DF)
	13:11.55.80 A > B: . 513:1025(512) ack 1 win 4096 (DF)
	13:11:55.83 B > A: . ack 1025 win 3072
	13:11.55.84 A > B: . 1025:1537(512) ack 1 win 4096 (DF)
	13:11.55.84 A > B: . 1537:2049(512) ack 1 win 4096 (DF)
	13:11.55.85 A > B: . 2049:2561(512) ack 1 win 4096 (DF)
	13:11:56.03 B > A: . ack 2561 win 1536
	13:11.56.05 A > B: . 2561:3073(512) ack 1 win 4096 (DF)
	13:11.56.06 A > B: . 3073:3585(512) ack 1 win 4096 (DF)
	13:11.56.06 A > B: . 3585:4097(512) ack 1 win 4096 (DF)
	13:11:56.23 B > A: . ack 4097 win 0
	13:11:58.16 A > B: . 4096:4097(1) ack 1 win 4096 (DF)
	13:11:58.16 B > A: . ack 4097 win 0
	13:12:00.16 A > B: . 4096:4097(1) ack 1 win 4096 (DF)
	13:12:00.16 B > A: . ack 4097 win 0
	13:12:02.16 A > B: . 4096:4097(1) ack 1 win 4096 (DF)
	13:12:02.16 B > A: . ack 4097 win 0
	13:12:05.37 A > B: . 4096:4097(1) ack 1 win 4096 (DF)
	13:12:05.37 B > A: . ack 4097 win 0
	13:12:06.36 B > A: F 1:1(0) ack 4097 win 0
	13:12:06.37 A > B: . ack 2 win 4096 (DF)
	13:12:11.78 A > B: . 4096:4097(1) ack 2 win 4096 (DF)
	13:12:11.78 B > A: . ack 4097 win 0
	13:12:24.59 A > B: . 4096:4097(1) ack 2 win 4096 (DF)
	13:12:24.60 B > A: . ack 4097 win 0
	13:12:50.22 A > B: . 4096:4097(1) ack 2 win 4096 (DF)
	13:12:50.22 B > A: . ack 4097 win 0

	Machine B in the trace above does not drop received data when
	the socket is "closed" by the application (in this case, the
	application process was terminated). This occured at
	approximately 13:12:06.36 and resulted in the FIN being sent
	in response to the close. However, because there is no longer an
	application to deliver the data to, the TCP should have dropped
	all of the received data and reopened the window, which it
	failed to do.

	Note: The persistence probes done by Machine A are the older
	style (resending the last octet to solicit an ACK).

	Note: Machine B also does not set the PSH bit when sending the
	FIN (as is recommended).

Trace file demonstrating correct behavoir
	Made using tcpdump:

	13:48:29.24 C > D: S 73445554:73445554(0) win 4096
	                    <mss 1460,wscale 0,eol> (DF)
	13:48:29.24 D > C: S 36050296:36050296(0) ack 73445555
	                    win 4096 <mss 1460,wscale 0,eol> (DF)
	13:48:29.25 C > D: . ack 1 win 4096 (DF)
	13:48:30.78 C > D: . 1:1461(1460) ack 1 win 4096 (DF)
	13:48:30.79 C > D: . 1461:2921(1460) ack 1 win 4096 (DF)
	13:48:30.80 D > C: . ack 2921 win 1176 (DF)
	13:48:32.75 C > D: . 2921:4097(1176) ack 1 win 4096 (DF)
	13:48:32.82 D > C: . ack 4097 win 0 (DF)
	13:48:34.76 C > D: . 4096:4097(1) ack 1 win 4096 (DF)
	13:48:34.84 D > C: . ack 4097 win 0 (DF)
	13:48:36.34 D > C: FP 1:1(0) ack 4097 win 4096 (DF)
	13:48:36.34 C > D: . 4097:5557(1460) ack 2 win 4096 (DF)
	13:48:36.34 D > C: R 36050298:36050298(0) win 24576
	13:48:36.34 C > D: . 5557:7017(1460) ack 2 win 4096 (DF)
	13:48:36.34 D > C: R 36050298:36050298(0) win 24576

	In this trace, the application process is terminated on Machine
	D at approximately 13:48:36.34 and it sends the FIN with the
	window opened again (since it properly discarded the previously
	received data). Machine C promptly sends more data, causing 
	Machine D to reset the connection since it can not deliver the
	data to the application.

	Note: The persistence probes done by Machine D are the older
	style (resending the last octet to solicit an ACK).

From owner-tcp-impl@relay.engr.sgi.com  Tue Feb 17 11:50:21 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA03237 for tcp-impl-list; Tue, 17 Feb 1998 11:48:36 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA03213 for <TCP-IMPL@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 11:48:34 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA18240
	for <TCP-IMPL@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 11:48:30 -0800
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (the-village.bc.nu [163.164.160.21]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id TAA22432; Tue, 17 Feb 1998 19:48:15 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0y4t3Q-0005FsC; Tue, 17 Feb 98 19:51 GMT
Message-Id: <m0y4t3Q-0005FsC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: New problem for TCPIMPL "known Problems" I-D?
To: VOLZ@PROCESS.COM (Bernie Volz)
Date: Tue, 17 Feb 1998 19:51:44 +0000 (GMT)
Cc: TCP-IMPL@cthulhu.engr.sgi.com
In-Reply-To: <009C1F37AEFBFBD2.2E27@PROCESS.COM> from "Bernie Volz" at Feb 17, 98 02:20:00 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Description
> 	When an application closes a connection in such a way that it
> 	can no longer read any received data, the TCP must assure that
> 	it discards this data and opens the receive window if it was
> 	not open.

Why ?  Suppose I have no resources left to commit to the close ? I agree
its good manners to do so. 

> 	Failure to open the window or reset the connection can lead
> 	to permanently hung TCP connections. Further, these connections

Indeed.. the client software should at some point choose to time out
at the application or user level. Or the peer tcp could eventually stop.

We have reports of a subtle variant of this between Linux and some
printers that was reported to the list. These printers drop the window to 0
when all the queued data is delivered prevent us from sending the FIN since
there is no sequence space for it.

> 	Note: The persistence probes done by Machine D are the older
> 	style (resending the last octet to solicit an ACK).

That reminds me there's another one I now only occasionally see - the
machine soliciting ack by sending 1 byte ahead of window to leave a hole.
Thats also buggy ;)



From owner-tcp-impl@relay.engr.sgi.com  Tue Feb 17 11:59:10 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA06172 for tcp-impl-list; Tue, 17 Feb 1998 11:57:51 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA06149 for <TCP-IMPL@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 11:57:49 -0800
Received: from palrel1.hp.com (palrel1.hp.com [156.153.255.242]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA21651
	for <TCP-IMPL@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 11:57:48 -0800
	env-from (raj@cup.hp.com)
Received: from loiter.cup.hp.com (root@loiter.cup.hp.com [15.13.104.252])
	by palrel1.hp.com (8.8.6/8.8.5tis) with ESMTP id LAA08417
	for <TCP-IMPL@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 11:57:30 -0800 (PST)
Received: from cup.hp.com (raj@loiter [15.13.104.252]) by loiter.cup.hp.com with ESMTP (8.8.6/8.7.3 TIS Messaging 5.0) id LAA07197 for <TCP-IMPL@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 11:55:44 -0800 (PST)
Message-ID: <34E9EB3F.DF509353@cup.hp.com>
Date: Tue, 17 Feb 1998 11:55:43 -0800
From: Rick Jones <raj@cup.hp.com>
Organization: HP 9000 Network Performance
X-Mailer: Mozilla 4.03 [en] (X11; I; HP-UX B.10.20 9000/735)
MIME-Version: 1.0
To: TCP-IMPL@cthulhu.engr.sgi.com
Subject: Re: New problem for TCPIMPL "known Problems" I-D?
References: <009C1F37AEFBFBD2.2E27@PROCESS.COM>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 
>         In this trace, the application process is terminated on Machine
>         D at approximately 13:48:36.34 and it sends the FIN with the
>         window opened again (since it properly discarded the previously
>         received data). Machine C promptly sends more data, causing
>         Machine D to reset the connection since it can not deliver the
>         data to the application.

Is it stated someplace that a window update should not be taken as an
indication that the data has been received into the application? If
"feels" as though there is an implicit assumption that if the window
advances, data has been presented to the application - for instance,
from 1122, section 4.2.2.14

"When the application program subsequently consumes the data and
increases the available receive buffer space again..."

Would it be better to reset upon a window probe?

Or is this an area where we say that an applicaiton needs its own
app-level protocol for what was and was not received?

rick jones
-- 
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, or post, but please do not do both...

From owner-tcp-impl@relay.engr.sgi.com  Tue Feb 17 12:20:33 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA13342 for tcp-impl-list; Tue, 17 Feb 1998 12:18:54 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA13285 for <TCP-IMPL@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 12:18:47 -0800
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA28662
	for <TCP-IMPL@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 12:18:42 -0800
	env-from (mouse@Twig.Rodents.Montreal.QC.CA)
Received: (from mouse@localhost)
	by Twig.Rodents.Montreal.QC.CA (8.8.5/8.8.5) id PAA18614;
	Tue, 17 Feb 1998 15:18:39 -0500 (EST)
Date: Tue, 17 Feb 1998 15:18:39 -0500 (EST)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199802172018.PAA18614@Twig.Rodents.Montreal.QC.CA>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
To: TCP-IMPL@cthulhu.engr.sgi.com
Subject: Re: New problem for TCPIMPL "known Problems" I-D?
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>> 	When an application closes a connection in such a way that it
>> 	can no longer read any received data, the TCP must assure that
>> 	it discards [any buffered] data and opens the receive window if
>> 	it was not open.
> Why ?

To prevent hung states, such as were described. :-)

> Suppose I have no resources left to commit to the close ?

What resources would the described behavior require?  Just because the
window is open doesn't mean you have to commit resources to buffering
received data in circumstances - like these - where it's never going to
be possible for anyone to read that data.

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl@relay.engr.sgi.com  Tue Feb 17 12:51:58 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA23651 for tcp-impl-list; Tue, 17 Feb 1998 12:49:33 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA23621 for <TCP-IMPL@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 12:49:28 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA09405
	for <TCP-IMPL@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 12:49:16 -0800
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (the-village.bc.nu [163.164.160.21]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id UAA23507; Tue, 17 Feb 1998 20:49:12 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0y4u0R-0005FsC; Tue, 17 Feb 98 20:52 GMT
Message-Id: <m0y4u0R-0005FsC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: New problem for TCPIMPL "known Problems" I-D?
To: mouse@Rodents.Montreal.QC.CA (der Mouse)
Date: Tue, 17 Feb 1998 20:52:43 +0000 (GMT)
Cc: TCP-IMPL@cthulhu.engr.sgi.com
In-Reply-To: <199802172018.PAA18614@Twig.Rodents.Montreal.QC.CA> from "der Mouse" at Feb 17, 98 03:18:39 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> >> 	When an application closes a connection in such a way that it
> >> 	can no longer read any received data, the TCP must assure that
> >> 	it discards [any buffered] data and opens the receive window if
> >> 	it was not open.
> > Why ?
> To prevent hung states, such as were described. :-)

That would seem to be an application decision. 

> > Suppose I have no resources left to commit to the close ?
> What resources would the described behavior require?  Just because the
> window is open doesn't mean you have to commit resources to buffering
> received data in circumstances - like these - where it's never going to
> be possible for anyone to read that data.

Maybe Im an embedded controller and my only bigger than ack byte buffer is now 
committed to a different job. (Yes Im playing devils advocate here). Im
not arguing with it being "good practice". I just question its a MUST have.

Ditto I'd support it being a good practice item with explantion in any final
document. Sorry - I should have been clearer on that


From owner-tcp-impl@relay.engr.sgi.com  Tue Feb 17 15:25:16 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA18955 for tcp-impl-list; Tue, 17 Feb 1998 15:20:39 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id PAA18947 for <TCP-IMPL@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 15:20:38 -0800
Received: from frantic.bsdi.com (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id PAA00691
	for <TCP-IMPL@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 15:20:36 -0800
	env-from (dab@frantic.bsdi.com)
Received: (from dab@localhost)
	by frantic.bsdi.com (8.8.8/8.8.8) id RAA00625;
	Tue, 17 Feb 1998 17:14:58 -0600 (CST)
Date: Tue, 17 Feb 1998 17:14:58 -0600 (CST)
From: David Borman <dab@BSDI.COM>
Message-Id: <199802172314.RAA00625@frantic.bsdi.com>
To: raj@cup.hp.com, TCP-IMPL@cthulhu.engr.sgi.com
Subject: Re: New problem for TCPIMPL "known Problems" I-D?
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

I agree with Rick Jones comment:

> Would it be better to reset upon a window probe?

The sending TCP should be sending a window probe, with a new byte
of data.  The receiving TCP, upon getting new data for a connection
that the application has shut down, should be generating a RST to
reset the TCP connection.  (If the receiving TCP tosses the out-of-window
data and generates an ACK instead of a RST, then that is the problem
that should be fixed.)

		-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Tue Feb 17 18:01:47 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA28957 for tcp-impl-list; Tue, 17 Feb 1998 17:57:31 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id RAA28934 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 17:57:25 -0800
Received: from hplms26.hpl.hp.com (hplms26.hpl.hp.com [15.255.168.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id RAA02435
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 17:09:26 -0800
	env-from (davidm@napali.hpl.hp.com)
Received: from hplms2.hpl.hp.com (hplms2.hpl.hp.com [15.0.152.33])
	by hplms26.hpl.hp.com (8.8.6/8.8.6 HPLabs Relay) with ESMTP id RAA14933
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 17:09:35 -0800 (PST)
Received: from napali.hpl.hp.com (root@napali.hpl.hp.com [15.4.89.123])
	by hplms2.hpl.hp.com (8.8.6/8.8.6 HPLabs Hub) with ESMTP id RAA18645
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 17:09:23 -0800 (PST)
Received: (from davidm@localhost)
	by napali.hpl.hp.com (8.8.7/8.8.7) id QAA06937;
	Tue, 17 Feb 1998 16:49:06 -0800
Date: Tue, 17 Feb 1998 16:49:06 -0800
Message-Id: <199802180049.QAA06937@napali.hpl.hp.com>
From: David Mosberger <davidm@hpl.hp.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: question about Nagle algorithm
Reply-to: davidm@hpl.hp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

RFC896 says:

[Solution 1:]
  The solution is to inhibit the sending of new TCP  segments  when
  new  outgoing  data  arrives  from  the  user  if  any previously
  transmitted data on the connection remains unacknowledged.

an alternative would be:

[Solution 2:]
  The solution is to inhibit the sending of new small TCP segments
  when new outgoing data arrives from the user if the previously
  transmitted data on the connection was small and remains
  unacknowledged (small == less than max. segment size).

It seems to me that solution 2 more directly addresses Nagle's concern
while disturbing other traffic less.  E.g., consider the case of
sending an HTTP reply that has a size of 1.5*mss.  Nagle's algorithm
has the effect of sending the first segment immediately but delaying
the sending of the second (half-full) segment until the first segment
has been acknowledged.  This clearly is unacceptable for any
application that uses request/response style communication over a
single TCP connection.

Solution 2 would avoid this problem but is obviously a little harder
to implement.  But apart from that, I don't see anything that's
obviously wrong with it.  Of course, one could argue that it's easy
enough to disable Nagle, so there might be no point in implementing
Solution 2 but, on the other hand, why should we make TCP harder to
use than (reasonably) necessary?  It's interesting to note that the
description of Nagle's algorithm in Steven's TCP/IP Illustrated, Vol 1
reads:

	[The Nagle] algorithm says that a TCP connection can have only
	one outstanding small segment that has not yet been
	acknowledged.

This is much closer to solution 2 than solution 1.

So, apart from implementation complexit/speed issues, are there any
reasons not to use Solution 2?

	--david

From owner-tcp-impl@relay.engr.sgi.com  Tue Feb 17 18:33:43 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA13680 for tcp-impl-list; Tue, 17 Feb 1998 18:31:45 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA01630 for <TCP-IMPL@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 16:09:39 -0800
Received: from alcor.process.com (alcor.process.com [192.42.95.16]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id QAA16077
	for <TCP-IMPL@CTHULHU.ENGR.SGI.COM>; Tue, 17 Feb 1998 16:09:38 -0800
	env-from (VOLZ@PROCESS.COM)
Date:     Tue, 17 Feb 1998 19:05 -0500
From: VOLZ@PROCESS.COM (Bernie Volz)
Message-Id: <009C1F5F8B80E8AC.2E25@PROCESS.COM>
To: raj@cup.hp.com, TCP-IMPL@cthulhu.engr.sgi.com
Subject:  Re: New problem for TCPIMPL "known Problems" I-D?
X-VMS-To: SMTP%"raj@cup.hp.com"
X-VMS-Cc: TCP-IMPL@CTHULHU.ENGR.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>>         In this trace, the application process is terminated on Machine
>>         D at approximately 13:48:36.34 and it sends the FIN with the
>>         window opened again (since it properly discarded the previously
>>         received data). Machine C promptly sends more data, causing
>>         Machine D to reset the connection since it can not deliver the
>>         data to the application.
>
>Is it stated someplace that a window update should not be taken as an
>indication that the data has been received into the application? If
>"feels" as though there is an implicit assumption that if the window
>advances, data has been presented to the application - for instance,
>from 1122, section 4.2.2.14
>
>"When the application program subsequently consumes the data and
>increases the available receive buffer space again..."

Not sure why you introduce this. There is no assumption about an ACK
meaning data has been received by an application. My point regarding
this problem is that if the application has CLOSED the socket in such a
way that it indicates "I am no longer interested in reading further data
from the socket" (such as if the application crashes or, in UNIX, does a
close on the socket), the TCP stack MUST assure that it takes steps to
either reset the connection or discard any previously received data and
opens the window to allow the connection to properly close (since if the
window is closed, the peer can't deliver any more data and send the FIN
to close its end of the connection - hanging the connection).

>Would it be better to reset upon a window probe?

I don't believe so. I believe it best to either open the window (and
reset the connection if more data is received) or just reset the
connection immediately.

>Or is this an area where we say that an applicaiton needs its own
>app-level protocol for what was and was not received?

I would say that applications that needs this might. A lot of apps may
not if they use typical command/reply sequences or other mechanisms.

- Bernie Volz

From owner-tcp-impl@relay.engr.sgi.com  Tue Feb 17 18:54:52 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA19045 for tcp-impl-list; Tue, 17 Feb 1998 18:52:16 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id SAA19028 for <TCP-IMPL@cthulhu.engr.sgi.com>; Tue, 17 Feb 1998 18:52:14 -0800
Received: from alcor.process.com (alcor.process.com [192.42.95.16]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id QAA18481
	for <TCP-IMPL@CTHULHU.ENGR.SGI.COM>; Tue, 17 Feb 1998 16:18:01 -0800
	env-from (VOLZ@PROCESS.COM)
Date:     Tue, 17 Feb 1998 19:14 -0500
From: VOLZ@PROCESS.COM (Bernie Volz)
Message-Id: <009C1F60C733B7DE.2E25@PROCESS.COM>
To: dab@BSDI.COM, RAJ@CUP.HP.COM, TCP-IMPL@cthulhu.engr.sgi.com
Subject:  Re: New problem for TCPIMPL "known Problems" I-D?
X-VMS-To: SMTP%"dab@BSDI.COM"
X-VMS-Cc: RAJ@CUP.HP.COM, TCP-IMPL@CTHULHU.ENGR.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>I agree with Rick Jones comment:
>
>> Would it be better to reset upon a window probe?
>
>The sending TCP should be sending a window probe, with a new byte
>of data.  The receiving TCP, upon getting new data for a connection
>that the application has shut down, should be generating a RST to
>reset the TCP connection.  (If the receiving TCP tosses the out-of-window
>data and generates an ACK instead of a RST, then that is the problem
>that should be fixed.)
>
>		-David Borman, dab@bsdi.com

I would rather see the "receiving TCP" open the window when it detects
this case and resetting the connection if any new data was received
(even if that data was received because of a window probe). I believe
this is what BSD based stacks typically do. I don't see any advantage to
waiting for a probe before opening the window (and discarding the
previously received data).

The only other option that may make sense to discuss is whether the
"receiving TCP" just resets the connection when there is data on its
receive queue when the application "closes" the socket (in such a way
that it can no longer read that data). I don't a strong opinion on
whether this is better (we could argue that it isn't because most
implementations open the window and reset if more data is received).

- Bernie Volz
  Process Software

From owner-tcp-impl@relay.engr.sgi.com  Wed Feb 18 07:56:22 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA19476 for tcp-impl-list; Wed, 18 Feb 1998 07:54:03 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA19469 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 07:54:01 -0800
Received: from igw3.watson.ibm.com (igw3.watson.ibm.com [198.81.209.18]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA20222
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 07:54:00 -0800
	env-from (nahum@watson.ibm.com)
Received: from mailhub.watson.ibm.com (mailhub.watson.ibm.com [9.2.250.97]) by igw3.watson.ibm.com (8.8.7/07-11-97) with ESMTP id KAA27126; Wed, 18 Feb 1998 10:51:24 -0500
Received: from meghana.watson.ibm.com (meghana.watson.ibm.com [9.2.22.32]) by mailhub.watson.ibm.com (8.8.7/07-14-97) with SMTP id KAA52174; Wed, 18 Feb 1998 10:51:24 -0500
Received: by meghana.watson.ibm.com (AIX 4.1/UCB 5.64/6/25/96)
          id AA28848; Wed, 18 Feb 1998 10:51:03 -0500
From: Erich Nahum <nahum@watson.ibm.com>
Message-Id: <9802181551.AA28848@meghana.watson.ibm.com>
Subject: Re: question about Nagle algorithm
To: davidm@hpl.hp.com
Date: Wed, 18 Feb 1998 10:51:03 -0500 (EST)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199802180049.QAA06937@napali.hpl.hp.com> from "David Mosberger" at Feb 17, 98 04:49:06 pm
Reply-To: nahum@watson.ibm.com (Erich M. Nahum)
X-Mailer: ELM [version 2.4 PL25]
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

David Mosberger writes:
> 
> [Solution 1:]
>   The solution is to inhibit the sending of new TCP  segments  when
>   new  outgoing  data  arrives  from  the  user  if  any previously
>   transmitted data on the connection remains unacknowledged.
> 
> [Solution 2:]
>   The solution is to inhibit the sending of new small TCP segments
>   when new outgoing data arrives from the user if the previously
>   transmitted data on the connection was small and remains
>   unacknowledged (small == less than max. segment size).

Hey David,

I believe the BSD derived TCP's follow solution 2 above; certainly
AIX does.  The only difference between 1 and 2 as you describe
them above seems to be the size of the segment in question.  Solution 
1 forces TCP to behave in something similar to a stop-and-wait flow 
control protocol, i.e., even if the segment is a full MTU size wait
for ACKs to come back.  The spirit of the Nagle algorithm is to
reduce the use of small packets.  Solution 2 means `have at most
one outstanding small packet per conversation.'

> It seems to me that solution 2 more directly addresses Nagle's concern
> while disturbing other traffic less.  E.g., consider the case of
> sending an HTTP reply that has a size of 1.5*mss.  Nagle's algorithm
> has the effect of sending the first segment immediately but delaying
> the sending of the second (half-full) segment until the first segment
> has been acknowledged.  This clearly is unacceptable for any
> application that uses request/response style communication over a
> single TCP connection.

John Heidemann has a good writeup on this very issue that appeared
in CCR last year.  It turns out Nagle isn't really an issue with 1.0
traffic, since closing the connection forces the data out with the
FIN.  However, it does affect 1.1 traffic, or 1.0 using persistent
connections, where the connection stays open.

> It's interesting to note that the
> description of Nagle's algorithm in Steven's TCP/IP Illustrated, Vol 1
> reads:
> 
> 	[The Nagle] algorithm says that a TCP connection can have only
> 	one outstanding small segment that has not yet been
> 	acknowledged.
> 
> This is much closer to solution 2 than solution 1.

As I mentioned, this seems to be what BSD does.

> So, apart from implementation complexit/speed issues, are there any
> reasons not to use Solution 2?

Is there a big difference between the two solutions in terms of
implementation complexity or speed?  I don't see it.

There is a related issue about how disabling Nagle on each connection
is a performance issue, since it's on the fast path for serving an
HTTP request, but that doesn't sound like what you're talking about.

-Erich

-- 
Erich M. Nahum                  IBM T.J. Watson Research Center
Networking Research             P.O. Box 704
nahum@watson.ibm.com            Yorktown Heights NY 10598

From owner-tcp-impl@relay.engr.sgi.com  Wed Feb 18 09:37:41 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA20567 for tcp-impl-list; Wed, 18 Feb 1998 09:34:30 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA20558 for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 09:34:29 -0800
Received: from palona3.hp.com (palrel3.hp.com [156.153.255.226]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA26152
	for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 09:34:28 -0800
	env-from (raj@cup.hp.com)
Received: from loiter.cup.hp.com (root@loiter.cup.hp.com [15.13.104.252])
	by palona3.hp.com (8.8.5/8.8.5tis) with ESMTP id JAA04393
	for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 09:34:26 -0800 (PST)
Received: from cup.hp.com (raj@loiter [15.13.104.252]) by loiter.cup.hp.com with ESMTP (8.8.6/8.7.3 TIS Messaging 5.0) id JAA08572 for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 09:34:25 -0800 (PST)
Message-ID: <34EB1BA0.69D6B4C6@cup.hp.com>
Date: Wed, 18 Feb 1998 09:34:24 -0800
From: Rick Jones <raj@cup.hp.com>
Organization: HP 9000 Network Performance
X-Mailer: Mozilla 4.03 [en] (X11; I; HP-UX B.10.20 9000/735)
MIME-Version: 1.0
To: TCP-IMPL@cthulhu.engr.sgi.com
Subject: Re: New problem for TCPIMPL "known Problems" I-D?
References: <009C1F60C733B7DE.2E25@PROCESS.COM>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Bernie Volz wrote:
> I would rather see the "receiving TCP" open the window when it detects
> this case and resetting the connection if any new data was received

Opening the window has a semantic implication that does not apply if the
data is simply dumped.

> The only other option that may make sense to discuss is whether the
> "receiving TCP" just resets the connection when there is data on its
> receive queue when the application "closes" the socket (in such a way
> that it can no longer read that data). I don't a strong opinion on
> whether this is better (we could argue that it isn't because most
> implementations open the window and reset if more data is received).

I think that this approach would be better because it does not give the
mistaken impression that data has been given to the application.

I still wonder though if this is not really an application issue.

rick jones
-- 
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, or post, but please do not do both...

From owner-tcp-impl@relay.engr.sgi.com  Wed Feb 18 09:50:26 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA25944 for tcp-impl-list; Wed, 18 Feb 1998 09:48:29 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA25901 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 09:48:23 -0800
Received: from frantic.bsdi.com (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA02051
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 09:48:16 -0800
	env-from (dab@frantic.bsdi.com)
Received: (from dab@localhost)
	by frantic.bsdi.com (8.8.8/8.8.8) id LAA01820
	for tcp-impl@cthulhu.engr.sgi.com; Wed, 18 Feb 1998 11:46:27 -0600 (CST)
Date: Wed, 18 Feb 1998 11:46:27 -0600 (CST)
From: David Borman <dab@BSDI.COM>
Message-Id: <199802181746.LAA01820@frantic.bsdi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: New problem for TCPIMPL "known Problems" I-D?
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Bernie Volz writes:

> I would rather see the "receiving TCP" open the window when it detects
> this case and resetting the connection if any new data was received
> (even if that data was received because of a window probe). I believe
> this is what BSD based stacks typically do. I don't see any advantage to
> waiting for a probe before opening the window (and discarding the
> previously received data).

Yes, new data should cause an RST to be generated, but there is no
need to require that the window be opened.  A window probe *is* new
data, it just happens to be one byte beyond the end of the current
window.  BSD stacks generate a RST when receiving a window probe for
a connection that the user has closed.  (It trims any old data off
the front of the packet, checks for new data for a closed connection
and generates a RST if there is any, and then it trims any data off
the end of the packet beyond the edge of the window.)

I'd say that real problem in the cited examples is that the sending
host is not correctly generating window probes:

>       Note: The persistence probes done by Machine A are the older
>       style (resending the last octet to solicit an ACK).

RFC 793 is not clear that you should generate a RST when receiving data
for a closed connection when all the new data is beyond the end of the
window. From page 69:

        If the RCV.WND is zero, no segments will be acceptable, but
        special allowance should be made to accept valid ACKs, URGs and
        RSTs.
      
        If an incoming segment is not acceptable, an acknowledgment
        should be sent in reply (unless the RST bit is set, if so drop
        the segment and return):

But in RFC 1122, page 88, it states:

            A host MAY implement a "half-duplex" TCP close sequence, so
            that an application that has called CLOSE cannot continue to
            read data from the connection.  If such a host issues a
            CLOSE call while received data is still pending in TCP, or
            if new data is received after CLOSE is called, its TCP 
            SHOULD send a RST to show that data was lost.

So that seems to clarify it.

> The only other option that may make sense to discuss is whether the
> "receiving TCP" just resets the connection when there is data on its
> receive queue when the application "closes" the socket (in such a way
> that it can no longer read that data). I don't a strong opinion on
> whether this is better (we could argue that it isn't because most
> implementations open the window and reset if more data is received).

According to the preceeding quote from RFC 1122, it is already
documented that a host should generate a RST in this case.  I'm
not sure if BSD does this (generate a RST when the connection is
closed with un-read data), from a brief look at the code I can't
quickly confirm this one way or the other.

			-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Wed Feb 18 10:13:51 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA04033 for tcp-impl-list; Wed, 18 Feb 1998 10:10:16 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA04023 for <TCP-IMPL@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 10:10:14 -0800
Received: from alcor.process.com (alcor.process.com [192.42.95.16]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id KAA11029
	for <TCP-IMPL@CTHULHU.ENGR.SGI.COM>; Wed, 18 Feb 1998 10:10:13 -0800
	env-from (VOLZ@PROCESS.COM)
Date:     Wed, 18 Feb 1998 13:05 -0500
From: VOLZ@PROCESS.COM (Bernie Volz)
Message-Id: <009C1FF65526D2BD.2E27@PROCESS.COM>
To: raj@cup.hp.com, TCP-IMPL@cthulhu.engr.sgi.com
Subject:  Re: New problem for TCPIMPL "known Problems" I-D?
X-VMS-To: SMTP%"raj@cup.hp.com"
X-VMS-Cc: TCP-IMPL@CTHULHU.ENGR.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Rick Jones wrote:
>Bernie Volz wrote:
>> I would rather see the "receiving TCP" open the window when it detects
>> this case and resetting the connection if any new data was received
>
>Opening the window has a semantic implication that does not apply if the
>data is simply dumped.

Agreed. But not opening the window isn't very productive.

>
>> The only other option that may make sense to discuss is whether the
>> "receiving TCP" just resets the connection when there is data on its
>> receive queue when the application "closes" the socket (in such a way
>> that it can no longer read that data). I don't a strong opinion on
>> whether this is better (we could argue that it isn't because most
>> implementations open the window and reset if more data is received).
>
>I think that this approach would be better because it does not give the
>mistaken impression that data has been given to the application.

This does have that attraction - one end of the connection would be told
that something is wrong because of the reset.

>I still wonder though if this is not really an application issue.

It is a stack issue since the TCP stack does have to deal with this. If
it doesn't handle this correctly, the connection can remain in the hung
state forever - certainly, the sender application can start a timer and
give up if that timer expires, but that is not robust since the receiver
application may be blocked from processing for a "long" time and the
sender can't tell the difference.

- Bernie Volz

From owner-tcp-impl@relay.engr.sgi.com  Wed Feb 18 11:03:08 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA24815 for tcp-impl-list; Wed, 18 Feb 1998 11:01:10 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA24801 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 11:01:08 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA02882
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 11:01:04 -0800
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (the-village.bc.nu [163.164.160.21]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id TAA13319; Wed, 18 Feb 1998 19:00:48 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0y5EnW-0005FsC; Wed, 18 Feb 98 19:04 GMT
Message-Id: <m0y5EnW-0005FsC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: New problem for TCPIMPL "known Problems" I-D?
To: dab@bsdi.com (David Borman)
Date: Wed, 18 Feb 1998 19:04:46 +0000 (GMT)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199802181746.LAA01820@frantic.bsdi.com> from "David Borman" at Feb 18, 98 11:46:27 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Yes, new data should cause an RST to be generated, but there is no
> need to require that the window be opened.  A window probe *is* new
> data, it just happens to be one byte beyond the end of the current
> window.  BSD stacks generate a RST when receiving a window probe for
> a connection that the user has closed.  (It trims any old data off
> the front of the packet, checks for new data for a closed connection
> and generates a RST if there is any, and then it trims any data off
> the end of the packet beyond the edge of the window.)
> 
> I'd say that real problem in the cited examples is that the sending
> host is not correctly generating window probes:

Actually the reset on zero window probe problem can be bad. The following
transaction can occur on some stacks doing keepalive and is broken there
as well as when probing.



		send data		ack window (n)
		send data (n)		ack window (0)
					shutdown for rx
					[compute mode]
		keepalive n+1
					RST 

and the reset tears down both sides of the connection. For a zero window
probe this is ok so long as the stack never attempts to probe a zero window
unless it has queued data. A window that is 0 with no queue causes no
probe and no harm. This prevents us doing any "opportunistic" window probe.

Im not sure if that has any performance impact. One for the queue theory
bods.

Alan



From owner-tcp-impl@relay.engr.sgi.com  Wed Feb 18 11:22:08 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA01180 for tcp-impl-list; Wed, 18 Feb 1998 11:17:22 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA01140 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 11:17:17 -0800
Received: from frantic.bsdi.com (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA09902
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 11:17:15 -0800
	env-from (dab@frantic.bsdi.com)
Received: (from dab@localhost)
	by frantic.bsdi.com (8.8.8/8.8.8) id NAA02036;
	Wed, 18 Feb 1998 13:15:01 -0600 (CST)
Date: Wed, 18 Feb 1998 13:15:01 -0600 (CST)
From: David Borman <dab@BSDI.COM>
Message-Id: <199802181915.NAA02036@frantic.bsdi.com>
To: alan@lxorguk.ukuu.org.uk
Subject: Re: New problem for TCPIMPL "known Problems" I-D?
Cc: tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Alan Cox writes:

> Actually the reset on zero window probe problem can be bad. The following
> transaction can occur on some stacks doing keepalive and is broken there
> as well as when probing.
> 
> 
> 
> 		send data		ack window (n)
> 		send data (n)		ack window (0)
> 					shutdown for rx
> 					[compute mode]
> 		keepalive n+1
> 					RST 
> 
> and the reset tears down both sides of the connection. For a zero window

Your example doesn't look right to me.  A reset on a window probe
happens because new data is received that cannot be delivered to
the application.  But a keepalive does not contain new data.  It
typically has SEG.SEQ = SND.NXT-1, and may or may not contain one
garbage byte (RFC 1122, section 4.2.3.6).  Your example implies
that new data is being delivered in a keep-alive.

> probe this is ok so long as the stack never attempts to probe a zero window
> unless it has queued data. A window that is 0 with no queue causes no

You can't generate a zero window probe unless you have data to send,
since the probe contains the next byte of data.

> probe and no harm. This prevents us doing any "opportunistic" window probe.

I'm not sure what you mean by "opportunistic" window probes.

			-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Wed Feb 18 11:23:27 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA03760 for tcp-impl-list; Wed, 18 Feb 1998 11:22:09 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA03715 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 11:22:03 -0800
Received: from Twig.Rodents.Montreal.QC.CA (Twig.Rodents.Montreal.QC.CA [132.206.78.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA11845
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 11:22:00 -0800
	env-from (mouse@Twig.Rodents.Montreal.QC.CA)
Received: (from mouse@localhost)
	by Twig.Rodents.Montreal.QC.CA (8.8.5/8.8.5) id OAA25631;
	Wed, 18 Feb 1998 14:21:55 -0500 (EST)
Date: Wed, 18 Feb 1998 14:21:55 -0500 (EST)
From: der Mouse  <mouse@Rodents.Montreal.QC.CA>
Message-Id: <199802181921.OAA25631@Twig.Rodents.Montreal.QC.CA>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: New problem for TCPIMPL "known Problems" I-D?
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Actually the reset on zero window probe problem can be bad. [...]

> 		send data		ack window (n)
> 		send data (n)		ack window (0)
> 					shutdown for rx
> 					[compute mode]
> 		keepalive n+1
> 					RST 

> and the reset tears down both sides of the connection.  For a zero
> window probe this is ok so long as the stack never attempts to probe
> a zero window unless it has queued data.

IMO this is not OK even then.  If we call the two machines in your
example A (left column) and B (right column), then when B shuts down
for receive, this fact should be pushed back to the application on A
when a write is attempted (eg, EPIPE/SIGPIPE under BSD), but I can't
see any excuse for tearing down the B-to-A half of the connection as
well.  I don't think an RST should be generated unless the connection
is completely gone on B's end - ie, there is no further possibility of
anyone on that end doing *any* I/O on it.

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From owner-tcp-impl@relay.engr.sgi.com  Wed Feb 18 11:28:15 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA06657 for tcp-impl-list; Wed, 18 Feb 1998 11:26:48 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA06596 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 11:26:43 -0800
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA13869
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 11:26:38 -0800
	env-from (alan@lxorguk.ukuu.org.uk)
Received: from lightning.swansea.linux.org.uk (the-village.bc.nu [163.164.160.21]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id TAA13754; Wed, 18 Feb 1998 19:26:35 GMT
Received: by lightning.swansea.linux.org.uk (Smail3.1.29.1 #2)
	id m0y5FCW-0005FsC; Wed, 18 Feb 98 19:30 GMT
Message-Id: <m0y5FCW-0005FsC@lightning.swansea.linux.org.uk>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: New problem for TCPIMPL "known Problems" I-D?
To: dab@BSDI.COM (David Borman)
Date: Wed, 18 Feb 1998 19:30:36 +0000 (GMT)
Cc: alan@lxorguk.ukuu.org.uk, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199802181915.NAA02036@frantic.bsdi.com> from "David Borman" at Feb 18, 98 01:15:01 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Your example doesn't look right to me.  A reset on a window probe
> happens because new data is received that cannot be delivered to
> the application.  But a keepalive does not contain new data.  It
> typically has SEG.SEQ = SND.NXT-1, and may or may not contain one
> garbage byte (RFC 1122, section 4.2.3.6).  Your example implies
> that new data is being delivered in a keep-alive.

4.2BSD and derived stacks do this, but not 4.3/4.4. Regrettably Im
still getting the old Linux trace of the above talking to prehistoric kit
and even worse newish kit with 4.2 stacks.

> > probe this is ok so long as the stack never attempts to probe a zero window
> > unless it has queued data. A window that is 0 with no queue causes no
> 
> You can't generate a zero window probe unless you have data to send,
> since the probe contains the next byte of data.
> 
> > probe and no harm. This prevents us doing any "opportunistic" window probe.
> 
> I'm not sure what you mean by "opportunistic" window probes.

Sending old data to produce an ack  response from the other end to see if
the window has opened again as a way of reducing the number of packets 
exchanged going from stalled, 0 window, no data -> transmitting data


From owner-tcp-impl@relay.engr.sgi.com  Wed Feb 18 11:28:36 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA06891 for tcp-impl-list; Wed, 18 Feb 1998 11:27:19 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA06874 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 11:27:17 -0800
Received: from hplms26.hpl.hp.com (hplms26.hpl.hp.com [15.255.168.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA14103
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 11:27:17 -0800
	env-from (davidm@napali.hpl.hp.com)
Received: from hplms2.hpl.hp.com (hplms2.hpl.hp.com [15.0.152.33])
	by hplms26.hpl.hp.com (8.8.6/8.8.6 HPLabs Relay) with ESMTP id LAA12380;
	Wed, 18 Feb 1998 11:27:28 -0800 (PST)
Received: from napali.hpl.hp.com (davidm@napali.hpl.hp.com [15.4.89.123])
	by hplms2.hpl.hp.com (8.8.6/8.8.6 HPLabs Hub) with ESMTP id KAA05008;
	Wed, 18 Feb 1998 10:37:45 -0800 (PST)
Received: (from davidm@localhost)
	by napali.hpl.hp.com (8.8.7/8.8.7) id KAA08535;
	Wed, 18 Feb 1998 10:37:44 -0800
Date: Wed, 18 Feb 1998 10:37:44 -0800
Message-Id: <199802181837.KAA08535@napali.hpl.hp.com>
From: David Mosberger <davidm@hpl.hp.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
To: nahum@watson.ibm.com (Erich M. Nahum)
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: question about Nagle algorithm
In-Reply-To: <9802181551.AA28848@meghana.watson.ibm.com>
References: <199802180049.QAA06937@napali.hpl.hp.com>
	<9802181551.AA28848@meghana.watson.ibm.com>
X-Mailer: VM 6.33 under Emacs 20.2.1
Reply-To: davidm@hpl.hp.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hi Erich,

>>>>> On Wed, 18 Feb 1998 10:51:03 -0500 (EST), Erich Nahum <nahum@watson.ibm.com> said:

  Erich> David Mosberger writes:
  >> [Solution 1:] The solution is to inhibit the sending of new TCP
  >> segments when new outgoing data arrives from the user if any
  >> previously transmitted data on the connection remains
  >> unacknowledged.
  >> 
  >> [Solution 2:] The solution is to inhibit the sending of new small
  >> TCP segments when new outgoing data arrives from the user if the
  >> previously transmitted data on the connection was small and
  >> remains unacknowledged (small == less than max. segment size).

  Erich> I believe the BSD derived TCP's follow solution 2 above;
  Erich> certainly AIX does.  The only difference between 1 and 2 as
  Erich> you describe them above seems to be the size of the segment
  Erich> in question.  Solution 1 forces TCP to behave in something
  Erich> similar to a stop-and-wait flow control protocol, i.e., even
  Erich> if the segment is a full MTU size wait for ACKs to come back.
  Erich> The spirit of the Nagle algorithm is to reduce the use of
  Erich> small packets.  Solution 2 means `have at most one
  Erich> outstanding small packet per conversation.'

I think the BSD implementation is more accuratly described as: "no
small segments when there is any data in flight".  Solution 2 would be
"no small segments when the previously sent and unacknowledged segment
was small".

However, the issue has become kind of moot because I found a realistic
case where the behavior of Solution 1 would produce better results
than Solution 2 (I won't bore anyone with the details), so there is no
clear advantage to Solution 2.

Thanks,

	--david
-- 
David Mosberger; HP Labs; 1501 Page Mill Rd MS 1U17; Palo Alto, CA 94304-1126
davidm@hpl.hp.com          voice: (650) 236-2575          fax: (650) 857-5100

From owner-tcp-impl@relay.engr.sgi.com  Wed Feb 18 12:01:45 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA21750 for tcp-impl-list; Wed, 18 Feb 1998 11:57:53 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id LAA21732 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 11:57:51 -0800
Received: from frantic.bsdi.com (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id LAA25021
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 18 Feb 1998 11:57:47 -0800
	env-from (dab@frantic.bsdi.com)
Received: (from dab@localhost)
	by frantic.bsdi.com (8.8.8/8.8.8) id NAA02128;
	Wed, 18 Feb 1998 13:54:20 -0600 (CST)
Date: Wed, 18 Feb 1998 13:54:20 -0600 (CST)
From: David Borman <dab@BSDI.COM>
Message-Id: <199802181954.NAA02128@frantic.bsdi.com>
To: mouse@Rodents.Montreal.QC.CA, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: New problem for TCPIMPL "known Problems" I-D?
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

der Mouse writes:

> IMO this is not OK even then.  If we call the two machines in your
> example A (left column) and B (right column), then when B shuts down
> for receive, this fact should be pushed back to the application on A
> when a write is attempted (eg, EPIPE/SIGPIPE under BSD), but I can't
> see any excuse for tearing down the B-to-A half of the connection as
> well.  I don't think an RST should be generated unless the connection
> is completely gone on B's end - ie, there is no further possibility of
> anyone on that end doing *any* I/O on it.

But the only way to push back to A is to generate a RST.  We don't want
to ACK the new data, because that implies that the data was accepted.
But then you'll never be able to accept the FIN, so you'll never be
able to have a graceful close, so why delay the pain?

TCP doesn't know what the applications intentions are, or how critical
that undelivered data is to the application.  Having TCP try to second
guess the application and allow a damaged connection to continue may
cause more harm than good, there is just no way for TCP to know that.
The best it can do is to let the applications know as soon as possible
that an abnormal situation now exists, and the mechanism for that in
this situation is a RST.

			-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Thu Feb 19 13:05:03 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id NAA08593 for tcp-impl-list; Thu, 19 Feb 1998 13:01:59 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA08585 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 19 Feb 1998 13:01:57 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA15977
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 19 Feb 1998 13:01:56 -0800
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id NAA16489; Thu, 19 Feb 1998 13:01:55 -0800 (PST)
Message-Id: <199802192101.NAA16489@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: New problem for TCPIMPL "known Problems" I-D?
Date: Thu, 19 Feb 1998 13:01:55 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Here's a summary of the thread as I see it:

	- Two things are broken in Bernie's trace.  First, host B's TCP
	  should have sent a RST on the close, since it has data it
	  can't deliver, per page 88 of RFC 1122 (thanks Dave!).

	- Second, host A's zero-window probing is broken.  It is sending
	  old, acked data rather than new data.

	- Had it sent new data, presumably this would've elicited a RST
	  and the resource management problem would go away.

So if that's right, it seems the problem description should be amended
to describe sending the RST as the correct behavior, and to also point
out that A's zero-window probe is broken.

Comments?

		Vern


One side-comment - Rick mentioend:
 
> Is it stated someplace that a window update should not be taken as an
> indication that the data has been received into the application? If

I think this is true - it should not be presumed to say anything about
delivering data to the application.  Consider that if you make a SO_RCVBUF
setsockopt() call to increase the receive buffer, the TCP may very well
send a window update advertising the new space.

From owner-tcp-impl@relay.engr.sgi.com  Wed Feb 25 10:35:07 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id KAA10053 for tcp-impl-list; Wed, 25 Feb 1998 10:31:00 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA09995 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Feb 1998 10:30:53 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id KAA05996
	for <tcp-impl@relay.engr.SGI.COM>; Wed, 25 Feb 1998 10:30:52 -0800
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id KAA00477; Wed, 25 Feb 1998 10:30:51 -0800 (PST)
Message-Id: <199802251830.KAA00477@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: TCP-IMPL meeting at Los Angeles IETF
Date: Wed, 25 Feb 1998 10:30:51 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

We've been scheduled for:

        Monday, March 30 at 0930-1130 (opposite adsl, hubmib, udlr, smime)

If you have suggestions for agenda topics, please send them along (either
to me or to the list).

                Vern

From owner-tcp-impl@relay.engr.sgi.com  Thu Feb 26 06:31:28 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id GAA10472 for tcp-impl-list; Thu, 26 Feb 1998 06:29:47 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id GAA10463 for <tcp-impl@engr.sgi.com>; Thu, 26 Feb 1998 06:29:46 -0800
Received: from ns.ietf.org (ietf.org [132.151.1.19]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id GAA15556
	for <tcp-impl@engr.sgi.com>; Thu, 26 Feb 1998 06:29:42 -0800
	env-from (cclark@cnri.reston.va.us)
Received: from CNRI.Reston.VA.US (localhost [127.0.0.1])
	by ns.ietf.org (8.8.7/8.8.7a) with ESMTP id JAA02653;
	Thu, 26 Feb 1998 09:29:37 -0500 (EST)
Message-Id: <199802261429.JAA02653@ns.ietf.org>
Mime-Version: 1.0
Content-Type: Multipart/Mixed; Boundary="NextPart"
To: IETF-Announce@ns.ietf.org
Cc: tcp-impl@engr.sgi.com
From: Internet-Drafts@ns.ietf.org
Reply-to: Internet-Drafts@ns.ietf.org
Subject: I-D ACTION:draft-ietf-tcpimpl-poduri-00.txt
Date: Thu, 26 Feb 1998 09:29:37 -0500
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

--NextPart

A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the TCP Implementation Working Group of the IETF.

	Title		: Simulation Studies of Increased 
                          Initial TCP Window Size
	Author(s)	: K. Nichols, K. Poduri
	Filename	: draft-ietf-tcpimpl-poduri-00.txt
	Pages		: 6
	Date		: 25-Feb-98
	
An increase in the permissible initial window size of a TCP connection,
from one segment to three or four segments, has been under discussion in
the tcp-impl working group. This document covers some simulation studies of
the effects of increasing the initial window size of TCP. Both long-lived
TCP connections (file transfers) and short-lived web-browsing style
connections were modeled. The simulations were performed using the publicly
available ns-2 simulator and our custom models and files are also
available.


Internet-Drafts are available by anonymous FTP.  Login with the username
"anonymous" and a password of your e-mail address.  After logging in,
type "cd internet-drafts" and then
	"get draft-ietf-tcpimpl-poduri-00.txt".
A URL for the Internet-Draft is:
ftp://ftp.ietf.org/internet-drafts/draft-ietf-tcpimpl-poduri-00.txt

Internet-Drafts directories are located at:

	Africa:	ftp.is.co.za
	
	Europe: ftp.nordu.net
		ftp.nis.garr.it
			
	Pacific Rim: munnari.oz.au
	
	US East Coast: ds.internic.net
	
	US West Coast: ftp.isi.edu

Internet-Drafts are also available by mail.

Send a message to:	mailserv@ds.internic.net.  In the body type:
	"FILE /internet-drafts/draft-ietf-tcpimpl-poduri-00.txt".
	
NOTE:	The mail server at ds.internic.net can return the document in
	MIME-encoded form by using the "mpack" utility.  To use this
	feature, insert the command "ENCODING mime" before the "FILE"
	command.  To decode the response(s), you will need "munpack" or
	a MIME-compliant mail reader.  Different MIME-compliant mail readers
	exhibit different behavior, especially when dealing with
	"multipart" MIME messages (i.e. documents which have been split
	up into multiple messages), so check your local documentation on
	how to manipulate these messages.
		
		
Below is the data which will enable a MIME compliant mail reader
implementation to automatically retrieve the ASCII version of the
Internet-Draft.

--NextPart
Content-Type: Multipart/Alternative; Boundary="OtherAccess"

--OtherAccess
Content-Type: Message/External-body;
	access-type="mail-server";
	server="mailserv@ds.internic.net"

Content-Type: text/plain
Content-ID:	<19980225155030.I-D@ietf.org>

ENCODING mime
FILE /internet-drafts/draft-ietf-tcpimpl-poduri-00.txt

--OtherAccess
Content-Type: Message/External-body;
	name="draft-ietf-tcpimpl-poduri-00.txt";
	site="ftp.ietf.org";
	access-type="anon-ftp";
	directory="internet-drafts"

Content-Type: text/plain
Content-ID:	<19980225155030.I-D@ietf.org>

--OtherAccess--

--NextPart--



From owner-tcp-impl@relay.engr.sgi.com  Thu Feb 26 07:40:45 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA23744 for tcp-impl-list; Thu, 26 Feb 1998 07:38:39 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA23730 for <tcp-impl@engr.sgi.com>; Thu, 26 Feb 1998 07:38:38 -0800
Received: from axl01it.ntc.nokia.com (axl01it.ntc.nokia.com [131.228.118.232]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA02572
	for <tcp-impl@engr.sgi.com>; Thu, 26 Feb 1998 07:38:36 -0800
	env-from (ghani@NASBPD01BS.ntc.nokia.com)
Received: from miller.ntc.nokia.com (miller.ntc.nokia.com [192.100.105.20]) by axl01it.ntc.nokia.com (8.8.5/8.6.9) with ESMTP id RAA26189 for <tcp-impl@engr.sgi.com>; Thu, 26 Feb 1998 17:36:57 +0200 (EET)
Received: from rhino.ntc.nokia.com by miller.ntc.nokia.com with ESMTP
	(1.39.111.2/16.2) id AA258307314; Thu, 26 Feb 1998 10:35:14 -0500
Received: from Microsoft Mail (PU Serial #1991)
  by rhino.ntc.nokia.com (PostalUnion/SMTP(tm) v2.1.9c for Windows NT(tm))
  id AA-1998Feb26.103100.1991.166698; Thu, 26 Feb 1998 10:34:52 -0500
From: ghani@NASBPD01BS.ntc.nokia.com (Ghani Nasir NRC/Boston)
To: tcp-impl@engr.sgi.com (tcpimp)
Message-Id: <1998Feb26.103100.1991.166698@rhino.ntc.nokia.com>
X-Mailer: Microsoft Mail via PostalUnion/SMTP for Windows NT
Mime-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable
Organization: Nokia Telecommunications
Date: Thu, 26 Feb 1998 10:34:52 -0500
Subject: "Acceptable" ACK's
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk



Hi,

I had a question regarding "acceptable" ACK numbers.  In Stevens, Vol. 2, o=
n =
p. 808,
it states that such an ACK is one that has a value such that:

          snd_una < ack_num <=3D snd_max.

>From this, i derive that the ACK processing  should therefore
mainly apply to such ACK values.  So what happens, say if i get an ack_num =
=
value
of ack_num=3Dsnd_una-1.  This means that the ACK has already been =
acknowledged.
Specifically, there are two things confusing me.

1)  First of all, if multiple such ACK's arrive, should they trigger the
fast-retransmit/recovery algorithms?  My initial hunch is no, since they do=
 =
not
apply to unacknowledged data.

2) Secondly, do such ACK's increase the congestion window?  In other words,=
 =
do
you STILL perform the (+t_maxseg) increment in slow start or the
(+t_maxseg*tmaxseg/cw+cw/8) increment in congestion avoidance if
ack_num<snd_una?  Should the decision to do this be consistent with the =
answer to
question 1) above?  If so why/why not?

I checked the BSD Unix code, but was unable to clearly answer the above. =
 Any
response or information would be much apprecaited.
Regards
Nasir Ghani,
Nokia, Boston, USA

BTW....could anyone who responds please also copy any responses to me,
just incase i may not be on the mailing list.....thanks!



From owner-tcp-impl@relay.engr.sgi.com  Thu Feb 26 07:46:26 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id HAA25248 for tcp-impl-list; Thu, 26 Feb 1998 07:45:08 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA25206 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 26 Feb 1998 07:45:06 -0800
Received: from relay-3.ftel.co.uk (relay-3.ftel.co.uk [192.65.220.26]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id HAA04347
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 26 Feb 1998 07:45:00 -0800
	env-from (G.Cope@ftel.co.uk)
Received: from callisto.ftel.co.uk (2VvvKkiWVPWQ2hLBjJSipALUktpIChqf@callisto.ftel.co.uk [172.16.2.14])
	by relay-3.ftel.co.uk (8.8.7/8.8.7/Revision:1.35) with ESMTP id PAA18755
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 26 Feb 1998 15:44:29 GMT
Received: from callisto.ftel.co.uk (R72+AMOTWnawVFWehQZ8i/gKi3ipffdf@localhost.ftel.co.uk [127.0.0.1])
	by callisto.ftel.co.uk (8.8.7/8.8.7/Revision:1.32) with SMTP id PAA14839;
	Thu, 26 Feb 1998 15:44:27 GMT
Message-ID: <34F58DDA.5759@ftel.co.uk>
Date: Thu, 26 Feb 1998 15:44:26 +0000
From: Graham Cope <G.Cope@ftel.co.uk>
Organization: Fujitsu Telecommunications Europe Ltd
X-Mailer: Mozilla 3.01Gold (X11; I; SunOS 5.6 sun4u)
MIME-Version: 1.0
To: tcp-impl@cthulhu.engr.sgi.com
CC: Forgham A <A.Forgham@ftel.co.uk>, McGaw Patrick <P.McGaw@ftel.co.uk>,
        Walley Gary <walleygm@btlip10.bt.co.uk>
Subject: Re: I-D ACTION:draft-ietf-tcpimpl-poduri-00.txt
References: <199802261429.JAA02653@ns.ietf.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Some immediate reactions to the paper (possibly going over old ground):


1/ What is important, the IW as measures in segments, or the IW as
measured in bytes?


2/ Is not the ssthresh also open to optimisation (ref: paper by Janey
Hoe)? 


3/ The issue of embedded URLs (end of section 3) begs a number of
questions concerning the extent towhich TCP should be 'application
aware'. Purists would argue that it should have no awareness, and just
be robust (but not necessarily optimal) with respect to anything that
applications throw at it.



Graham Cope

From owner-tcp-impl@relay.engr.sgi.com  Sat Feb 28 01:06:46 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id BAA21232 for tcp-impl-list; Sat, 28 Feb 1998 01:05:18 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id BAA21227 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 28 Feb 1998 01:05:14 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id BAA03083
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 28 Feb 1998 01:05:14 -0800
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id AAA07766; Sat, 28 Feb 1998 00:55:01 -0800 (PST)
Message-Id: <199802280855.AAA07766@daffy.ee.lbl.gov>
To: Graham Cope <G.Cope@ftel.co.uk>
Cc: tcp-impl@cthulhu.engr.sgi.com, Forgham A <A.Forgham@ftel.co.uk>,
        McGaw Patrick <P.McGaw@ftel.co.uk>,
        Walley Gary <walleygm@btlip10.bt.co.uk>
Subject: Re: I-D ACTION:draft-ietf-tcpimpl-poduri-00.txt
In-reply-to: Your message of Thu, 26 Feb 1998 15:44:26 PST.
Date: Sat, 28 Feb 1998 00:55:01 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> 1/ What is important, the IW as measures in segments, or the IW as
> measured in bytes?

These are often used interchangeably, and I don't see a significant
difference between using one or the other, providing that we're careful
in doing the congestion avoidance bookkeeping.

> 2/ Is not the ssthresh also open to optimisation (ref: paper by Janey
> Hoe)? 

This remains a research issue.  Janey's paper doesn't spell out how the
ssthresh estimation is to be done, and without guidance there, we need to
wait before thinking about perhaps incorporating it into the congestion
control standard.

> 3/ The issue of embedded URLs (end of section 3) begs a number of
> questions concerning the extent towhich TCP should be 'application
> aware'. Purists would argue that it should have no awareness, and just
> be robust (but not necessarily optimal) with respect to anything that
> applications throw at it.

definitely!  You may be misinterpreting the texst.  It's just saying that
the simulations had multiple connections beginning in parallel because
that's often what happens with embedded URLs.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Sat Feb 28 09:58:22 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id JAA00805 for tcp-impl-list; Sat, 28 Feb 1998 09:56:48 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA00800 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 28 Feb 1998 09:56:46 -0800
Received: from tnt.isi.edu (tnt.isi.edu [128.9.128.128]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA14918
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 28 Feb 1998 09:56:45 -0800
	env-from (touch@ISI.EDU)
Received: from rum.isi.edu (rum-s.isi.edu [128.9.192.237])
	by tnt.isi.edu (8.8.7/8.8.6) with ESMTP id JAA22471;
	Sat, 28 Feb 1998 09:50:03 -0800 (PST)
From: Joe Touch <touch@ISI.EDU>
Received: (from touch@localhost)
	by rum.isi.edu (8.8.7/8.8.6) id JAA12641;
	Sat, 28 Feb 1998 09:50:02 -0800 (PST)
Date: Sat, 28 Feb 1998 09:50:02 -0800 (PST)
Message-Id: <199802281750.JAA12641@rum.isi.edu>
To: G.Cope@ftel.co.uk, vern@ee.lbl.gov
Subject: Re: I-D ACTION:draft-ietf-tcpimpl-poduri-00.txt
Cc: A.Forgham@ftel.co.uk, P.McGaw@ftel.co.uk, tcp-impl@cthulhu.engr.sgi.com,
        walleygm@btlip10.bt.co.uk
X-auto-sig-adder-by: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Subject: Re: I-D ACTION:draft-ietf-tcpimpl-poduri-00.txt
> Date: Sat, 28 Feb 1998 00:55:01 PST
> From: Vern Paxson <vern@ee.lbl.gov>
> 
> > 1/ What is important, the IW as measures in segments, or the IW as
> > measured in bytes?
> 
> These are often used interchangeably, and I don't see a significant
> difference between using one or the other, providing that we're careful
> in doing the congestion avoidance bookkeeping.

This is a problem in other areas of implementation.
There are three terms used - bytes, segments, and full segments.

See John Heidemann's CCR paper on how these terms can interact to
stall the feedback - e.g., limiting the sender to two outstanding (any
size) segments, and forcing the receiver to wait for two FULL segments
or a timeout.

"being careful" may not be sufficient to avoid these kinds of
unintended behaviors. Is it feasible to encourage a single term?

Joe
----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Sat Feb 28 12:01:57 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id MAA13608 for tcp-impl-list; Sat, 28 Feb 1998 12:00:14 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA13603 for <tcp-impl@engr.sgi.com>; Sat, 28 Feb 1998 12:00:12 -0800
Received: from pinot.eecs.harvard.edu (pinot.eecs.harvard.edu [140.247.60.65]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id MAA02411
	for <tcp-impl@engr.sgi.com>; Sat, 28 Feb 1998 12:00:11 -0800
	env-from (dong@eecs.harvard.edu)
Received: from localhost (localhost [127.0.0.1]) by pinot.eecs.harvard.edu (8.6.12/8.6.12) with SMTP id PAA04565 for tcp-impl@engr.sgi.com; Sat, 28 Feb 1998 15:00:10 -0500
Message-Id: <199802282000.PAA04565@pinot.eecs.harvard.edu>
X-Authentication-Warning: pinot.eecs.harvard.edu: Host localhost didn't use HELO protocol
To: tcp-impl@engr.sgi.com
Subject: is this a rxt timer bug?
Date: Sat, 28 Feb 98 15:00:10 -0500
From: Dong Lin <dong@eecs.harvard.edu>
X-Mts: smtp
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


The exponential backoff pointer tp->t_rxtshift is currently cleared
inside tcp_input.c:tcp_xmit_timer when a new RTT sample is
collected. This implies that positive retransmission ACKs cannot reset
rxtshift.

As a result, TCP can quit the connection within 13 seconds after 13
timeouts for 13 different lost packets under the following
circumstances:
 1. disabled timestamps
 2. each loss causes a timeout
 3. no lost retransmissions
 4. one loss for each window (which causes rxtshift never cleared)

I encountered this case with 90% probability when the loss rate is
higher than 15%.

The obvious fix is to reset rxtshift for all positive ACKs.

Dong Lin

From owner-tcp-impl@relay.engr.sgi.com  Sat Feb 28 23:23:53 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id XAA13784 for tcp-impl-list; Sat, 28 Feb 1998 23:22:12 -0800
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id XAA13739 for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 28 Feb 1998 23:22:04 -0800
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id XAA16727
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 28 Feb 1998 23:22:01 -0800
	env-from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id XAA08988; Sat, 28 Feb 1998 23:21:58 -0800 (PST)
Message-Id: <199803010721.XAA08988@daffy.ee.lbl.gov>
To: Joe Touch <touch@ISI.EDU>
Cc: G.Cope@ftel.co.uk, A.Forgham@ftel.co.uk, P.McGaw@ftel.co.uk,
        tcp-impl@cthulhu.engr.sgi.com, walleygm@btlip10.bt.co.uk
Subject: Re: I-D ACTION:draft-ietf-tcpimpl-poduri-00.txt
In-reply-to: Your message of Sat, 28 Feb 1998 09:50:02 PST.
Date: Sat, 28 Feb 1998 23:21:58 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> There are three terms used - bytes, segments, and full segments.
> ...
> "being careful" may not be sufficient to avoid these kinds of
> unintended behaviors. Is it feasible to encourage a single term?

Seems rather than picking a single term, we should make sure that the
revision of RFC 2001 defines each of these terms carefully (if it or some
other RFC doesn't already), and then use them accordingly.  Because I don't
think we can get away from sometimes needing to use each of the different
terms.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar  4 03:34:25 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id DAA561856 for tcp-impl-list; Wed, 4 Mar 1998 03:32:38 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id DAA565463 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 4 Mar 1998 03:32:35 -0800 (PST)
Received: from relay-2.ftel.co.uk (relay-2.ftel.co.uk [192.65.220.25]) by sgi.sgi.com (980304.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id DAA14792
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 4 Mar 1998 03:32:13 -0800 (PST)
	mail_from (G.Cope@ftel.co.uk)
Received: from callisto.ftel.co.uk (XOOLKwYyeIfUSod4TbIX0Xh23u2/ClRO@callisto.ftel.co.uk [172.16.2.14])
	by relay-2.ftel.co.uk (8.8.7/8.8.7/Revision:1.35) with ESMTP id LAA28755
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 4 Mar 1998 11:32:00 GMT
Received: from callisto.ftel.co.uk (x03ddLXeUYle7dzduoPT6aSWclA4jPoi@localhost.ftel.co.uk [127.0.0.1])
	by callisto.ftel.co.uk (8.8.7/8.8.7/Revision:1.32) with SMTP id LAA13696;
	Wed, 4 Mar 1998 11:31:58 GMT
Message-ID: <34FD3BAC.1314@ftel.co.uk>
Date: Wed, 04 Mar 1998 11:31:56 +0000
From: Graham Cope <G.Cope@ftel.co.uk>
Organization: Fujitsu Telecommunications Europe Ltd
X-Mailer: Mozilla 3.01Gold (X11; I; SunOS 5.6 sun4u)
MIME-Version: 1.0
To: tcp-impl@cthulhu.engr.sgi.com
CC: McGaw Patrick <P.McGaw@ftel.co.uk>,
        Walley Gary <walleygm@btlip10.bt.co.uk>
Subject: Time-out / FR interaction
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Consider the following scenario....

Segments 1,2,3,4,5,6 are transmitted within a window.

Segment 1 arrives safely and is ACKed.

Segment 2 gets lost.

Segments 3,4,5, get delayed. Before they arrive, triggering an ACK since
out of sequence data is received, segment 2 gets timed-out.

So, segment 2 gets retransmitted (due to time-out).

Then 3 ACKs arrive for segments 3,4 and 5.

The sender enters fast retransmit (or does it?).
  If so, which is the 'most likely lost segment', for re-sending (1 or
2)?
  Should is also do congestion avoidance?



Graham

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar  6 10:29:19 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id KAA1221433 for tcp-impl-list; Fri, 6 Mar 1998 10:27:23 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id KAA1254991 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 6 Mar 1998 10:27:21 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980305.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id KAA15910
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Mar 1998 10:27:20 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id KAA23129; Fri, 6 Mar 1998 10:27:15 -0800 (PST)
Message-Id: <199803061827.KAA23129@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: tcp-impl minutes or lack thereof
Cc: allyn@mci.net
Date: Fri, 06 Mar 1998 10:27:15 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Unfortunately we've had a minutes failure for the DC meeting, meaning that
the official minutes are not likely to be forthcoming.  So if any of you
have notes from the meeting you care to pass along, I'll collate them into
some sort of summary to pass along to the list, so we have context for the
LA meeting.

Speaking of the LA meeting, we need someone to volunteer to take official
notes.  If you're willing, that'd be great, please let me know.

	Thanks,

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar  6 11:48:35 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id LAA1285205 for tcp-impl-list; Fri, 6 Mar 1998 11:45:50 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id LAA1293052 for <tcp-impl@engr.sgi.com>; Fri, 6 Mar 1998 11:45:48 -0800 (PST)
Received: from ns.ietf.org (ietf.org [132.151.1.19]) by sgi.sgi.com (980305.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id LAA18682
	for <tcp-impl@engr.sgi.com>; Fri, 6 Mar 1998 11:45:47 -0800 (PST)
	mail_from (cclark@cnri.reston.va.us)
Received: from CNRI.Reston.VA.US (localhost [127.0.0.1])
	by ns.ietf.org (8.8.7/8.8.7a) with ESMTP id OAA20480;
	Fri, 6 Mar 1998 14:45:42 -0500 (EST)
Message-Id: <199803061945.OAA20480@ns.ietf.org>
Mime-Version: 1.0
Content-Type: Multipart/Mixed; Boundary="NextPart"
To: IETF-Announce:;@cnri.reston.va.us
Cc: tcp-impl@engr.sgi.com
From: Internet-Drafts@ns.ietf.org
Reply-to: Internet-Drafts@ns.ietf.org
Subject: I-D ACTION:draft-ietf-tcpimpl-tools-02.txt
Date: Fri, 06 Mar 1998 14:45:42 -0500
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

--NextPart

A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the TCP Implementation Working Group of the IETF.

	Title		: Some Testing Tools for TCP Implementors
	Author(s)	: S. Parker, C. Schmechel
	Filename	: draft-ietf-tcpimpl-tools-02.txt
	Pages		: 14
	Date		: 05-Mar-98
	
       Available tools for testing TCP implementations are catalogued by
       this memo.  Hopefully disseminating this information will
       encourage those responsible for building and maintaining TCP to
       make the best use of available tests.  The type of testing the
       tool provides, the type of tests it is capable of doing, and its
       availability is enumerated.  This document lists only tools which
       can evaluate one or more TCP implementations, or which can privde
       some specific results which describe or evaluate the TCP being
       tested.

Internet-Drafts are available by anonymous FTP.  Login with the username
"anonymous" and a password of your e-mail address.  After logging in,
type "cd internet-drafts" and then
	"get draft-ietf-tcpimpl-tools-02.txt".
A URL for the Internet-Draft is:
ftp://ftp.ietf.org/internet-drafts/draft-ietf-tcpimpl-tools-02.txt

Internet-Drafts directories are located at:

	Africa:	ftp.is.co.za
	
	Europe: ftp.nordu.net
		ftp.nis.garr.it
			
	Pacific Rim: munnari.oz.au
	
	US East Coast: ds.internic.net
	
	US West Coast: ftp.isi.edu

Internet-Drafts are also available by mail.

Send a message to:	mailserv@ietf.org.  In the body type:
	"FILE /internet-drafts/draft-ietf-tcpimpl-tools-02.txt".
	
NOTE:	The mail server at ietf.org can return the document in
	MIME-encoded form by using the "mpack" utility.  To use this
	feature, insert the command "ENCODING mime" before the "FILE"
	command.  To decode the response(s), you will need "munpack" or
	a MIME-compliant mail reader.  Different MIME-compliant mail readers
	exhibit different behavior, especially when dealing with
	"multipart" MIME messages (i.e. documents which have been split
	up into multiple messages), so check your local documentation on
	how to manipulate these messages.
		
		
Below is the data which will enable a MIME compliant mail reader
implementation to automatically retrieve the ASCII version of the
Internet-Draft.

--NextPart
Content-Type: Multipart/Alternative; Boundary="OtherAccess"

--OtherAccess
Content-Type: Message/External-body;
	access-type="mail-server";
	server="mailserv@ietf.org"

Content-Type: text/plain
Content-ID:	<19980305142541.I-D@ietf.org>

ENCODING mime
FILE /internet-drafts/draft-ietf-tcpimpl-tools-02.txt

--OtherAccess
Content-Type: Message/External-body;
	name="draft-ietf-tcpimpl-tools-02.txt";
	site="ftp.ietf.org";
	access-type="anon-ftp";
	directory="internet-drafts"

Content-Type: text/plain
Content-ID:	<19980305142541.I-D@ietf.org>

--OtherAccess--

--NextPart--



From owner-tcp-impl@relay.engr.sgi.com  Fri Mar  6 12:19:05 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id MAA1301536 for tcp-impl-list; Fri, 6 Mar 1998 12:16:59 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id MAA1293314 for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 6 Mar 1998 12:16:57 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980305.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id MAA01446
	for <tcp-impl@relay.engr.SGI.COM>; Fri, 6 Mar 1998 12:16:55 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id MAA23950; Fri, 6 Mar 1998 12:16:51 -0800 (PST)
Message-Id: <199803062016.MAA23950@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Last Call for Testing Tools I-D
Cc: allyn@mci.net, sparker@Eng.Sun.COM, cschmec@Eng.Sun.COM
Date: Fri, 06 Mar 1998 12:16:51 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

After the LA IETF, we intend to submit the Testing Tools I-D:

	ftp://ftp.ietf.org/internet-drafts/draft-ietf-tcpimpl-tools-02.txt

to the IESG for publication as an Informational RFC.  At the meeting we
will conduct a "last call" to determine if there is significant dissension.
If you have comments/objections, it would be helpful to first develop them
on the mailing list now, rather than waiting for the meeting, so we can see
what can be discussed & ironed out between now and the meeting.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Sun Mar  8 23:37:29 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id XAA2104651 for tcp-impl-list; Sun, 8 Mar 1998 23:36:04 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id XAA1998139 for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 8 Mar 1998 23:36:02 -0800 (PST)
Received: from philos.philosys.de (philos.philosys.de [193.100.254.1]) by sgi.sgi.com (980308.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id XAA05159
	for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 8 Mar 1998 23:35:59 -0800 (PST)
	mail_from (Uwe.Girlich@philosys.de)
Received: (from girlich@localhost)
	by philos.philosys.de (8.8.8/8.8.8) id IAA03831
	for tcp-impl@cthulhu.engr.sgi.com; Mon, 9 Mar 1998 08:38:40 +0100 (MET)
From: Uwe Girlich <Uwe.Girlich@philosys.de>
Message-Id: <199803090738.IAA03831@philos.philosys.de>
Subject: TCP timer problem
To: tcp-impl@cthulhu.engr.sgi.com
Date: Mon, 9 Mar 1998 08:38:39 +0100 (MET)
X-Mailer: ELM [version 2.4 PL22]
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hello all,

I'm currently working on TCP security-related changes on SINIX (Siemens 
Nixdorf UNIX) systems. I'm relativly new to the TCP subject and hope you could
help me:

In BSD-derived TCP implementations the connection-establishment timer and 
the keep-alive timer share the same counter timer[TCPT_KEEP]. This is
just fine but when a single packet with a SYN-bit set arrives on a 
connecting socket (waiting for the first response after active open SYN 
sending), the timer gets the usual tcp_keepidle value (see Stevens 2, 
tcp_input.c, line 339 tp->t_timer[TCPT_KEEP] = tcp_keepidle;): 2 hours and 
the connection isn't established yet!
This comes because after every packet receiving the timer gets this value.

I think (it may be wrong) that this is a bad behaviour, because when the SYN 
packet (without ACK, so to prevent changing to the TCPS_ESTABLISHED state) 
was an attack and nothing more comes in at this connection, this port is dead
for the next two hours. 

The same problem arises, when someone from A telnet to a computer B behind a 
wrongly configured firewall (a custumer of mine had this problem recently):
The first SYN (from A) comes in (wrong behaviour of the firewall), the SYN/ACK
(from B) gets filtered, so A resends its SYN packet and B (now in SYN_SENT) 
resets the timer to tcp_keepidle and telnet is dead on B for the next 2 hours.

I think the timer should be reset to tcp_keepidle only if the state is at 
least TCPS_ESTABLISHED. In the 2 position with the state transition to 
TCPS_ESTABLISHED (passive and active open) the timer must be set as well. 

Bye, 
Uwe Girlich, Uwe.Girlich@philosys.de


From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 10 09:22:33 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id JAA1662514 for tcp-impl-list; Tue, 10 Mar 1998 09:20:14 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id JAA2727270 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 10 Mar 1998 09:20:12 -0800 (PST)
Received: from ALPHA1.RESTON.MCI.NET (alpha1.Reston.mci.net [204.70.128.80]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id JAA29229
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 10 Mar 1998 09:20:11 -0800 (PST)
	mail_from (allyn@MCI.NET)
Received: from allyn .reston.mci.net ([204.71.238.98])
 by ALPHA1.RESTON.MCI.NET (PMDF V5.1-10 #8388)
 with SMTP id <01IUI1T1B8JK001VZ8@ALPHA1.RESTON.MCI.NET> for
 tcp-impl@cthulhu.engr.sgi.com; Tue, 10 Mar 1998 12:20:08 EST
Date: Tue, 10 Mar 1998 12:19:48 -0500 (Eastern Standard Time)
From: allyn romanow <allyn@MCI.NET>
Subject: New co-chair, Mark Allman
To: tcp-impl@cthulhu.engr.sgi.com
Cc: vern@ee.lbl.gov
Message-id: <SIMEON.9803101248.O@allyn.reston.mci.net>
MIME-version: 1.0
X-Mailer: Simeon for Win32 Version 4.1.5 Build (42)
Content-type: TEXT/PLAIN; CHARSET=US-ASCII
X-Authentication: none
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


The subject says it all. Welcome to Mark Allman as the new 
co-chair! Steve Alexander had to step down due to other 
commitments.

----------------------
allyn romanow
allyn@mci.net


From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 11 10:07:15 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id KAA3246771 for tcp-impl-list; Wed, 11 Mar 1998 10:04:22 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id KAA3242896 for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 11 Mar 1998 10:04:20 -0800 (PST)
Received: from ntrlink.hq.interlink.com (ntrlink.hq.interlink.com [138.42.128.44]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id KAA10681
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 11 Mar 1998 10:04:18 -0800 (PST)
	mail_from (fab@md.interlink.com)
Received: from fab.md.interlink.com by ntrlink.hq.interlink.com (8.8.5/SMI-SVR4)
	id KAA15091; Wed, 11 Mar 1998 10:11:52 -0800 (PST)
Received: by fab.md.interlink.com (SMI-8.6/SMI-SVR4)
	id NAA09593; Wed, 11 Mar 1998 13:07:10 -0500
Date: Wed, 11 Mar 1998 13:07:10 -0500
From: Fred Bohle  <fab@md.interlink.com>
Message-Id: <199803111807.NAA09593@fab.md.interlink.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: TCP RST validation
X-Sun-Charset: US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Greetings,

	I am having a problem reconciling the TCP RFC 793, with
Unix code, as described in Stevens & Wright, TCP/IP Illustrated,
Volume 2.  The RFC says, on P. 37, at the end of section 3.4:

     Reset Processing
   
     In all states except SYN-SENT, all reset (RST) segments are validated
     by checking their SEQ-fields.  A reset is valid if its sequence number
     is in the window.  ...

But in Stevens, Figure 28-30, Corrections for lines 646-676,
(corrected drop code) the comments say:

/*
 * Send an ACK to resynchronize and drop any data.
 * But keep on processing for RST or ACK.
 */ 

which the code does.  This leads me to believe that an RST would
be accepted no matter what the sequence number was.  That can't
be right.

	Is this a bug in the code, or a deliberate violation of
the RFC?  Or have I missed something?


	We have a situation where an RST packet comes in that has
a sequence number which is out of the window (left of the current
number by a lot), and the code processes the RST.  But the RFC
says not to process it.  I wanted some input before I started making
code changes...

Fred

------------------------------------------------------------------------
Fred Bohle			EMAIL: fab@interlink.com
Interlink Computer Sciences	AT&T : 410-992-7750 x314
9250 Rumsey Road, Suite 200     Home : 410-643-6720
Columbia, MD 21045-1946         WWW  : www.interlink.com
------------------------------------------------------------------------


From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 12 16:54:23 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id QAA2248133 for tcp-impl-list; Thu, 12 Mar 1998 16:14:47 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id QAA3888125 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 12 Mar 1998 16:14:45 -0800 (PST)
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id QAA16000
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 12 Mar 1998 16:14:44 -0800 (PST)
	mail_from (Chris.Schmechel@eng.Sun.COM)
Received: from sunmail1.Sun.COM ([129.145.1.2]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id QAA15179 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 12 Mar 1998 16:14:44 -0800
Received: from jurassic.eng.sun.com by sunmail1.Sun.COM (SMI-8.6/SMI-4.1)
	id QAA14768; Thu, 12 Mar 1998 16:14:41 -0800
Received: from mont-blanc (mont-blanc [129.146.86.183])
	by jurassic.eng.sun.com (8.9.0.Beta1+Sun/8.9.0.Beta1) with SMTP id QAA04453;
	Thu, 12 Mar 1998 16:14:41 -0800 (PST)
Message-Id: <199803130014.QAA04453@jurassic.eng.sun.com>
Date: Thu, 12 Mar 1998 16:13:51 -0800 (PST)
From: Chris Schmechel <Chris.Schmechel@eng.Sun.COM>
Reply-To: Chris Schmechel <Chris.Schmechel@eng.Sun.COM>
Subject: Re: Last Call for Testing Tools I-D
To: tcp-impl@cthulhu.engr.sgi.com
Cc: cschmec@eng.Sun.COM
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
Content-MD5: iu5ZMrHi1hmjyuZ1Db2cgA==
X-Mailer: dtmail 1.3.0 CDE Version 1.3_6 SunOS 5.7 sun4u sparc 
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Hi -

It was pointed out that I need to finish the "Security Considerations"
section in the Testing Tools I-D.  I've come up with the following points
to cover:

	- Some of the tools could be used to create rogue packets or
	  denial-of-service attacks, etc.
	- Some of the tools require changes to the kernel
	- Some of the tools require root privileges to execute
	- None of the listed tools evaluate security in anyway or form
	- You are trusting code that you have fetched from some perhaps
	  untrustworthy remote site (maybe the code has a Trojan in it)

So I would like to open up the discussion for comments on the above
points.  Suggestive wording is also welcome.

Thanks,

-Chris Schmechel
 <Chris.Schmechel@Eng.Sun.COM>
 


From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 12 21:56:05 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id VAA4115045 for tcp-impl-list; Thu, 12 Mar 1998 21:54:35 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id VAA4016878 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 12 Mar 1998 21:54:33 -0800 (PST)
Received: from labinfo.iet.unipi.it (labinfo.iet.unipi.it [131.114.9.5]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id VAA25277
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 12 Mar 1998 21:54:31 -0800 (PST)
	mail_from (luigi@labinfo.iet.unipi.it)
Received: from localhost (luigi@localhost) by labinfo.iet.unipi.it (8.6.5/8.6.5) id FAA14424; Fri, 13 Mar 1998 05:26:00 +0100
From: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Message-Id: <199803130426.FAA14424@labinfo.iet.unipi.it>
Subject: Re: Last Call for Testing Tools I-D
To: Chris.Schmechel@ENG.SUN.COM
Date: Fri, 13 Mar 1998 05:26:00 +0100 (MET)
Cc: tcp-impl@cthulhu.engr.sgi.com, cschmec@ENG.SUN.COM
In-Reply-To: <199803130014.QAA04453@jurassic.eng.sun.com> from "Chris Schmechel" at Mar 12, 98 04:13:32 pm
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Hi -
> 
> It was pointed out that I need to finish the "Security Considerations"
> section in the Testing Tools I-D.  I've come up with the following points
> to cover:
> 
> 	- Some of the tools could be used to create rogue packets or
> 	  denial-of-service attacks, etc.
> 	- Some of the tools require changes to the kernel
> 	- Some of the tools require root privileges to execute
> 	- None of the listed tools evaluate security in anyway or form
> 	- You are trusting code that you have fetched from some perhaps
> 	  untrustworthy remote site (maybe the code has a Trojan in it)

all true but other than possibly pointing out the above features
(some of them presumably common to most software) i don't see what
else should be done.

	cheers
	luigi
-----------------------------+--------------------------------------
Luigi Rizzo                  |  Dip. di Ingegneria dell'Informazione
email: luigi@iet.unipi.it    |  Universita' di Pisa
tel: +39-50-568533           |  via Diotisalvi 2, 56126 PISA (Italy)
fax: +39-50-568522           |  http://www.iet.unipi.it/~luigi/
_____________________________|______________________________________

From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 12 22:15:29 1998
Received: (from majordomo-owner@localhost) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id WAA4070576 for tcp-impl-list; Thu, 12 Mar 1998 22:12:57 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id WAA4126042 for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 12 Mar 1998 22:12:51 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id WAA00078
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 12 Mar 1998 22:12:51 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id WAA12800; Thu, 12 Mar 1998 22:12:45 -0800 (PST)
Message-Id: <199803130612.WAA12800@daffy.ee.lbl.gov>
To: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: Last Call for Testing Tools I-D
In-reply-to: Your message of Fri, 13 Mar 1998 05:26:00 PST.
Date: Thu, 12 Mar 1998 22:12:45 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> all true but other than possibly pointing out the above features
> (some of them presumably common to most software) i don't see what
> else should be done.

That's the point - the Security Considerations section documents
possible security issues.  What Chris is looking for are (1) other
security considerations to add to the discussion, (2) comments on
whether the ones listed are appropriate/correct/etc., and (3) wording
for describing them better and in more detail.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 13 05:41:19 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id FAA4191727
	for tcp-impl-list;
	Fri, 13 Mar 1998 05:38:46 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id FAA4212411
	for <tcp-impl@engr.sgi.com>;
	Fri, 13 Mar 1998 05:38:44 -0800 (PST)
Received: from ns.ietf.org (ietf.org [132.151.1.19]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id FAA29103
	for <tcp-impl@engr.sgi.com>; Fri, 13 Mar 1998 05:38:43 -0800 (PST)
	mail_from (cclark@cnri.reston.va.us)
Received: from CNRI.Reston.VA.US (localhost [127.0.0.1])
	by ns.ietf.org (8.8.7/8.8.7a) with ESMTP id IAA27953;
	Fri, 13 Mar 1998 08:38:39 -0500 (EST)
Message-Id: <199803131338.IAA27953@ns.ietf.org>
Mime-Version: 1.0
Content-Type: Multipart/Mixed; Boundary="NextPart"
To: IETF-Announce:;@cnri.reston.va.us
Cc: tcp-impl@engr.sgi.com
From: Internet-Drafts@ns.ietf.org
Reply-to: Internet-Drafts@ns.ietf.org
Subject: I-D ACTION:draft-ietf-tcpimpl-needdoc-00.txt
Date: Fri, 13 Mar 1998 08:38:39 -0500
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

--NextPart

A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the TCP Implementation Working Group of the IETF.

	Title		: TCP Implementation Problems 
                          That Need To Be Documented
	Author(s)	: V. Paxson, M. Allman
	Filename	: draft-ietf-tcpimpl-needdoc-00.txt
	Pages		: 5
	Date		: 12-Mar-98
	
   The TCP-IMPL working group has documented a number of TCP
   implementation problems [PADHV98].  However, a significant number
   still have not been fully described and documented in the form used
   in [PADHV98].  This memo briefly describes a number of these,
   including commentary as to the authors' opinions regarding the
   importance of documenting the problem.  This memo is *not* intended
   to ever see light as an RFC of some form; its sole function is to
   facilitate working group discussion of which problems are more
   pressing to document than others, and to aid in arriving at a
   decision as to when [PADHV98] will be sufficiently complete to merit
   its publication as an Informational RFC.

   We divide the descriptions into ''serious'' problems, meaning those    we think should be included in [PADHV98] prior to its publication;
   ''security'' problems, which might not be viewed as implementation
   problems per se, but represent significant security problems of which
   TCP implementors should be aware; and ''less serious'' problems, those
   that, if the working group fails to find volunteers to document them,
   should not hold up [PADHV98].
 
   It might be worthwhile to separate the security problems out into
   their own document.  We particularly solicit working group input on
   this subject.


Internet-Drafts are available by anonymous FTP.  Login with the username
"anonymous" and a password of your e-mail address.  After logging in,
type "cd internet-drafts" and then
	"get draft-ietf-tcpimpl-needdoc-00.txt".
A URL for the Internet-Draft is:
ftp://ftp.ietf.org/internet-drafts/draft-ietf-tcpimpl-needdoc-00.txt

Internet-Drafts directories are located at:

	Africa:	ftp.is.co.za
	
	Europe: ftp.nordu.net
		ftp.nis.garr.it
			
	Pacific Rim: munnari.oz.au
	
	US East Coast: ds.internic.net
	
	US West Coast: ftp.isi.edu

Internet-Drafts are also available by mail.

Send a message to:	mailserv@ietf.org.  In the body type:
	"FILE /internet-drafts/draft-ietf-tcpimpl-needdoc-00.txt".
	
NOTE:	The mail server at ietf.org can return the document in
	MIME-encoded form by using the "mpack" utility.  To use this
	feature, insert the command "ENCODING mime" before the "FILE"
	command.  To decode the response(s), you will need "munpack" or
	a MIME-compliant mail reader.  Different MIME-compliant mail readers
	exhibit different behavior, especially when dealing with
	"multipart" MIME messages (i.e. documents which have been split
	up into multiple messages), so check your local documentation on
	how to manipulate these messages.
		
		
Below is the data which will enable a MIME compliant mail reader
implementation to automatically retrieve the ASCII version of the
Internet-Draft.

--NextPart
Content-Type: Multipart/Alternative; Boundary="OtherAccess"

--OtherAccess
Content-Type: Message/External-body;
	access-type="mail-server";
	server="mailserv@ietf.org"

Content-Type: text/plain
Content-ID:	<19980312103847.I-D@ietf.org>

ENCODING mime
FILE /internet-drafts/draft-ietf-tcpimpl-needdoc-00.txt

--OtherAccess
Content-Type: Message/External-body;
	name="draft-ietf-tcpimpl-needdoc-00.txt";
	site="ftp.ietf.org";
	access-type="anon-ftp";
	directory="internet-drafts"

Content-Type: text/plain
Content-ID:	<19980312103847.I-D@ietf.org>

--OtherAccess--

--NextPart--



From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 13 05:41:19 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id FAA4091953
	for tcp-impl-list;
	Fri, 13 Mar 1998 05:39:05 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id FAA4219671
	for <tcp-impl@engr.sgi.com>;
	Fri, 13 Mar 1998 05:39:04 -0800 (PST)
Received: from ns.ietf.org (ietf.org [132.151.1.19]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id FAA29170
	for <tcp-impl@engr.sgi.com>; Fri, 13 Mar 1998 05:39:03 -0800 (PST)
	mail_from (cclark@cnri.reston.va.us)
Received: from CNRI.Reston.VA.US (localhost [127.0.0.1])
	by ns.ietf.org (8.8.7/8.8.7a) with ESMTP id IAA27999;
	Fri, 13 Mar 1998 08:39:00 -0500 (EST)
Message-Id: <199803131339.IAA27999@ns.ietf.org>
Mime-Version: 1.0
Content-Type: Multipart/Mixed; Boundary="NextPart"
To: IETF-Announce:;@cnri.reston.va.us
Cc: tcp-impl@engr.sgi.com
From: Internet-Drafts@ns.ietf.org
Reply-to: Internet-Drafts@ns.ietf.org
Subject: I-D ACTION:draft-ietf-tcpimpl-prob-03.txt
Date: Fri, 13 Mar 1998 08:38:59 -0500
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

--NextPart

A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the TCP Implementation Working Group of the IETF.

	Title		: Known TCP Implementation Problems
	Author(s)	: B. Volz, I. Heavens, S. Dawson, V. Paxson, M. Allman
	Filename	: draft-ietf-tcpimpl-prob-03.txt
	Pages		: 34
	Date		: 12-Mar-98
	
   This memo catalogs a number of known TCP implementation problems.
   The goal in doing so is to improve conditions in the existing
   Internet by enhancing the quality of current TCP/IP implementations.
   It is hoped that both performance and correctness issues can be
   resolved by making implementors aware of the problems and their
   solutions.  In the long term, it is hoped that this will provide a
   reduction in unnecessary traffic on the network, the rate of
   connection failures due to protocol errors, and load on network
   servers due to time spent processing both unsuccessful connections
   and retransmitted data.  This will help to ensure the stability of
   the global Internet.

Internet-Drafts are available by anonymous FTP.  Login with the username
"anonymous" and a password of your e-mail address.  After logging in,
type "cd internet-drafts" and then
	"get draft-ietf-tcpimpl-prob-03.txt".
A URL for the Internet-Draft is:
ftp://ftp.ietf.org/internet-drafts/draft-ietf-tcpimpl-prob-03.txt

Internet-Drafts directories are located at:

	Africa:	ftp.is.co.za
	
	Europe: ftp.nordu.net
		ftp.nis.garr.it
			
	Pacific Rim: munnari.oz.au
	
	US East Coast: ds.internic.net
	
	US West Coast: ftp.isi.edu

Internet-Drafts are also available by mail.

Send a message to:	mailserv@ietf.org.  In the body type:
	"FILE /internet-drafts/draft-ietf-tcpimpl-prob-03.txt".
	
NOTE:	The mail server at ietf.org can return the document in
	MIME-encoded form by using the "mpack" utility.  To use this
	feature, insert the command "ENCODING mime" before the "FILE"
	command.  To decode the response(s), you will need "munpack" or
	a MIME-compliant mail reader.  Different MIME-compliant mail readers
	exhibit different behavior, especially when dealing with
	"multipart" MIME messages (i.e. documents which have been split
	up into multiple messages), so check your local documentation on
	how to manipulate these messages.
		
		
Below is the data which will enable a MIME compliant mail reader
implementation to automatically retrieve the ASCII version of the
Internet-Draft.

--NextPart
Content-Type: Multipart/Alternative; Boundary="OtherAccess"

--OtherAccess
Content-Type: Message/External-body;
	access-type="mail-server";
	server="mailserv@ietf.org"

Content-Type: text/plain
Content-ID:	<19980312110729.I-D@ietf.org>

ENCODING mime
FILE /internet-drafts/draft-ietf-tcpimpl-prob-03.txt

--OtherAccess
Content-Type: Message/External-body;
	name="draft-ietf-tcpimpl-prob-03.txt";
	site="ftp.ietf.org";
	access-type="anon-ftp";
	directory="internet-drafts"

Content-Type: text/plain
Content-ID:	<19980312110729.I-D@ietf.org>

--OtherAccess--

--NextPart--



From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 13 09:43:25 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id JAA4373450
	for tcp-impl-list;
	Fri, 13 Mar 1998 09:40:00 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id JAA4361564
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Fri, 13 Mar 1998 09:39:55 -0800 (PST)
Received: from zero.aec.at (zero.aec.at [193.170.192.102]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id JAA22361
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 13 Mar 1998 09:39:53 -0800 (PST)
	mail_from (andi@zero.aec.at)
Received: (qmail 21986 invoked by uid 573); 13 Mar 1998 17:41:48 -0000
To: Fred Bohle <fab@md.interlink.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP RST validation
References: <199803111807.NAA09593@fab.md.interlink.com>
From: Andi Kleen <ak@muc.de>
Date: 13 Mar 1998 18:41:47 +0100
In-Reply-To: Fred Bohle's message of Wed, 11 Mar 1998 13:07:10 -0500
Message-ID: <k2iupisd2c.fsf@zero.aec.at>
Lines: 21
X-Mailer: Gnus v5.4.41/Emacs 19.34
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Fred Bohle  <fab@md.interlink.com> writes:
> 
> 	Is this a bug in the code, or a deliberate violation of
> the RFC?  Or have I missed something?

IMHO it is a bug. I allows nice DoS attacks (the well known 'nuke').
I think FreeBSD has fixed this.

> 
> 	We have a situation where an RST packet comes in that has
> a sequence number which is out of the window (left of the current
> number by a lot), and the code processes the RST.  But the RFC
> says not to process it.  I wanted some input before I started making
> code changes...

I fixed this case in the Linux 2.1 code. So far nobody has complained.
But there is a funny effect when the window gets very small or drops to
null, that was fixed by checking against SEQ <= rst <= SEQ+min(window,32768).
This attempts to make sure that a legal RST always gets through. 

-Andi

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 13 17:05:34 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id RAA4578233
	for tcp-impl-list;
	Fri, 13 Mar 1998 17:02:08 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id RAA4601585
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Fri, 13 Mar 1998 17:02:07 -0800 (PST)
Received: from mail4.microsoft.com (mail4.microsoft.com [131.107.3.29]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id RAA20555
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 13 Mar 1998 17:02:07 -0800 (PST)
	mail_from (peterf@microsoft.com)
Received: by INET-04-IMC with Internet Mail Service (5.5.1960.3)
	id <GWTCTH15>; Fri, 13 Mar 1998 17:02:07 -0800
Message-ID: <8D8EF175E72CD111805800805F3198EE03925CD7@red-msg-46.dns.microsoft.com>
From: Peter Ford <peterf@microsoft.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: RE: I-D ACTION:draft-ietf-tcpimpl-prob-03.txt
Date: Fri, 13 Mar 1998 17:01:52 -0800
X-Mailer: Internet Mail Service (5.5.1960.3)
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


When will the working group document be changed to  reflect some of the
current thinking on the number of initial packets to send such as those
referenced in http://www-nrg.ee.lbl.gov/floyd/tcp_init_win.html

cheers, peter


> -----Original Message-----
> From:	Internet-Drafts@ns.ietf.org [SMTP:Internet-Drafts@ns.ietf.org]
> Sent:	Friday, March 13, 1998 5:39 AM
> To:	IETF-Announce; @cnri.reston.va.us
> Cc:	tcp-impl@cthulhu.engr.sgi.com
> Subject:	I-D ACTION:draft-ietf-tcpimpl-prob-03.txt
> 
> A New Internet-Draft is available from the on-line Internet-Drafts
> directories.
> This draft is a work item of the TCP Implementation Working Group of the
> IETF.
> 
> 	Title		: Known TCP Implementation Problems
> 	Author(s)	: B. Volz, I. Heavens, S. Dawson, V. Paxson, M.
> Allman
> 	Filename	: draft-ietf-tcpimpl-prob-03.txt
> 	Pages		: 34
> 	Date		: 12-Mar-98
> 	
>    This memo catalogs a number of known TCP implementation problems.
>    The goal in doing so is to improve conditions in the existing
>    Internet by enhancing the quality of current TCP/IP implementations.
>    It is hoped that both performance and correctness issues can be
>    resolved by making implementors aware of the problems and their
>    solutions.  In the long term, it is hoped that this will provide a
>    reduction in unnecessary traffic on the network, the rate of
>    connection failures due to protocol errors, and load on network
>    servers due to time spent processing both unsuccessful connections
>    and retransmitted data.  This will help to ensure the stability of
>    the global Internet.
> 
> Internet-Drafts are available by anonymous FTP.  Login with the username
> "anonymous" and a password of your e-mail address.  After logging in,
> type "cd internet-drafts" and then
> 	"get draft-ietf-tcpimpl-prob-03.txt".
> A URL for the Internet-Draft is:
> ftp://ftp.ietf.org/internet-drafts/draft-ietf-tcpimpl-prob-03.txt
> 
> Internet-Drafts directories are located at:
> 
> 	Africa:	ftp.is.co.za
> 	
> 	Europe: ftp.nordu.net
> 		ftp.nis.garr.it
> 			
> 	Pacific Rim: munnari.oz.au
> 	
> 	US East Coast: ds.internic.net
> 	
> 	US West Coast: ftp.isi.edu
> 
> Internet-Drafts are also available by mail.
> 
> Send a message to:	mailserv@ietf.org.  In the body type:
> 	"FILE /internet-drafts/draft-ietf-tcpimpl-prob-03.txt".
> 	
> NOTE:	The mail server at ietf.org can return the document in
> 	MIME-encoded form by using the "mpack" utility.  To use this
> 	feature, insert the command "ENCODING mime" before the "FILE"
> 	command.  To decode the response(s), you will need "munpack" or
> 	a MIME-compliant mail reader.  Different MIME-compliant mail readers
> 	exhibit different behavior, especially when dealing with
> 	"multipart" MIME messages (i.e. documents which have been split
> 	up into multiple messages), so check your local documentation on
> 	how to manipulate these messages.
> 		
> 		
> Below is the data which will enable a MIME compliant mail reader
> implementation to automatically retrieve the ASCII version of the
> Internet-Draft. << Message: Untitled Attachment >> 

From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 13 20:23:32 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id UAA4671064
	for tcp-impl-list;
	Fri, 13 Mar 1998 20:20:33 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id UAA4485477
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Fri, 13 Mar 1998 20:20:32 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id UAA10295
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 13 Mar 1998 20:20:31 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id UAA15626; Fri, 13 Mar 1998 20:19:11 -0800 (PST)
Message-Id: <199803140419.UAA15626@daffy.ee.lbl.gov>
To: Peter Ford <peterf@microsoft.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: I-D ACTION:draft-ietf-tcpimpl-prob-03.txt
In-reply-to: Your message of Fri, 13 Mar 1998 17:01:52 PST.
Date: Fri, 13 Mar 1998 20:19:11 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> When will the working group document be changed to  reflect some of the
> current thinking on the number of initial packets to send such as those
> referenced in http://www-nrg.ee.lbl.gov/floyd/tcp_init_win.html

This will happen as we develop our revision of RFC 2001.  At the last WG
meeting, we attained rough consensus to allow an initial window of two
segments.  Over the next few weeks on the mailing list, and then at LA,
we will see whether we also have rough consensus for the change proposed in
draft-floyd-incr-init-win-01.txt, which was submitted this morning.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 14 11:50:49 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id LAA4669362
	for tcp-impl-list;
	Sat, 14 Mar 1998 11:49:17 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id LAA4700752
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Sat, 14 Mar 1998 11:49:15 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id LAA16297
	for <tcp-impl@relay.engr.SGI.COM>; Sat, 14 Mar 1998 11:49:14 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id LAA16479; Sat, 14 Mar 1998 11:49:14 -0800 (PST)
Message-Id: <199803141949.LAA16479@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: I-D ACTION:draft-ietf-tcpimpl-prob-03.txt
In-reply-to: Your message of Fri, 13 Mar 1998 08:38:59 PST.
Date: Sat, 14 Mar 1998 11:49:14 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Here are the significant diffs between the latest version and the previous
one.  One new problem was added, "Failure to RST on close with data pending",
per the discussion on the mailing list a couple of weeks ago.  The other
changes are clarifications or correctness tweaks.

		Vern


@@ -126,7 +129,9 @@
 Why the problem is viewed as a problem.
 .IP "Relevant RFCs" 5
 Brief discussion of the RFCs with respect to which the problem is viewed
-as an implementation error.
+as an implementation error.  These RFCs often qualify behavior using
+terms such as MUST, SHOULD, MAY, and others written capitalized.
+See RFC 2119 for the exact interpretation of these terms.
 .IP "Trace file demonstrating the problem" 5
 One or more ASCII trace files demonstrating the problem, if applicable.
 These may in the future be replaced with URLs to on-line traces.
@@ -1118,26 +1131,24 @@
 10:27:55.09 D > C: . ack 8 win 8754 (DF)
 .fi
 
+Here, Machine D sends a FIN with 40 bytes of data even
+before the original 10 octets have been acknowledged. This is
+correct behavior as it provides for the highest performance.
+
 .IP References 5
 This problem is documented in [Dawson97].
 
 .IP "How to detect" 5
 For implementations manifesting this problem, it shows up on a packet
-trace.  If the connection is left idle, the keepalive probes will
-arrive closer together than the two hour minimum.
+trace.
 
-Here, Machine D sends a FIN with 40 bytes of data even
-before the original 10 octets have been acknowledged. This is
-correct behavoir as it provides for the highest performance.
@@ -1146,8 +1157,8 @@
 
 .IP Significance 5
 Potentially serious for TCP endpoints that manage large numbers
-of connections, due to exhaustion of memory available for managing
-connection state.
+of connections, due to exhaustion of memory and/or process slots
+available for managing connection state.
 
 .IP Implications 5
 Failure to send the RST can lead to permanently hung TCP
@@ -1263,10 +1272,189 @@
 client > server D=80 S=59500 Rst Seq=597 Len=0 Win=0
 .fi
 
+"client" sends a number of RSTs, one in response to each incoming packet
+from "server".  One might wonder why "server" keeps sending data packets
+after it has received a RST from "client"; the explanation is that "server"
+had already transmitted all five of the data packets before receiving
+the first RST from "client", so it is too late to avoid transmitting them.
+
 .IP "How to detect" 5
 The problem can be detected by inspecting packet traces of a
 large, interrupted bulk transfer.
 
+.Pb
+.IP "Name of Problem" 5
+Failure to RST on close with data pending
+
+.IP Classification 5
+Resource management
+
+.IP Description 5
+When an application closes a connection in such a way that it
+can no longer read any received data, the TCP SHOULD, per
+section 4.2.2.13 of RFC 1122, send a RST if there is any unread
+received data, or if any new data is received. A TCP that fails
+to do so exhibits "Failure to RST on close with data pending".
+
+Note that, for some TCPs, this situation can be caused by an
+application "crashing" while a peer is sending data.
+
+We have observed a number of TCPs that exhibit this problem.
+The problem is less serious if any subsequent data sent to the
+now-closed connection endpoint elicits a RST (see illustration
+below).
+
+.IP Significance 5
+This problem is most significant for endpoints that engage
+in large numbers of connections, as their ability to do so
+will be curtailed as they leak away resources.
+
+.IP Implications 5
+Failure to reset the connection can lead to permanently hung
+connections, in which the remote endpoint takes no further action
+to tear down the connection because it is waiting on the local TCP
+to first take some action.  This is particularly the case if the
+local TCP also allows the advertised window to go to zero, and
+fails to tear down the connection when the remote TCP engages in
+"persist" probes (see example below).
+
+.IP "Relevant RFCs" 5
+RFC 1122 section 4.2.2.13.  Also, 4.2.2.17 for the zero-window probing
+discussion below.
+
+.IP "Trace file demonstrating it" 5
+Made using tcpdump.  No drop information available.
+
+.nf
+13:11:46.04 A > B: S 458659166:458659166(0) win 4096
+                    <mss 1460,wscale 0,eol> (DF)
+13:11:46.04 B > A: S 792320000:792320000(0) ack 458659167
+                    win 4096
+13:11:46.04 A > B: . ack 1 win 4096 (DF)
+13:11.55.80 A > B: . 1:513(512) ack 1 win 4096 (DF)
+13:11.55.80 A > B: . 513:1025(512) ack 1 win 4096 (DF)
+13:11:55.83 B > A: . ack 1025 win 3072
+13:11.55.84 A > B: . 1025:1537(512) ack 1 win 4096 (DF)
+13:11.55.84 A > B: . 1537:2049(512) ack 1 win 4096 (DF)
+13:11.55.85 A > B: . 2049:2561(512) ack 1 win 4096 (DF)
+13:11:56.03 B > A: . ack 2561 win 1536
+13:11.56.05 A > B: . 2561:3073(512) ack 1 win 4096 (DF)
+13:11.56.06 A > B: . 3073:3585(512) ack 1 win 4096 (DF)
+13:11.56.06 A > B: . 3585:4097(512) ack 1 win 4096 (DF)
+13:11:56.23 B > A: . ack 4097 win 0
+13:11:58.16 A > B: . 4096:4097(1) ack 1 win 4096 (DF)
+13:11:58.16 B > A: . ack 4097 win 0
+13:12:00.16 A > B: . 4096:4097(1) ack 1 win 4096 (DF)
+13:12:00.16 B > A: . ack 4097 win 0
+13:12:02.16 A > B: . 4096:4097(1) ack 1 win 4096 (DF)
+13:12:02.16 B > A: . ack 4097 win 0
+13:12:05.37 A > B: . 4096:4097(1) ack 1 win 4096 (DF)
+13:12:05.37 B > A: . ack 4097 win 0
+13:12:06.36 B > A: F 1:1(0) ack 4097 win 0
+13:12:06.37 A > B: . ack 2 win 4096 (DF)
+13:12:11.78 A > B: . 4096:4097(1) ack 2 win 4096 (DF)
+13:12:11.78 B > A: . ack 4097 win 0
+13:12:24.59 A > B: . 4096:4097(1) ack 2 win 4096 (DF)
+13:12:24.60 B > A: . ack 4097 win 0
+13:12:50.22 A > B: . 4096:4097(1) ack 2 win 4096 (DF)
+13:12:50.22 B > A: . ack 4097 win 0
+.fi
+
+Machine B in the trace above does not drop received data when
+the socket is "closed" by the application (in this case, the
+application process was terminated). This occured at
+approximately 13:12:06.36 and resulted in the FIN being sent
+in response to the close. However, because there is no longer an
+application to deliver the data to, the TCP should have instead
+sent a RST.
+
+Note: Machine A's zero-window probing is also broken.  It is
+resending old data, rather than new data. Section 3.7 in RFC 793
+and Section 4.2.2.17 in RFC 1122 discuss zero-window probing.
+
+.IP "Trace file demonstrating better behavior" 5
+Made using tcpdump.  No drop information available.
+
+Better, but still not fully correct, behavior, per the discussion below.
+We show this behavior because it has been observed for a number of
+different TCP implementations.
+
+.nf
+13:48:29.24 C > D: S 73445554:73445554(0) win 4096
+                    <mss 1460,wscale 0,eol> (DF)
+13:48:29.24 D > C: S 36050296:36050296(0) ack 73445555
+                    win 4096 <mss 1460,wscale 0,eol> (DF)
+13:48:29.25 C > D: . ack 1 win 4096 (DF)
+13:48:30.78 C > D: . 1:1461(1460) ack 1 win 4096 (DF)
+13:48:30.79 C > D: . 1461:2921(1460) ack 1 win 4096 (DF)
+13:48:30.80 D > C: . ack 2921 win 1176 (DF)
+13:48:32.75 C > D: . 2921:4097(1176) ack 1 win 4096 (DF)
+13:48:32.82 D > C: . ack 4097 win 0 (DF)
+13:48:34.76 C > D: . 4096:4097(1) ack 1 win 4096 (DF)
+13:48:34.84 D > C: . ack 4097 win 0 (DF)
+13:48:36.34 D > C: FP 1:1(0) ack 4097 win 4096 (DF)
+13:48:36.34 C > D: . 4097:5557(1460) ack 2 win 4096 (DF)
+13:48:36.34 D > C: R 36050298:36050298(0) win 24576
+13:48:36.34 C > D: . 5557:7017(1460) ack 2 win 4096 (DF)
+13:48:36.34 D > C: R 36050298:36050298(0) win 24576
+.fi
+
+In this trace, the application process is terminated on Machine
+D at approximately 13:48:36.34.  Its TCP sends the FIN with the
+window opened again (since it discarded the previously received
+data).  Machine C promptly sends more data, causing Machine D to
+reset the connection since it cannot deliver the data to the
+application. Ideally, Machine D SHOULD send a RST instead of
+dropping the data and re-opening the receive window.
+
+Note: Machine C's zero-window probing is broken, the same
+as in the example above.
+
+
+.IP "Trace file demonstrating correct behavior" 5
+Made using tcpdump.  No losses reported.
+
+.nf
+14:12:02.19 E > F: S 1143360000:1143360000(0) win 4096
+14:12:02.19 F > E: S 1002988443:1002988443(0) ack 1143360001
+                    win 4096 <mss 1460> (DF)
+14:12:02.19 E > F: . ack 1 win 4096
+14:12:10.43 E > F: . 1:513(512) ack 1 win 4096
+14:12:10.61 F > E: . ack 513 win 3584 (DF)
+14:12:10.61 E > F: . 513:1025(512) ack 1 win 4096
+14:12:10.61 E > F: . 1025:1537(512) ack 1 win 4096
+14:12:10.81 F > E: . ack 1537 win 2560 (DF)
+14:12:10.81 E > F: . 1537:2049(512) ack 1 win 4096
+14:12:10.81 E > F: . 2049:2561(512) ack 1 win 4096
+14:12:10.81 E > F: . 2561:3073(512) ack 1 win 4096
+14:12:11.01 F > E: . ack 3073 win 1024 (DF)
+14:12:11.01 E > F: . 3073:3585(512) ack 1 win 4096
+14:12:11.01 E > F: . 3585:4097(512) ack 1 win 4096
+14:12:11.21 F > E: . ack 4097 win 0 (DF)
+14:12:15.88 E > F: . 4097:4098(1) ack 1 win 4096
+14:12:16.06 F > E: . ack 4097 win 0 (DF)
+14:12:20.88 E > F: . 4097:4098(1) ack 1 win 4096
+14:12:20.91 F > E: . ack 4097 win 0 (DF)
+14:12:21.94 F > E: R 1002988444:1002988444(0) win 4096
+.fi
+
+When the application terminates at 14:12:21.94, F immediately
+sends a RST.
+
+Note: Machine E's zero-window probing is (finally) correct.
+
+.IP "How to detect" 5
+The problem can often be detected by inspecting packet traces of a
+transfer in which the receiving application terminates abnormally.
+When doing so, there can be an ambiguity (if only looking at the trace)
+as to whether the receiving TCP did indeed have unread data that
+it could now no longer deliver.  To provoke this to happen, it may
+help to suspend the receiving application so that it fails to consume
+any data, eventually exhausting the advertised window.  At this point,
+since the advertised window is zero, we know that the receiving TCP has
+undelivered data buffered up.  Terminating the application process then
+should suffice to test the correctness of the TCP's behavior.
+
 .Pa
 Security Considerations
 
@@ -1319,6 +1507,9 @@
 .IP [RFC1122] 5
 R. Braden, Editor, "Requirements for Internet Hosts -- Communication Layers,"
 Oct. 1989.
+.IP [RFC2119] 5
+S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels,"
+Mar. 1997.
 .IP [Dawson97] 5
 S. Dawson, F. Jahanian, and T. Mitton, "Experiments on Six Commercial
 TCP Implementations Using a Software Fault Injection Tool," to appear

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 14 11:52:54 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id LAA3371198
	for tcp-impl-list;
	Sat, 14 Mar 1998 11:52:43 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id LAA4679858
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Sat, 14 Mar 1998 11:52:42 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id LAA16880
	for <tcp-impl@relay.engr.SGI.COM>; Sat, 14 Mar 1998 11:52:41 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id LAA16508; Sat, 14 Mar 1998 11:52:41 -0800 (PST)
Message-Id: <199803141952.LAA16508@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: I-D ACTION:draft-ietf-tcpimpl-needdoc-00.txt
Date: Sat, 14 Mar 1998 11:52:40 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Here is the full text of the I-D.  Mark & I put it together as a way to
focus working group discussion on what problems we still need to document,
and with what priority; and whether we should spin off a separate "security
issues" document.  We welcome comments on (1) problems we're missing,
(2) disagreements about the tentative priorities we gave these, (3) the
separate security doc issue, (4) volunteers to write up any of these, and
(5) anything else related to this I-D.

	Thanks,

		Vern






Network Working Group                                          M. Allman
Internet Draft                                                 V. Paxson
Expiration Date: September 1998                               March 1998



         TCP Implementation Problems That Need To Be Documented
                  <draft-ietf-tcpimpl-needdoc-00.txt>


1. Status of this Memo

   This document is an Internet Draft.  Internet Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet Drafts.

   Internet Drafts are draft documents valid for a maximum of six
   months, and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet Drafts as reference
   material or to cite them other than as ``work in progress''.

   To learn the current status of any Internet Draft, please check the
   ``1id-abstracts.txt'' listing contained in the Internet Drafts shadow
   directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
   ftp.isi.edu (US West Coast).

   This memo provides information for the Internet community.  This memo
   does not specify an Internet standard of any kind.  Distribution of
   this memo is unlimited.


2. Introduction

   The TCP-IMPL working group has documented a number of TCP
   implementation problems [PADHV98].  However, a significant number
   still have not been fully described and documented in the form used
   in [PADHV98].  This memo briefly describes a number of these,
   including commentary as to the authors' opinions regarding the
   importance of documenting the problem.  This memo is *not* intended
   to ever see light as an RFC of some form; its sole function is to
   facilitate working group discussion of which problems are more
   pressing to document than others, and to aid in arriving at a
   decision as to when [PADHV98] will be sufficiently complete to merit
   its publication as an Informational RFC.

   We divide the descriptions into "serious" problems, meaning those we



Allman/Paxson                                                   [Page 1]





ID       TCP Implementation Problems That Need To Be DocumentedMarch 1998


   think should be included in [PADHV98] prior to its publication;
   "security" problems, which might not be viewed as implementation
   problems per se, but represent significant security problems of which
   TCP implementors should be aware; and "less serious" problems, those
   that, if the working group fails to find volunteers to document them,
   should not hold up [PADHV98].

   It might be worthwhile to separate the security problems out into
   their own document.  We particularly solicit working group input on
   this subject.


3. Serious Problems


   Initial RTO too low
        The retransmission timeout is supposed to be initialized to 3
        seconds, per RFC 1122, 4.2.3.1.  Some TCPs initialize it to a
        much lower value, around 200 msec.  For paths with RTTs greater
        than this value, the initial data packets will always be
        retransmitted, usually unnecessarily.  Consequently, (1) the
        connection immediately enters into congestion avoidance with the
        smallest possible value for ssthresh, and (2) the RTT computed
        for it will be discarded due to application of Karn's algorithm,
        hence the RTO may fail to adapt to the long RTT, resulting in
        further needless retransmissions.  Both of these add up to
        miserable performance.

        We view this problem as quite serious from a performance per-
        spective; not so serious from a network stability perspective.


   CWND uninitialized
        Some TCPs under some circumstances fail to properly initialize
        cwnd, setting it instead to a very large value.  This leads to
        massive bursts upon startup.  In particular, this has been
        observed in some Reno-derived TCPs when, upon initiating a con-
        nection, the remote peer does not include an MSS option in its
        SYN ack.

        This problem is fairly serious from a network stability perspec-
        tive.  It would be more serious if the circumstances leading to
        it were more common.  It will often have a deleterious effect on
        performance, too, since large burst often leads to multiple
        losses, and, consequently, retransmission timeouts and reduction
        of ssthresh to a small value.





Allman/Paxson                                                   [Page 2]





ID       TCP Implementation Problems That Need To Be DocumentedMarch 1998


   Failure of window deflation due to header prediction
        As documented in Brakmo/Peterson's CCR paper, some TCPs will
        fail to deflate cwnd after fast recovery because the incoming
        acks match the header prediction test, which omits the window
        deflation code.

        This problem is fairly serious from a network stability perspec-
        tive it means that during a time of congestion the TCP effec-
        tively fails to back off its transmission rate as much as it
        should.


   Retransmission sends 2 packets
        Some TCPs miscompute the amount of data in a segment when using
        the per-segment timestamp option, and during retransmission con-
        sequently send two packets, one nearly full-sized and one
        tinygram.

        This problem is somewhat serious because it injects twice the
        number of packets into the network as necessary, during a time
        when the network is under stress.


4. Security Problems


Predictable initial sequence number
     TCPs that generate predictable initial sequence numbers are much
     more vulnerable to "IP spoofing" attacks than those that generate
     difficult-to-predict ISNs.


Ameliorating SYN flooding
     A nasty denial-of-service attack sometimes observed in the Internet
     concerns "SYN flooding", in which the attacker sends a high-volume
     stream of initial SYN packets to the target machine, often using
     bogus IP source addresses.  These create large volumes of connec-
     tion state on the target, and can completely fill "listen" queues,
     depriving legitimate connection attempts from completing.  There
     are techniques, however, for hardening a TCP to resist this attack.


The Land attack
     A "Land" attack consists of sending a bogus SYN packet to a host
     that contains the same source and destination addresses and the
     same source and destination ports.  Some TCPs, upon receiving such
     a packet, crash or enter into infinite loops.




Allman/Paxson                                                   [Page 3]





ID       TCP Implementation Problems That Need To Be DocumentedMarch 1998


5. Less Serious Problems


Failure to set PSH when send buffer drains
     If a TCP does not set PSH when it has no more data to send, then
     the data receiver may fail to deliver the data to the application
     in a timely fashion, because it is waiting for the next PSH flag
     before doing so.


Failure of window deflation due to fencepost error
     As documented in Brakmo/Peterson's CCR paper, some TCPs fail to
     deflate cwnd after fast recovery because the TCP's test for whether
     it is in fast recovery contains a fencepost error.  The test checks
     for whether more than 3 dup acks have been received, rather than
     whether 3 or more have been received.

     Not so serious because relatively rare - only has any effect if the
     fencepost is hit exactly.


Failure to ack above-sequence data
     When above-sequence data arrives, the receiving TCP should generate
     a duplicate ack.  TCPs that fail to do so will often impair perfor-
     mance because the connections they participate in will always
     suffer timeout retransmissions upon loss, instead of taking advan-
     tage of fast retransmit/fast recovery.


6. Security Considerations

   Three of the problems discussed above relate directly to addressing
   security concerns: "predictable initial sequence number", "ameliorat-
   ing SYN flooding", and "the Land attack".  All three are considered
   to be in the "Serious" category, meriting inclusion in [PADHV98]
   prior to its publication, or publication in a separate documented
   devoted to security problems.


7. References


[PADHV98]
     V. Paxson (editor), M. Allman, S. Dawson, I. Heavens and B. Volz,
     "Known TCP Implementation Problems," Feb. 1998.






Allman/Paxson                                                   [Page 4]





ID       TCP Implementation Problems That Need To Be DocumentedMarch 1998


8. Authors' Addresses

   Mark Allman <mallman@lerc.nasa.gov>
   NASA Lewis Research Center/Sterling Software
   21000 Brookpark Road
   MS 54-2
   Cleveland, OH 44135
   USA

   Phone: +1 216/433-6586
   Vern Paxson <vern@ee.lbl.gov>
   Network Research Group
   Lawrence Berkeley National Laboratory
   Berkeley, CA 94720
   USA
   Phone: +1 510/486-7504



































Allman/Paxson                                                   [Page 5]



From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 14 13:11:57 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id NAA4910220
	for tcp-impl-list;
	Sat, 14 Mar 1998 13:09:38 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id NAA4894592
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Sat, 14 Mar 1998 13:09:36 -0800 (PST)
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id NAA00706
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 14 Mar 1998 13:09:20 -0800 (PST)
	mail_from (alan@lxorguk.ukuu.org.uk)
Received: from the-village.bc.nu (the-village.bc.nu [163.164.160.21]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id VAA24053; Sat, 14 Mar 1998 21:08:58 GMT
Received: by the-village.bc.nu (Smail3.1.29.1 #2)
	id m0yDyCi-000V5kC; Sat, 14 Mar 98 21:10 GMT
Message-Id: <m0yDyCi-000V5kC@the-village.bc.nu>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: I-D ACTION:draft-ietf-tcpimpl-needdoc-00.txt
To: vern@ee.lbl.gov (Vern Paxson)
Date: Sat, 14 Mar 1998 21:10:52 +0000 (GMT)
Cc: tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199803141952.LAA16508@daffy.ee.lbl.gov> from "Vern Paxson" at Mar 14, 98 11:52:40 am
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> issues" document.  We welcome comments on (1) problems we're missing,

(1) is missing two items

Security one

Handling of spoofed MTU discovery. A TCP stack needs to be verifying TCP
sequence space on returned ICMP don't fragment frames. Without this an
attacker can force the segment size of a 68 bytes and perpetrate a DoS
attack against the endpoints. This also helps avoid ICMP attacks


Covering For Incompeten(ce|ts) Item.

A TCP should switch off MTU discovery on repeatedly retransmitted frames
to see if dropping MTU discovery causes successful transmission. Without this
incorrectly configured paths and many firewalls will cause mysterious end
user problems. The correct answer is to upgrade all the people who perpetrated
bad firewall products, but this is regrettably not possible.


Alan


From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 14 23:09:43 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id XAA5065645
	for tcp-impl-list;
	Sat, 14 Mar 1998 23:07:37 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id XAA5085861
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Sat, 14 Mar 1998 23:07:31 -0800 (PST)
Received: from dm.cobaltmicro.com (dm.cobaltmicro.com [209.133.34.35]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id XAA08872
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 14 Mar 1998 23:07:31 -0800 (PST)
	mail_from (davem@dm.cobaltmicro.com)
Received: (from davem@localhost)
	by dm.cobaltmicro.com (8.8.7/8.8.7) id XAA00488;
	Sat, 14 Mar 1998 23:03:44 -0800
Date: Sat, 14 Mar 1998 23:03:44 -0800
Message-Id: <199803150703.XAA00488@dm.cobaltmicro.com>
From: "David S. Miller" <davem@dm.cobaltmicro.com>
To: vern@ee.lbl.gov
CC: tcp-impl@cthulhu.engr.sgi.com
In-reply-to: <199803141952.LAA16508@daffy.ee.lbl.gov> (message from Vern
	Paxson on Sat, 14 Mar 1998 11:52:40 PST)
Subject: Re: I-D ACTION:draft-ietf-tcpimpl-needdoc-00.txt
References:  <199803141952.LAA16508@daffy.ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   Date: Sat, 14 Mar 1998 11:52:40 PST
   From: Vern Paxson <vern@ee.lbl.gov>

   Here is the full text of the I-D.  Mark & I put it together as a
   way to focus working group discussion on what problems we still
   need to document, and with what priority; and whether we should
   spin off a separate "security issues" document.  We welcome
   comments on (1) problems we're missing, (2) disagreements about the
   tentative priorities we gave these, (3) the separate security doc
   issue, (4) volunteers to write up any of these, and (5) anything
   else related to this I-D.

Brakmo/Peterson's CCR paper should be added to the references section.

Later,
David S. Miller
davem@dm.cobaltmicro.com

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 16 07:35:29 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id HAA2201807
	for tcp-impl-list;
	Mon, 16 Mar 1998 07:33:26 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id HAA5596679
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Mon, 16 Mar 1998 07:33:23 -0800 (PST)
Received: from assateague.lerc.nasa.gov (assateague.lerc.nasa.gov [139.88.35.25]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id HAA17911
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 16 Mar 1998 07:33:22 -0800 (PST)
	mail_from (mallman@guns.lerc.nasa.gov)
Received: from guns.lerc.nasa.gov by assateague.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id KAA15782; Mon, 16 Mar 1998 10:33:21 -0500 (EST)
Received: from guns by guns.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-local)
        id KAA23718; Mon, 16 Mar 1998 10:33:20 -0500 (EST)
Message-Id: <199803161533.KAA23718@guns.lerc.nasa.gov>
To: Vern Paxson <vern@ee.lbl.gov>
From: Mark Allman <mallman@lerc.nasa.gov>
Reply-To: mallman@lerc.nasa.gov
cc: Peter Ford <peterf@microsoft.com>, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: I-D ACTION:draft-ietf-tcpimpl-prob-03.txt 
Organization: Late Night Hackers, NASA LeRC, Cleveland, Ohio
Song-of-the-Day: People Are Strange
Date: Mon, 16 Mar 1998 10:33:20 -0500
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> This will happen as we develop our revision of RFC 2001.  At the
> last WG meeting, we attained rough consensus to allow an initial
> window of two segments.  Over the next few weeks on the mailing
> list, and then at LA, we will see whether we also have rough
> consensus for the change proposed in
> draft-floyd-incr-init-win-01.txt, which was submitted this
> morning.

For the impatient, you can grab a copy of the new version from:

http://gigahertz.lerc.nasa.gov/~mallman/papers/draft-floyd-incr-init-win-01.txt

before it appears in the I-D archive.

allman

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 18 13:34:05 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id NAA1022979
	for tcp-impl-list;
	Wed, 18 Mar 1998 13:31:39 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id NAA1019561
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Wed, 18 Mar 1998 13:31:37 -0800 (PST)
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id NAA23712
	for <tcp-impl@relay.engr.sgi.com>; Wed, 18 Mar 1998 13:31:36 -0800 (PST)
	mail_from (aron@cs.rice.edu)
Received: (from aron@localhost)
          by cs.rice.edu (8.8.5/8.8.4)
	  id PAA26336 for tcp-impl@relay.engr.sgi.com; Wed, 18 Mar 1998 15:31:35 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Message-Id: <199803182131.PAA26336@cs.rice.edu>
Subject: measuring initial ssthresh
To: tcp-impl@cthulhu.engr.sgi.com (TCP Implementor's List)
Date: Wed, 18 Mar 1998 15:31:35 -0600 (CST)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Hi,
	has anyone tried to estimate the initial value of ssthresh in real
networks as suggested by Janey Hoe's SIGCOMM '96 paper ? My concern is that
under the presence of delayed ACKs in TCP (which is something that wasn't
considered in that paper), the ssthresh value can be highly underestimated.
	The method suggested in that paper requires measuring the inter-arrival
times of three closely spaced ACKs. But if the TCP receiver delays the ACKs,
the sender cannot conclusively determine whether the inter-arrival time of
any two ACKs is affected by the jitter introduced due to the delay. In fact
three closely spaced ACKs would require the receiver to receive six closely
spaced segments (assuming it ACKs every two segments) before the spacing
between the ACKs is not affected by jitter. But then the six closely spaced
segments should have been sent in response to ACKs that weren't affected 
by jitter - looks like the chicken and egg problem. 
	So the only definite way of getting 3 closely spaced ACKs is to send 6
back to back segments at connection startup - which is something that isn't
acceptable.




- Mohit Aron
  aron@cs.rice.edu


From owner-tcp-impl@relay.engr.sgi.com  Fri Mar 20 08:17:37 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id IAA1750867
	for tcp-impl-list;
	Fri, 20 Mar 1998 08:15:16 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id IAA1703220
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Fri, 20 Mar 1998 08:15:14 -0800 (PST)
Received: from italy.psc.edu (italy.psc.edu [128.182.61.156]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id IAA15391
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 20 Mar 1998 08:15:10 -0800 (PST)
	mail_from (semke@psc.edu)
Received: from localhost (semke@localhost)
	by italy.psc.edu (8.8.6/8.8.6/psc) with SMTP id LAA14643
	for <tcp-impl@cthulhu.engr.sgi.com>; Fri, 20 Mar 1998 11:14:59 -0500 (EST)
Message-Id: <199803201614.LAA14643@italy.psc.edu>
X-Authentication-Warning: italy.psc.edu: semke@localhost didn't use HELO protocol
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: I-D ACTION:draft-ietf-tcpimpl-prob-03.txt
Date: Fri, 20 Mar 1998 11:14:59 -0500
From: Jeff Semke <semke@psc.edu>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


If anyone is interested, here is a starting point for a writeup on the
bug in which some BSD-based senders increase cwnd by 
segsize*segsize/cwnd + segsize/8.

I'd be interested in hearing if people on the list think it should be
included in the next revision of the I-D.

I have not been able to quickly obtain clean tcpdump traces that illustrate
the symptom, although in theory it should be simple.  If anyone else
is willing or able to collect traces, it would be appreciated.

-Jeff

------------------------------------------------------------------------------

Name of Problem
     Incorrect increase of the congestion window by a fraction of a segment

Classification
     Congestion control

Description
     RFC 1122 section 4.2.2.15 states that TCP MUST implement Van
     Jacobson's "congestion avoidance" algorithm [Jacobson88].  This
     algorithm calls for increasing the congestion window, cwnd, by
     segsize*segsize/cwnd for each ACK received for new data [RFC 2001].
     This has the effect of increasing cwnd by one segment in each
     round trip time.

     Some TCP implementations add an additional fraction of a segment
     (typically segsize/8) to cwnd for each ACK received for new data
     [Stevens94, Wright95].  These implementations exhibit "Incorrect
     increase of the congestion window by a fraction of a segment".

Significance
     In congested environments, may be detrimental to the performance
     of other connections and to the connection itself.

Implications
     Incorrect increase of the congestion window by a fraction of a
     segment allows a TCP to more aggressively open its congestion
     window, increasing the loss rate experienced by all connections
     sharing a bottleneck with the aggressive TCP.  In particular,
     Reno TCP senders with this bug suffer significant performance 
     degradation through drop-tail routers when using sufficiently
     large windows.

Relevant RFCs
     RFC 1122 requires the use of the "congestion avoidance" algorithm.
     RFC 2001 describes the "congestion avoidance" algorithm in detail.

How to detect
     The problem can be detected by closely examining packet traces
     taken near the sender.  During congestion avoidance, cwnd will
     increase by an additional segment upon the receipt of [typically]
     eight acknowledgements without a loss.  This increase is in
     addition to the one segment increase per round trip time (or
     two round trip times if the receiver is using delayed ACKs).

     Furthermore, graphs of the sequence number vs. time, taken
     from packet traces, are normally linear during congestion avoidance.
     When viewing packet traces of transfers from senders exhibiting
     this problem, the graphs appear parabolic instead of linear.

     And finally, the traces will show that, at sufficiently large
     windows, nearly every loss event results in a timeout since the
     additional increase in cwnd causes drop-tail queues to overflow
     by more than the prescribed single-packet increase.

How to fix
     This problem may be corrected by removing the "+ segsize/8" term
     from the code that increases cwnd each time an ACK of new data
     is received.

References

[Stevens94]
      W. Richard Stevens, "TCP/IP Illustrated, Volume 1", Addison-Wesley
      Publishing Company, Reading, Massachusetts, 1994.

[Wright95]
      Gary R. Wright and W. Richard Stevens, "TCP/IP Illustrated, Volume 2",
      Addison-Wesley Publishing Compary, Reading Massachusetts, 1995.







From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 21 13:48:12 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id NAA1761561
	for tcp-impl-list;
	Sat, 21 Mar 1998 13:46:28 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id NAA2162142
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Sat, 21 Mar 1998 13:46:26 -0800 (PST)
Received: from gecko.nas.nasa.gov (gecko.nas.nasa.gov [129.99.34.45]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id NAA15978
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 21 Mar 1998 13:46:25 -0800 (PST)
	mail_from (kml@gecko.nas.nasa.gov)
Received: from gecko.nas.nasa.gov (kml@localhost)
	by gecko.nas.nasa.gov (8.8.7/NAS8.8.7) with ESMTP id NAA13683
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 21 Mar 1998 13:46:25 -0800 (PST)
Message-Id: <199803212146.NAA13683@gecko.nas.nasa.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: TCP problems with IP options and path MTU discovery
Date: Sat, 21 Mar 1998 13:46:24 -0800
From: "Kevin M. Lahey" <kml@nas.nasa.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Bill Fenner and I discovered an interesting TCP problem recently
involving the interaction of TCP segment size and IP options.
Using netcat (ftp://coast.cs.purdue.edu:/pub/tools/unix/netcat/nc110.tgz)
I found that I could run bulk transfers and interactive sessions 
between two hosts on my test network.  However, when I tried to
use source routing, I found that I could connect and type 
interactively with no problem, but bulk transfers would just freeze.
Checking this out with tcpdump, I could see that the three-way
handshake would complete, but no more packets would flow.

Bill pointed out that this could be happening because TCP passes
as large a packet as possible down to IP with DF set, whereupon IP
adds in the IP options, making the packet too large.  Since DF is
set, it can't fragment the packet, and even if IP notified the
upper layer, chances are that the path MTU discovery code did
not include IP options in its calculations.  I certainly observed
something like this in NetBSD.

While IP options seem to be gradually disappearing, I still thought that
this was an interesting enough bug to pass along, especially
as I found it in all but one (Solaris 2.5.1) of the six different
OSes I checked...

Thanks,

Kevin Lahey
kml@nas.nasa.gov

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 21 14:01:50 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id OAA2128855
	for tcp-impl-list;
	Sat, 21 Mar 1998 14:00:10 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id OAA2188805
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Sat, 21 Mar 1998 14:00:08 -0800 (PST)
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id OAA18027
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 21 Mar 1998 14:00:07 -0800 (PST)
	mail_from (braden@ISI.EDU)
From: braden@ISI.EDU
Received: from can.isi.edu (can.isi.edu [128.9.160.148])
	by zephyr.isi.edu (8.8.7/8.8.6) with SMTP id OAA21518;
	Sat, 21 Mar 1998 14:00:05 -0800 (PST)
Date: Sat, 21 Mar 98 13:59:35 PST
Posted-Date: Sat, 21 Mar 98 13:59:35 PST
Message-Id: <9803212159.AA04327@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA04327>; Sat, 21 Mar 98 13:59:35 PST
To: tcp-impl@cthulhu.engr.sgi.com, kml@nas.nasa.gov
Subject: Re: TCP problems with IP options and path MTU discovery
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


There are unfortunately a lot of TCP implementations that did
not take seriously the sections of Host Requirements having
to do with MSS.

Bob Braden

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 21 16:03:21 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id QAA2229796
	for tcp-impl-list;
	Sat, 21 Mar 1998 16:01:05 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id QAA2238877
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Sat, 21 Mar 1998 16:01:03 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id QAA10675
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 21 Mar 1998 16:01:02 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id QAA05655; Sat, 21 Mar 1998 16:01:01 -0800 (PST)
Message-Id: <199803220001.QAA05655@daffy.ee.lbl.gov>
To: "Kevin M. Lahey" <kml@nas.nasa.gov>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP problems with IP options and path MTU discovery
In-reply-to: Your message of Sat, 21 Mar 1998 13:46:24 PST.
Date: Sat, 21 Mar 1998 16:01:01 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> However, when I tried to
> use source routing, I found that I could connect and type 
> interactively with no problem, but bulk transfers would just freeze.
> Checking this out with tcpdump, I could see that the three-way
> handshake would complete, but no more packets would flow.

I'm confused - why doesn't the sender keep trying progressively
smaller segment sizes until it finds one that works?  It may not
be optimal because it's not figuring the IP header size correctly,
but it ought to find *something*, shouldn't it?

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 21 16:36:47 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id QAA2193566
	for tcp-impl-list;
	Sat, 21 Mar 1998 16:34:38 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id QAA2226870
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Sat, 21 Mar 1998 16:34:35 -0800 (PST)
Received: from lestat.nas.nasa.gov (lestat.nas.nasa.gov [129.99.50.29]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id QAA16937
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 21 Mar 1998 16:34:35 -0800 (PST)
	mail_from (thorpej@lestat.nas.nasa.gov)
Received: from localhost (localhost [127.0.0.1]) by lestat.nas.nasa.gov (8.8.8/8.6.12) with SMTP id QAA23607; Sat, 21 Mar 1998 16:24:13 -0800 (PST)
Message-Id: <199803220024.QAA23607@lestat.nas.nasa.gov>
X-Authentication-Warning: lestat.nas.nasa.gov: localhost [127.0.0.1] didn't use HELO protocol
To: Vern Paxson <vern@ee.lbl.gov>
Cc: "Kevin M. Lahey" <kml@nas.nasa.gov>, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP problems with IP options and path MTU discovery 
Reply-To: Jason Thorpe <thorpej@nas.nasa.gov>
From: Jason Thorpe <thorpej@nas.nasa.gov>
Date: Sat, 21 Mar 1998 16:24:12 -0800
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

On Sat, 21 Mar 1998 16:01:01 PST 
 Vern Paxson <vern@ee.lbl.gov> wrote:

 > I'm confused - why doesn't the sender keep trying progressively
 > smaller segment sizes until it finds one that works?  It may not
 > be optimal because it's not figuring the IP header size correctly,
 > but it ought to find *something*, shouldn't it?

I guess it depends on whether or not the TCP code is getting an error
message back.  I could see it stepping through the set of smaller segments
if it got an EMSGSIZE back from IP, but if IP isn't returning such an
error, it's not really any different than black hole syndrome (which
is a ball of hair to deal with, as has been discussed here before..)

Jason R. Thorpe                                       thorpej@nas.nasa.gov
NASA Ames Research Center                            Home: +1 408 866 1912
NAS: M/S 258-5                                       Work: +1 650 604 0935
Moffett Field, CA 94035                             Pager: +1 415 428 6939

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 21 19:54:04 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id TAA2273797
	for tcp-impl-list;
	Sat, 21 Mar 1998 19:52:06 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id TAA2269462
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Sat, 21 Mar 1998 19:52:04 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id TAA17048
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 21 Mar 1998 19:52:03 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id TAA05940; Sat, 21 Mar 1998 19:52:02 -0800 (PST)
Message-Id: <199803220352.TAA05940@daffy.ee.lbl.gov>
To: Jason Thorpe <thorpej@nas.nasa.gov>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP problems with IP options and path MTU discovery 
In-reply-to: Your message of Sat, 21 Mar 1998 16:24:12 PST.
Date: Sat, 21 Mar 1998 19:52:01 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>  Vern Paxson <vern@ee.lbl.gov> wrote:
> 
>  > I'm confused - why doesn't the sender keep trying progressively
>  > smaller segment sizes until it finds one that works?  It may not
>  > be optimal because it's not figuring the IP header size correctly,
>  > but it ought to find *something*, shouldn't it?
> 
> I guess it depends on whether or not the TCP code is getting an error
> message back.

So I'm still confused.  Either the TCP gets error messages back, in
which case MTU discovery should work (though perhaps picking a suboptimal
size because the size of the IP header isn't correctly determined); or
it doesn't get error messages back, in which case MTU discovery is busted
regardless of whether or not there are IP options.  I don't still don't
see a mechanism for causing the problem that Kevin described.  What am
I missing?

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 21 20:06:54 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id UAA2279349
	for tcp-impl-list;
	Sat, 21 Mar 1998 20:05:16 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id UAA2266324
	for <TCP-IMPL@cthulhu.engr.sgi.com>;
	Sat, 21 Mar 1998 20:05:14 -0800 (PST)
Received: from alcor.process.com (alcor.process.com [192.42.95.16]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id UAA18818
	for <TCP-IMPL@CTHULHU.ENGR.SGI.COM>; Sat, 21 Mar 1998 20:05:13 -0800 (PST)
	mail_from (VOLZ@PROCESS.COM)
Date:     Sat, 21 Mar 1998 23:05 -0500
From: VOLZ@PROCESS.COM (Bernie Volz)
Message-Id: <009C38A642515EE7.41E9@PROCESS.COM>
To: vern@ee.lbl.gov, TCP-IMPL@cthulhu.engr.sgi.com
Subject:  Re: TCP problems with IP options and path MTU discovery
X-VMS-To: SMTP%"vern@ee.lbl.gov"
X-VMS-Cc: TCP-IMPL@CTHULHU.ENGR.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>>  Vern Paxson <vern@ee.lbl.gov> wrote:
>> 
>>  > I'm confused - why doesn't the sender keep trying progressively
>>  > smaller segment sizes until it finds one that works?  It may not
>>  > be optimal because it's not figuring the IP header size correctly,
>>  > but it ought to find *something*, shouldn't it?
>> 
>> I guess it depends on whether or not the TCP code is getting an error
>> message back.
>
>So I'm still confused.  Either the TCP gets error messages back, in
>which case MTU discovery should work (though perhaps picking a suboptimal
>size because the size of the IP header isn't correctly determined); or
>it doesn't get error messages back, in which case MTU discovery is busted
>regardless of whether or not there are IP options.  I don't still don't
>see a mechanism for causing the problem that Kevin described.  What am
>I missing?
>
>		Vern

It's probably because the local IP doesn't "send" an ICMP error
(packet to big) signal up the stack (to TCP) such that it drops the
packet size to the next lower value.

- Bernie Volz

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 21 20:10:28 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id UAA2278244
	for tcp-impl-list;
	Sat, 21 Mar 1998 20:10:16 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id UAA2286127
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Sat, 21 Mar 1998 20:10:14 -0800 (PST)
Received: from scanner.worldgate.com (scanner.worldgate.com [198.161.84.3]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id UAA20767
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 21 Mar 1998 20:10:13 -0800 (PST)
	mail_from (marcs@znep.com)
Received: from znep.com (uucp@localhost)
	by scanner.worldgate.com (8.8.7/8.8.7) with UUCP id VAA02299;
	Sat, 21 Mar 1998 21:10:06 -0700 (MST)
Received: from localhost (marcs@localhost) by alive.znep.com (8.7.5/8.7.3) with SMTP id VAA07862; Sat, 21 Mar 1998 21:09:45 -0700 (MST)
Date: Sat, 21 Mar 1998 21:09:45 -0700 (MST)
From: Marc Slemko <marcs@znep.com>
To: Vern Paxson <vern@ee.lbl.gov>
cc: Jason Thorpe <thorpej@nas.nasa.gov>, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP problems with IP options and path MTU discovery 
In-Reply-To: <199803220352.TAA05940@daffy.ee.lbl.gov>
Message-ID: <Pine.BSF.3.95.980321210539.4890Q-100000@alive.znep.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

On Sat, 21 Mar 1998, Vern Paxson wrote:

> >  Vern Paxson <vern@ee.lbl.gov> wrote:
> > 
> >  > I'm confused - why doesn't the sender keep trying progressively
> >  > smaller segment sizes until it finds one that works?  It may not
> >  > be optimal because it's not figuring the IP header size correctly,
> >  > but it ought to find *something*, shouldn't it?
> > 
> > I guess it depends on whether or not the TCP code is getting an error
> > message back.
> 
> So I'm still confused.  Either the TCP gets error messages back, in
> which case MTU discovery should work (though perhaps picking a suboptimal
> size because the size of the IP header isn't correctly determined); or
> it doesn't get error messages back, in which case MTU discovery is busted
> regardless of whether or not there are IP options.  I don't still don't
> see a mechanism for causing the problem that Kevin described.  What am
> I missing?

What if it is getting the ICMP fragmentation required, but since it has a
next-hop MTU in the ICMP message, it assumes that number is correct and
doesn't even bother checking to see if the previous segment was already
using that size?

This is similar to if you had a router lying about the next-hop MTU.
Probably a good implementation practice to not blindly listen, but verify
that it makes sense in the context of previous segments.

(I haven't looked at code to see if this appears to be what happens in any
implementations, just taking a wild guess...)



From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 21 20:11:14 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id UAA2281761
	for tcp-impl-list;
	Sat, 21 Mar 1998 20:11:08 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id UAA2268918
	for <TCP-IMPL@cthulhu.engr.sgi.com>;
	Sat, 21 Mar 1998 20:11:06 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id UAA20814
	for <TCP-IMPL@CTHULHU.ENGR.SGI.COM>; Sat, 21 Mar 1998 20:11:05 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id UAA05987; Sat, 21 Mar 1998 20:11:02 -0800 (PST)
Message-Id: <199803220411.UAA05987@daffy.ee.lbl.gov>
To: VOLZ@PROCESS.COM (Bernie Volz)
Cc: TCP-IMPL@cthulhu.engr.sgi.com
Subject: Re: TCP problems with IP options and path MTU discovery
In-reply-to: Your message of Sat, 21 Mar 1998 23:05:00 PST.
Date: Sat, 21 Mar 1998 20:11:02 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> It's probably because the local IP doesn't "send" an ICMP error
> (packet to big) signal up the stack (to TCP) such that it drops the
> packet size to the next lower value.

But why would it not do this if-and-only-if an IP option is used??

I just groveled through the 4.4BSD source and couldn't find anything in
there that would lead to the ICMP_UNREACH_NEEDFRAG not being propagated
if IP options are used, but being propagated if they aren't.

If I understand the original description of the problem, it's that MTU
discovery stops working if-and-only-if an IP option is used (source
routing; maybe others?).  And I still don't see how this can be!

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 21 20:25:08 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id UAA2288698
	for tcp-impl-list;
	Sat, 21 Mar 1998 20:23:04 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id UAA2291169
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Sat, 21 Mar 1998 20:23:02 -0800 (PST)
Received: from lestat.nas.nasa.gov (lestat.nas.nasa.gov [129.99.50.29]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id UAA22743
	for <tcp-impl@cthulhu.engr.sgi.com>; Sat, 21 Mar 1998 20:23:02 -0800 (PST)
	mail_from (thorpej@lestat.nas.nasa.gov)
Received: from localhost (localhost [127.0.0.1]) by lestat.nas.nasa.gov (8.8.8/8.6.12) with SMTP id UAA24378; Sat, 21 Mar 1998 20:12:41 -0800 (PST)
Message-Id: <199803220412.UAA24378@lestat.nas.nasa.gov>
X-Authentication-Warning: lestat.nas.nasa.gov: localhost [127.0.0.1] didn't use HELO protocol
To: Vern Paxson <vern@ee.lbl.gov>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP problems with IP options and path MTU discovery 
Reply-To: Jason Thorpe <thorpej@nas.nasa.gov>
From: Jason Thorpe <thorpej@nas.nasa.gov>
Date: Sat, 21 Mar 1998 20:12:40 -0800
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

On Sat, 21 Mar 1998 19:52:01 PST 
 Vern Paxson <vern@ee.lbl.gov> wrote:

 > > I guess it depends on whether or not the TCP code is getting an error
 > > message back.
 > 
 > So I'm still confused.  Either the TCP gets error messages back, in
 > which case MTU discovery should work (though perhaps picking a suboptimal
 > size because the size of the IP header isn't correctly determined); or
 > it doesn't get error messages back, in which case MTU discovery is busted
 > regardless of whether or not there are IP options.  I don't still don't
 > see a mechanism for causing the problem that Kevin described.  What am
 > I missing?

The missing piece is that most PMTU implementations (at least the ones
that I've used) don't deal with black holes very well ("Hm, no ACKs... Is
the host down, or is someone blocking ICMP?").  In the case where IP:

	(1) doesn't transmit, and

	(2) doesn't return an error to indicate that the message was
	    too big

...this looks exactly like a black hole as far as TCP is concerned.

I'm not really making an excuse for the bug, just pointing out one reason
it's so easy to spot in some cases (working black hole discovery would
make this harder to notice, because packets would eventually go out).

Jason R. Thorpe                                       thorpej@nas.nasa.gov
NASA Ames Research Center                            Home: +1 408 866 1912
NAS: M/S 258-5                                       Work: +1 650 604 0935
Moffett Field, CA 94035                             Pager: +1 415 428 6939

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 21 20:26:34 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id UAA2273880
	for tcp-impl-list;
	Sat, 21 Mar 1998 20:26:30 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id UAA2261846
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Sat, 21 Mar 1998 20:26:29 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id UAA23360
	for <tcp-impl@relay.engr.SGI.COM>; Sat, 21 Mar 1998 20:26:28 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id UAA06084; Sat, 21 Mar 1998 20:26:27 -0800 (PST)
Message-Id: <199803220426.UAA06084@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP problems with IP options and path MTU discovery 
In-reply-to: Your message of Sat, 21 Mar 1998 21:09:45 PST.
Date: Sat, 21 Mar 1998 20:26:27 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Okay, thanks to Craig Partridge (via private mail):

> First hop MTU.
> 
> TCP passes segment to IP, which adds options, which passes to link layer
> an invalid, overlarge, link layer datagram.

... I now understand.  The problem is that in this particular case, the
local stack isn't generating the ICMP_UNREACH_NEEDFRAG that it needs
to generate and hand up to the local TCP.  (So this is a bug in its IP
implementation.)

This certainly seems worth documenting, and I've noted it as such.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 21 20:37:57 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id UAA2285839
	for tcp-impl-list;
	Sat, 21 Mar 1998 20:36:15 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id UAA2296376
	for <TCP-IMPL@cthulhu.engr.sgi.com>;
	Sat, 21 Mar 1998 20:36:13 -0800 (PST)
Received: from lestat.nas.nasa.gov (lestat.nas.nasa.gov [129.99.50.29]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id UAA24788
	for <TCP-IMPL@cthulhu.engr.sgi.com>; Sat, 21 Mar 1998 20:36:13 -0800 (PST)
	mail_from (thorpej@lestat.nas.nasa.gov)
Received: from localhost (localhost [127.0.0.1]) by lestat.nas.nasa.gov (8.8.8/8.6.12) with SMTP id UAA24429; Sat, 21 Mar 1998 20:25:50 -0800 (PST)
Message-Id: <199803220425.UAA24429@lestat.nas.nasa.gov>
X-Authentication-Warning: lestat.nas.nasa.gov: localhost [127.0.0.1] didn't use HELO protocol
To: Vern Paxson <vern@ee.lbl.gov>
Cc: VOLZ@process.com (Bernie Volz), TCP-IMPL@cthulhu.engr.sgi.com
Subject: Re: TCP problems with IP options and path MTU discovery 
Reply-To: Jason Thorpe <thorpej@nas.nasa.gov>
From: Jason Thorpe <thorpej@nas.nasa.gov>
Date: Sat, 21 Mar 1998 20:25:49 -0800
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

On Sat, 21 Mar 1998 20:11:02 PST 
 Vern Paxson <vern@ee.lbl.gov> wrote:

 > But why would it not do this if-and-only-if an IP option is used??
 > 
 > I just groveled through the 4.4BSD source and couldn't find anything in
 > there that would lead to the ICMP_UNREACH_NEEDFRAG not being propagated
 > if IP options are used, but being propagated if they aren't.
 > 
 > If I understand the original description of the problem, it's that MTU
 > discovery stops working if-and-only-if an IP option is used (source
 > routing; maybe others?).  And I still don't see how this can be!

Right... and if the PMTU engine uses the MTU in the ICMP message to
size the next segment (which is, BTW, the MTU it would have used to size
the segment in the first place, since it's the hosts own outgoing
interface), it's going to lose because IP is going to tack the option on,
which TCP didn't account for.

I think Bernie's comment makes perfect sense :-)

Kevin - Have you determined if needs-frag is sent back to ourselves or
if it's black hole-like?

Vern - I think everyone here is in agreement that PMTU is busted in
this case, but the fact that Kevin was able to observe it on a number
of different operating systems made it worth mentioning here...  Maybe
this is worth adding to known-bugs?

Jason R. Thorpe                                       thorpej@nas.nasa.gov
NASA Ames Research Center                            Home: +1 408 866 1912
NAS: M/S 258-5                                       Work: +1 650 604 0935
Moffett Field, CA 94035                             Pager: +1 415 428 6939

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 21 21:13:39 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id VAA2288847
	for tcp-impl-list;
	Sat, 21 Mar 1998 21:12:05 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id VAA2286042
	for <TCP-IMPL@cthulhu.engr.sgi.com>;
	Sat, 21 Mar 1998 21:12:02 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id VAA00599
	for <TCP-IMPL@cthulhu.engr.sgi.com>; Sat, 21 Mar 1998 21:12:02 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id VAA06186; Sat, 21 Mar 1998 21:12:00 -0800 (PST)
Message-Id: <199803220512.VAA06186@daffy.ee.lbl.gov>
To: Jason Thorpe <thorpej@nas.nasa.gov>
Cc: TCP-IMPL@cthulhu.engr.sgi.com
Subject: Re: TCP problems with IP options and path MTU discovery 
In-reply-to: Your message of Sat, 21 Mar 1998 20:25:49 PST.
Date: Sat, 21 Mar 1998 21:12:00 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Right... and if the PMTU engine uses the MTU in the ICMP message to
> size the next segment (which is, BTW, the MTU it would have used to size
> the segment in the first place, since it's the hosts own outgoing
> interface), it's going to lose because IP is going to tack the option on,
> which TCP didn't account for.

To wit:

	mss = rt->rt_rmx.rmx_mtu - sizeof(struct tcpiphdr);

This from tcp_mtudisc() in tcp_subr.c.  It then carefully adjusts for TCP
options, but doesn't do anything regarding IP options (and rt->rt_rmx.rmx_mtu)
has been set to the interface MTU, no adjustment for IP options either).

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Sat Mar 21 21:21:12 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id VAA2308766
	for tcp-impl-list;
	Sat, 21 Mar 1998 21:19:46 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id VAA2300207
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Sat, 21 Mar 1998 21:19:43 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id VAA01549
	for <tcp-impl@relay.engr.SGI.COM>; Sat, 21 Mar 1998 21:19:42 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id VAA06235; Sat, 21 Mar 1998 21:19:42 -0800 (PST)
Message-Id: <199803220519.VAA06235@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP problems with IP options and path MTU discovery 
In-reply-to: Your message of Sat, 21 Mar 1998 20:26:27 PST.
Date: Sat, 21 Mar 1998 21:19:42 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> ... I now understand.  The problem is that in this particular case, the
> local stack isn't generating the ICMP_UNREACH_NEEDFRAG that it needs
> to generate and hand up to the local TCP.  (So this is a bug in its IP
> implementation.)

Sigh.  Let me try one more time, on the remote chance that perhaps at least
one of the 400 other readers of this list is as befuddled about this issue
as I've been.

The local IP *does* generate the ICMP_UNREACH_NEEDFRAG (I verified this in
the 4.4 BSD sources).  But when it does, it sets the "correct" MTU value to
use to be that of the associated interface, as it knows that value too.  It
then hands this up to the local TCP, which deducts from that correct value
the overhead of the TCP/IP headers, along with any TCP options - but forgets
to deduct for IP options.  So it winds up using a value that, with all the
TCP/IP options added in, is in fact too large for the interface.  It sets
DF on this, and the whole process repeats - nothing ever goes out.  (I wonder
if you also get a pretty nifty livelock that eats up lots of CPU looping
between the IP and TCP layers?)

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Sun Mar 22 23:43:33 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id XAA2598925
	for tcp-impl-list;
	Sun, 22 Mar 1998 23:41:59 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id XAA2593469
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Sun, 22 Mar 1998 23:41:58 -0800 (PST)
Received: from alpha.xerox.com (alpha.Xerox.COM [13.1.64.93]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id XAA21619
	for <tcp-impl@cthulhu.engr.sgi.com>; Sun, 22 Mar 1998 23:41:57 -0800 (PST)
	mail_from (fenner@parc.xerox.com)
Received: from crevenia.parc.xerox.com ([13.2.116.11]) by alpha.xerox.com with SMTP id <51916(4)>; Sun, 22 Mar 1998 23:41:55 PST
Received: from localhost by crevenia.parc.xerox.com with SMTP id <177482>; Sun, 22 Mar 1998 23:41:46 -0800
To: Vern Paxson <vern@ee.lbl.gov>
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP problems with IP options and path MTU discovery 
In-reply-to: Your message of "Sat, 21 Mar 98 21:19:42 PST."
             <199803220519.VAA06235@daffy.ee.lbl.gov> 
Date: Sun, 22 Mar 1998 23:41:39 PST
From: Bill Fenner <fenner@parc.xerox.com>
Message-Id: <98Mar22.234146pst.177482@crevenia.parc.xerox.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Vern Paxson <vern@ee.lbl.gov> wrote:
>The local IP *does* generate the ICMP_UNREACH_NEEDFRAG (I verified this in
>the 4.4 BSD sources).

Actually, I believe that ip_output returns EMSGSIZE to tcp_output.

        /* 
         * Too large for interface; fragment if possible.
         * Must be able to put at least 8 bytes per fragment.
         */     
        if (ip->ip_off & IP_DF) {
                error = EMSGSIZE;

The problem, as stated before, is simply that TCP uses the IP MTU minus
the size of standard TCP and IP headers, minus the size of any TCP
options, to calculate the allowable size of a given segment.  The size
of any IP options needs to be included in this calculation.

        /*              
         * Adjust data length if insertion of options will
         * bump the packet length beyond the t_maxseg length.
         */
        if (len > tp->t_maxseg - optlen) {

An additional adjustment of the length to handle IP options is required.

(Don't forget that IP options can be added and removed in the middle of
a connection, so it's not necessarily sufficient to base the MSS on the
IP options at a given point in time)

  Bill

From owner-tcp-impl@relay.engr.sgi.com  Sun Mar 22 23:45:31 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id XAA2613667
	for tcp-impl-list;
	Sun, 22 Mar 1998 23:45:27 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id XAA2607012
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Sun, 22 Mar 1998 23:45:25 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id XAA22126
	for <tcp-impl@relay.engr.SGI.COM>; Sun, 22 Mar 1998 23:45:25 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id XAA07871; Sun, 22 Mar 1998 23:45:20 -0800 (PST)
Message-Id: <199803230745.XAA07871@daffy.ee.lbl.gov>
To: Bill Fenner <fenner@parc.xerox.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP problems with IP options and path MTU discovery 
In-reply-to: Your message of Sun, 22 Mar 1998 23:41:39 PST.
Date: Sun, 22 Mar 1998 23:45:20 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Vern Paxson <vern@ee.lbl.gov> wrote:
> >The local IP *does* generate the ICMP_UNREACH_NEEDFRAG (I verified this in
> >the 4.4 BSD sources).
> 
> Actually, I believe that ip_output returns EMSGSIZE to tcp_output.

You're of course right - I was being (very) sloppy with my phrasing.

> (Don't forget that IP options can be added and removed in the middle of
> a connection, so it's not necessarily sufficient to base the MSS on the
> IP options at a given point in time)

Which ones would those be?

(I realize they *can* be added/removed - but in practice, will this occur?)

	Thanks,

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 23 00:40:10 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id AAA2619021
	for tcp-impl-list;
	Mon, 23 Mar 1998 00:38:23 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id AAA2618126
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Mon, 23 Mar 1998 00:38:21 -0800 (PST)
Received: from alpha.xerox.com (alpha.Xerox.COM [13.1.64.93]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id AAA02534
	for <tcp-impl@relay.engr.sgi.com>; Mon, 23 Mar 1998 00:38:20 -0800 (PST)
	mail_from (fenner@parc.xerox.com)
Received: from crevenia.parc.xerox.com ([13.2.116.11]) by alpha.xerox.com with SMTP id <52058(2)>; Mon, 23 Mar 1998 00:38:16 PST
Received: from localhost by crevenia.parc.xerox.com with SMTP id <177482>; Mon, 23 Mar 1998 00:38:05 -0800
To: Vern Paxson <vern@ee.lbl.gov>
cc: Bill Fenner <fenner@parc.xerox.com>, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP problems with IP options and path MTU discovery 
In-reply-to: Your message of "Sun, 22 Mar 98 23:45:20 PST."
             <199803230745.XAA07871@daffy.ee.lbl.gov> 
Date: Mon, 23 Mar 1998 00:37:52 PST
From: Bill Fenner <fenner@parc.xerox.com>
Message-Id: <98Mar23.003805pst.177482@crevenia.parc.xerox.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Vern Paxson <vern@ee.lbl.gov> wrote:
>(I realize they *can* be added/removed - but in practice, will this occur?)

Who knows?  I don't know of any current applications that do this, but
that doesn't preclude it from being allowed.

Bob Braden's reference to Host Requirements points out that this issue
was addressed nearly 10 years ago, and that people simply failed to
implement it; RFC1122 section 4.2.2.6 has the formula

   Eff.snd.MSS = min(SendMSS+20, MMS_S) - TCPhdrsize - IPoptionsize

in it (where TCPhdrsize includes the TCP header plus options).  So
RFC1122 agrees that the calculation should be per packet.

  Bill

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 23 00:43:14 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id AAA2614831
	for tcp-impl-list;
	Mon, 23 Mar 1998 00:43:10 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id AAA2620064
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Mon, 23 Mar 1998 00:43:09 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id AAA03247
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 23 Mar 1998 00:43:08 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id AAA08636; Mon, 23 Mar 1998 00:43:07 -0800 (PST)
Message-Id: <199803230843.AAA08636@daffy.ee.lbl.gov>
To: Bill Fenner <fenner@parc.xerox.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP problems with IP options and path MTU discovery 
In-reply-to: Your message of Sun, 22 Mar 1998 23:41:39 PST.
Date: Mon, 23 Mar 1998 00:43:07 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> An additional adjustment of the length to handle IP options is required.
> 
> (Don't forget that IP options can be added and removed in the middle of
> a connection, so it's not necessarily sufficient to base the MSS on the
> IP options at a given point in time)

Hmmmm, I wonder the degree to which CSLIP complicates all this still
further.  If you really want to squeeze every last byte out of your
allowable payload, then you need to know how well the IP/TCP headers
will compress.  So it seems clear that at some point you should punt
and just pick a value that is for sure safe.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 23 10:12:00 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id KAA2757231
	for tcp-impl-list;
	Mon, 23 Mar 1998 10:08:06 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id KAA2762952
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Mon, 23 Mar 1998 10:08:04 -0800 (PST)
Received: from alpha.xerox.com (alpha.Xerox.COM [13.1.64.93]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id KAA16805
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 23 Mar 1998 10:08:03 -0800 (PST)
	mail_from (fenner@parc.xerox.com)
Received: from crevenia.parc.xerox.com ([13.2.116.11]) by alpha.xerox.com with SMTP id <52855(2)>; Mon, 23 Mar 1998 10:08:01 PST
Received: from localhost by crevenia.parc.xerox.com with SMTP id <177482>; Mon, 23 Mar 1998 10:07:54 -0800
To: Vern Paxson <vern@ee.lbl.gov>
cc: Bill Fenner <fenner@parc.xerox.com>, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP problems with IP options and path MTU discovery 
In-reply-to: Your message of "Mon, 23 Mar 98 00:43:07 PST."
             <199803230843.AAA08636@daffy.ee.lbl.gov> 
Date: Mon, 23 Mar 1998 10:07:42 PST
From: Bill Fenner <fenner@parc.xerox.com>
Message-Id: <98Mar23.100754pst.177482@crevenia.parc.xerox.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Vern Paxson <vern@ee.lbl.gov> wrote:
>I wonder the degree to which CSLIP complicates all this still further.

I think guessing an effective MTU based upon compression is going too
far.  TCP is already adjusting the amount of data that it's preparing
to put into a packet based upon the length of the TCP options that it's
putting on that packet; around line 321 of the 4.4-Lite2 tcp_output.c
we see

        /*
         * Adjust data length if insertion of options will
         * bump the packet length beyond the t_maxseg length.
         */
        if (len > tp->t_maxseg - optlen) {
                len = tp->t_maxseg - optlen;

The fix is simply to compute "ipoptlen", and change this code to

        if (len > tp->t_maxseg - optlen - ipoptlen) {
                len = tp->t_maxseg - optlen - ipoptlen;

You don't even have to change the comment.  This then brings this bit
of code into compliance with RFC1122 (always a worthy goal).

  Bill

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 23 10:21:01 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id KAA2756129
	for tcp-impl-list;
	Mon, 23 Mar 1998 10:17:37 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id KAA2668040
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Mon, 23 Mar 1998 10:17:35 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id KAA21370
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 23 Mar 1998 10:17:32 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id KAA09908; Mon, 23 Mar 1998 10:17:29 -0800 (PST)
Message-Id: <199803231817.KAA09908@daffy.ee.lbl.gov>
To: Bill Fenner <fenner@parc.xerox.com>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: TCP problems with IP options and path MTU discovery 
In-reply-to: Your message of Mon, 23 Mar 1998 10:07:42 PST.
Date: Mon, 23 Mar 1998 10:17:29 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> The fix is simply to compute "ipoptlen", and change this code to
> 
>         if (len > tp->t_maxseg - optlen - ipoptlen) {
>                 len = tp->t_maxseg - optlen - ipoptlen;

Right.  Hopefully, it's dead easy to compute ipoptlen.  (Haven't looked
into this.)

> You don't even have to change the comment.

An excellent property of any bug fix!

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 23 15:46:36 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id PAA2907646
	for tcp-impl-list;
	Mon, 23 Mar 1998 15:40:32 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id PAA2897237
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Mon, 23 Mar 1998 15:40:27 -0800 (PST)
Received: from tnt.isi.edu (tnt.isi.edu [128.9.128.128]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id PAA24951
	for <tcp-impl@cthulhu.engr.sgi.com>; Mon, 23 Mar 1998 15:40:25 -0800 (PST)
	mail_from (touch@ISI.EDU)
Received: from rum.isi.edu (rum-s.isi.edu [128.9.192.237])
	by tnt.isi.edu (8.8.7/8.8.6) with ESMTP id PAA15668;
	Mon, 23 Mar 1998 15:40:24 -0800 (PST)
From: Joe Touch <touch@ISI.EDU>
Received: (from touch@localhost)
	by rum.isi.edu (8.8.7/8.8.6) id PAA09496;
	Mon, 23 Mar 1998 15:40:23 -0800 (PST)
Date: Mon, 23 Mar 1998 15:40:23 -0800 (PST)
Message-Id: <199803232340.PAA09496@rum.isi.edu>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Slow-start restart - discussion of fixes
Cc: touch@ISI.EDU
X-auto-sig-adder-by: faber@isi.edu
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Hi,

At the last IETF, I presented a discussion of the TCP slow-start
restart failure for HTTP-style request/responses over persistent
connections.

The following is a draft summarizing the problem and potential
solutions, which is being submitted. If there is sufficient interest,
we have graphs describing the implementations and comparing them which
can be discussed at the LA meeting.

Joe (and Amy and John)

PS - the draft does not yet have an official name, but will be
available at:

	http://www.isi.edu/touch/pubs/draft-xxx.txt


-----------------------------------------------------------------------


INTERNET-DRAFT                     Amy Hughes, Joe Touch, John Heidemann
draft-xxx.txt                                                        ISI
                                                          March 30, 1998
                                                 Expires: Sept. 30, 1998

              Issues in TCP Slow-Start Restart After Idle

Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as ``work in progress.''

   Please check the I-D abstract listing contained in each Internet
   Draft directory to learn the current status of this or any other
   Internet Draft.

   The distribution of this document is unlimited.

Abstract

   This draft discusses variations in the TCP 'slow-start restart' (SSR)
   algorithm, and the unintended failure of some variations to properly
   restart in some environments. SSR is intended to avoid line-rate
   bursts after idle periods, where TCP accumulates permission to send
   in the form of ACKs, but does not consume that permission
   immediately. SSR's original "restart after send is idle" is commonly
   implemented as "restart after receive is idle". The latter
   unintentionally fails to restart for bidirectional connections where
   the sender's burst is triggered by a reverse-path data packet, such
   as in persistent HTTP. Both the former and latter are shown to permit
   bursts in other circumstances. Three solutions are discussed, and
   their implementations evaluated.

   This document is a product of the LSAM project at ISI.  Comments are
   solicited and should be addressed to the authors.

Introduction

   Slow-Start Restart (SSR) describes one TCP behavior to respond to
   long sending pauses in an open connection.  When a sender becomes
   idle, the normal ack-clocking mechanism which regulates traffic is no



Expires Sept. 30, 1998                                          [Page 1]

Hughes, et al.             Restart After Idle             March 30, 1998


   longer present and the sender may introduce a burst of packets into
   the network as large as the current congestion window (CWND).  Such a
   burst may be too large for the intermediate routers to handle and may
   be too large for the receiver to handle at one time as well.

   A send timer was first proposed [JK90] to detect idle sending
   periods; the recommended response is to close the congestion window
   and perform a new slow-start.  However, a footnote to this first
   proposed solution noted that send/receive symmetry on the channel
   meant that a receive timer could be used instead to achieve the same
   results.  As this second solution takes advantage of a timer that is
   already required (to detect packet loss) it was implemented by
   Jacobson and Karels.  This solution has been repeated in
   implementations which derive from their work.

   Bursty connections, such as the persistent connections required in
   HTTP/1.1 [FGMFB97] have been found to interact in meaningful ways
   with SSR [6].  In fact, it was discovered that SSR never occurs with
   HTTP/1.1 [Poo97].  This is because a new request will reset the
   receive timer (as suggested in the footnote in [JK90]) and the
   sending pause will not be detected [Tou97].

   Further, both timer solutions depend on the retransmit timeout (RTO)
   and cannot detect send pauses that are shorter than this duration.
   In such cases, the sender may transmit a burst as large as the full
   congestion window.

Burst detection.

   There are several ways of determining whether a connection is at risk
   of sending a burst of packets into the channel.  We will discuss each
   method below, from the least radical to the most radical.

 Receive Timer:
   The use of a receive timer is the most common burst detection method.
   It is attractive because it is simple and makes use of an existing
   timer.  However, a receive timer does not properly detect bursts in
   HTTP/1.1 because the timer is cancelled when the request packet is
   received.  Further, when the connection is idle for less than a full
   RTO, a burst cannot be detected.  Such a burst can happen when the
   connection is "nearly idle" or when acks are lost or reordered.

 Send Timer:
   A send timer is the reciprocal solution to using a receive timer.
   While it requires a new timestamp field to be maintained, it clearly
   detects send pauses and corrects the problem presented by HTTP/1.1.
   However, as with the receive timer, it cannot detect bursts that
   could happen before a full RTO.



Expires Sept. 30, 1998                                          [Page 2]

Hughes, et al.             Restart After Idle             March 30, 1998


 Packet Counting:
   An alternative method examines the unused portion of the congestion
   window to determine if the capacity to burst exists.  This method is
   simple, it uses existing information to make its decision, and it
   solves both the HTTP/1.1. problem as well as the RTO problem.  In
   addition, it addresses the problem that needs to be solved (bursts)
   instead of a specific circumstance where the problem could happen
   (send pauses).  However, where timer detection avoids defining a
   burst (it defines idle periods instead), here a burst must be defined
   before it can be detected.  One possible definition is the situation
   where the available portion of the sending window is some proportion
   of the entire congestion window, say 50%.  Another definition places
   a numerical limit on the available portion of the congestion window,
   say 4 or CWND-1 packets.

Burst Response

   Once a burst is detected, there are several different ways to take
   action.  The different possibilities are listed below, again from
   least to most radical.

 Full Restart:
   Reducing the congestion window to one packet and re-entering slow-
   start, the original slow-start restart is one response.  This was the
   solution proposed by J&K.  This is a very conservative response and
   it defeats most of the speedup that HTTP/1.1 provides [HOT97].
   Current proposals [FAP97] have suggested increasing the initial
   window from 1 packet to 4 packets.  Further, depending on the method
   of burst detection, Full Restart can be far more punitive than it
   should be.  Coupled with a timer, full restart is most likely to
   respond to a completely empty congestion window.  Coupled with Packet
   Counting, the response could close the window too far, even smaller
   than the amount of outstanding data.

 Window Limiting:
   This is a modified version of Full Restart which solves the problem
   created by using Packet Counting to detect bursts.  With this type of
   response, the congestion window is reduced to the amount of
   outstanding data plus the slow-start initial window (1, 2, or 4).  It
   works exactly like Full Restart in the idle case, but is successful
   at controlling bursts in an active connection.  Further, in an active
   connection, it effectively implements a leaky bucket of the initial
   window size for the accumulation of send opportunity based on the
   receipt of acks.  This solution is fairly conservative, especially as
   it defaults to Full Restart, but more importantly, sending
   opportunity is simply lost if not used, and is not available for
   paced output.  Also, it forces negative congestion feedback on the
   congestion window.



Expires Sept. 30, 1998                                          [Page 3]

Hughes, et al.             Restart After Idle             March 30, 1998


 Burst Size Limitation:
   When a burst is detected, its effects are limited, the sender may not
   send any more than a preset number of packets into the network.  It
   is less conservative than the first two responses in that it does not
   affect the size of the congestion window, and it is simple to
   implement, simply count up the number of packets you can send and
   stop when you reach the limit.  Whether to wait for an ack or some
   other signal to resume sending is an implementation detail.  Lastly,
   this burst response can be performed after each ack or with each
   send. The behavior is slightly different in each case.

 Pacing:
   When a burst is detected, packets are dribbled into the network until
   the sender starts receiving acks and normal maintenance can be
   resumed [VH97].  This solution is very easy on the network and scales
   well in cases of high bw/delay.  However, it requires a new timer and
   parameter tuning require more research.

Implemented Solutions

   Now we will examine combinations of the different detection and
   response methods presented above.  Each of the solutions that below
   have been implemented in some form.

 BSD Implementation (Jacobson and Karels)
   The most common implementation uses a receive timer coupled with Full
   Restart.  This is the implementation that causes the interaction
   problems with HTTP/1.1.  The obvious alternative is to implement a
   send timer as originally intended and use Full Restart.  There are
   several drawbacks to this solution.  First, a send timer adds
   additional state and serves no purpose other than to correct the
   bursting behavior after send pauses.  Second, forcing a slow-start in
   this situation is problematic for HTTP/1.1.  A slow-start for each
   new user request adds a delay burden to characteristically small HTTP
   responses. Further, the HTTP user request pattern is unpredictable.
   It is possible for the user to make a new request before the send
   timer expires, triggering a burst that would defeat such a timer.

 Maximum Burst Limitation (Floyd)
   Floyd has proposed a coupling of Packet Counting with Burst Size
   Limitation.  This solution has been implemented in ns and it prevents
   the sender from transmitting a series of back-to-back packets larger
   than the user configured burst limit (suggested to be 4 packets)
   [NS97].  There are several issues involved with recovering from a
   burst and the ns implementation doesn't address them consistently.
   First, it is not clear when the sender is allowed to send again after
   sending the the first limited burst of packets.  One implementation
   requires the sender to wait for the burst timer to expire.  Another



Expires Sept. 30, 1998                                          [Page 4]

Hughes, et al.             Restart After Idle             March 30, 1998


   seems to allow a series of short bursts.  Another issue is how the
   simulation implementation and usage translates to a live network
   situation.  The implementation of this solution can range from simple
   to more complex.

 Congestion Window Monitoring (Hughes, Touch, and Heidemann)
   Our proposed solution combines Packet Counting with Window Limiting.
   Whenever (CWND - outstanding data > 4), we reduce CWND to
   (outstanding data + 4).  The choice of 4 packets is discussed in with
   the implementation details below.  Congestion Window Monitoring (CWM)
   allows the congestion window to grow normally but shrinks the
   congestion window as the sender becomes idle.  It also prevents the
   sender from transmitting any bursts larger than 4 packets in response
   to a new request. Because CWM is not dependent on any timers, the
   loss of an ack or a nearly idle connection cannot cause any bursts.
   CWM is similar to Burst Limitation, but avoids the burst by reducing
   CWND, rather than by inhibiting the sends directly.  As a result, we
   avoid the potential problem of sequential calls to TCP_output, which
   would cause bursts in the former, but not the latter.  CWM also
   causes TCP to use the feedback of 'not using the CWND fast enough',
   which results in a decrease in the CWND.

   CWM effectively imposes a leaky bucket type limitation on the
   congestion window.  The window is allowed to grow and be managed
   normally but the sender is not allowed to save up any sending
   opportunities.  Any opportunity that is not used is lost.  This
   property of CWM forces interleaved reception of acks and processing
   of sends.

 Rate Based Pacing (Visweswaraiah and Heidemann)
   Rate Based Pacing combines the Pacing response with either a Send
   Timer or Packet Counting.  It avoids slow-start when resuming after
   sending pauses and allows the normal clocking of packets to be
   gracefully restarted.  When a burst potential is detected, the
   algorithm meters a small burst of packets into the channel [VH97].
   RBP is the least conservative solution to the bursting problem
   because it continues to make use of the pre-pause congestion window.
   If network conditions have changed significantly, maintaining the
   previous window could cause the paced connection to be overly
   aggressive as compared to other connections.  (Although some work
   suggests congestion windows are stable over multi-minute timeframes
   [BSSK97].)  More recently pacing been suggested for use in wireless
   networking scenarios [BPK97], and for satellite connections.

Experimental Comparisons

   Packet traces of the current FreeBSD implementation of SSR (using the
   receive timer), of a modified version of FreeBSD using a send timer,



Expires Sept. 30, 1998                                          [Page 5]

Hughes, et al.             Restart After Idle             March 30, 1998


   and of CWM with HTTP/1.1 support the above observations.  In all of
   the traces, the response pattern for the first request is the same
   with each method.  This shows that CWM allows the congestion window
   to grow normally.  Because of the different actions taken by the
   three algorithms, the response pattern for the second request differs
   as would be expected.  [We have graphs available upon request]

   When the second request arrives at the server after the
   retransmission timeout (RTO), normal FreeBSD allows the server to
   respond with a burst of packets.  FreeBSD using a send timer responds
   by entering slow-start. CWM allows a 4 packet burst.  When the second
   request arrives at the server before the RTO, both timer
   implementations allow a burst.  CWM again limits the burst to 4
   packets.  Note, RTO is the common timer limit, but any value would
   have the same results, depending on when the second request was
   presented in relation to the timer.

Implementation of Congestion Window Monitoring

   Congestion Window Monitoring requires a simple modification to
   existing TCP output routines.  The changes required replace the
   current idle detection code.  Replace the existing 3 lines of code:

          idle = (snd_max == snd_una)
          if (idle && now - lastrcv >= rto)
                  cwnd = 1;

   with the following 3 lines of code:

          maxwin = 4 + snd_nxt - snd_una;
          if (cwnd > maxwin)
                  cwnd = maxwin;

   Packet counting is implemented by line 1.  Lines 2 and 3 implement
   Window Limitation.

   The choice of limiting the available congestion window to 4 packets
   is based on the normal operation of TCP.  An ACK received by the
   sender may be in response to the receipt of 2 packets, allowing
   another 2 to be sent.  Further, normal window growth may require the
   sending of a third packet.  Lastly, in slow-start with delayed ACKs,
   the receipt of an ACK can trigger the sending of 4 packets. Thus, 4
   packets is a reasonable burst to send into the network.

   Increasing the initial window in slow-start to 4 packets has already
   been proposed [FAP97].  The effects of this change have been explored
   in simulation in [PN98] and in practice in [AHO97].  Such a
   modification to TCP would cause the same behavior as our solution in



Expires Sept. 30, 1998                                          [Page 6]

Hughes, et al.             Restart After Idle             March 30, 1998


   the cases where the pause timer has expired.  It does not address the
   pre-timeout bursting situation we are concerned with.

Conclusions

   At this time, we propose CWM as a simple, minimal and effective fix
   to the 'bug' in current TCP implementations that is exploited by
   HTTP/1.1.  Modifications can be made to TCP to solve the slow-start
   restart problem that are consistent with the original congestion
   avoidance specifications (i.e. a send timer).  However, we feel that
   the original intended behavior is not appropriate to some current
   applications, specifically HTTP. Thus, we recommend Congestion Window
   Monitoring to prevent bursts into the network.  Not only does this
   solution solve the current problem in a simple way, it will prevent
   bursting in any other situation that might arise. The 4 packet bursts
   which we allow are consistent with congestion window growth
   algorithms and with Floyd's conclusion about increasing the initial
   window size.

   CWM, as well as the other solutions listed, need to be re-evaluated
   within emerging TCP implementations, e.g., SACK [JB88].  In general,
   TCP has no rate pacing and uses congestion control to avoid bursts in
   current implementations.  A more explicit mechanism, such as RBP or
   similar proposals may be desirable in the future.

Security implications

   CWM presents no security problems.

References


   [AHO97] Mark Allman, Chris Hayes, and Shawn Ostermann.  An Evaluatin
       of TCP Slow Start Modifications, July 1997.  (Submitted to CCR,
       draft available from http://jarok.cs.ohiou.edu/papers/)

   [BPK97] Hari Balakrishnan, Venkata N. Padmanabhan, and Randy H. Katz.
       The Effects of Asymmetry on TCP Performance.  In Proceedings of
       the ACM/IEEE Mobicom, Budapest, Hungary, ACM.  September, 1997.

   [BSSK97] Hari Balakrishnan, Srinivasan Seshan, Mark Stemm, and Randy
       H. Katz.  Analyzing Stability in Wide-Area Network Performance.
       In Proceedings of the ACM SIGMETRICS, Seattle WA, USA, ACM.
       June, 1997.

   [FGMFB97] R. Fielding, Jim Gettys, Jeffrey C. Mogul, H. Frystyk, and
       Tim Berners-Lee. Hypertext Transfer Protocol -- HTTP/1.1, January
       1997.  RFC 2068.



Expires Sept. 30, 1998                                          [Page 7]

Hughes, et al.             Restart After Idle             March 30, 1998


   [FAP97] Sally Floyd, Mark Allman, and Craig Partridge.  Increasing
       TCP's Initial Window, July 1997.  Internet Draft draft-floyd-
       incr-init-win-01.txt

   [Hei97] John Heidemann. Performance Interactions Between P-HTTP and
       TCP Implementations.  ACM Computer Communications Review, 27(2),
       65-73, April 1997.

   [HOT97] John Heidemann, Katia Obraczka, and Joe Touch.  Modeling the
       Performance of HTTP Over Several Transport Protocols.  ACM/IEEE
       Transactions on Networking 5(5), 616-630, October, 1997.

   [JB88] Van Jacobson and R.T. Braden. TCP extensions for long-delay
       paths, October 1988. RFC 1072.

   [JK90] Van Jacobson and Michael J. Karels.  Congestion Avoidance and
       Control.  ACM Computer Communication Review, 18(4):314-329,
       August 1990. Revised version of his SIGCOMM '88 paper.

   [NS97] ns Network Simulator.  http://www-mash.cs.berkeley.edu/ns/,
       1997.

   [PN98] K. Poduri and K. Nichols. Simulation Studies of Increased
       Initial TCP Window Size, February 1998.  Internet Draft draft-
       ietf-tcpimpl-poduri-00.txt

   [Poo97] Kacheong Poon, Sun Microsystems, tcp-implementors mailing
       list, August, 1997.

   [Tou97] Joe Touch, ISI, tcp-implementors mailing list, August 12,
       1997.

   [VH97] Vikram Visweswaraiah and John Heidemann.  Improving Restart of
       Idle TCP Connections.  Technical Report 97-661, University of
       Southern California, November 1997.



Authors/ Address

   Amy Hughes, Joe Touch, John Hiedemann
   University of Southern California/Information Sciences Institute
   4676 Admiralty Way
   Marina del Rey, CA 90292-6695
   USA
   Phone: +1 310-822-1511
   Fax:   +1 310-823-6714
   URLs:   http://www.isi.edu/~ahughes



Expires Sept. 30, 1998                                          [Page 8]

Hughes, et al.             Restart After Idle             March 30, 1998


           http://www.isi.edu/~touch
           http://www.isi.edu/~johnh
   Email: ahughes@isi.edu
          touch@isi.edu
          johnh@isi.edu














































Expires Sept. 30, 1998                                          [Page 9]

----------------------------------------------------------------------
Joe Touch - touch@isi.edu		    http://www.isi.edu/~touch/
ISI / Project Leader, ATOMIC-2, LSAM       http://www.isi.edu/atomic2/
USC / Research Assistant Prof.                http://www.isi.edu/lsam/

From owner-tcp-impl@relay.engr.sgi.com  Mon Mar 23 23:46:01 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id XAA3048072
	for tcp-impl-list;
	Mon, 23 Mar 1998 23:44:15 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id XAA3038320
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Mon, 23 Mar 1998 23:44:13 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id XAA28992
	for <tcp-impl@relay.engr.SGI.COM>; Mon, 23 Mar 1998 23:44:12 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id XAA14025; Mon, 23 Mar 1998 23:44:11 -0800 (PST)
Message-Id: <199803240744.XAA14025@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Cc: floyd@ee.lbl.gov, mallman@lerc.nasa.gov
Subject: increasing TCP's initial window
In-reply-to: Your message of Mon, 23 Mar 1998 17:33:35 PST.
Date: Mon, 23 Mar 1998 23:44:11 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

One of the main topics for discussion at next week's tcpimpl meeting is
increasing TCP's initial window to 3 or 4 segments.  (We already found
rough consensus for an increase to 2 segments at last December's DC
meeting.)  We'd really like to have some discussion of this change on the
mailing list prior to the meeting.  To this end, Sally Floyd put together
the appended note in order to summarize the issues and encourage discussion.

		Vern

--------------------------------------------------

Here is the summary:

(1) At the last tcpimpl meeting, we had rough consensus for allowing
a TCP Initial Window (IW) of two segments, and agreed to revisit
at this meeting the question of initial windows of three or four
segments.

(2) Mark Allman and I have revised the internet-draft
draft-floyd-incr-init-win-01.txt, to reflect the current state of the
discussion.   The revised draft adds the following specifications and/or
clarifications:

  (a) The increased IW would only apply after the initial SYN/ACK handshake.
  (b) If the SYN or SYN-ack had to be retransmitted, then the side
    retransmitting it must start with IW=1.
  (c) A more lengthy discussion of interactions with the "Don't Fragment" (DF)
    bit, including a recommendation to recompute the IW if the segment
    size is changed during Path MTU Discovery.

(3) The new I-D also covers some additional simulation and other results:

  (a) A description of the bursts of two and three segments common during
    slow-start, and of the four-segment bursts common whan a single
    delayed-ACK is dropped.
  (b) Studies by [HAGT98] showing that the use of larger initial windows
    decreases HTTP transfer time, for experiments in a satellite environment.
  (c) Studies by [PN98] investigating the impact of larger initial windows
    on competing traffic in a simulation scenario with both HTTP and FTP flows.
    The larger initial windows decreased HTTP transfer times and at the
    same time slightly increased the segment drop rate.
  (d) Studies by [Mor97] showing that in a heavily-congested network,
    initial windows of four segments can **increase** HTTP transfer times
    and increase the segment drop rate.

My own assessment of the discussion to date is that there is a
reasonably-solid consensus in favor of allowing an IW of three segments
(for MSS < 2190 bytes).  It is not clear to me if there is a consensus
in favor of allowing an IW of four segments (for MSS <= 1095 bytes).

- Sally

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 05:17:00 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id FAA3021006
	for tcp-impl-list;
	Tue, 24 Mar 1998 05:15:22 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id FAA3123290
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 05:15:20 -0800 (PST)
Received: from assateague.lerc.nasa.gov (assateague.lerc.nasa.gov [139.88.35.25]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id FAA28126
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Mar 1998 05:15:18 -0800 (PST)
	mail_from (mallman@guns.lerc.nasa.gov)
Received: from guns.lerc.nasa.gov by assateague.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main)
        id IAA23125; Tue, 24 Mar 1998 08:15:16 -0500 (EST)
Received: from guns by guns.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-local)
        id IAA00858; Tue, 24 Mar 1998 08:15:15 -0500 (EST)
Message-Id: <199803241315.IAA00858@guns.lerc.nasa.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Cc: "Vern Paxson" <vern@ee.lbl.gov>
From: "Mark Allman" <mallman@lerc.nasa.gov>
Reply-To: mallman@lerc.nasa.gov
Subject: Minutes
Organization: Late Night Hackers, NASA Lewis, Cleveland, Ohio
Song-of-the-Day: Nothingman
Date: Tue, 24 Mar 1998 08:15:15 -0500
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

 
Here are the minutes for the Washington meeting that Vern and I have
hacked out over the past few days.  Thanks for Joe Touch and Bernie
Volz for their meeting notes.  Any inaccuracies in these notes can
be attributed to our weak memories.

allman


---

TCP-IMPL Meeting Minutes
40th IETF Meeting -- Washington DC
12-8-97 -- 12-11-97

============================================================================

These notes were compiled by Mark Allman and Vern Paxson based on notes
from Joe Touch and Bernie Volz (thanks!), as well as the co-chairs'
(possibly faint) memories of the meeting.

============================================================================

1.  additions to Known Problems I-D: Vern Paxson (10 min)

Vern outlined a new problem catagory: resource management

Vern outlined a few additions to the "Known Problems" I-D:
  -A low interval between keepalives can cost money on dial-on-demand links

  -Stretch ACK Violation

   A question about the definition of a "full sized packet" was
   raised.  A suggestion was made that this should be changed to a
   "packet with data".  This would allow receivers to ACK every
   second packet with any data instead of waiting for 2 "full
   segments worth" of data to arrive.  (Actually, receivers are
   already "allowed" to do this.  RFC 1122 says the TCP must ack
   *at least* every two full sized segments, but acking more often
   is certainly permissible, too. - VP)

  -Failure to send FIN notification promptly
  -Failure to send a RST after Half Duplex Close

============================================================================

2.  revisions to Testing Tools I-D: Steve Parker (5 min)
  
  (We don't recall any details concerning Steve's short presentation.)

============================================================================

3.  porting Packet Shell to libpcap: Steve Parker (10 min)

  Steve Parker reported that Sun's packet shell is being ported to
  libpcap for greater portability.  Also, roughly 200 TCP tests have
  been added to the suite.  They are also porting UNH IPv6 tests (licensed
  from UNH) to this regression suite. [This is currently not available
  outside Sun.]

============================================================================

4.  TIME_WAIT problems: Joe Touch (5 min)

  TIME_WAIT issues - Joe Touch gave a presentation on the time_wait
  accumulation issue, and discussed both application and protocol
  solutions. Joe presented the alternatives as either/or, in favor of
  the protocol solution, and asked the participants which solution to
  recommend.  There was support for both solutions, as well as encouragement
  that existing optimizations for dealing with large number of TCBs, such
  as hashing and ordering, should also be encouraged.

  We [the co-chairs] believe there was a considerable body of opinion at
  the meeting that TIME_WAIT isn't really a significant problem, you just
  need to implement a decent hash table and you're done (per the final
  comment in the previous paragraph).  Unfortunately, in the absence of
  definitive notes, and given that neither co-chair has a clear recollection,
  we cannot state this more strongly [or rule it out as fantasy].

============================================================================

5.  slow-start restart bug: Joe Touch (5 min)

  Joe Touch discussed various ways to re-start an idle connection.
  Joe presented 5 mechanisms for doing this...

    1.	do nothing (i.e., just use same cwnd as before idle period)
    2.	slow start from 1 segment after not sending for 1 RTO
    3.	slow start from 1 segment after not receiving for 1 RTO
    4.	never send more than 4 segments (a 'maxburst' parameter)
    5.	never accumulate more than 4 segments of unused window
        (this was Joe's proposed solution)

  A discussion followed.  Most people agreed the solution widely
  used (#3) was a bug because in the common case of, say, persistent
  HTTP, often the TCP has just received something (a request to
  send more data), so test #3 fails even though the TCP is quite idle.

  Implementing solution #2 would require a new state variable (time of
  last send).  Solutions 2 and 3 can generate large line-rate bursts if
  connection is idle for less than 1 RTO, but does not continue to send
  new segments as ACKs are received.  Joe noted that solution 4 can also
  burst when ACK processing and send processing are not appropriately
  interleaved.  No consensus was generated on the appropriate approach.
  Joe offered to perform a comparison and give another report at the
  next meeting.

============================================================================

6.  checksum document (5 min)
  
  Vern presented a suggestion from Larry Backman: TCP checksumming
  documentation is spread over a large number of RFCs (RFCs 793,
  1071, 1122, 1141, 1626, 1936).  The suggestion is for a single
  document summarizing checksumming, including:

    -algorithm
    -what to crunch on 
    -strategy (whole packet vs. incremental)
    -reference implementation
    -caveats/collected wisdom

  Larry volunteered to work on a document from an implementor's
  perspective.  But, he requested help from an "algorithm geek".

  Consensus was that RFC 1071 already summarized checksumming quite
  well. 

============================================================================

7.  call for volunteers (5 min)

  Vern asked (begged) for volunteers for the outstanding
  implementation problems that need documented.  The response
  was, as usual, underwhelming.

============================================================================

8.  initial slow-start (50+ min)

  Sally Floyd briefly outlined the proposal for increasing TCP's
  initial window from 1 segment to 2--4 segments, depending on the
  MSS: 

    	MSS <= 1095 bytes:
	    win = 4 * MSS
	1095 bytes < MSS < 2190 bytes:
	    win = 4380
	MSS => 2190 bytes:
	    win = 2 * MSS

  Sally argued that bursts of 2 and 3 segments are common in the
  Internet.  When TCP is in congestion avoidance and receives a
  delayed ACK, 2 segments are transmitted.  If TCP is in slow start,
  3 segments are transmitted.  Furthermore, bursts of 4 and 5
  segments are not rare.  If a single delayed ACK is dropped during
  congestion avoidance a burst of 4 segments is sent.  And, if one
  delayed ACK is dropped during slow start a burst of 5 segments is
  sent.  

  Kedar Poduri then presented simulations of multiple flows using an
  initial window of 1--4 segments.  These simulations showed
  improvements to web type traffic when using larger initial
  windows, without large increases in the drop rate.  (Joint work
  with Kathie Nichols).

  Tim Shepard presented the results outlined in his Internet Draft
  (draft-shepard-tcp-4-packets-3-buff-00.txt).  Tim showed that when
  using an initial window of 4 segments and a router buffer of 3
  segments (guaranteeing that the 4th segment would be dropped) the
  performance of the TCP connection was slightly better than using
  an initial window of 1 segment.  (Joint work with Craig
  Partridge). 

  Mark Allman presented measurements of 16KB transfers across the
  Internet and dialup modem channels.  When using initial window of
  2--4 segments over the dialup channel, transfer time was decreased
  by roughly 7--10%.  In addition, the drop rate was not increased.
  In the Internet tests, the drop rate was increase very slightly
  with initial window of 2--4 segments.  Furthermore, the transfer
  time was reduced by 2, 15 and 25% for initial windows of 2, 3 and
  4 segments respectivly.

  A discussion followed.  A consensus for an initial window of 2
  segments was obtained.  Some people felt that more evidence from
  real networks was needed before they would be comfortable with
  initial windows of 3 or 4 segments.  It was suggested that we
  accept an initial window as proposed (i.e., initial window of
  between 2 and 4 segments based on MSS) at the next meeting unless
  evidence that it is harmful is presented.

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 10:44:15 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id KAA3211092
	for tcp-impl-list;
	Tue, 24 Mar 1998 10:40:59 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id KAA3206755
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 10:40:53 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id KAA17574
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 24 Mar 1998 10:40:52 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id KAA15307; Tue, 24 Mar 1998 10:40:51 -0800 (PST)
Message-Id: <199803241840.KAA15307@daffy.ee.lbl.gov>
To: tcp-impl@cthulhu.engr.sgi.com
Cc: mallman@lerc.nasa.gov
Subject: Draft agenda for TCPIMPL WG meeting in LA
Date: Tue, 24 Mar 1998 10:40:51 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

If you have comments etc., now would be a good time. - Vern


1.  Testing Tools I-D
    (5 minutes)
2.  Known Problems I-D
    (5 minutes)
3.  Pending problems
    (15 minutes)
4.  Re-starting idle connections
    (15 minutes)
5.  ns' new network emulation capabilities 
    (10 minutes)
6.  Increasing the initial window size
    (45+ minutes)

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 12:53:00 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id MAA3247882
	for tcp-impl-list;
	Tue, 24 Mar 1998 12:50:20 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id MAA3257246
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 12:50:19 -0800 (PST)
Received: from stars.cisco.com (stars.cisco.com [171.71.112.28]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id MAA10684
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Mar 1998 12:50:18 -0800 (PST)
	mail_from (floop@cisco.com)
Received: from powder.cisco.com (gisg-bdc.cisco.com [171.69.180.2]) by stars.cisco.com (8.8.4-Cisco.1/8.6.5) with ESMTP id MAA20403 for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Mar 1998 12:50:07 -0800 (PST)
Received: by gisg-bdc.cisco.com with Internet Mail Service (5.5.1960.3)
	id <GNNGVL1A>; Tue, 24 Mar 1998 14:52:16 -0600
Message-ID: <F96684A2CC2FD111BD8A00A0C92A1A145D26@gisg-bdc.cisco.com>
From: Doug Drew <floop@cisco.com>
To: tcp-impl@cthulhu.engr.sgi.com
Cc: "Alan M. Carroll" <amc@cisco.com>, Bradley Frank <baf@cisco.com>
Subject: using RST to indicate busy
Date: Tue, 24 Mar 1998 14:52:13 -0600
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.1960.3)
Content-Type: text/plain
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

I have seen a network trace of the following scenario:

An HTTP client sends a SYN to port 80 on a server.  The server
returns a RST.  The client waits half a second, then sends another
SYN with the same sequence number.  The server returns a
SYN-ACK, and the conversation proceeds as normal.

Did I miss a meeting?  Is the server using the RST to
indicate "I'm busy, try again soon" to the client?  Is this behavior
a) documented and 2) permitted?

I found that the most recent stack that I have from a large
software company near Seattle does indeed retransmit SYNs
at half second intervals when it receives a RST.  It does this
three times before giving up.

It is much more complicated to experimentally reproduce
the server behavior, which is what I'm really asking about.

Doug Drew - floop@cisco.com
Software Engineer - Centri Engineering Group - Cisco Systems, Inc.
Champaign, IL voice: (217) 363-4514 fax: (217) 363-4599



From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 13:14:59 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id NAA3270183
	for tcp-impl-list;
	Tue, 24 Mar 1998 13:13:51 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id NAA3267846
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 13:13:49 -0800 (PST)
Received: from frantic.bsdi.com (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id NAA20472
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Mar 1998 13:13:45 -0800 (PST)
	mail_from (dab@frantic.bsdi.com)
Received: (from dab@localhost)
	by frantic.bsdi.com (8.8.8/8.8.8) id PAA00353;
	Tue, 24 Mar 1998 15:13:27 -0600 (CST)
Date: Tue, 24 Mar 1998 15:13:27 -0600 (CST)
From: David Borman <dab@BSDI.COM>
Message-Id: <199803242113.PAA00353@frantic.bsdi.com>
To: floop@cisco.com, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: using RST to indicate busy
Cc: amc@cisco.com, baf@cisco.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Doug,

> From: Doug Drew <floop@cisco.com>
> Subject: using RST to indicate busy
> Date: Tue, 24 Mar 1998 14:52:13 -0600
> ...
> An HTTP client sends a SYN to port 80 on a server.  The server
> returns a RST.  The client waits half a second, then sends another
> SYN with the same sequence number.  The server returns a
> SYN-ACK, and the conversation proceeds as normal.
> 
> Did I miss a meeting?  Is the server using the RST to
> indicate "I'm busy, try again soon" to the client?  Is this behavior
> a) documented and 2) permitted?
> 
> I found that the most recent stack that I have from a large
> software company near Seattle does indeed retransmit SYNs
> at half second intervals when it receives a RST.  It does this
> three times before giving up.

BSD based TCP implementations will just drop the connection
upon receiving a RST in response to the SYN, and return
ECONNREFUSED to the application.  Some applications will
pause, then loop and try the connect() again (the rcmd()
library routine does this), for a specified number of
retries.

BSD based TCP implementations also just drop the incoming SYN
if there are too many outstanding, unaccepted connections, depending
on the client to retransmit the SYN.

		-David Borman, dab@bsdi.com


From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 13:14:59 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id NAA3224138
	for tcp-impl-list;
	Tue, 24 Mar 1998 13:11:06 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id NAA3269003
	for <TCP-IMPL@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 13:11:04 -0800 (PST)
Received: from alcor.process.com (alcor.process.com [192.42.95.16]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id NAA19198
	for <TCP-IMPL@CTHULHU.ENGR.SGI.COM>; Tue, 24 Mar 1998 13:11:03 -0800 (PST)
	mail_from (VOLZ@PROCESS.COM)
Date:     Tue, 24 Mar 1998 16:10 -0500
From: VOLZ@PROCESS.COM (Bernie Volz)
Message-Id: <009C3AC7E56A28BF.85D4@PROCESS.COM>
To: floop@cisco.com, TCP-IMPL@cthulhu.engr.sgi.com
Subject:  RE: using RST to indicate busy
X-VMS-To: SMTP%"floop@cisco.com"
X-VMS-Cc: TCP-IMPL@CTHULHU.ENGR.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Doug Drew - floop@cisco.com wrote:

>I have seen a network trace of the following scenario:
>
>An HTTP client sends a SYN to port 80 on a server.  The server
>returns a RST.  The client waits half a second, then sends another
>SYN with the same sequence number.  The server returns a
>SYN-ACK, and the conversation proceeds as normal.
>
>Did I miss a meeting?  Is the server using the RST to
>indicate "I'm busy, try again soon" to the client?  Is this behavior
>a) documented and 2) permitted?
>
>I found that the most recent stack that I have from a large
>software company near Seattle does indeed retransmit SYNs
>at half second intervals when it receives a RST.  It does this
>three times before giving up.
>
>It is much more complicated to experimentally reproduce
>the server behavior, which is what I'm really asking about.
>
>Doug Drew - floop@cisco.com
>Software Engineer - Centri Engineering Group - Cisco Systems, Inc.
>Champaign, IL voice: (217) 363-4514 fax: (217) 363-4599

The same large software company near Seattle does send RST when
the socket queue is full and it can't accept another connection (this
was especially bad in earlier releases of that company's TCP/IP software
when the backlog could only be set to a small number of connection). The
normal (4.4BSD) practice is to simply ignore the SYN and allow the
normal TCP retransmission of the SYN to "try again" a short time later.

Since this large software company chose to RST instead, I think they
decided that they also needed to retry several times and hence that is
what their "connect" implementation does.

I did, a long while ago, suggest to their developers that they change
this behavoir to be more in line with standard practices. But alas ...

It is pretty simple to reproduce the server behavoir ... write code such
as:
	- Create stream socket
	- Bind to port
	- Listen with backlog of 5
	- Sleep (ie, don't call accept)
Compile/link/run on that company's server.

Then, open a bunch of connections. Once the backlog queue fills, watch
the RSTs come for any additional connection attempts.

- Bernie Volz
  Process Software

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 13:23:38 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id NAA3280047
	for tcp-impl-list;
	Tue, 24 Mar 1998 13:21:18 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id NAA3274392
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 13:21:16 -0800 (PST)
Received: from firewall.agranat.com (agranat.com [198.113.147.2]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id NAA23271
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Mar 1998 13:21:15 -0800 (PST)
	mail_from (lawrence@agranat.com)
Received: from agranat.com (alice [192.104.71.130]) by firewall.agranat.com (8.6.12/8.6.9) with ESMTP id QAA06936; Tue, 24 Mar 1998 16:21:07 -0500
Received: from localhost (lawrence@localhost)
	by agranat.com (8.8.5/8.8.5) with SMTP id QAA10283;
	Tue, 24 Mar 1998 16:21:07 -0500
Date: Tue, 24 Mar 1998 16:21:07 -0500 (EST)
From: Scott Lawrence <lawrence@agranat.com>
To: Doug Drew <floop@cisco.com>
cc: tcp-impl@cthulhu.engr.sgi.com, "Alan M. Carroll" <amc@cisco.com>,
        Bradley Frank <baf@cisco.com>
Subject: Re: using RST to indicate busy
In-Reply-To: <F96684A2CC2FD111BD8A00A0C92A1A145D26@gisg-bdc.cisco.com>
Message-ID: <Pine.LNX.3.96.980324161121.10025A-100000@alice.agranat.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


On Tue, 24 Mar 1998, Doug Drew wrote:

> I have seen a network trace of the following scenario:
> 
> An HTTP client sends a SYN to port 80 on a server.  The server
> returns a RST.  The client waits half a second, then sends another
> SYN with the same sequence number.  The server returns a
> SYN-ACK, and the conversation proceeds as normal.
> 
> Did I miss a meeting?  Is the server using the RST to
> indicate "I'm busy, try again soon" to the client?  Is this behavior
> a) documented and 2) permitted?

  If a maximum number of current and pending inbound connections has
  been exceeded, then sending an RST from the server is, it seems to me,
  the right thing to do to refuse a new offered connection.  This is what
  our embedded web server does (typically it is configured with a maximum
  of between 8 and 16 connections).  

  However, there is no implication that a retry will succeed, and retrying
  from the client seems incorrect to me (the RST should be interpreted as 
  a refusal).

--
Scott Lawrence            Consulting Engineer       <lawrence@agranat.com>
Agranat Systems, Inc.  EmWeb Server Engineering    http://www.agranat.com/



From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 13:23:38 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id NAA3268202
	for tcp-impl-list;
	Tue, 24 Mar 1998 13:22:09 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id NAA3250116
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 13:22:07 -0800 (PST)
Received: from kalae.kohala.com (kalae.kohala.com [209.75.135.35]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id NAA23594
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Mar 1998 13:22:00 -0800 (PST)
	mail_from (rstevens@kohala.kohala.com)
Received: from kohala.kohala.com (kohala.kohala.com [209.75.135.33])
	by kalae.kohala.com (8.8.5/8.8.5) with ESMTP id OAA28708;
	Tue, 24 Mar 1998 14:21:53 -0700 (MST)
Received: (from rstevens@localhost) by kohala.kohala.com (8.8.5/8.8.3) id OAA29887; Tue, 24 Mar 1998 14:21:53 -0700 (MST)
Message-Id: <199803242121.OAA29887@kohala.kohala.com>
From: rstevens@kohala.com (W. Richard Stevens)
Date: Tue, 24 Mar 1998 14:21:53 -0700
Reply-To: "W. Richard Stevens" <rstevens@kohala.com>
X-Phone: +1 520 297 9416
X-Homepage: http://www.kohala.com/~rstevens
X-Mailer: Mail User's Shell (7.2.6 beta(3) 11/17/96)
To: Doug Drew <floop@cisco.com>, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: using RST to indicate busy
Cc: "Alan M. Carroll" <amc@cisco.com>, Bradley Frank <baf@cisco.com>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Here are some comments related to this that appeared on the end2end
list, January 28, 1997.

> If you are referring to Windows 95/NT applications (such
> as the Netscape browser) doing many SYN requests when attempting to
> CONNECT when the server sends back RSTs, that isn't really a Netscape
> bug or issue. It is Microsoft. Their TCP/IP does this. They retry on a
> SYN that is RST. This is because they also send back RSTs when the
> backlog for a listening socket is reached and in the early days, you
> could only set a low backlog which meant that they quickly found that
> users were getting the ECONNREFUSED status when a server was just too
> slow to keep up with the incoming connection request rate.


> > Correct me if I'm wrong, but doesn't this go against all existing
> > practice?
> >
> > How can they then distinguish between an RST in response to a SYN
> > that means "there is no socket in the LISTEN state" (i.e., the server
> > was never started) versus "the listen queue is filled for this socket"?
> > If I try to connect to a server that was never started, how many times
> > do they resend this SYN, even though each one elicits an RST, and with
> > what frequency?
> >
> > RFC 793 is pretty clear (p. 37) that an RST in other than the LISTEN
> > state means abort the connection, and go to the CLOSED state.
> 
> I discussed this with some Microsoft folks when Windows NT first came
> out. They said that was the way they intended it to be and that was the
> way it would be. Period. (Perhaps they feel big enough to set their own
> standards, it wouldn't be the first time.)

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 13:32:11 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id NAA3259660
	for tcp-impl-list;
	Tue, 24 Mar 1998 13:29:15 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id NAA3281035
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 13:29:13 -0800 (PST)
Received: from prawn.fishy.net (flounder.fishy.net [207.115.61.34]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id NAA26512
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Mar 1998 13:29:10 -0800 (PST)
	mail_from (oleg@prodigy.net)
Received: from prodigy.net ([172.17.17.247]) by prawn.fishy.net (8.8.5/8.7.3) with ESMTP id QAA17254; Tue, 24 Mar 1998 16:29:02 -0500
Message-ID: <35182449.DE4A3DBB@prodigy.net>
Date: Tue, 24 Mar 1998 16:23:22 -0500
From: Oleg Vishnepolsky <oleg@prodigy.net>
X-Mailer: Mozilla 4.04 [en] (WinNT; I)
MIME-Version: 1.0
To: Doug Drew <floop@cisco.com>
CC: tcp-impl@cthulhu.engr.sgi.com, "Alan M. Carroll" <amc@cisco.com>,
        Bradley Frank <baf@cisco.com>
Subject: Re: using RST to indicate busy
References: <F96684A2CC2FD111BD8A00A0C92A1A145D26@gisg-bdc.cisco.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

I believe Microsoft's NT 4.0's TCP/IP stack  implementation
does that. No, you are not missing anything. That's Microsoft.
They adopted "embrace and extend" strategy towards
Internet and interoperability.

Oleg Vishnepolsky

Doug Drew wrote:

> I have seen a network trace of the following scenario:
>
> An HTTP client sends a SYN to port 80 on a server.  The server
> returns a RST.  The client waits half a second, then sends another
> SYN with the same sequence number.  The server returns a
> SYN-ACK, and the conversation proceeds as normal.
>
> Did I miss a meeting?  Is the server using the RST to
> indicate "I'm busy, try again soon" to the client?  Is this behavior
> a) documented and 2) permitted?
>
> I found that the most recent stack that I have from a large
> software company near Seattle does indeed retransmit SYNs
> at half second intervals when it receives a RST.  It does this
> three times before giving up.
>
> It is much more complicated to experimentally reproduce
> the server behavior, which is what I'm really asking about.
>
> Doug Drew - floop@cisco.com
> Software Engineer - Centri Engineering Group - Cisco Systems, Inc.
> Champaign, IL voice: (217) 363-4514 fax: (217) 363-4599




From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 13:32:29 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id NAA3253801
	for tcp-impl-list;
	Tue, 24 Mar 1998 13:31:26 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id NAA3286424
	for <TCP-IMPL@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 13:31:23 -0800 (PST)
Received: from scanner.worldgate.com (scanner.worldgate.com [198.161.84.3]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id NAA27316
	for <TCP-IMPL@cthulhu.engr.sgi.com>; Tue, 24 Mar 1998 13:31:22 -0800 (PST)
	mail_from (marcs@znep.com)
Received: from znep.com (uucp@localhost)
	by scanner.worldgate.com (8.8.7/8.8.7) with UUCP id OAA21526;
	Tue, 24 Mar 1998 14:31:08 -0700 (MST)
Received: from localhost (marcs@localhost) by alive.znep.com (8.7.5/8.7.3) with SMTP id OAA14544; Tue, 24 Mar 1998 14:30:27 -0700 (MST)
Date: Tue, 24 Mar 1998 14:30:27 -0700 (MST)
From: Marc Slemko <marcs@znep.com>
To: Bernie Volz <VOLZ@PROCESS.COM>
cc: floop@cisco.com, TCP-IMPL@cthulhu.engr.sgi.com
Subject: RE: using RST to indicate busy
In-Reply-To: <009C3AC7E56A28BF.85D4@PROCESS.COM>
Message-ID: <Pine.BSF.3.95.980324141712.9590N-100000@alive.znep.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

On Tue, 24 Mar 1998, Bernie Volz wrote:

[...]
> The same large software company near Seattle does send RST when
> the socket queue is full and it can't accept another connection (this
> was especially bad in earlier releases of that company's TCP/IP software
> when the backlog could only be set to a small number of connection). The
> normal (4.4BSD) practice is to simply ignore the SYN and allow the
> normal TCP retransmission of the SYN to "try again" a short time later.
> 
> Since this large software company chose to RST instead, I think they
> decided that they also needed to retry several times and hence that is
> what their "connect" implementation does.
> 
> I did, a long while ago, suggest to their developers that they change
> this behavoir to be more in line with standard practices. But alas ...

This would be the same company that decides they need to send a RST to
terminate connections from their popular HTTP client.  The only answer I
could ever get about why they do this is because this other company did it
with their popular HTTP client, so it had to be a desirable way of doing
things, regardless of how broken it may be (and it does cause real-world
problems).  

I was under the odd impression that TCP did something where it magically
resent the SYN if it didn't get a response.  I was also under the odd
impression that by simply ignoring the SYN if the server was too busy (in
whatever way you want to define busy) would result in all clients properly
retrying the connection at some later time.  This idea seems to make sense
to me because it uses TCP's established retransmission method and requires
nothing of clients except that they comply with the spec while still
offering the server a second chance if it is too busy at a particular
moment.

If a server wishes to send a RST instead, I can't see that it is horrible.
All it means is that the server is saying that the connection can not be
made, period, and that it must have no expectation that the client would
take it upon itself to retry the connection without user intervention.

Modifying your client to automatically retry the connection using its own
(possibly very broken and harmful) retransmissions methods is, however, a
different matter that I find difficult to justify.  The only possible
justification I can think of is that, for whatever reason, the server is
normally unable to handle the load being placed on it and is often having
to do such things.  In that case, implementing your own retransmission
timeouts in this method would result in the perception of better
performance, however the real problem is that the server's TCP stack or
application program is unable to handle the requests.

If I were cynical, I would suggest that the reason for my first complaint
(RST terminating connections) is due to other limitations in some vendors
(both Unix and NT) stacks WRT large numbers of sockets in TIME_WAIT.

I am always disappointed when vendors try to force their methods by sheer
market force without discussion or consideration of technical merit, but
thats the way things have always been.


From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 14:05:33 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id OAA3293175
	for tcp-impl-list;
	Tue, 24 Mar 1998 14:02:37 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id OAA3298084
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 14:02:31 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id OAA09896
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Mar 1998 14:02:29 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id OAA15731; Tue, 24 Mar 1998 14:02:25 -0800 (PST)
Message-Id: <199803242202.OAA15731@daffy.ee.lbl.gov>
To: Scott Lawrence <lawrence@agranat.com>
Cc: Doug Drew <floop@cisco.com>, tcp-impl@cthulhu.engr.sgi.com,
        "Alan M. Carroll" <amc@cisco.com>, Bradley Frank <baf@cisco.com>
Subject: Re: using RST to indicate busy
In-reply-to: Your message of Tue, 24 Mar 1998 16:21:07 PST.
Date: Tue, 24 Mar 1998 14:02:25 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>   If a maximum number of current and pending inbound connections has
>   been exceeded, then sending an RST from the server is, it seems to me,
>   the right thing to do to refuse a new offered connection.

Well, it seems busted to me.  There's already a retry mechanism built
into SYN transmission, and it has backoff built into it, which is just
what you want when you're busy; and there's plenty of code out there
that actually believes it when you tell it "go away" with a RST, so
you break connectivity with those apps; and there's the passage from
end2end that Rich just passed along.

If RFC p. 37 gives us chapter and verse on this, then I'm certainly
inclined to document it as a known implementation problem.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 14:22:45 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id OAA3275092
	for tcp-impl-list;
	Tue, 24 Mar 1998 14:20:35 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id OAA2854960
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 14:20:32 -0800 (PST)
Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id OAA17132
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Mar 1998 14:18:34 -0800 (PST)
	mail_from (alan@lxorguk.ukuu.org.uk)
Received: from the-village.bc.nu (the-village.bc.nu [163.164.160.21]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id WAA22541; Tue, 24 Mar 1998 22:17:44 GMT
Received: by the-village.bc.nu (Smail3.1.29.1 #2)
	id m0yHbzx-000aNgC; Tue, 24 Mar 98 22:16 GMT
Message-Id: <m0yHbzx-000aNgC@the-village.bc.nu>
From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: using RST to indicate busy
To: lawrence@agranat.com (Scott Lawrence)
Date: Tue, 24 Mar 1998 22:16:44 +0000 (GMT)
Cc: floop@cisco.com, tcp-impl@cthulhu.engr.sgi.com, amc@cisco.com,
        baf@cisco.com
In-Reply-To: <Pine.LNX.3.96.980324161121.10025A-100000@alice.agranat.com> from "Scott Lawrence" at Mar 24, 98 04:21:07 pm
Content-Type: text
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>   been exceeded, then sending an RST from the server is, it seems to me,
>   the right thing to do to refuse a new offered connection.  This is what
>   our embedded web server does (typically it is configured with a maximum
>   of between 8 and 16 connections).  

I would strongly disagree. If you drop the frame the caller will seamlessly
try again and back off exponentially until the conection is achieved. Sending
an RST normally causes the end user to see "Connection refused by remote host"
and conclude the site is down or broken.


From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 14:22:45 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id OAA3290000
	for tcp-impl-list;
	Tue, 24 Mar 1998 14:20:04 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id OAA3208631
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 14:20:02 -0800 (PST)
Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id OAA17587
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Mar 1998 14:20:01 -0800 (PST)
	mail_from (sparker@fstop.Eng.Sun.COM)
Received: from Eng.Sun.COM (engmail2 [129.146.1.25]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id OAA16292; Tue, 24 Mar 1998 14:19:48 -0800
Received: from fstop. (fstop.Eng.Sun.COM [192.9.204.16])
	by Eng.Sun.COM (SMI-8.6/SMI-5.3) with SMTP id OAA08319;
	Tue, 24 Mar 1998 14:19:44 -0800
Received: from fstop.eng.sun.com by fstop. (SMI-8.6/SMI-SVR4)
	id OAA11376; Tue, 24 Mar 1998 14:19:34 -0800
Message-Id: <199803242219.OAA11376@fstop.>
From: sparker@Eng.Sun.COM
To: Scott Lawrence <lawrence@agranat.com>
cc: tcp-impl@cthulhu.engr.sgi.com, "Alan M. Carroll" <amc@cisco.com>,
        Bradley Frank <baf@cisco.com>
Subject: Re: using RST to indicate busy 
Date: Tue, 24 Mar 1998 14:19:34 -0800
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


-   If a maximum number of current and pending inbound connections has
-   been exceeded, then sending an RST from the server is, it seems to me,
-   the right thing to do to refuse a new offered connection.  This is what
-   our embedded web server does (typically it is configured with a maximum
-   of between 8 and 16 connections).  
- 
-   However, there is no implication that a retry will succeed, and retrying
-   from the client seems incorrect to me (the RST should be interpreted as 
-   a refusal).

The reality is that RST gets reflected back to the user as ECONNREFUSED,
and in general is normally interpretable as "no service is available".
You seem to be describing what I consider a transient condition, and ignoring
the packet is the best way I can see to cause the right thing:  slow but
not prevent service.  Lots of applications, I think, help encourage the
user to treat RSTs back as 'no server is out there doing work'.

Making the sender timeout and retransmit the SYN seems to me *much* better
at accomplishing this.



Cheers,

	~sparker

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 14:31:18 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id OAA3269695
	for tcp-impl-list;
	Tue, 24 Mar 1998 14:28:18 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id OAA3315037
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 14:28:17 -0800 (PST)
Received: from frantic.bsdi.com (frantic.BSDI.COM [205.230.227.254]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id OAA20766
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Mar 1998 14:28:14 -0800 (PST)
	mail_from (dab@frantic.bsdi.com)
Received: (from dab@localhost)
	by frantic.bsdi.com (8.8.8/8.8.8) id QAA00459
	for tcp-impl@cthulhu.engr.sgi.com; Tue, 24 Mar 1998 16:28:11 -0600 (CST)
Date: Tue, 24 Mar 1998 16:28:11 -0600 (CST)
From: David Borman <dab@BSDI.COM>
Message-Id: <199803242228.QAA00459@frantic.bsdi.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: using RST to indicate busy
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> Subject: Re: using RST to indicate busy
> Date: Tue, 24 Mar 1998 14:02:25 PST
> From: Vern Paxson <vern@ee.lbl.gov>
> 
> >   If a maximum number of current and pending inbound connections has
> >   been exceeded, then sending an RST from the server is, it seems to me,
> >   the right thing to do to refuse a new offered connection.
> 
> Well, it seems busted to me.  There's already a retry mechanism built
> into SYN transmission, and it has backoff built into it, which is just
> what you want when you're busy; and there's plenty of code out there
> that actually believes it when you tell it "go away" with a RST, so
> you break connectivity with those apps; and there's the passage from
> end2end that Rich just passed along.
> 
> If RFC p. 37 gives us chapter and verse on this, then I'm certainly
> inclined to document it as a known implementation problem.

Page 37 is the state transition diagram, pp. 66-67 is the
explicit text:

SEGMENT ARRIVES
...
    If the state is SYN-SENT then
            
      first check the ACK bit

        If the ACK bit is set
        
          If SEG.ACK =< ISS, or SEG.ACK > SND.NXT, send a reset (unless
          the RST bit is set, if so drop the segment and return)

            <SEQ=SEG.ACK><CTL=RST>              

          and discard the segment.  Return.
  
          If SND.UNA =< SEG.ACK =< SND.NXT then the ACK is acceptable.
                 
      second check the RST bit

        If the RST bit is set
        
          If the ACK was acceptable then signal the user "error:
          connection reset", drop the segment, enter CLOSED state,
          delete TCB, and return.  Otherwise (no ACK) drop the segment
          and return.

It seems pretty clear to me that if you get a RST in response
to the SYN, you drop the connection and return an error back
to the user.  I agree with Vern, lets document this as an
implementation problem.

If the server really wants to say "I'm busy, go away and don't
bother me any more!", then I guess it is ok for it to return an
RST.  However, if the server wants to be more polite, and tell
the client "I'm busy, try again in a bit" then they should
silently drop the SYN and let the client retransmit.

		-David Borman, dab@bsdi.com

BTW, can anyone verify whether or not the RST packet coming
from windows has the ACK bit set?  As I re-read the text, if
the ACK bit is not set, then the RST is to be silently dropped,
and the connection left intact, retransmitting the SYN.


From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 14:49:06 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id OAA2934255
	for tcp-impl-list;
	Tue, 24 Mar 1998 14:47:00 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id OAA3315159
	for <TCP-IMPL@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 14:46:54 -0800 (PST)
Received: from alcor.process.com (alcor.process.com [192.42.95.16]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id OAA28894
	for <TCP-IMPL@CTHULHU.ENGR.SGI.COM>; Tue, 24 Mar 1998 14:46:53 -0800 (PST)
	mail_from (VOLZ@PROCESS.COM)
Date:     Tue, 24 Mar 1998 17:46 -0500
From: VOLZ@PROCESS.COM (Bernie Volz)
Message-Id: <009C3AD537E1517E.85D4@PROCESS.COM>
To: dab@bsdi.com, TCP-IMPL@cthulhu.engr.sgi.com
Subject:  Re: using RST to indicate busy
X-VMS-To: SMTP%"dab@bsdi.com"
X-VMS-Cc: TCP-IMPL@CTHULHU.ENGR.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

David Borman, dab@bsdi.com, wrote:
>BTW, can anyone verify whether or not the RST packet coming
>from windows has the ACK bit set?  As I re-read the text, if
>the ACK bit is not set, then the RST is to be silently dropped,
>and the connection left intact, retransmitting the SYN.

Yes, I can tell ...

The ACK bit *IS* set. (RST ! ACK are set.)

If folks want traces to document this, it is real easy for me to provide
them. Let me know.

- Bernie Volz
  Process Software

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 14:55:03 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id OAA2539870
	for tcp-impl-list;
	Tue, 24 Mar 1998 14:53:32 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id OAA3293748
	for <TCP-IMPL@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 14:53:31 -0800 (PST)
Received: from alcor.process.com (alcor.process.com [192.42.95.16]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id OAA01613
	for <TCP-IMPL@CTHULHU.ENGR.SGI.COM>; Tue, 24 Mar 1998 14:53:29 -0800 (PST)
	mail_from (VOLZ@PROCESS.COM)
Date:     Tue, 24 Mar 1998 17:52 -0500
From: VOLZ@PROCESS.COM (Bernie Volz)
Message-Id: <009C3AD6244A37F6.85D4@PROCESS.COM>
To: dab@bsdi.com, TCP-IMPL@cthulhu.engr.sgi.com
Subject:  Re: using RST to indicate busy
X-VMS-To: SMTP%"dab@bsdi.com"
X-VMS-Cc: TCP-IMPL@CTHULHU.ENGR.SGI.COM
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Here's a trace of what happens:

17:41:25.017906 clienta > server: S 607990879:607990879(0) win 4096 <mss 1460,wscale 0,eol> (DF)
17:41:25.017906 server > clienta: S 175475371:175475371(0) ack 607990880 win 8760 <mss 1460> (DF)
17:41:25.017906 clienta > server: . ack 1 win 4096 (DF)
17:41:27.577887 clientb > server: S 608790873:608790873(0) win 4096 <mss 1460,wscale 0,eol> (DF)
17:41:27.577887 server > clientb: R 0:0(0) ack 608790874 win 0

clienta is one connection (successful) from a client. clientb is another
connection which is improperly reset because the backlog is full. server
is the server (from the seattle company). (Note that several connections
were made BEFORE the RST one to fill the backlog.)

- Bernie Volz
  Process Software

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 15:25:17 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id PAA3259290
	for tcp-impl-list;
	Tue, 24 Mar 1998 15:23:17 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id PAA3308697
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 15:23:15 -0800 (PST)
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id PAA14185
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Mar 1998 15:23:12 -0800 (PST)
	mail_from (braden@ISI.EDU)
From: braden@ISI.EDU
Received: from can.isi.edu (can.isi.edu [128.9.160.148])
	by zephyr.isi.edu (8.8.7/8.8.6) with SMTP id PAA08113;
	Tue, 24 Mar 1998 15:23:11 -0800 (PST)
Date: Tue, 24 Mar 98 15:22:40 PST
Posted-Date: Tue, 24 Mar 98 15:22:40 PST
Message-Id: <9803242322.AA00986@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA00986>; Tue, 24 Mar 98 15:22:40 PST
To: oleg@prodigy.net
Subject: Lest we forget...
Cc: tcp-impl@cthulhu.engr.sgi.com
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

  *> 
  *> I believe Microsoft's NT 4.0's TCP/IP stack  implementation
  *> does that. No, you are not missing anything. That's Microsoft.
  *> They adopted "embrace and extend" strategy towards
  *> Internet and interoperability.
  *> 
  *> Oleg Vishnepolsky
  *>

T'was ever thus!  During the time frame 1985-1988, roughly, there was a
certain amount of warfare between the BSD authors in Berkeley and the
Internet community over a series of analogous issues.  In the end, the
Internet religion mostly won out; the resulting compromises are
recorded in the Host Requirements RFC.

Bob Braden

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 16:45:18 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id QAA3279320
	for tcp-impl-list;
	Tue, 24 Mar 1998 16:43:09 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id QAA3338710
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 16:43:07 -0800 (PST)
Received: from poptart.corp.home.net (poptart.svr.home.net [24.0.26.24]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id QAA12772
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Mar 1998 16:43:06 -0800 (PST)
	mail_from (rja@corp.home.net)
Received: from borg.eos.home.net ([24.0.16.111]) by poptart.corp.home.net
          (Netscape Mail Server v2.02) with ESMTP id AAA19884;
          Tue, 24 Mar 1998 16:42:49 -0800
Received: (from rja@localhost)
	by borg.eos.home.net (8.8.5/8.8.5) id QAA21967;
	Tue, 24 Mar 1998 16:42:49 -0800 (PST)
From: rja@corp.home.net (Ran Atkinson)
Message-Id: <980324164248.ZM21965@borg.eos.home.net>
Date: Tue, 24 Mar 1998 16:42:48 -0800
X-Mailer: Z-Mail (4.0.1 13Jan97)
To: tcp-impl@cthulhu.engr.sgi.com
Subject: TCP over MCNS
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

TCP Folks,

	As a science question (unrelated to the matter of
how many initial segments to allow), it would be interesting
to understand how TCP behaves over an MCNS-standards-compliant
cable modem system with its MAC layer and asymmetric bw.  Such
a system would be different than the Bay/LANcity product in the
lower layers.

	I recognise that this ought not be particularly different than
other link layers, but it would be interesting to see simulation of this
configuration none the less.

Ran
rja@home.net

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 24 16:53:00 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id QAA3340578
	for tcp-impl-list;
	Tue, 24 Mar 1998 16:51:39 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id QAA3339552
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Tue, 24 Mar 1998 16:51:37 -0800 (PST)
Received: from strato-fe0.ultra.net (strato-fe0.ultra.net [146.115.8.190]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id QAA15509
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 24 Mar 1998 16:51:35 -0800 (PST)
	mail_from (backman@ultranet.com)
Received: from boss (d2.dial-5.cmb.ma.ultra.net [209.6.68.2]) by strato-fe0.ultra.net (8.8.8/ult.n14767) with SMTP id TAA25246; Tue, 24 Mar 1998 19:51:22 -0500 (EST)
Reply-To: "Larry Backman" <backman@ultranet.com>
From: "Larry Backman" <backman@ultranet.com>
To: "W. Richard Stevens" <rstevens@kohala.com>, "Doug Drew" <floop@cisco.com>,
        <tcp-impl@cthulhu.engr.sgi.com>
Cc: "Alan M. Carroll" <amc@cisco.com>, "Bradley Frank" <baf@cisco.com>
Subject: Re: using RST to indicate busy
Date: Tue, 24 Mar 1998 19:47:09 -0500
Message-ID: <01bd5787$8ac045e0$a67c7f80@boss>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 4.71.1712.3
X-MimeOLE: Produced By Microsoft MimeOLE V4.71.1712.3
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

|>
|> I discussed this with some Microsoft folks when Windows NT first
came
|> out. They said that was the way they intended it to be and that was
the
|> way it would be. Period. (Perhaps they feel big enough to set their
own
|> standards, it wouldn't be the first time.)
|

I had the same discussion with probably the same suspects as part of
the Winsock wars about 2 years back.
As a 3rd party stack in Microsoft's world we were told to just like
you - this was the way it was, and to deal with it; ie. emulate it in
our stack or get crushed.


We got crushed....


From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 25 01:17:45 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id BAA3485382
	for tcp-impl-list;
	Wed, 25 Mar 1998 01:15:46 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id BAA3466322
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Wed, 25 Mar 1998 01:15:44 -0800 (PST)
Received: from mail2.mailsorter.net (mail2.mailsorter.net [207.67.128.17] (may be forged)) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id BAA15648
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Mar 1998 01:15:43 -0800 (PST)
	mail_from (marke@muttsnuts.com)
Received: from bursar ([194.69.110.209]) by mail2.mailsorter.net
          (Netscape Mail Server v2.02) with SMTP id AAA4999
          for <tcp-impl@cthulhu.engr.sgi.com>;
          Wed, 25 Mar 1998 01:15:40 -0800
Message-Id: <3.0.5.32.19980325091025.0085e100@mail.muttsnuts.com>
X-Sender: markemuttsnuts@mail.muttsnuts.com (Unverified)
X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.5 (32)
Date: Wed, 25 Mar 1998 09:10:25 +0000
To: tcp-impl@cthulhu.engr.sgi.com
From: "Mark S. Edwards" <marke@muttsnuts.com>
Subject: Re: using RST to indicate busy
In-Reply-To: <F96684A2CC2FD111BD8A00A0C92A1A145D26@gisg-bdc.cisco.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Yup, this caught me out when I wrote a security app to isolate parts of the
network.  I wanted to stop unauthorised hosts accessing particular machines.

The most obvious thing to do was fake a RST back to the host sending the
SYN.  This worked fine until the stack you mention was being used on the
remote host.  I ended up having to keep track of conversation states and
waiting for the ACK SYN before faking a RST to both hosts.

I wasn't happy.

Mark.


At 14:52 24/03/98 -0600, Doug Drew wrote:
>
>I have seen a network trace of the following scenario:
>
>An HTTP client sends a SYN to port 80 on a server.  The server
>returns a RST.  The client waits half a second, then sends another
>SYN with the same sequence number.  The server returns a
>SYN-ACK, and the conversation proceeds as normal.
>
>Did I miss a meeting?  Is the server using the RST to
>indicate "I'm busy, try again soon" to the client?  Is this behavior
>a) documented and 2) permitted?
>
>I found that the most recent stack that I have from a large
>software company near Seattle does indeed retransmit SYNs
>at half second intervals when it receives a RST.  It does this
>three times before giving up.
>
>It is much more complicated to experimentally reproduce
>the server behavior, which is what I'm really asking about.
>
>Doug Drew - floop@cisco.com
>Software Engineer - Centri Engineering Group - Cisco Systems, Inc.
>Champaign, IL voice: (217) 363-4514 fax: (217) 363-4599
>
>
>
>

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 25 07:22:05 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id HAA3580641
	for tcp-impl-list;
	Wed, 25 Mar 1998 07:19:46 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id HAA3564547
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Wed, 25 Mar 1998 07:19:45 -0800 (PST)
Received: from firewall.agranat.com (agranat.com [198.113.147.2]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id HAA07849
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Mar 1998 07:19:44 -0800 (PST)
	mail_from (lawrence@agranat.com)
Received: from agranat.com (alice [192.104.71.130]) by firewall.agranat.com (8.6.12/8.6.9) with ESMTP id KAA09669; Wed, 25 Mar 1998 10:19:42 -0500
Received: from localhost (lawrence@localhost)
	by agranat.com (8.8.5/8.8.5) with SMTP id KAA15433;
	Wed, 25 Mar 1998 10:19:41 -0500
Date: Wed, 25 Mar 1998 10:19:41 -0500 (EST)
From: Scott Lawrence <lawrence@agranat.com>
To: sparker@Eng.Sun.COM
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: using RST to indicate busy 
In-Reply-To: <199803242219.OAA11376@fstop.>
Message-ID: <Pine.LNX.3.96.980325101759.15420A-100000@alice.agranat.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> 
> -   If a maximum number of current and pending inbound connections has
> -   been exceeded, then sending an RST from the server is, it seems to me,
> -   the right thing to do to refuse a new offered connection.  This is what
> -   our embedded web server does (typically it is configured with a maximum
> -   of between 8 and 16 connections).  

On Tue, 24 Mar 1998 sparker@Eng.Sun.COM wrote:

> The reality is that RST gets reflected back to the user as ECONNREFUSED,
> and in general is normally interpretable as "no service is available".
> You seem to be describing what I consider a transient condition, and ignoring
> the packet is the best way I can see to cause the right thing:  slow but
> not prevent service.  Lots of applications, I think, help encourage the
> user to treat RSTs back as 'no server is out there doing work'.
 
> Making the sender timeout and retransmit the SYN seems to me *much* better
> at accomplishing this.

  Thanks to all of you who pointed this out - as is often the case,
  simpler is better.  I've made the change in our stack to make this
  distinction and drop on resource constraints rather than RST.


From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 25 15:39:58 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id PAA3127697
	for tcp-impl-list;
	Wed, 25 Mar 1998 15:37:57 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id PAA3723475
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Wed, 25 Mar 1998 15:37:55 -0800 (PST)
Received: from dm.cobaltmicro.com (dm.cobaltmicro.com [209.133.34.35]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id PAA04005
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Mar 1998 15:37:55 -0800 (PST)
	mail_from (davem@dm.cobaltmicro.com)
Received: (from davem@localhost)
	by dm.cobaltmicro.com (8.8.7/8.8.7) id PAA01546;
	Wed, 25 Mar 1998 15:34:54 -0800
Date: Wed, 25 Mar 1998 15:34:54 -0800
Message-Id: <199803252334.PAA01546@dm.cobaltmicro.com>
From: "David S. Miller" <davem@dm.cobaltmicro.com>
To: tcp-impl@cthulhu.engr.sgi.com
Subject: possible bug in PAWS
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


In the cases where an implementation uses fine grained timestamps
for it's RFC1323 timestamp implementation, PAWS does a lot of stupid
things if segments are re-ordered at all.  Here is one trace example
of the behavior in question:

14:44:04.222503 dm.ftp-data > fis.1451: . 377929:379377(1448) ack 1 win 31856 <nop,nop,timestamp 201736 16805698> (DF) [tos 0x8]
14:44:04.236085 fis.edu.1451 > dm.ftp-data: . ack 350417 win 30408 <nop,nop,timestamp 16805700 201729,nop,nop, sack 3 {373585:375033}{369241:372137}{351865:367793} > (DF) [tos 0x8]
14:44:04.236129 dm.ftp-data > fis.1451: . 379377:380825(1448) ack 1 win 31856 <nop,nop,timestamp 201737 16805700> (DF) [tos 0x8]

All is OK so far, then the trouble begins:

14:44:04.254084 fis.1451 > dm.ftp-data: . ack 350417 win 30408 <nop,nop,timestamp 16805702 201730,nop,nop, sack 2 {369241:376481}{351865:367793} > (DF) [tos 0x8]
14:44:04.257324 fis.1451 > dm.ftp-data: . ack 350417 win 30408 <nop,nop,timestamp 16805701 201729,nop,nop, sack 2 {369241:375033}{351865:367793} > (DF) [tos 0x8]

These two segments were reordered (by whatever means) by the network.
It is obvious for two reasons:

1) the timestamp sent by 'fis' in this trace decreases (also the
   echo'd one's do too...)
2) the SACK options in the first ACK report further progress
   in the reassembly queue than the second one does

14:44:04.257357 dm.ftp-data > fis.1451: . ack 1 win 31856 <nop,nop,timestamp 201739 16805702> (DF) [tos 0x8]

The second ack gets dropped, due to the PAWS test, and an ACK is
returned as RFC1323 says should happen.

I only noticed this weird behavior with fine grained timestamps and
ACK's from a receiver.

What happens when this happens to data segments?  This connection path
being traced does reorder segments often, so I bet I can catch such a
scenerio in action quite easy with some trying. (ie. senders data
bearing packets get reordered, and the PAWS test drops them at the
receiver).

The end result of this, is that over this link the bandwidth realized
for bulk data transfer is halfed.

Comments?  Is this a true flaw in PAWS as specified or did I miss
something?

Later,
David S. Miller
davem@dm.cobaltmicro.com

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 25 16:14:01 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id QAA3795282
	for tcp-impl-list;
	Wed, 25 Mar 1998 16:11:15 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id QAA3789518
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Wed, 25 Mar 1998 16:11:14 -0800 (PST)
Received: from zephyr.isi.edu (zephyr.isi.edu [128.9.160.160]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id QAA17163
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Mar 1998 16:11:13 -0800 (PST)
	mail_from (braden@ISI.EDU)
From: braden@ISI.EDU
Received: from can.isi.edu (can.isi.edu [128.9.160.148])
	by zephyr.isi.edu (8.8.7/8.8.6) with SMTP id QAA11863;
	Wed, 25 Mar 1998 16:11:11 -0800 (PST)
Date: Wed, 25 Mar 98 16:10:39 PST
Posted-Date: Wed, 25 Mar 98 16:10:39 PST
Message-Id: <9803260010.AA01921@can.isi.edu>
Received: by can.isi.edu (4.1/4.0.3-6)
	id <AA01921>; Wed, 25 Mar 98 16:10:39 PST
To: tcp-impl@cthulhu.engr.sgi.com, davem@dm.cobaltmicro.com
Subject: Re: possible bug in PAWS
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Oh, dear.  That problem.  It makes my head hurt to think
about it.

Bob Braden

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 25 16:28:42 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id QAA3804064
	for tcp-impl-list;
	Wed, 25 Mar 1998 16:26:19 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id QAA3801226
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Wed, 25 Mar 1998 16:26:17 -0800 (PST)
Received: from red.juniper.net (red.juniper.net [208.197.169.254]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id QAA22674
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Mar 1998 16:26:16 -0800 (PST)
	mail_from (skibo@juniper.net)
Received: from shark.juniper.net (shark.juniper.net [208.197.169.201])
	by red.juniper.net (8.8.5/8.8.5) with ESMTP id QAA11750;
	Wed, 25 Mar 1998 16:26:16 -0800 (PST)
Received: from shark.juniper.net (localhost.juniper.net [127.0.0.1]) by shark.juniper.net (8.8.7/8.7.3) with ESMTP id QAA25088; Wed, 25 Mar 1998 16:26:15 -0800 (PST)
Message-Id: <199803260026.QAA25088@shark.juniper.net>
X-Mailer: exmh version 2.0.2 2/24/98
To: "David S. Miller" <davem@dm.cobaltmicro.com>
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: possible bug in PAWS 
In-reply-to: Your message of "Wed, 25 Mar 1998 15:34:54 PST."
             <199803252334.PAA01546@dm.cobaltmicro.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 25 Mar 1998 16:26:15 -0800
From: Thomas Skibo <skibo@juniper.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk



> 
> What happens when this happens to data segments?  This connection path
> being traced does reorder segments often, so I bet I can catch such a
> scenerio in action quite easy with some trying. (ie. senders data
> bearing packets get reordered, and the PAWS test drops them at the
> receiver).

Forgive me if I'm a bit rusty.  I think the difference is that in
data segments the TS.RECENT would not be updated by the first
out of order segment because the segment wouldn't advance the
"left" side of the window.  Therefore, the time-stamp of the second
out of order segment (the one supposedly transmitted FIRST) would
not be less than TS.RECENT and so it wouldn't be dropped by PAWS.

-Skibo


-- 
Thomas Skibo		Juniper Networks, Inc.		skibo@jnx.com



From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 25 16:41:35 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id QAA3824282
	for tcp-impl-list;
	Wed, 25 Mar 1998 16:38:05 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id QAA3769702
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Wed, 25 Mar 1998 16:38:01 -0800 (PST)
Received: from dm.cobaltmicro.com (dm.cobaltmicro.com [209.133.34.35]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id QAA26873
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Mar 1998 16:37:59 -0800 (PST)
	mail_from (davem@dm.cobaltmicro.com)
Received: (from davem@localhost)
	by dm.cobaltmicro.com (8.8.7/8.8.7) id QAA02135;
	Wed, 25 Mar 1998 16:34:53 -0800
Date: Wed, 25 Mar 1998 16:34:53 -0800
Message-Id: <199803260034.QAA02135@dm.cobaltmicro.com>
From: "David S. Miller" <davem@dm.cobaltmicro.com>
To: skibo@juniper.net
CC: tcp-impl@cthulhu.engr.sgi.com
In-reply-to: <199803260026.QAA25088@shark.juniper.net> (message from Thomas
	Skibo on Wed, 25 Mar 1998 16:26:15 -0800)
Subject: Re: possible bug in PAWS
References:  <199803260026.QAA25088@shark.juniper.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   Date: Wed, 25 Mar 1998 16:26:15 -0800
   From: Thomas Skibo <skibo@juniper.net>

   Forgive me if I'm a bit rusty.  I think the difference is that in
   data segments the TS.RECENT would not be updated by the first out
   of order segment because the segment wouldn't advance the "left"
   side of the window.

You're correct, thanks for stating this.

Then as far as I can tell, the bogus PAWS drops can only happen for
pure-ACK packets.  This has one major problem (and this happens to by
why I noticed the situation in the first place), these dropped ACK's
at the sender mess with fast retransmission, ie. feedback information
from duplicate ACKs is lost.

I guess it would be nice to mention in an updated standards document
which deals with timestamps, to mention something along the lines of
"A host SHOULD not allow the timestamp to increment at a rate faster
than XXX or else it will be subsceptable to the following problem..."

Later,
David S. Miller
davem@dm.cobaltmicro.com

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 25 16:54:11 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id QAA3798412
	for tcp-impl-list;
	Wed, 25 Mar 1998 16:51:23 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id QAA3821742
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Wed, 25 Mar 1998 16:51:21 -0800 (PST)
Received: from viner (viner.ento.vt.edu [128.173.215.41]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id QAA01287
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Mar 1998 16:51:20 -0800 (PST)
	mail_from (huddle@mci.net)
Received: from shuddle.reston.mci.net by viner (SMI-8.6/SMI-SVR4)
	id TAA02451; Wed, 25 Mar 1998 19:37:37 -0500
Message-Id: <3.0.5.32.19980325192622.016e55d0@postoffice.res.mci.net>
X-Sender: huddle@postoffice.res.mci.net
X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.5 (32)
Date: Wed, 25 Mar 1998 19:26:22 +0000
To: mpls@external.cisco.com
From: Scott Huddle <huddle@mci.net>
Subject: Re: IP Over Sonet Considered Harmful?
Cc: tcp-impl@cthulhu.engr.sgi.com
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

This is not an MPLS issue, it is a broken TCP implementation
issue.  Note cc'es.

-scott

>Date: Wed, 25 Mar 1998 08:23:04 -0800
>From: alan@isi.net
>Subject: Re: IP Over Sonet Considered Harmful?
>Sender: owner-mpls@netlab.indiana.edu
>To: Sean Doran <smd@ebone.net>
>Cc: mpls@external.cisco.com
>X-Mailer: Mutt 0.88
>X-Authentication-warning: stone51.netlab.indiana.edu: majordom set sender to
> owner-mpls@netlab.indiana.edu using -f
>
>
>> |   The only significant reason is the mswindows 95 ttl of 32.
>> |
>> |   If we design a standard that ignores this cold hard fact, we
>> |   must deal with the consequences.  Not sure I can predict what
>> |   those consequences are, but I'd prefer to not find out.
>> 
>> Why are you on this strange Jihad?
>> 
>> If you are so nervous about the diameter of the network,
>> and you are totally convinced that the Windows 95 systems
>> are unfixable, you might choose to push for a campaign to have the
>> very first-hop router connecting a Windows 95 end-system rewrite
>> the ttl from 32 to 64 or somesuch.
>
>  I am on this 'strange' discussion because I am a pragmatist.  I don't
>  believe this is a Jihad but rather a practical view on events.
>
>  Your proposal to make the first hop router up the TTL is
>  cumbersome and impractical to implement.  As well, it gives rise
>  to configuration error which could induce loops.
>
>> |   Look in my email for the reasoning why each NSP should decrement
>> |   no more than 8 TTLs, and why this is difficult with proliferant LSR
>> |   hops.
>> 
>> Your reasoning makes no sense.
>
>  Could you please support this assertion with your observations?
>
>> Again, if a too-low initial TTL is the problem, the
>> solution is to fix the initial TTL as close to the
>> originating end-system as possible (i.e., *in* the 
>> originating end-system, or at the first possible
>> intermediate system -- this seems like an ideal task for a NAT).
>
>  It is not practical for me to induce L3 ttl degradation, and when
>  my customers complain, say A. Fix your win95 or B. put in a NAT.
>  They will go to another provider.  This is the pragmatic side.
>  Running code and all that.
>
>> Reducing the TTL by at least one in every device which
>> can potentially participate within a forwarding loop
>> is good self-defensive engineering.
>
>  Yep.  Agreed, and if we had enough TTL to play with I'd support
>  you 100%.  But the (sig.) LCD shows that we don't.
>
>  Why are you on this strange Jihad to eat TTL in every piece of
>  electronics on the earth?
>
>  -a
>
>
>

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 25 16:54:11 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id QAA3814426
	for tcp-impl-list;
	Wed, 25 Mar 1998 16:51:51 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id QAA1511628
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Wed, 25 Mar 1998 16:51:49 -0800 (PST)
Received: from red.juniper.net (red.juniper.net [208.197.169.254]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id QAA01415
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Mar 1998 16:51:48 -0800 (PST)
	mail_from (skibo@juniper.net)
Received: from shark.juniper.net (shark.juniper.net [208.197.169.201])
	by red.juniper.net (8.8.5/8.8.5) with ESMTP id QAA12323;
	Wed, 25 Mar 1998 16:51:48 -0800 (PST)
Received: from shark.juniper.net (localhost.juniper.net [127.0.0.1]) by shark.juniper.net (8.8.7/8.7.3) with ESMTP id QAA25168; Wed, 25 Mar 1998 16:51:47 -0800 (PST)
Message-Id: <199803260051.QAA25168@shark.juniper.net>
X-Mailer: exmh version 2.0.2 2/24/98
To: "David S. Miller" <davem@dm.cobaltmicro.com>
cc: skibo@juniper.net, tcp-impl@cthulhu.engr.sgi.com
Subject: Re: possible bug in PAWS 
In-reply-to: Your message of "Wed, 25 Mar 1998 16:34:53 PST."
             <199803260034.QAA02135@dm.cobaltmicro.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 25 Mar 1998 16:51:47 -0800
From: Thomas Skibo <skibo@juniper.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> 
> Then as far as I can tell, the bogus PAWS drops can only happen for
> pure-ACK packets.  This has one major problem (and this happens to by
> why I noticed the situation in the first place), these dropped ACK's
> at the sender mess with fast retransmission, ie. feedback information
> from duplicate ACKs is lost.

I wouldn't consider it a big deal that fast retransmit gets hosed if
the ACKs coming back are being reordered.  If you've got a lot of
reordering happening, you're not going to perform well anyway.  The
retransmits will fall back on good-ol'-fashioned time-outs (which
presumably you're improving because you appear to have your
TCP timers running as at a finer granularity!).

-Skibo


-- 
Thomas Skibo		Juniper Networks, Inc.		skibo@jnx.com



From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 25 19:22:25 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id TAA3896313
	for tcp-impl-list;
	Wed, 25 Mar 1998 19:18:23 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id TAA3892249
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Wed, 25 Mar 1998 19:18:19 -0800 (PST)
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id TAA12683
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Mar 1998 19:18:18 -0800 (PST)
	mail_from (aron@cs.rice.edu)
Received: (from aron@localhost)
          by cs.rice.edu (8.8.5/8.8.4)
	  id VAA10826; Wed, 25 Mar 1998 21:18:03 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Message-Id: <199803260318.VAA10826@cs.rice.edu>
Subject: Re: possible bug in PAWS
To: davem@dm.cobaltmicro.com (David S. Miller)
Date: Wed, 25 Mar 1998 21:18:03 -0600 (CST)
Cc: skibo@juniper.net, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199803260034.QAA02135@dm.cobaltmicro.com> from "David S. Miller" at Mar 25, 98 04:34:53 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> 
> I guess it would be nice to mention in an updated standards document
> which deals with timestamps, to mention something along the lines of
> "A host SHOULD not allow the timestamp to increment at a rate faster
> than XXX or else it will be subsceptable to the following problem..."
> 


I don't know the granularity of timestamps that you're using, but RFC 1323
does suggest a safe finest granularity of 1ms for the timestamp clock.



- Mohit

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 25 22:07:03 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id WAA3832972
	for tcp-impl-list;
	Wed, 25 Mar 1998 22:06:42 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id WAA3940547
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Wed, 25 Mar 1998 22:06:40 -0800 (PST)
Received: from dm.cobaltmicro.com (dm.cobaltmicro.com [209.133.34.35]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id WAA18667
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Mar 1998 22:06:40 -0800 (PST)
	mail_from (davem@dm.cobaltmicro.com)
Received: (from davem@localhost)
	by dm.cobaltmicro.com (8.8.7/8.8.7) id WAA01453;
	Wed, 25 Mar 1998 22:03:38 -0800
Date: Wed, 25 Mar 1998 22:03:38 -0800
Message-Id: <199803260603.WAA01453@dm.cobaltmicro.com>
From: "David S. Miller" <davem@dm.cobaltmicro.com>
To: aron@cs.rice.edu
CC: skibo@juniper.net, tcp-impl@cthulhu.engr.sgi.com
In-reply-to: <199803260318.VAA10826@cs.rice.edu> (message from Mohit Aron on
	Wed, 25 Mar 1998 21:18:03 -0600 (CST))
Subject: Re: possible bug in PAWS
References:  <199803260318.VAA10826@cs.rice.edu>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   From: Mohit Aron <aron@cs.rice.edu>
   Date: Wed, 25 Mar 1998 21:18:03 -0600 (CST)

   I don't know the granularity of timestamps that you're using, but
   RFC 1323 does suggest a safe finest granularity of 1ms for the
   timestamp clock.

Yes, I just read this too.  On every architecture other than the
Alpha, the granularity is 10ms, on the Alpha it is 1ms (it's the
system wide ticker which I am using in the kernel, and the Alpha is
the one platform which uses a value of HZ other than 100).

So I believe I'm within the suggested granularity ;-)

Later,
David S. Miller
davem@dm.cobaltmicro.com

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 25 22:07:03 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id WAA3912873
	for tcp-impl-list;
	Wed, 25 Mar 1998 22:04:43 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id WAA3884205
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Wed, 25 Mar 1998 22:04:41 -0800 (PST)
Received: from dm.cobaltmicro.com (dm.cobaltmicro.com [209.133.34.35]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id WAA17787
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Mar 1998 22:04:41 -0800 (PST)
	mail_from (davem@dm.cobaltmicro.com)
Received: (from davem@localhost)
	by dm.cobaltmicro.com (8.8.7/8.8.7) id WAA01443;
	Wed, 25 Mar 1998 22:01:39 -0800
Date: Wed, 25 Mar 1998 22:01:39 -0800
Message-Id: <199803260601.WAA01443@dm.cobaltmicro.com>
From: "David S. Miller" <davem@dm.cobaltmicro.com>
To: skibo@juniper.net
CC: tcp-impl@cthulhu.engr.sgi.com
In-reply-to: <199803260051.QAA25168@shark.juniper.net> (message from Thomas
	Skibo on Wed, 25 Mar 1998 16:51:47 -0800)
Subject: Re: possible bug in PAWS
References:  <199803260051.QAA25168@shark.juniper.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   Date: Wed, 25 Mar 1998 16:51:47 -0800
   From: Thomas Skibo <skibo@juniper.net>

   I wouldn't consider it a big deal that fast retransmit gets hosed
   if the ACKs coming back are being reordered.  If you've got a lot
   of reordering happening, you're not going to perform well anyway.
   The retransmits will fall back on good-ol'-fashioned time-outs
   (which presumably you're improving because you appear to have your
   TCP timers running as at a finer granularity!).

If the dropped ACK's cause the "ACK stream" to dry up too quickly, and
thus cause fast recovery to not complete before a timeout, then it is
a big deal.  I lose an RTT, the pipe drains, and I enter slow start.
What if this is a link over a satellite where the B*D is huge and I
lose the whole window due to the timeout?

I actually perform pretty well on this particular connection, even
with all the reordering, but more so if I add a workaround for the
PAWS out-of-order ack drops.

Timeouts are a last resort in my opinion, to ensure reliably data
delivery in extreme cases of loss and congestion.  That isn't
happening here, every couple packets in the stream get reordered, TCP
should perform decently in such a case.

Later,
David S. Miller
davem@dm.cobaltmicro.com

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 25 22:32:22 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id WAA3943378
	for tcp-impl-list;
	Wed, 25 Mar 1998 22:30:21 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id WAA3931288
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Wed, 25 Mar 1998 22:30:19 -0800 (PST)
Received: from daffy.ee.lbl.gov (daffy.ee.lbl.gov [131.243.1.31]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id WAA23507
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Mar 1998 22:30:19 -0800 (PST)
	mail_from (vern@ee.lbl.gov)
Received: by daffy.ee.lbl.gov (8.8.8/8.8.5)
	id WAA00626; Wed, 25 Mar 1998 22:30:13 -0800 (PST)
Message-Id: <199803260630.WAA00626@daffy.ee.lbl.gov>
To: Thomas Skibo <skibo@juniper.net>
Cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: possible bug in PAWS 
In-reply-to: Your message of Wed, 25 Mar 1998 16:51:47 PST.
Date: Wed, 25 Mar 1998 22:30:13 PST
From: Vern Paxson <vern@ee.lbl.gov>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

> I wouldn't consider it a big deal that fast retransmit gets hosed if
> the ACKs coming back are being reordered.  If you've got a lot of
> reordering happening, you're not going to perform well anyway.

Reordered ACKs mean the congestion window opens more slowly (some of
the ACKs are ignored because the sequence number they're ack'ing is
too low).  It also means burstier transmissions, since the window
slides more when it does in fact slide.  Fast retransmission and
recovery still work fine, since a reordered duplicate ACK still looks
like a duplicate ACK (unless there's *major* reordering).

So I'd argue that fast retransmit getting hosed is significantly
worse than the performance loss you suffer from ACK reordering alone.

		Vern

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 25 23:03:40 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id XAA3845829
	for tcp-impl-list;
	Wed, 25 Mar 1998 23:02:02 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id XAA3919268
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Wed, 25 Mar 1998 23:02:01 -0800 (PST)
Received: from cs.rice.edu (cs.rice.edu [128.42.1.30]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id XAA29922
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Mar 1998 23:02:00 -0800 (PST)
	mail_from (aron@cs.rice.edu)
Received: (from aron@localhost)
          by cs.rice.edu (8.8.5/8.8.4)
	  id BAA13503; Thu, 26 Mar 1998 01:01:50 -0600 (CST)
From: Mohit Aron <aron@cs.rice.edu>
Message-Id: <199803260701.BAA13503@cs.rice.edu>
Subject: Re: possible bug in PAWS
To: davem@dm.cobaltmicro.com (David S. Miller)
Date: Thu, 26 Mar 1998 01:01:50 -0600 (CST)
Cc: skibo@juniper.net, tcp-impl@cthulhu.engr.sgi.com
In-Reply-To: <199803260603.WAA01453@dm.cobaltmicro.com> from "David S. Miller" at Mar 25, 98 10:03:38 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> 
> Yes, I just read this too.  On every architecture other than the
> Alpha, the granularity is 10ms, on the Alpha it is 1ms (it's the
> system wide ticker which I am using in the kernel, and the Alpha is
> the one platform which uses a value of HZ other than 100).
> 

Irrespective of the system clock, I think BSD based TCP implementations use
a timestamp granularity of 500ms. So I won't say that on every architecture
the granularity used is 10ms.



- Mohit

From owner-tcp-impl@relay.engr.sgi.com  Wed Mar 25 23:35:35 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id XAA3903885
	for tcp-impl-list;
	Wed, 25 Mar 1998 23:33:55 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id XAA3471341
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Wed, 25 Mar 1998 23:33:54 -0800 (PST)
Received: from smtp.on.rogers.wave.ca (smtp.on.rogers.wave.ca [24.112.32.20]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id XAA05659
	for <tcp-impl@cthulhu.engr.sgi.com>; Wed, 25 Mar 1998 23:33:53 -0800 (PST)
	mail_from (eschenk@pc-37249.bc.rogers.wave.ca)
Received: from pc-37249.bc.rogers.wave.ca ([24.113.51.193]) by smtp.on.rogers.wave.ca with ESMTP id <512819-22387>; Thu, 26 Mar 1998 02:32:44 -0500
Received: from pc-37249.bc.rogers.wave.ca (eschenk@localhost [127.0.0.1])
	by pc-37249.bc.rogers.wave.ca (8.8.7/8.8.7) with ESMTP id XAA20865;
	Wed, 25 Mar 1998 23:31:11 -0800
Message-Id: <199803260731.XAA20865@pc-37249.bc.rogers.wave.ca>
To: Mohit Aron <aron@cs.rice.edu>
cc: Eric Schenk <eschenk@rogers.wave.ca>,
        davem@dm.cobaltmicro.com (David S. Miller), skibo@juniper.net,
        tcp-impl@cthulhu.engr.sgi.com
Subject: Re: possible bug in PAWS 
In-reply-to: Your message of "Thu, 26 Mar 1998 01:01:50 CST."
             <199803260701.BAA13503@cs.rice.edu> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: 	Wed, 25 Mar 1998 23:31:11 -0800
From: "Eric Schenk" <eschenk@pc-37249.bc.rogers.wave.ca>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


Mohit Aron <aron@cs.rice.edu> writes:
>Irrespective of the system clock, I think BSD based TCP implementations use
>a timestamp granularity of 500ms. So I won't say that on every architecture
>the granularity used is 10ms.

Ah. Small misunderstanding. I believe I can safely speak for David and
say that what he meant is that of all the architectures Linux runs on,
it uses a 10ms granularity on everything but the Alpha, where it uses
a 1ms granularity (actually 1/1024 of a second last time I looked).

As you point out, BSD based code of course uses a 500ms clock for
retransmission timers, although I'm not sure that every BSD based
implementation of PAWS uses the same timer for it's the timestamps
as the 500ms tick. Anyone who has looked deeply at many of the
different BSD based implementations care to comment?

Cheers,

-- 
Eric Schenk                             www: http://www.loonie.net/~eschenk
                          email: eschenk@loonie.net, eschenk@rogers.wave.ca


From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 26 00:26:21 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id AAA3973058
	for tcp-impl-list;
	Thu, 26 Mar 1998 00:24:44 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id AAA3954314
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Thu, 26 Mar 1998 00:24:42 -0800 (PST)
Received: from zippy.psc.edu (zippy.psc.edu [128.182.61.149]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id AAA15254
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 26 Mar 1998 00:24:40 -0800 (PST)
	mail_from (mathis@psc.edu)
Received: from zippy.psc.edu (localhost [127.0.0.1]) by zippy.psc.edu (8.8.5/8.8.2) with ESMTP id DAA20750; Thu, 26 Mar 1998 03:24:37 -0500 (EST)
Message-Id: <199803260824.DAA20750@zippy.psc.edu>
To: "David S. Miller" <davem@dm.cobaltmicro.com>
cc: tcp-impl@cthulhu.engr.sgi.com, mathis@zippy.psc.edu
Subject: Re: possible bug in PAWS 
In-reply-to: Your message of "Wed, 25 Mar 1998 22:01:39 EST."
             <199803260601.WAA01443@dm.cobaltmicro.com> 
Date: Thu, 26 Mar 1998 03:24:37 -0500
From: "Matt Mathis" <mathis@psc.edu>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

Instead of a workaround, how about breaking new teritory.....?

If you compute the "forward acknowledgement", fack, to be the right
most sequence mentioned by any acknowledgement to date (including SACK
blocks and acknowledgement numbers).  I beleive that some of the timer
algorithms can be recast using the advancing fack instead of the
advancing acknowledgement numbers.  This might permit these algorithms
to continue to function during recovery.

This is a 2 line change for RTTM, where it would permit RTT
measurement at precisely the time when it is most needed.

This is less clear for PAWS, because it requires implementing a more
complex test for "in sequence".

For more information on Forward Acknowledgement see our SigComm'96
paper and the update at
http://www.psc.edu/networking/papers/FACKnotes/current/

--MM--

From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 26 07:24:23 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id HAA4060917
	for tcp-impl-list;
	Thu, 26 Mar 1998 07:22:39 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id HAA4086690
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Thu, 26 Mar 1998 07:22:37 -0800 (PST)
Received: from barrichello.ucr.edu (mail.cs.ucr.edu [138.23.169.107]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id HAA10728
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 26 Mar 1998 07:22:35 -0800 (PST)
	mail_from (vishnu@cs.ucr.edu)
Received: from hill.ucr.edu (vishnu@hill.ucr.edu [138.23.169.109])
	by barrichello.ucr.edu (8.8.8/8.8.8) with ESMTP id HAA18817;
	Thu, 26 Mar 1998 07:22:36 -0800
Received: from localhost (vishnu@localhost)
	by hill.ucr.edu (8.8.8/8.8.8) with SMTP id HAA09530;
	Thu, 26 Mar 1998 07:22:33 -0800
X-Authentication-Warning: hill.ucr.edu: vishnu owned process doing -bs
Date: Thu, 26 Mar 1998 07:22:32 -0800 (PST)
From: Natchu Vishnu Priya <vishnu@cs.ucr.edu>
Reply-To: Natchu Vishnu Priya <vishnu@cs.ucr.edu>
To: Vern Paxson <vern@ee.lbl.gov>
cc: Thomas Skibo <skibo@juniper.net>,
        "David S. Miller" <davem@dm.cobaltmicro.com>,
        tcp-impl@cthulhu.engr.sgi.com
Subject: Re: possible bug in PAWS 
In-Reply-To: <199803260630.WAA00626@daffy.ee.lbl.gov>
Message-ID: <Pine.LNX.3.96.980326071424.8803F-100000@hill.ucr.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

On Wed, 25 Mar 1998, Vern Paxson wrote:

> So I'd argue that fast retransmit getting hosed is significantly
> worse than the performance loss you suffer from ACK reordering alone.
> 

Another scenario which has to be considered..

if we have a data segment sent followed by a pure ack, and they reach out
of order the pure ack sent later will update the TS.Recent before the data
seg reaches.. the data seg will be dropped..
with networks which reorder segments this might lead to a significant
increase in dropped segments.....

looks like rule R3 is sec 4.2 of rfc1323 is hosed.
It reacts perfectly for data segments.. but for acks it does not have
sufficient information to check if the ack is new or old,

can we not use the TSecr of the acks to figure out if they are new or old,
and not modifiy TS.Recent at all (i.e. leave the TS.Recent change rule as
is done for RTTM)?

> 		Vern
> 
vishnu

ps: Dave, has the code for SACK, PAWS been added to linux 2.1.x.


From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 26 10:03:54 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id KAA4022746
	for tcp-impl-list;
	Thu, 26 Mar 1998 10:00:30 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id KAA4147930
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Thu, 26 Mar 1998 10:00:27 -0800 (PST)
Received: from red.juniper.net (red.juniper.net [208.197.169.254]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id KAA12580
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 26 Mar 1998 10:00:23 -0800 (PST)
	mail_from (skibo@juniper.net)
Received: from shark.juniper.net (shark.juniper.net [208.197.169.201])
	by red.juniper.net (8.8.5/8.8.5) with ESMTP id KAA27815;
	Thu, 26 Mar 1998 10:00:22 -0800 (PST)
Received: from shark.juniper.net (localhost.juniper.net [127.0.0.1]) by shark.juniper.net (8.8.7/8.7.3) with ESMTP id KAA26673; Thu, 26 Mar 1998 10:00:21 -0800 (PST)
Message-Id: <199803261800.KAA26673@shark.juniper.net>
X-Mailer: exmh version 2.0.2 2/24/98
To: Natchu Vishnu Priya <vishnu@cs.ucr.edu>
cc: tcp-impl@cthulhu.engr.sgi.com
Subject: Re: possible bug in PAWS 
In-reply-to: Your message of "Thu, 26 Mar 1998 07:22:32 PST."
             <Pine.LNX.3.96.980326071424.8803F-100000@hill.ucr.edu> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 26 Mar 1998 10:00:19 -0800
From: Thomas Skibo <skibo@juniper.net>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk


> Another scenario which has to be considered..
> 
> if we have a data segment sent followed by a pure ack, and they reach out
> of order the pure ack sent later will update the TS.Recent before the data
> seg reaches.. the data seg will be dropped..

The pure ACK will have a SEQUENCE number greater than last.ack.sent
and so it won't update TS.Recent.  The data segment won't be dropped.




-- 
Thomas Skibo		Juniper Networks, Inc.		skibo@jnx.com



From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 26 10:49:21 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id KAA4159976
	for tcp-impl-list;
	Thu, 26 Mar 1998 10:46:18 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id KAA4163878
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Thu, 26 Mar 1998 10:46:17 -0800 (PST)
Received: from dm.cobaltmicro.com (dm.cobaltmicro.com [209.133.34.35]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id KAA01082
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 26 Mar 1998 10:46:16 -0800 (PST)
	mail_from (davem@dm.cobaltmicro.com)
Received: (from davem@localhost)
	by dm.cobaltmicro.com (8.8.7/8.8.7) id KAA06291;
	Thu, 26 Mar 1998 10:43:14 -0800
Date: Thu, 26 Mar 1998 10:43:14 -0800
Message-Id: <199803261843.KAA06291@dm.cobaltmicro.com>
From: "David S. Miller" <davem@dm.cobaltmicro.com>
To: vishnu@cs.ucr.edu
CC: vern@ee.lbl.gov, skibo@juniper.net, tcp-impl@cthulhu.engr.sgi.com
In-reply-to: <Pine.LNX.3.96.980326071424.8803F-100000@hill.ucr.edu> (message
	from Natchu Vishnu Priya on Thu, 26 Mar 1998 07:22:32 -0800 (PST))
Subject: Re: possible bug in PAWS
References:  <Pine.LNX.3.96.980326071424.8803F-100000@hill.ucr.edu>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   Date: Thu, 26 Mar 1998 07:22:32 -0800 (PST)
   From: Natchu Vishnu Priya <vishnu@cs.ucr.edu>

   ps: Dave, has the code for SACK, PAWS been added to linux 2.1.x.

Yes, and I have FACK going in as soon as I resolve some issues and
fine tune it.

Later,
David S. Miller
davem@dm.cobaltmicro.com

From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 26 10:49:21 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id KAA3177191
	for tcp-impl-list;
	Thu, 26 Mar 1998 10:45:42 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id KAA4153814
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Thu, 26 Mar 1998 10:45:40 -0800 (PST)
Received: from dm.cobaltmicro.com (dm.cobaltmicro.com [209.133.34.35]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id KAA00652
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 26 Mar 1998 10:45:39 -0800 (PST)
	mail_from (davem@dm.cobaltmicro.com)
Received: (from davem@localhost)
	by dm.cobaltmicro.com (8.8.7/8.8.7) id KAA06284;
	Thu, 26 Mar 1998 10:42:20 -0800
Date: Thu, 26 Mar 1998 10:42:20 -0800
Message-Id: <199803261842.KAA06284@dm.cobaltmicro.com>
From: "David S. Miller" <davem@dm.cobaltmicro.com>
To: mathis@psc.edu
CC: tcp-impl@cthulhu.engr.sgi.com, mathis@zippy.psc.edu
In-reply-to: <199803260824.DAA20750@zippy.psc.edu> (mathis@psc.edu)
Subject: Re: possible bug in PAWS
References:  <199803260824.DAA20750@zippy.psc.edu>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   Date: Thu, 26 Mar 1998 03:24:37 -0500
   From: "Matt Mathis" <mathis@psc.edu>

Hello Matt,

   If you compute the "forward acknowledgement", fack, to be the right
   most sequence mentioned by any acknowledgement to date (including SACK
   blocks and acknowledgement numbers).  I beleive that some of the timer
   algorithms can be recast using the advancing fack instead of the
   advancing acknowledgement numbers.  This might permit these algorithms
   to continue to function during recovery.

It's funny you should mention this, I became aware of all these
problems while implementing FACK under Linux, and I wanted to address
this very issue before I began fine tuning my implementation.

So if FACKs can be used to fix this PAWS ACK dropping problem, I
should be able to easily incorporate such a fix into my current code.

Later,
David S. Miller
davem@dm.cobaltmicro.com

From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 26 12:05:48 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id MAA4161060
	for tcp-impl-list;
	Thu, 26 Mar 1998 12:03:26 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id MAA4217096
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Thu, 26 Mar 1998 12:03:24 -0800 (PST)
Received: from barrichello.ucr.edu (mail.cs.ucr.edu [138.23.169.107]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id MAA03140
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 26 Mar 1998 12:03:23 -0800 (PST)
	mail_from (vishnu@cs.ucr.edu)
Received: from hill.ucr.edu (vishnu@hill.ucr.edu [138.23.169.109])
	by barrichello.ucr.edu (8.8.8/8.8.8) with ESMTP id NAA00481;
	Thu, 26 Mar 1998 13:04:54 -0800
Received: from localhost (vishnu@localhost)
	by hill.ucr.edu (8.8.8/8.8.8) with SMTP id MAA12453;
	Thu, 26 Mar 1998 12:03:21 -0800
X-Authentication-Warning: hill.ucr.edu: vishnu owned process doing -bs
Date: Thu, 26 Mar 1998 12:03:21 -0800 (PST)
From: Natchu Vishnu Priya <vishnu@cs.ucr.edu>
To: Vern Paxson <vern@ee.lbl.gov>
cc: Thomas Skibo <skibo@juniper.net>,
        "David S. Miller" <davem@dm.cobaltmicro.com>,
        tcp-impl@cthulhu.engr.sgi.com
Subject: Re: possible bug in PAWS 
In-Reply-To: <Pine.LNX.3.96.980326071424.8803F-100000@hill.ucr.edu>
Message-ID: <Pine.LNX.3.96.980326115836.12193A-100000@hill.ucr.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

On Thu, 26 Mar 1998, Natchu Vishnu Priya wrote:

> if we have a data segment sent followed by a pure ack, and they reach out
> of order the pure ack sent later will update the TS.Recent before the data
> seg reaches.. the data seg will be dropped..
> with networks which reorder segments this might lead to a significant
> increase in dropped segments.....
sorry.. I thought I had something...
The TS.Recent will not be advaanced by the ack..

> 
> looks like rule R3 is sec 4.2 of rfc1323 is hosed.
> It reacts perfectly for data segments.. but for acks it does not have
> sufficient information to check if the ack is new or old,
> 
> can we not use the TSecr of the acks to figure out if they are new or old,
> and not modifiy TS.Recent at all (i.e. leave the TS.Recent change rule as
> is done for RTTM)?
> 

anyway the fix I was looking at would be like this..

we note down the timestamp we sent on the first unacked packet and on pure
acks accept only those that have a TSecr >= timestamp on first unacked
packet.


This we can make sure that older acks (of same seq number) will not be
mistaken for dupacks.

This would work well for TCP when it stays idle for a long time too...

vishnu



From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 26 14:54:15 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id OAA4244147
	for tcp-impl-list;
	Thu, 26 Mar 1998 14:52:43 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id OAA4225813
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Thu, 26 Mar 1998 14:52:41 -0800 (PST)
Received: from zippy.psc.edu (zippy.psc.edu [128.182.61.149]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id OAA10540
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 26 Mar 1998 14:52:40 -0800 (PST)
	mail_from (mathis@psc.edu)
Received: from zippy.psc.edu (localhost [127.0.0.1]) by zippy.psc.edu (8.8.5/8.8.2) with ESMTP id RAA22474; Thu, 26 Mar 1998 17:52:37 -0500 (EST)
Message-Id: <199803262252.RAA22474@zippy.psc.edu>
To: "David S. Miller" <davem@dm.cobaltmicro.com>
cc: mathis@psc.edu, tcp-impl@cthulhu.engr.sgi.com, mathis@zippy.psc.edu
Subject: Re: possible bug in PAWS 
In-reply-to: Your message of "Thu, 26 Mar 1998 10:42:20 EST."
             <9jpvj9dqpq.fsf@totally-fudged-out-message-id> 
Date: Thu, 26 Mar 1998 17:52:36 -0500
From: "Matt Mathis" <mathis@psc.edu>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

>So if FACKs can be used to fix this PAWS ACK dropping problem, I
>should be able to easily incorporate such a fix into my current code.

The algorithm to do this does not yet exist.....  This particular
problem has been on my todo list since 2018 was in draft.  Lacking
progress, I thought I'd offer it to someone who has the need.

> looks like rule R3 is sec 4.2 of rfc1323 is hosed.
> It reacts perfectly for data segments.. but for acks it does not have
> sufficient information to check if the ack is new or old,

Note that if there are SACK blocks, fack should tell you if the duplicate
ACKs got reordered, and which timestamps to keep.   The "interesting"
part of the problem is figuring out what other checks are needed, and
what might go wrong.

--MM--

From owner-tcp-impl@relay.engr.sgi.com  Thu Mar 26 22:45:24 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id WAA4381780
	for tcp-impl-list;
	Thu, 26 Mar 1998 22:44:02 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id WAA4412597
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Thu, 26 Mar 1998 22:44:01 -0800 (PST)
Received: from dm.cobaltmicro.com (dm.cobaltmicro.com [209.133.34.35]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id WAA18955
	for <tcp-impl@cthulhu.engr.sgi.com>; Thu, 26 Mar 1998 22:44:00 -0800 (PST)
	mail_from (davem@dm.cobaltmicro.com)
Received: (from davem@localhost)
	by dm.cobaltmicro.com (8.8.7/8.8.7) id WAA07237;
	Thu, 26 Mar 1998 22:40:58 -0800
Date: Thu, 26 Mar 1998 22:40:58 -0800
Message-Id: <199803270640.WAA07237@dm.cobaltmicro.com>
From: "David S. Miller" <davem@dm.cobaltmicro.com>
To: mathis@psc.edu
CC: tcp-impl@cthulhu.engr.sgi.com, mathis@zippy.psc.edu
In-reply-to: <199803262252.RAA22474@zippy.psc.edu> (mathis@psc.edu)
Subject: Re: possible bug in PAWS
References:  <199803262252.RAA22474@zippy.psc.edu>
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

   Date: Thu, 26 Mar 1998 17:52:36 -0500
   From: "Matt Mathis" <mathis@psc.edu>

   >So if FACKs can be used to fix this PAWS ACK dropping problem, I
   >should be able to easily incorporate such a fix into my current code.

   The algorithm to do this does not yet exist.....  This particular
   problem has been on my todo list since 2018 was in draft.  Lacking
   progress, I thought I'd offer it to someone who has the need.

Noted.

   > looks like rule R3 is sec 4.2 of rfc1323 is hosed.
   > It reacts perfectly for data segments.. but for acks it does not have
   > sufficient information to check if the ack is new or old,

   Note that if there are SACK blocks, fack should tell you if the duplicate
   ACKs got reordered, and which timestamps to keep.   The "interesting"
   part of the problem is figuring out what other checks are needed, and
   what might go wrong.

Currently, my hack is to totally ignore the PAWS test for pure-ACK
frames, and add an extra check in the ts_recent update code so it is
not updated for the "old" ACK's we accept, the same behavior as for
out-of-order data segments.

It's probably totally wrong.  However, I do think it might be crucial
to find a close fix for the non-SACK cases.  If a more accurate and
complete one can be prescribed when SACK/FACK information is
available, so be it.

Later,
David S. Miller
davem@dm.cobaltmicro.com

From owner-tcp-impl@relay.engr.sgi.com  Tue Mar 31 14:44:10 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id OAA5795294
	for tcp-impl-list;
	Tue, 31 Mar 1998 14:41:20 -0800 (PST)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id OAA6082305
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Tue, 31 Mar 1998 14:41:14 -0800 (PST)
Received: from venera.isi.edu (venera.isi.edu [128.9.176.32]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id OAA17841
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 31 Mar 1998 14:41:11 -0800 (PST)
	mail_from (ahughes@ISI.EDU)
Received: from hun.isi.edu (hun.isi.edu [128.9.160.145])
	by venera.isi.edu (8.8.7/8.8.6) with SMTP id OAA02360
	for <tcp-impl@cthulhu.engr.sgi.com>; Tue, 31 Mar 1998 14:41:08 -0800 (PST)
Message-ID: <35217103.5656AEC7@isi.edu>
Date: Tue, 31 Mar 1998 14:41:07 -0800
From: "Amy (Biermann) Hughes" <ahughes@ISI.EDU>
X-Mailer: Mozilla 3.03Gold (X11; I; SunOS 4.1.4 sun4m)
MIME-Version: 1.0
To: tcp-impl@cthulhu.engr.sgi.com
Subject: IETF presentation: Issues in TCP Slow-Start Restart after Idle
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

The slides from my Monday talk at IETF are available from:
http://www.isi.edu/~ahughes/pubs/ietf-slides.ps

Please note: these slides do not contain the tcpdump graphs that
I presented yesterday.  If you would like a copy of those, please
e-mail me.  Also, I added a note about the code that was presented
yesterday.

If you would like to see the proposed ID that was the source
for the talk, it is available at:
http://www.isi.edu/~ahughes/pubs/draft-xxx.txt

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Amy S. Hughes				   Graduate Research Assistant
(310) 822-1511 x111		    USC/Information Sciences Institute
ahughes@isi.edu 		       4676 Admirality Way, Suite 1219
www.isi.edu/~ahughes                          Marina del Rey, Ca 90292
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From owner-tcp-impl@relay.engr.sgi.com  Tue Apr  7 10:49:04 1998
Received: (from majordomo-owner@localhost)
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) id KAA8748778
	for tcp-impl-list;
	Tue, 7 Apr 1998 10:47:04 -0700 (PDT)
Return-Path: <owner-tcp-impl@relay.engr.sgi.com>
Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37])
	by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF)
	via ESMTP id KAA7930961
	for <tcp-impl@cthulhu.engr.sgi.com>;
	Tue, 7 Apr 1998 10:44:23 -0700 (PDT)
Received: from emu.sp.trw.com (emu.sp.TRW.COM [129.4.168.39]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id KAA29949
	for <tcp-impl@relay.engr.SGI.COM>; Tue, 7 Apr 1998 10:44:22 -0700 (PDT)
	mail_from (Aaron.Falk@trw.com)
Received: from afalk (afalk.sp.TRW.COM [129.4.147.22]) by emu.sp.trw.com (8.7.5/8.7.3) with ESMTP id KAA28423; Tue, 7 Apr 1998 10:43:39 -0700 (PDT)
Message-ID: <352A6601.C7B82FE7@trw.com>
Date: Tue, 07 Apr 1998 10:44:33 -0700
From: Aaron Falk <Aaron.Falk@trw.com>
Reply-To: afalk@mailsrv1.trw.com
Organization: TRW, Electronics Systems & Technology Division
X-Mailer: Mozilla 4.0 [en] (WinNT; U)
MIME-Version: 1.0
To: tcp-over-satellite@achtung.sp.TRW.COM, end2end-interest@ISI.EDU,
        tcp-impl@cthulhu.engr.sgi.com
CC: tcppep@lerc.nasa.gov
Subject: TCP Spoofing I-D
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-tcp-impl@relay.engr.sgi.com
Precedence: bulk

At the TCPSAT meeting last week in Los Angeles, there was
discussion about creating an Internet Draft discussing TCP
spoofing. Similar to the goals of the TCP over satellite
I-Ds, this draft would 1) educate people spoofing (or
considering it) on any risks of their approach (especially
when that approach had negative impact on traffic outside of
their domain) and 2) to perform a survey of what kind of
spoofing is being done in the Internet.  Note that this is
not limited to TCP over satellite.  Spoofing is being done
in other environments as well, such as connections over
transoceanic fiber.

There was a strong consensus in the working group to pursue
this and (with some guidance from the Transport Area
directors) a brief plan has been developed. A short,
informal meeting was held with interested folks at the IETF
to work out some details. There was general agreement that
spoofing means different things to different people but that
it is generally a pejorative term. To allow the group to
focus on the technical issues rather than the emotional
ones, I am suggesting that what we are talking about are
proxies that enhance TCP performance. Therefore, I propose
that we call this activity TCPPEP for, naturally, TCP
Performance Enhancing Proxies. Eric Travis and I led the
discussion and Eric will be posting a summary to the tcppep
list.

So, here's the plan:


   * Create a mail list for discussion
   * Solicit drafts from interested parties on what
     mechanisms they are using
   * Request a BOF at the August IETF in Chicago
   * Create an Informational RFC documenting the groups
     output

A mail list has been created. Here are the instructions:

> To subscribe to the TCP with Performance Enhancing Proxies mailing list, send mail to the following address:
>
> majordomo@lerc.nasa.gov
>
> With the following text in the body of the message.
>
> subscribe tcppep
>
> To send mail to the mailing list, send it to the following address.
>
> tcppep@lerc.nasa.gov
>
An archive is available through a web page at
http://tcppep.lerc.nasa.gov/tcppep/.

Also note there has been discussion of a workshop before the
next IETF to continue f2f discussions.

Please use the tcppep list for further discussion on this
topic.

--aaron

--
Aaron Falk    (310) 814-4932
TRW, Inc                     Space & Electronics Group
One Space Park                 Redondo Beach, CA 90278



