From owner-ppp-comp Tue May 11 19:10:35 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nt6Gx-00006na@daver.bungi.com>; Tue, 11 May 93 19:10 PDT
X-Path: sgi.com!news
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
To: ppp-comp@bungi.com
Subject: test
Date: Wed, 12 May 1993 01:01:46 GMT
Message-ID: <hrh5urg@sgi.sgi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

test
initial test




From owner-ppp-comp Wed May 12 08:13:21 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0ntIUZ-00000Ma@daver.bungi.com>; Wed, 12 May 93 08:13 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Welcome
Date: Wed, 12 May 1993 08:13:06 PDT
Message-ID: <m0ntIUV-0000S0C@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

Welcome to the ppp-comp mailing list.  This list should be used to
dicuss items related to data compression of PPP links. Within a few
months, I hope to get all of the issues resolved, and have at least
4 interoperating versions, with at least two different compression
algorithms.

Over the next few days, a couple of documents that I have written
covering compression will be posted to this mailing list. All submissions
received by ppp-comp are archived, so don't worry about missing some
information (but I do need an ftp archive-site volunteer - please mail
dlr@bungi.com).

Thanks for participating and making this specification happen!


-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Wed May 12 11:09:41 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0ntLFF-00002Ka@daver.bungi.com>; Wed, 12 May 93 11:09 PDT
X-Path: bridge2.NSD.3Com.COM!vsp
From: Venkat Prasad <vsp@NSD.3Com.COM>
To: ppp-comp@bungi.com
Subject: Comp-Implementation Experience
Date: Wed, 12 May 93 11:03:07 -0700
Message-ID: <199305121803.AA16346@himagiri.NSD.3Com.COM>
Reply-To: ppp-comp@bungi.com
Organization: 3Com, 5400 Bayfront Plaza, Santa Clara, CA 95052-8145
Precedence: bulk



3Com has a working/shipping implementation based on the first draft
of CCCP. I will be willing to work with other vendors who have 
implementations to explore interoperable solutions.

/Prasad


>> X-Path: dlr
>> Precedence: bulk

>> Welcome to the ppp-comp mailing list.  This list should be used to
>> dicuss items related to data compression of PPP links. Within a few
>> months, I hope to get all of the issues resolved, and have at least
>> 4 interoperating versions, with at least two different compression
>> algorithms.

>> Over the next few days, a couple of documents that I have written
>> covering compression will be posted to this mailing list. All submissions
>> received by ppp-comp are archived, so don't worry about missing some
>> information (but I do need an ftp archive-site volunteer - please mail
>> dlr@bungi.com).

>> Thanks for participating and making this specification happen!


>> -- 
>> Dave Rand
>> {pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com


From owner-ppp-comp Thu May 13 18:45:34 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0ntopw-0000PWa@daver.bungi.com>; Thu, 13 May 93 18:45 PDT
X-Path: eng.buffalo.edu!victord
From: victord@eng.buffalo.edu (victor demjanenko)
To: ppp-comp@bungi.com
Subject: Re:  Comp-Implementation Experience
Date: Thu, 13 May 93 08:26:17 EDT
Message-ID: <9305131226.AA00572@beatrix.eng.buffalo.edu>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

Where can I get some information on CCCP?

Victor Demjanenko, Ph.D.
Electrical and Computer Engineering

Buffalo, New York


From owner-ppp-comp Sun May 16 21:21:55 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nuwhv-0000DEa@daver.bungi.com>; Sun, 16 May 93 21:21 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: New topics of conversation
Date: Sun, 16 May 1993 21:21:44 PDT
Message-ID: <m0nuwhs-0000DdC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

Now that everyone has joined up, I would like to start a few new threads
of conversation.  The archive is running now, and the anonymous ftp
archive site will be set up early next week (I have a volunteer site
already, thanks!)

Thanks again for your interest, and let's hope that we can conclude
the proposals quickly. Everyone (or at least everyone's marketing
departments :-) is interested in compression - so let's get
the technical details settled.


-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Sun May 16 21:32:58 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nuwsQ-0000DEa@daver.bungi.com>; Sun, 16 May 93 21:32 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Measuring compression
Date: Sun, 16 May 1993 21:32:30 PDT
Message-ID: <m0nuwsJ-0000CrC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

No matter which algorithms are chosen, there comes a time to measure how
much compression can be achived.  There are many different ways of doing
this. In the RFC, I would like to propose a standard way of measuring the
compression ratio of each algorithm, and each manufacturer.

Currently, I do this by taking the Calgary Corpus set of files,
wrapping IPX and PPP header information around 512 bytes from
each file, and interleaving the results.  This simulates a number
of IPX users using a common link, getting fair share of the link.

The good things about this approach are:

1. It really tries to measure the effective compression ratio of
   each algorithm, under real-world conditions.
2. It uses a standard, available set of files to do so.
3. It is very deterministic.

The bad things about this approach are:

1. It is IPX-centric.
2. It doesn't test variable packet sizes.
3. It doesn't show the 200:1 compression ratios that marketing people demand.

In my tests, the best compression ratio was 2.7:1, and the worst was 1.1:1.
I'll be publishing the complete text of my tests on Monday, of the algorithms
that I ran.

We need to define a common, available set of files that we can feel good
about quoting the performance on.  Too many people have been burned by
marketing claims of 4:1 compression ratios - we can take this opportunity
to fix it now!

Please - address your comments to the group. Address your flames to me.



-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Sun May 16 21:41:57 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nux1H-0000COa@daver.bungi.com>; Sun, 16 May 93 21:41 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Compression algorithms
Date: Sun, 16 May 1993 21:41:42 PDT
Message-ID: <m0nux1C-0000BYC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

In order to be interoperable, we need to have a least one compression
algorithm that is common to all implementations.  The ideal algorithm
should use little or no memory, run in < 100 usec for a 1500 byte packet,
require no hardware, offer a 200:1 compression ratio, and not require
a license fee :-)

Since there is no algorithm (that I am aware of) meeting these criteria,
we need to settle on one or more algorithms that can serve as a minimum
base for everyone to implement.  In the work that I have done, we allow for
up to 256 algorithms to be used - this allows custom algorithms to be
preferred over the generic ones, if both ends support it.

There are two broad classes of compression algorithms under study:

1. Compression algorithms requiring state.  These compression algorithms
   maintain state between each packet on the link, and thus require that
   every byte emitted by the compression engine must be recieved exactly
   once by the decompression engine. This means a reliable link, as in
   LAPB.

2. Compression algorithms not requiring state.  Each packet contains
   sufficient information to decompress, and the loss (or duplication)
   of a packet will not cause the decompressor to fail.

It is my opinion that no matter which approach is taken, there must be
sufficient redundancy in the transmitted data to ensure that the decompressed
data is valid.  In more obvious terms, since each bit of the compressed
data is more significant - the link-level CRC is not enough to ensure that
the transmitted data is the same as the received, decompressed data. This
means that an additional CRC, or checksum, or *something* is required to
validate the decompressed data.

I have tested about 20 or so algorithms, and I have my favourites: what
are yours?


-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Sun May 16 21:45:52 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nux57-0000Mra@daver.bungi.com>; Sun, 16 May 93 21:45 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: link-level issues
Date: Sun, 16 May 1993 21:45:42 PDT
Message-ID: <m0nux54-0000MqC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

If we chose a compression algorithm that requires state information,
a reliable link must be used.  LAPB seems to be a good choice, but has
some problems.

1. If the simple form of LAPB is used (8 window), there is not enough
   data "in flight" for satellite links, or high speed links.

2. If the 128 window version is used, there can be a lot of data buffers
   required.

3. There is no selective retransmission in LAPB - you must resend the
   entire window from the dropped packet.


Should we mandate the 128-window version? Add selective retransmission?
Use another protocol?


-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Mon May 17 06:24:42 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nv5BA-0000Q0a@daver.bungi.com>; Mon, 17 May 93 06:24 PDT
X-Path: eng.buffalo.edu!victord
From: victord@eng.buffalo.edu (victor demjanenko)
To: ppp-comp@bungi.com
Subject: Re:  link-level issues
Date: Mon, 17 May 93 09:11:10 EDT
Message-ID: <9305171311.AA17430@beatrix.eng.buffalo.edu>
Reply-To: ppp-comp@bungi.com
Precedence: bulk



From owner-ppp-comp Mon May 17 10:49:18 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nv9J8-000018a@daver.bungi.com>; Mon, 17 May 93 10:49 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Feedback to Dave Rand's messages
Date: Mon, 17 May 1993 13:33:24 -0400 (EDT)
Message-ID: <9305171733.AA16095@hobbit.gandalf.ca>
Reply-To: ppp-comp@bungi.com
Precedence: bulk


> If we chose a compression algorithm that requires state information,
> a reliable link must be used.  LAPB seems to be a good choice, but has
> some problems.
> 
> 1. If the simple form of LAPB is used (8 window), there is not enough
> data "in flight" for satellite links, or high speed links.
> 
> 2. If the 128 window version is used, there can be a lot of data buffers
> required.
> 
> 3. There is no selective retransmission in LAPB - you must resend the
> entire window from the dropped packet.
>
>Should we mandate the 128-window version? Add selective retransmission?
>Use another protocol?

In the spirit of PPP, let's negotiate the LAPB window size :-)  Further,
negotiate the active window separate from the LAPB window.  There is no
reason to tie the two windows together.  Doesn't Fred Bakers LAPB document
get into this?  There is no need to go to anything other than LAPB.
Selective reject extension is a good idea, but most integrated controllers
do not support this.  One main objective of PPP is hardware backwards
compatability.  (Note, our LAPB is software based, so don't blame me for
this PPP decision). 

> Since there is no algorithm (that I am aware of) meeting these criteria,
> we need to settle on one or more algorithms that can serve as a minimum
> base for everyone to implement.  In the work that I have done, we allow for
> up to 256 algorithms to be used - this allows custom algorithms to be
> preferred over the generic ones, if both ends support it.

Then, I would favour an LZ77 static encoded algorithm compatable with STAC,
but with no per-product royalties.  Does anyone want to buy it from me?
> 
> There are two broad classes of compression algorithms under study:
> 
> 1. Compression algorithms requiring state.  These compression algorithms
>    maintain state between each packet on the link, and thus require that
>    every byte emitted by the compression engine must be recieved exactly
>    once by the decompression engine. This means a reliable link, as in
>    LAPB.

IMHO, this is the only way to do it.  Others may disagree.  Look at the
headstands one goes through to make an unreliable link just do header
compression.  The amount of compression you'll get from a state-less
algorithm will be pathetic.

> It is my opinion that no matter which approach is taken, there must be
> sufficient redundancy in the transmitted data to ensure that the decompressed
> data is valid.  In more obvious terms, since each bit of the compressed
> data is more significant - the link-level CRC is not enough to ensure that
> the transmitted data is the same as the received, decompressed data. This
> means that an additional CRC, or checksum, or *something* is required to
> validate the decompressed data.

It seems just fine in the thousand of compression devices we have shipped :-)
Why do you want to add redundancy to the data, when compression is trying
to get rid of it?  Do you not trust the compression algorithm, the device
hardware, or what?  Fix what you don't trust.

The 16-bit CRC may not be suitable for very high error
rate links, but this is a link layer concern.  PPP can use a 32-bit CRC
to solve the high error rate link.

But there is some extra information there, IP checksum, IP length field, 
or even the original FCS on bridged packets.
> 
> I have tested about 20 or so algorithms, and I have my favourites: what
> are yours?

LZ77 sliding window + either static huffman or arithmetic backend.  It's
my favoutite because it's out in the field, we own the source code, no
license fee, and by the way, it gives the highest compression ratios out
there.  It's also for sale to anyone that's interested.

> No matter which algorithms are chosen, there comes a time to measure how
> much compression can be achived.  There are many different ways of doing
> this. In the RFC, I would like to propose a standard way of measuring the
> compression ratio of each algorithm, and each manufacturer.

> Currently, I do this by taking the Calgary Corpus set of files,
> wrapping IPX and PPP header information around 512 bytes from
> each file, and interleaving the results.  This simulates a number
> of IPX users using a common link, getting fair share of the link.

Yes.  I also spent time trying to determine a valid measurement method.
Unfortunately, our methods are quite different.  Having tried yours and
coming in a close third behind HPACK and Info-ZIP,  I would like to
state that treating the packets as one file does not reflect reality.

> The good things about this approach are:
> 
> 1. It really tries to measure the effective compression ratio of
>    each algorithm, under real-world conditions.

Unfortunately, interleaving one packet from each file isn't realistic.
On a LAN you get multiple packets from one source, then some from a 
different source.

> 2. It uses a standard, available set of files to do so.

Yes, but the file set is too small.  Let's augment the basic file set
with some real life file.  I would propose to add PC and SUN binary and
graphic files.  Maybe some medical images.  Anything else?

> 3. It is very deterministic.
> 
> The bad things about this approach are:
> 
> 1. It is IPX-centric.
> 2. It doesn't test variable packet sizes.
> 3. It doesn't show the 200:1 compression ratios that marketing people demand.

> In my tests, the best compression ratio was 2.7:1, and the worst was 1.1:1.
> I'll be publishing the complete text of my tests on Monday, of the algorithms
> that I ran.

Feel free to put my 2.67:1 number on the list! 
> 
> We need to define a common, available set of files that we can feel good
> about quoting the performance on.  Too many people have been burned by
> marketing claims of 4:1 compression ratios - we can take this opportunity
> to fix it now!

Here is my counter-proposal for benchmarking:

Record real-life transfers using a SNIFFER.  Play back SNIFFER file, playing 
the frames into the compressor.  I have done this for a LANALYZER.  I intend 
to do it for a SNIFFER as well.  If anyone has a SNIFFER disassembler, I could
use it.

I suggest getting as many traces as deemed sufficient to create a CORPUS.
We can certainly do this for IP traffic, by tracing an Internet link.  There
should be no privacy issues, since the Internet is a public network.  Can a
similar network be found for IPX?  I have an IPX trace file for U of A, but
I'd have to check with them as to whether I can release it.

I believe this approach will solve all the shortcomings in your approach Dave.

Note, it should be possible to trace the transfer of Calgary Corpus files to
add them to our Corpus.


From owner-ppp-comp Mon May 17 10:49:18 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nv9J2-00009Za@daver.bungi.com>; Mon, 17 May 93 10:48 PDT
X-Path: hprnls7.rose.hp.com!jim
From: Jim Petty <jim@hprnls7.rose.hp.com>
To: ppp-comp@bungi.com
Subject: Re: Compression algorithms
Date: Mon, 17 May 93 8:41:11 PDT
Message-ID: <9305171541.AA20501@hprnls7.rose.hp.com>
References: <<m0nux1C-0000BYC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

Yo Dave

> 
> In order to be interoperable, we need to have a least one compression
> algorithm that is common to all implementations.  The ideal algorithm
> should use little or no memory, run in < 100 usec for a 1500 byte packet,
> require no hardware, offer a 200:1 compression ratio, and not require
> a license fee :-)

Are you listening in on our marketing dept again?

> 
> Since there is no algorithm (that I am aware of) meeting these criteria,
> we need to settle on one or more algorithms that can serve as a minimum
> base for everyone to implement.  In the work that I have done, we allow for
> up to 256 algorithms to be used - this allows custom algorithms to be
> preferred over the generic ones, if both ends support it.

The minimum base is a good idea?

> 
> There are two broad classes of compression algorithms under study:
> 
> 1. Compression algorithms requiring state.  These compression algorithms
>    maintain state between each packet on the link, and thus require that
>    every byte emitted by the compression engine must be recieved exactly
>    once by the decompression engine. This means a reliable link, as in
>    LAPB.
> 
> 2. Compression algorithms not requiring state.  Each packet contains
>    sufficient information to decompress, and the loss (or duplication)
>    of a packet will not cause the decompressor to fail.
> 
> It is my opinion that no matter which approach is taken, there must be
> sufficient redundancy in the transmitted data to ensure that the decompressed
> data is valid.  In more obvious terms, since each bit of the compressed
> data is more significant - the link-level CRC is not enough to ensure that
> the transmitted data is the same as the received, decompressed data. This
> means that an additional CRC, or checksum, or *something* is required to
> validate the decompressed data.

I am reminded of an old software tale about a client who insisted that
there be 0% data corruption in the requested application.  He was unhappy
when his application would periodically stop due to impending data
corruption.  Maybe there is something you can do, use a 48bit CRC
or something, but the minimum base you mentioned earlier should be
the link-level CRC.  Some vendors have proprietary experience with
this method, any problems out there?

> 
> I have tested about 20 or so algorithms, and I have my favourites: what
> are yours?
> 
> 
> -- 
> Dave Rand
> {pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com
> 
> 

Jim Petty


From owner-ppp-comp Mon May 17 10:49:23 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nv9J1-00002qa@daver.bungi.com>; Mon, 17 May 93 10:48 PDT
X-Path: mail.barrnet.net!brian
From: brian@mail.barrnet.net
To: ppp-comp@bungi.com
Subject: Re: link-level issues
Date: Mon, 17 May 1993 08:16:59 -0800
Message-ID: <9305171517.AA17770@Angband.Stanford.EDU>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

>3. There is no selective retransmission in LAPB - you must resend the
>   entire window from the dropped packet.

True, but whether or not this is an issue depends on the error rate on your
link.  Most links are pretty good these days so the likelyhood that a
retransmission will be required is pretty small.  It may not be worth the
added complexity to switch from a go-back-N protocol to a selective
retransmission protocol.



Brian Lloyd                                       3420 Sudbury Road
brian@lloyd.com                                   Cameron Park, CA  95682
brian@mail.barrnet.net                            (916) 676-3442 - fax
(415) 725-1392                                    (916) 676-1147 - voice


From owner-ppp-comp Mon May 17 10:49:28 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nv9Iz-00009da@daver.bungi.com>; Mon, 17 May 93 10:48 PDT
X-Path: hprnls7.rose.hp.com!jim
From: Jim Petty <jim@hprnls7.rose.hp.com>
To: ppp-comp@bungi.com
Subject: Re: link-level issues
Date: Mon, 17 May 93 8:58:44 PDT
Message-ID: <9305171558.AA20508@hprnls7.rose.hp.com>
References: <<m0nux54-0000MqC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

Hi Dave

> 
> If we chose a compression algorithm that requires state information,
> a reliable link must be used.  LAPB seems to be a good choice, but has
> some problems.
> 
> 1. If the simple form of LAPB is used (8 window), there is not enough
>    data "in flight" for satellite links, or high speed links.
> 
> 2. If the 128 window version is used, there can be a lot of data buffers
>    required.
> 
> 3. There is no selective retransmission in LAPB - you must resend the
>    entire window from the dropped packet.
> 
> 
> Should we mandate the 128-window version? Add selective retransmission?
> Use another protocol?

You are right, there are different situations the PPP compression 
will be subjected.  But how are we to predict the situation?  What about
the low-end router product that would be happier with the modulo-8
window in LAPB.  If you use LAPB, then LAPB should be configured
seperately for the local situation.

What other protocols have been thinking about?

> 
> 
> -- 
> Dave Rand
> {pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com
> 
> 

Jim Petty


From owner-ppp-comp Mon May 17 11:02:56 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nv9WR-00005Wa@daver.bungi.com>; Mon, 17 May 93 11:02 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: link-level issues
Date: Mon, 17 May 1993 11:02:42 PDT
Message-ID: <m0nv9WN-00007VC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: link-level issues" on May 17,  8:16, brian@mail.barrnet.net writes:]
> >3. There is no selective retransmission in LAPB - you must resend the
> >   entire window from the dropped packet.
> 
> True, but whether or not this is an issue depends on the error rate on your
> link.  Most links are pretty good these days so the likelyhood that a
> retransmission will be required is pretty small.  It may not be worth the
> added complexity to switch from a go-back-N protocol to a selective
> retransmission protocol.

I don't agree. In my experience, the reason for dropped packets is an
out of resources situation on the receiving end. This causes packets
to back up on the TX end until the retransmit timeout, at which time
another burst of packets are sent, triggering another dropped packet,
etc, etc.

-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Mon May 17 11:21:15 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nv9o9-00007Ma@daver.bungi.com>; Mon, 17 May 93 11:21 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: link-level issues
Date: Mon, 17 May 1993 11:21:00 PDT
Message-ID: <m0nv9o5-00008PC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: link-level issues" on May 17,  8:58, Jim Petty writes:]
> > Should we mandate the 128-window version? Add selective retransmission?
> > Use another protocol?
> 
> You are right, there are different situations the PPP compression 
> will be subjected.  But how are we to predict the situation?  What about
> the low-end router product that would be happier with the modulo-8
> window in LAPB.  If you use LAPB, then LAPB should be configured
> seperately for the local situation.
> 
> What other protocols have been thinking about?


I was considering some other negative-acknowledge protocols, such as
a ZMODEM-type protocol...


-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Mon May 17 12:09:33 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvAYP-00002Ha@daver.bungi.com>; Mon, 17 May 93 12:08 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@Bungi.com
Subject: Compression CRC - needed?
Date: Mon, 17 May 1993 12:08:09 PDT
Message-ID: <m0nvAXw-00007zC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

Since several people have commented, here are my reasons for wanting
an additional check - of some kind - on the compressed data stream.

1. With a method that keeps state between packets, a corrupt packet may
   affect the 'dictionary', preventing all further communications on the
   link.
2. A corrupt packet passed to the decompressor may do bad things, such as
   create a packet that is too long. The LZ78 based UNIX compress utility
   is a good (bad?) example of this - when faced with bogus input, it
   usually spews a huge string of garbage out, then core dumps.
3. The link-level CRC-16 is good at catching single and 2 bit errors,
   but does not reliably detect longer errors.  High speed modems 
   are now using scramblers of length 16 or so, meaning that a single
   baud error on the line will corrupt many bits - reducing the 
   ability of the CRC-16 to catch the error.
4. CRC-32 cannot be implemented on much of the sync hardware that is
   already in the field.
5. Not all protocols implement higher-level checksums that will catch
   decompressor failures, like TCP/IP will.

I don't think that I am paranoid - it is just that most people 
assume that since there is a CRC on the data, it must be reliable.
There are many things that can fail between a packet coming in good,
and going out bad.  It costs very little to add a CRC-16 to the compressed
data stream, and even less for a Fletcher checksum.

In my prototype, I am actually sending the CRC-16 of the original packet
down through the compressor, so the number of bits actually used on the
link is less than 16 in many cases.  This gives me the assurance that
the data that entered the compressor has a high probability of being the
same data that left the decompressor.

*ANY* check of the data is better than no check. I would be happy to
see ANYTHING, like an LRC, over no check at all.


-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Mon May 17 12:55:35 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvBHN-00002Oa@daver.bungi.com>; Mon, 17 May 93 12:55 PDT
X-Path: fcr.com!brad
From: Brad Parker <brad@fcr.com>
To: ppp-comp@bungi.com
Subject: Re: link-level issues 
Date: Mon, 17 May 1993 15:47:28 -0400
Message-ID: <9305171948.AA14922@stemwinder.fcr.com>
References: <<dlr@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk


>> If we chose a compression algorithm that requires state information,
>> a reliable link must be used.  LAPB seems to be a good choice, but has
>> some problems.
>> 
>> 1. If the simple form of LAPB is used (8 window), there is not enough
>>    data "in flight" for satellite links, or high speed links.
>> 
>> 2. If the 128 window version is used, there can be a lot of data buffers
>>    required.
>> 
>> 3. There is no selective retransmission in LAPB - you must resend the
>>    entire window from the dropped packet.
>> 
>> 
>> Should we mandate the 128-window version? Add selective retransmission?
>> Use another protocol?

I guess it depends on your "design center", but for modem work, such as
I am doing, a window of 8 is more than enough.

The one thing I would "add" to the LAPB protocol would be sending a REJ
if a "garbled" frame is received.  This is optional, and (I think) does
not break anything.  It should help interactive response on slow links.

since I desire small windows, selective retransmit is not needed.

-brad




From owner-ppp-comp Mon May 17 12:55:39 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvBHN-0000Cea@daver.bungi.com>; Mon, 17 May 93 12:55 PDT
X-Path: acc.com!fbaker
From: fbaker@acc.com (Fred Baker)
To: ppp-comp@bungi.com
Subject: Re:  link-level issues
Date: Mon, 17 May 93 12:55:58 PDT
Message-ID: <9305171955.AA23781@saffron.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

I would suggest that we follow ISO 7776, which would answer these questions
in this way.

>> 1. If the simple form of LAPB is used (8 window), there is not enough
>>    data "in flight" for satellite links, or high speed links.
>> 
>> 2. If the 128 window version is used, there can be a lot of data buffers
>>    required.
>> 
>> 3. There is no selective retransmission in LAPB - you must resend the
>>    entire window from the dropped packet.
>>
>> Should we mandate the 128-window version? Add selective retransmission?
>> Use another protocol?

ISO 7776 describes another value, K, called the "window".  The window
is independent of the modulus, although it cannot exceed the modulus.
If one is using modulo 8, the window can be any value in 1..7; if one
is using modulo 128, the window can be any value from 1..127, and is
usually in 8..127.

I would suggest that we use vanilla LAPB and negotiate the window at
setup time.

Generally, one would like the window to approximate 

			# bits in largest message
	window = 2 * -------------------------------
			# bits in smallest message

as the following sequence CAN occur in LAPB:

	system A:		system B:
				starts sending 1500 byte message
	sends 100 byte message
	sends 100 byte message
	sends 100 byte message
	sends 100 byte message
	sends 100 byte message
	sends 100 byte message
	sends 100 byte message
	sends 800 bytes of idle sequence
				starts sending another 1500 byte message,
					acknowledging A's traffic
	sends 1500 bytes of idle sequence
				starts sending some third message
	now has window to send more.

Now, what you'd LIKE to have happen is for "A" to send an RR+P after
its messages, and for "B" to send an RR+F before sending its second
message.  I can't say whether any implementation does exactly that, but
I know of some that send the RR+P, think I know of at least one that
doesn't, and know of some that might respond to the RR+P *after* the
second large I frame.  Even in that case, however, you'd like the
window to be at least 1500/100=15.

3COM and ACC have both stated (in the past) a preference for using LAPB
because some or all of our equipment is able to use a hardware LAPB
implementation.  In this case, we would both prefer to not require use
of SREJ, as the hardware doesn't support it.  We could see negotiating
its use.  If you really want to use a different protocol, I would
suggest the use of the LAPB replacement suggested by John G. Fletcher
of Lawrence Livermore Labs in the SIGCOMM '84 paper "Serial Link
Protocol Design: A critique of the X.25 standard, Level 2"

There are a couple of reasons SREJ isn't real widely deployed, the most
telling to me being that links generally don't lose just a few bits
here and there, but rather lose blocks of information more
infrequently.  Links prone to *bit* errors tend to have forward error
correction (trellis coding) built into the modem.  When you lose a
block of data (DS1 or BONDING frame slip, DSU synchronization loss)
it's common to lse two or more frames.

I can't say I see overwhelming value in SREJ.

From owner-ppp-comp Mon May 17 16:03:51 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvED9-0000Aka@daver.bungi.com>; Mon, 17 May 93 16:03 PDT
X-Path: opal.acc.com!art
From: art@opal.acc.com (Art Berggreen)
To: ppp-comp@bungi.com
Subject: Re: link-level issues
Date: Mon, 17 May 93 13:46:20 PDT
Message-ID: <9305172046.AA06358@opal.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

>
>I don't agree. In my experience, the reason for dropped packets is an
>out of resources situation on the receiving end. This causes packets
>to back up on the TX end until the retransmit timeout, at which time
>another burst of packets are sent, triggering another dropped packet,
>etc, etc.

I would submit that a properly designed and implemented LAPB should
go RNR before resources are completely exhausted.  (of course the
real world is not always this ideal)

Art

From owner-ppp-comp Mon May 17 16:03:51 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvEDA-00000ua@daver.bungi.com>; Mon, 17 May 93 16:03 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: link-level issues
Date: Mon, 17 May 1993 16:58:21 -0400 (EDT)
Message-ID: <9305172058.AA25708@hobbit.gandalf.ca>
References: <<m0nv9WN-00007VC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> 
> [In the message entitled "Re: link-level issues" on May 17,  8:16, brian@mail.barrnet.net writes:]
> > >3. There is no selective retransmission in LAPB - you must resend the
> > >   entire window from the dropped packet.
> > 
> > True, but whether or not this is an issue depends on the error rate on your
> > link.  Most links are pretty good these days so the likelyhood that a
> > retransmission will be required is pretty small.  It may not be worth the
> > added complexity to switch from a go-back-N protocol to a selective
> > retransmission protocol.
> 
> I don't agree. In my experience, the reason for dropped packets is an
> out of resources situation on the receiving end. This causes packets
> to back up on the TX end until the retransmit timeout, at which time
> another burst of packets are sent, triggering another dropped packet,
> etc, etc.

That's what an RNR is for in LAPB.

From owner-ppp-comp Mon May 17 16:03:55 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvED9-0000ALa@daver.bungi.com>; Mon, 17 May 93 16:03 PDT
X-Path: opal.acc.com!art
From: art@opal.acc.com (Art Berggreen)
To: ppp-comp@bungi.com
Subject: Re: link-level issues
Date: Mon, 17 May 93 13:49:39 PDT
Message-ID: <9305172049.AA06366@opal.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

>> 
>> What other protocols have been thinking about?
>
>
>I was considering some other negative-acknowledge protocols, such as
>a ZMODEM-type protocol...

Let's please stick with mainstream solutions that have silicon options.
(like LAPB)

Art


From owner-ppp-comp Mon May 17 16:03:59 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvEDP-00002fa@daver.bungi.com>; Mon, 17 May 93 16:03 PDT
X-Path: opal.acc.com!art
From: art@opal.acc.com (Art Berggreen)
To: ppp-comp@bungi.com
Subject: Re: link-level issues
Date: Mon, 17 May 93 13:58:43 PDT
Message-ID: <9305172058.AA06381@opal.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

>
>The one thing I would "add" to the LAPB protocol would be sending a REJ
>if a "garbled" frame is received.  This is optional, and (I think) does
>not break anything.  It should help interactive response on slow links.

What is your definition of garbled frame?  Just because a flag gets
corrected and the receiver starts trying to receive a frame shouldn't
cause a REJ to get generated.  Also, I think that such behavior is
at odds with ISO HDLC specs and may well break some LAPB conformance
tests.  Finally, most silicon implementations couldn't be expected
to do this.

Art


From owner-ppp-comp Mon May 17 16:04:00 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvEDU-00009La@daver.bungi.com>; Mon, 17 May 93 16:03 PDT
X-Path: opal.acc.com!art
From: art@opal.acc.com (Art Berggreen)
To: ppp-comp@bungi.com
Subject: Re:  Compression CRC - needed?
Date: Mon, 17 May 93 13:54:35 PDT
Message-ID: <9305172054.AA06375@opal.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

>
>I don't think that I am paranoid - it is just that most people 
>assume that since there is a CRC on the data, it must be reliable.
>There are many things that can fail between a packet coming in good,
>and going out bad.  It costs very little to add a CRC-16 to the compressed
>data stream, and even less for a Fletcher checksum.
>
>In my prototype, I am actually sending the CRC-16 of the original packet
>down through the compressor, so the number of bits actually used on the
>link is less than 16 in many cases.  This gives me the assurance that
>the data that entered the compressor has a high probability of being the
>same data that left the decompressor.
>
>*ANY* check of the data is better than no check. I would be happy to
>see ANYTHING, like an LRC, over no check at all.

Let's make this an attribute of the compression option negotiated and
not required for basic operation.  In high speed synchronous links, the
compression and LAPB may be in hardware.  The prospect of having to
add additional hardware or perform a software CRC calculation on all
of the data is probably not acceptable.

Art


From owner-ppp-comp Mon May 17 16:04:24 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvEDB-0000DIa@daver.bungi.com>; Mon, 17 May 93 16:03 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Mon, 17 May 1993 16:55:13 -0400 (EDT)
Message-ID: <9305172055.AA25002@hobbit.gandalf.ca>
References: <<m0nvAXw-00007zC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> 1. With a method that keeps state between packets, a corrupt packet may
>    affect the 'dictionary', preventing all further communications on the
>    link.

> 2. A corrupt packet passed to the decompressor may do bad things, such as
>    create a packet that is too long. The LZ78 based UNIX compress utility
>    is a good (bad?) example of this - when faced with bogus input, it
>    usually spews a huge string of garbage out, then core dumps.

> 3. The link-level CRC-16 is good at catching single and 2 bit errors,
>    but does not reliably detect longer errors.  High speed modems 
>    are now using scramblers of length 16 or so, meaning that a single
>    baud error on the line will corrupt many bits - reducing the 
>    ability of the CRC-16 to catch the error.

I realize the need for adequate error-correction. 
This is stictly a link-layer issue.  If you feel that LAPB is not good
enough, then we need an ultra-reliable link layer.  

Please don't move functionality to layers where it doesn't belong.

> 4. CRC-32 cannot be implemented on much of the sync hardware that is
>    already in the field.

Sure it can.  In software!

> 5. Not all protocols implement higher-level checksums that will catch
>    decompressor failures, like TCP/IP will.

True.  Some use a checksum of 0xFFFF.  Maybe we can too.  It compresses
quite well :-)

> I don't think that I am paranoid - it is just that most people 
> assume that since there is a CRC on the data, it must be reliable.

Most PPP advocates feel LAPB is overkill.  After all, aren't all PPP
links over digital circuits with an error rate of 10^-12 ?

As soon as a router touches the packet, the (use of a) FCS is destroyed.
One then tends to rely on the hardware/software of the router to deliver
it error-free.

CRC's are very good, but their not perfect.  Then again, no amount of
additional checking will make it perfect.  What order of magnitude, if
any, in the detection of errors does your method achieve.  How does this
compare to FCS 32?  

> There are many things that can fail between a packet coming in good,
> and going out bad.  It costs very little to add a CRC-16 to the compressed
> data stream, and even less for a Fletcher checksum.
> 
> In my prototype, I am actually sending the CRC-16 of the original packet
> down through the compressor, so the number of bits actually used on the
> link is less than 16 in many cases.  

Oh, compressable CRC's.  Either the data isn't too random, or you've got
a real good compressor.  I always thought that CRC's would be white noise
to a compressor.

> This gives me the assurance that
> the data that entered the compressor has a high probability of being the
> same data that left the decompressor.

Yes.  I would kinda agree.  

Can you trust that the CRC was generated on good data or corrupt data?
Was the packet corrupted by the software before passing to through the
compressor?  Is the compression software more or less reliable than the
rest of the router software?

Only the end-systems know for sure.  The fact is, there are so many places
in a bridge/router where an error can occur.  You're only going to solve
one of them, and then think the system is "error-free".
> 
> *ANY* check of the data is better than no check. I would be happy to
> see ANYTHING, like an LRC, over no check at all.

Yes, and no.  Any check costs time and more importantly bandwidth.  PPP
already has too much overhead.  I don't want more. 

At a minimum, any additional checking must be *optional*, and I for one
vote OFF as the default.

In my experience, any error in the data stream tends to cause very weird
behaviour in the decompressor.  One bit error tends to blow up the bridge.
Thank God for watchdogs!  Seriously, a lot of good a checksum will do
here.  

So I would suggest that everyone switch to arithmetic encoders which blow 
up nicely when an error happens :-)



From owner-ppp-comp Mon May 17 20:58:21 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvIoC-000083a@daver.bungi.com>; Mon, 17 May 93 20:57 PDT
X-Path: Novell.COM!Dave_Rand
From: Dave_Rand@Novell.COM (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re:  Compression CRC - needed?
Date: Mon, 17 May 1993 17:52:48 PDT
Message-ID: <9305180052.AA12163@va.SJF.Novell.COM>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

On May 17, 13:54, Art Berggreen wrote:
} Subject: Re:  Compression CRC - needed?
} >
} >*ANY* check of the data is better than no check. I would be happy to
} >see ANYTHING, like an LRC, over no check at all.
} 
} Let's make this an attribute of the compression option negotiated and
} not required for basic operation.  In high speed synchronous links, the
} compression and LAPB may be in hardware.  The prospect of having to
} add additional hardware or perform a software CRC calculation on all
} of the data is probably not acceptable.

Each compression protocol may implement an additional error check,
boundary check, or *some* check (or not) as it sees fit. Hardware 
to do CRC is trivial - but a simpler check would be fine. Even a
test to see if you have expanded past the end of a packet, or
something to check the validity of a new codeword, is better than
nothing. But it is, of course, optional with each new algorithm.
We can support 256 of them, after all.

The core algorithm(s) that are chosen should, in my opinion, have
sufficient tests to have a high degree of confidence in the data.

How much does this *really* cost?  Well, on my implementation, I
can run a T1 link, doing all my extra, wasteful CRC checking, on
a 386, and getting a reasonable compression ratio. All in software.

In a software implementation, you can do the CRC calculation as you
read each byte of the input packet (which you have to do anyway).
This adds one line of C to your compressor: 

	crc16 = (crc >> 8) ^ crctab[(crc ^ new_byte) & 0xff];

Or, if you are writing in assembler, it adds 4 lines of code
(forgive me - it is in 386 assembler):

	mov	bl,dl		; get lsb of old CRC
	xor	bl,new_byte	; xor in new byte
	shr	dx,8		; shift old value right 8 bits
	xor	dx,[ebp+ebx*2]	; ebp points to the crc table

Currently, I do the CRC on the packet, then compress the packet.
The CRC, on a 576 byte packet, takes about 300usec on a 386.
When I move to doing the CRC as I compress the data, I will eliminate
the second pass through the input packet, and my overall time will go
down.

It just doesn't take much time to do a CRC.  It takes even less time
to do a Fletcher, or LRC.  It makes it far more reliable, and we know
that the data path between the compressor and decompressor is good.

I'm not asking everyone to add this to all their hardware (although
I think it is a good idea), nor am I asking everyone to add it to all
software implementations (although I think it is ALSO a good idea).
I am suggesting that it is a *great* idea when talking to two different
implementations of a given algorithm, which will be the base of the
PPP compression.


-- 

From owner-ppp-comp Mon May 17 20:58:22 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvIoC-00002ra@daver.bungi.com>; Mon, 17 May 93 20:57 PDT
X-Path: acc.com!fbaker
From: fbaker@acc.com (Fred Baker)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Mon, 17 May 93 17:28:15 PDT
Message-ID: <9305180028.AA25465@saffron.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

Dave (which one? I'll let you guess!):

>> > 4. CRC-32 cannot be implemented on much of the sync hardware that is
>> >    already in the field.
>> 
>> Sure it can.  In software!

While this is certainly true, I can't say that I'm prepared to run
right out and implement it.

>> Most PPP advocates feel LAPB is overkill.  After all, aren't all PPP
>> links over digital circuits with an error rate of 10^-12 ?

No, there are many PPP conversations over links of lower quality,
especially in the asynchronous and 56 KBPS domains, which together
comprise better than 80% of North American circuits.  Compression is of
the most interest at speeds where analog circuits are common.

In the event that retransmissions are truly unnecessary, the sum total
that LAPB costs you is the additional 12 bytes it takes to keep the
send and receive sequence numbers etc, and the half dozen instructions
required on input and output to determine that the line has once again
proven itself trustworthy.  The startup and shutdown state machines are
about as arduous as PPP's, perhaps less so, and in the big scheme of
things are equally a noise issue.

>> Any check costs time and more importantly bandwidth.
>> PPP already has too much overhead.  I don't want more.

And you want a software CRC-32?

>> So I would suggest that everyone switch to arithmetic encoders which
>> blow up nicely when an error happens :-)

Dave has been doing some work on various encoders, and you have as
well.  Perhaps you could send him an appropriate algorithm to document
and publish?

From owner-ppp-comp Mon May 17 20:58:22 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvIoI-00008La@daver.bungi.com>; Mon, 17 May 93 20:57 PDT
X-Path: bridge2.NSD.3Com.COM!vsp
From: Venkat Prasad <vsp@NSD.3Com.COM>
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed? 
Date: Mon, 17 May 93 18:27:25 -0700
Message-ID: <199305180127.AA21436@himagiri.NSD.3Com.COM>
Reply-To: ppp-comp@bungi.com
Organization: 3Com, 5400 Bayfront Plaza, Santa Clara, CA 95052-8145
Precedence: bulk


>> Since several people have commented, here are my reasons for wanting
>> an additional check - of some kind - on the compressed data stream.

>> 1. With a method that keeps state between packets, a corrupt packet may
>>    affect the 'dictionary', preventing all further communications on the
>>    link.
>> 2. A corrupt packet passed to the decompressor may do bad things, such as
>>    create a packet that is too long. The LZ78 based UNIX compress utility
>>    is a good (bad?) example of this - when faced with bogus input, it
>>    usually spews a huge string of garbage out, then core dumps.

Simply send the number of raw bytes in the compressed packet across. 
If decompression yields a different number of bytes, toss the packet.

>> 3. The link-level CRC-16 is good at catching single and 2 bit errors,
>>    but does not reliably detect longer errors.  High speed modems 
>>    are now using scramblers of length 16 or so, meaning that a single
>>    baud error on the line will corrupt many bits - reducing the 
>>    ability of the CRC-16 to catch the error.

>> 5. Not all protocols implement higher-level checksums that will catch
>>    decompressor failures, like TCP/IP will.

Most likely when decompression fails, the packet may not even look
like a TCP/IP packet.

>> I don't think that I am paranoid - it is just that most people 
>> assume that since there is a CRC on the data, it must be reliable.
>> There are many things that can fail between a packet coming in good,
>> and going out bad.  It costs very little to add a CRC-16 to the compressed
>> data stream, and even less for a Fletcher checksum.

Data compression requires a lot of CPU cycles. As such I am not for adding
additional CRC at the software level. 

>> *ANY* check of the data is better than no check. I would be happy to
>> see ANYTHING, like an LRC, over no check at all.

>> -- 
>> Dave Rand
>> {pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

/Prasad


From owner-ppp-comp Mon May 17 20:58:27 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvIoO-00008Ba@daver.bungi.com>; Mon, 17 May 93 20:57 PDT
X-Path: fcr.com!brad
From: Brad Parker <brad@fcr.com>
To: ppp-comp@bungi.com
Subject: Re: link-level issues 
Date: Mon, 17 May 1993 23:16:12 -0400
Message-ID: <9305180316.AA15788@stemwinder.fcr.com>
References: <<art@opal.acc.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk


>> >The one thing I would "add" to the LAPB protocol would be sending a REJ
>> >if a "garbled" frame is received.  This is optional, and (I think) does
>> >not break anything.  It should help interactive response on slow links.
>> 
>> What is your definition of garbled frame?  Just because a flag gets

I would say a frame with bad fcs (usually due to a fifo overflow).

>> corrected and the receiver starts trying to receive a frame shouldn't
>> cause a REJ to get generated.  Also, I think that such behavior is
>> at odds with ISO HDLC specs and may well break some LAPB conformance
>> tests.

really?  I'm curious how it would break.

>>  Finally, most silicon implementations couldn't be expected
>> to do this.

of course.  but if s/w implementations did, it could help.

Apple Macintosh and IBM PC's with cheap serial chips have little or
no fifo.  Worse, they have unpredictable unpredictable interrupt latency.
(but then, so does 386BSD ;-)   Because of this they tend to suffer from
overruns at an alarming rate.  These overruns, more often than not,
manifest themselves as a FCS errors (and, in many cases the serial chip
will also indicate an overrun, but the O/S & it's driver(s) often get
in the way and obscure this).

Sending A REJ seems like a harmless way for these little systems to indicate
that the last packet was not received correctly.

Would the LAPB conformance tests you indicated claim that the last packet
was not out of sequence and hence should not be rejected?  I can see this
from a strict interpretation, but I would say the conformance test and
real world considerations may conflict on this point.  they spec seems
to say that if the sender gets a REJ, it should resend, right?

MNP uses this same technique to great advantage.  

-brad

From owner-ppp-comp Mon May 17 20:58:33 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvIoE-00008Ja@daver.bungi.com>; Mon, 17 May 93 20:57 PDT
X-Path: Novell.COM!Dave_Rand
From: Dave_Rand@Novell.COM (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Mon, 17 May 1993 18:14:10 PDT
Message-ID: <9305180114.AA12709@va.SJF.Novell.COM>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

On May 17, 16:55, Dave Carr wrote:
} 
} I realize the need for adequate error-correction. 
} This is stictly a link-layer issue.  If you feel that LAPB is not good
} enough, then we need an ultra-reliable link layer.  
} 
} Please don't move functionality to layers where it doesn't belong.

If that were the case, each application would do its own compression,
and we wouldn't need link-level compression :-)

Passing bad data to a decompressor may cause it to emit unbounded amounts
of information (like a circular reference in an LZ78 algorithm). Since
the algorithm doesn't check (and probably can't check) for circular
references, we need to give it some help.  I submit that this is
the correct layer.

} 
} > 4. CRC-32 cannot be implemented on much of the sync hardware that is
} >    already in the field.
} 
} Sure it can.  In software!

No, it can't.  Much of the sync hardware in the world is based on the
Zilog SCC, which simply can't read the last few bits of a user-generated
CRC. Yes, you can send 32 bit CRC's with an SCC, but you can't check them.


} CRC's are very good, but their not perfect.  Then again, no amount of
} additional checking will make it perfect.  What order of magnitude, if
} any, in the detection of errors does your method achieve.  How does this
} compare to FCS 32?  

My method (checking the decompressor output) will fail to catch
<0.0000000233% of errors. Straight CRC-16 will fail to catch 0.0015%
of errors. Straight CRC-32 will fail to catch 0.0000000233% of errors.
I say less than CRC-32, because I am currently performing two
separate checks, in addition to the link-level CRC-16. How much more
these two checks add is open to debate (since I am checking length
and CRC - a topic for tommorow).

} 
} Oh, compressable CRC's.  Either the data isn't too random, or you've got
} a real good compressor.  I always thought that CRC's would be white noise
} to a compressor.

Brain warp. I was thinking of the case where I was testing repeating
patterns, and the CRC's were within the window of the compressor I
was using. Please ignore this - CRC's in the compressed data stream will
almost always be white noise, using 16 or more bits.

} Yes.  I would kinda agree.  
} 
} Can you trust that the CRC was generated on good data or corrupt data?
} Was the packet corrupted by the software before passing to through the
} compressor?  Is the compression software more or less reliable than the
} rest of the router software?

The most unreliable part of the link will of course be the path between
the routers.  My point was, and is, if you don't have a mechanism to
DETECT a failure in the compressor/decompressor, you will no longer be
able to use the link because the compressor and decompressor are out
of sync.


} > 
} > *ANY* check of the data is better than no check. I would be happy to
} > see ANYTHING, like an LRC, over no check at all.
} 
} Yes, and no.  Any check costs time and more importantly bandwidth.  PPP
} already has too much overhead.  I don't want more. 

ANY check is better than no check. Adding ANY form of secondary check will
allow you to close down or re-initialize the link. Any check adds an
insignificant amount of time to a compression algorithm. We are adding,
AT MOST a 0.3% overhead (for 512 byte packets). Are there many compression
algorithms that add less overhead, when faced with uncompressable input
data?

} 
} In my experience, any error in the data stream tends to cause very weird
} behaviour in the decompressor.  One bit error tends to blow up the bridge.
} Thank God for watchdogs!  Seriously, a lot of good a checksum will do
} here.  

This is my point. Here, the checksum will do a LOT of good - preventing the
bridge from blowing up (core dumping :-).




-- 

From owner-ppp-comp Mon May 17 23:10:18 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvKsK-0000EGa@daver.bungi.com>; Mon, 17 May 93 23:10 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: link-level issues
Date: Mon, 17 May 1993 23:10:03 PDT
Message-ID: <m0nvKsF-0000OfC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: link-level issues" on May 17, 13:49, Art Berggreen writes:]
> >> 
> >> What other protocols have been thinking about?
> >
> >
> >I was considering some other negative-acknowledge protocols, such as
> >a ZMODEM-type protocol...
> 
> Let's please stick with mainstream solutions that have silicon options.
> (like LAPB)
> 

Agreed.

Do we have agreement on:

LAPB
Negotiate modulus: 8 or 128
Negotiate window: 1 to 128


-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Mon May 17 23:58:01 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvLcC-00004va@daver.bungi.com>; Mon, 17 May 93 23:57 PDT
X-Path: mail.barrnet.net!brian
From: brian@mail.barrnet.net
To: ppp-comp@bungi.com
Subject: Re: link-level issues
Date: Mon, 17 May 1993 23:14:13 -0800
Message-ID: <9305180614.AA18410@Angband.Stanford.EDU>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

>I don't agree. In my experience, the reason for dropped packets is an
>out of resources situation on the receiving end. This causes packets
>to back up on the TX end until the retransmit timeout, at which time
>another burst of packets are sent, triggering another dropped packet,
>etc, etc.
>
>-- 
>Dave Rand
>{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

Sounds like an implementation problem to me.  Using a selective
retransmission algorithm to solve the receiver's buffer starvation problem
strikes me as similar to trying to drive a screw with a hammer.  

RNR is useful but if you have an inherant buffer starvation problem in the
receiver, any excess in latency can cause the sender to send one frame too
many before it receives the RNR thus necessitating a retransmission.  RNR
makes sense as a solution to a transient buffer starvation problem but not
a solution to insufficient receive buffers from the get-go.

Mr. Carr suggested. "In the spirit of PPP, let's negotiate the LAPB window
size."  That makes more sense as a solution to buffer starvation than does
selective retransmission or reliance on RNR.

Brian Lloyd                                       3420 Sudbury Road
brian@lloyd.com                                   Cameron Park, CA  95682
brian@mail.barrnet.net                            (916) 676-3442 - fax
(415) 725-1392                                    (916) 676-1147 - voice


From owner-ppp-comp Tue May 18 09:19:48 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvUNm-0000BVa@daver.bungi.com>; Tue, 18 May 93 09:19 PDT
X-Path: acc.com!fbaker
From: fbaker@acc.com (Fred Baker)
To: ppp-comp@bungi.com
Subject: Re: link-level issues
Date: Tue, 18 May 93 09:01:44 PDT
Message-ID: <9305181601.AA25676@saffron.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

>> Do we have agreement on:

>> LAPB
>> Negotiate modulus: 8 or 128
>> Negotiate window: 1 to 128

I think so. A suggestion: if the negotiated window < 8, use modulo 8,
else use modulo 128.  We only REALLY need to negotiate the window.

And actually, as long as we agree on the modulus, there is no
requirement that the windows be equal.  The negotiation could (in my
opinion, should) read:

	I am willing to receive <window> frames without acknowledging

and if an implementation configured for modulo 8 receives a CONFIGURE
REQUEST indicating window > 7, it should NAK with a window of 7.

From owner-ppp-comp Tue May 18 09:19:53 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvUNp-00002La@daver.bungi.com>; Tue, 18 May 93 09:19 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Tue, 18 May 1993 09:22:32 -0400 (EDT)
Message-ID: <9305181322.AA08486@hobbit.gandalf.ca>
References: <<9305180114.AA12709@va.SJF.Novell.COM>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> 
> If that were the case, each application would do its own compression,
> and we wouldn't need link-level compression :-)

An old argument, but a goody.  I don't care to open that can o' worms.
> 
> Passing bad data to a decompressor may cause it to emit unbounded amounts
> of information (like a circular reference in an LZ78 algorithm). Since
> the algorithm doesn't check (and probably can't check) for circular
> references, we need to give it some help.  I submit that this is
> the correct layer.
> } 
> } > 4. CRC-32 cannot be implemented on much of the sync hardware that is
> } >    already in the field.
> } 
> } Sure it can.  In software!
> 
> No, it can't.  Much of the sync hardware in the world is based on the
> Zilog SCC, which simply can't read the last few bits of a user-generated
> CRC. Yes, you can send 32 bit CRC's with an SCC, but you can't check them.

Damn Zilog devices.  Take a simple serial device and let Zilog make it, poof.
They could screw up the Lord's prayer.  Seriously, we happen to use the Zilog
16C30.  Errata city.

But you could do a 30-bit CRC then, ignoring the last 2 bits.  But I'll
conceed the point to Dave.
> 
> ANY check is better than no check. Adding ANY form of secondary check will
> allow you to close down or re-initialize the link. Any check adds an
> insignificant amount of time to a compression algorithm. We are adding,
> AT MOST a 0.3% overhead (for 512 byte packets). Are there many compression
> algorithms that add less overhead, when faced with uncompressable input
> data?

Yes.  I can add a *fraction* of a bit to indicate uncompressable data.
Of course, I don't have to pad to a byte boundary since I'm using an HDLC
controller that supports residuals.  Damn protocols (PPP included) that waste 
bits padding.

Your calculation must be on the uncompressed data.  At 4:1, an extra 16-bit 
CRC is 1.6% of link bandwidth.
> 
> } 
> } In my experience, any error in the data stream tends to cause very weird
> } behaviour in the decompressor.  One bit error tends to blow up the bridge.
> } Thank God for watchdogs!  Seriously, a lot of good a checksum will do
> } here.  
> 
> This is my point. Here, the checksum will do a LOT of good - preventing the
> bridge from blowing up (core dumping :-).

Only if it gets a chance to be calculated :-)  I guess then you need also a 
mechanism for resetting the compressor.  On a multi-link, this could involve
tearing down all the links, or issuing a multi-link reset.

Seriously, I could see adding a length of uncompressed data to the packet.
I should also be free to compress the uncompress length.  But then again,
I already do that for 802.3 MAC types.


From owner-ppp-comp Tue May 18 09:19:59 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvUNl-0000B1a@daver.bungi.com>; Tue, 18 May 93 09:19 PDT
X-Path: opal.acc.com!art
From: art@opal.acc.com (Art Berggreen)
To: ppp-comp@bungi.com
Subject: Re: link-level issues
Date: Tue, 18 May 93 08:50:25 PDT
Message-ID: <9305181550.AA07295@opal.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

>> Let's please stick with mainstream solutions that have silicon options.
>> (like LAPB)
>> 
>
>Agreed.
>
>Do we have agreement on:
>
>LAPB
>Negotiate modulus: 8 or 128
>Negotiate window: 1 to 128

I vote for window negotiation anywhere between 1 and 127 (128 is illegal).

Art


From owner-ppp-comp Tue May 18 10:29:24 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvVTC-0000Dma@daver.bungi.com>; Tue, 18 May 93 10:28 PDT
X-Path: opal.acc.com!art
From: art@opal.acc.com (Art Berggreen)
To: ppp-comp@bungi.com
Subject: Re: link-level issues
Date: Tue, 18 May 93 08:44:55 PDT
Message-ID: <9305181544.AA07288@opal.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk


>>> What is your definition of garbled frame?  Just because a flag gets
>
>I would say a frame with bad fcs (usually due to a fifo overflow).

Hmm, most of the bit-stuffing chip I've dealt with have the FCS calculation
in the serial interface and continue to process the FCS properly even if
the FIFO overflows.  The overrun is reported as a separate error bit from
FCS error.

>>> corrected and the receiver starts trying to receive a frame shouldn't
>>> cause a REJ to get generated.  Also, I think that such behavior is
>>> at odds with ISO HDLC specs and may well break some LAPB conformance
>>> tests.
>
>really?  I'm curious how it would break.

Over time, formal conformance suites seem to be getting more and more
specific.  The GOSIP-2 tests based on ISO-8882 are extremely specific.
Many of these test send various types of bad packets (misaddressed, short,
overlength, illegal control field, bad FCS, etc) and expect such packets to
be silently discarded.
>
>>>  Finally, most silicon implementations couldn't be expected
>>> to do this.
>
>of course.  but if s/w implementations did, it could help.
>
>Apple Macintosh and IBM PC's with cheap serial chips have little or
>no fifo.  Worse, they have unpredictable unpredictable interrupt latency.
>(but then, so does 386BSD ;-)   Because of this they tend to suffer from
>overruns at an alarming rate.  These overruns, more often than not,
>manifest themselves as a FCS errors (and, in many cases the serial chip
>will also indicate an overrun, but the O/S & it's driver(s) often get
>in the way and obscure this).
>
>Sending A REJ seems like a harmless way for these little systems to indicate
>that the last packet was not received correctly.

A REJ specifically means that the sequence number on the last I-frame was
in-window but not the one expected, not that an arbitrary error occurred.
In most other cases an unexpected frame (with good FCS) will cause a FRMR
response.

>
>Would the LAPB conformance tests you indicated claim that the last packet
>was not out of sequence and hence should not be rejected?  I can see this
>from a strict interpretation, but I would say the conformance test and
>real world considerations may conflict on this point.  they spec seems
>to say that if the sender gets a REJ, it should resend, right?

ISO-7776 4.4.3:
    "Any frame received from the DCE/remote DTE which is invalid
     (see 3.8) shall be discarded by the DTE and no action shall
     be taken as a result of that frame."

     (3.8 includes FCS errors)
>
>MNP uses this same technique to great advantage.  

MNP is not LAPB.

>-brad

Art


From owner-ppp-comp Tue May 18 12:28:00 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvXKK-0000Nsa@daver.bungi.com>; Tue, 18 May 93 12:27 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Tue, 18 May 1993 10:18:55 -0400 (EDT)
Message-ID: <9305181418.AA19874@hobbit.gandalf.ca>
References: <<9305180028.AA25465@saffron.acc.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> >> Most PPP advocates feel LAPB is overkill.  After all, aren't all PPP
> >> links over digital circuits with an error rate of 10^-12 ?
> 
> No, there are many PPP conversations over links of lower quality,
> especially in the asynchronous and 56 KBPS domains, which together
> comprise better than 80% of North American circuits.  Compression is of
> the most interest at speeds where analog circuits are common.

You don't have to sell me on LAPB Fred.  I've been doing compression on
muxes and bridges for 5 years.  We are one of the few companies, including
ACC that have LAPB over PPP working.
> 
> >> Any check costs time and more importantly bandwidth.
> >> PPP already has too much overhead.  I don't want more.
> 
> And you want a software CRC-32?

No. I don't want a software CRC.  We happen to have hardware that can do it.
But for those people that just can't do it hardware, the option is (or almost)
is there (SCC).  
> 
> >> So I would suggest that everyone switch to arithmetic encoders which
> >> blow up nicely when an error happens :-)
> 
> Dave has been doing some work on various encoders, and you have as
> well.  Perhaps you could send him an appropriate algorithm to document
> and publish?

We intend to sell our code, not give it away.  However, the price is reasonable
and is a one time license.  Refrain from asking about the cost and terms just
yet, as management is tossing around the details.

The algorithm is an LZ77 (history buffer) front end, followed by arithmetic
encoding.  It is CPU intensive, but produces the highest compression ratio
of any algorithm I tried.  Note, on Dave Rands test, this FZA algorithm
scored 2.67:1, while HPACK and ZIP scored 2.7x.  

Here are results of FTPing the Calgary Corpus over our 5220 bridge.  This
testing was performed on the released software, an I've added some goodies
since then.  The numbers are compared relative to ZIPping (Info-ZIP 1.9) the 
file before transmission on an uncompressed link.  


                     Size    ZIPped     ZIP    5220       ZIP      5220     5220/
                                      Ratio   Ratio  KBytes/s  KBytes/s       ZIP
CALGARY CORPUS 

bib                111261     35142     3.2     2.7      23.7      20.0      0.84
book1              768761    313459     2.5     2.3      18.4      17.0      0.92
book2              610856    206736     3.0     2.7      22.2      20.0      0.90
geo                102400     68575     1.5     1.5      11.2      11.0      0.98
news               377109    144570     2.6     2.4      19.6      18.0      0.92
obj1                21504     10407     2.1     2.0      15.5      15.0      0.97
obj2               246814     81678     3.0     2.8      22.7      21.0      0.93
        obj.tar    278528     92197     3.0     2.8      22.7      21.0      0.93
paper1              53161     18657     2.8     2.7      21.4      20.0      0.94
paper2              82199     29838     2.8     2.5      20.7      19.0      0.92
paper3              42526     18180     2.3     2.5      17.5      19.0      1.08
paper4              13286      5621     2.4     2.5      17.7      19.0      1.07
paper5              11954      5080     2.4     2.5      17.6      19.0      1.08
paper6              38105     13314     2.9     2.8      21.5      21.0      0.98
      paper.tar    253952     88191     2.9     2.7      21.6      20.0      0.93
pic                513216     56028     9.2     7.7      68.7      58.0      0.84
progc               39611     13356     3.0     2.7      22.2      20.0      0.90
progl               71646     16357     4.4     4.0      32.9      30.0      0.91
progp               49379     11310     4.4     4.0      32.7      30.0      0.92
       prog.tar    160636     40916     3.9     3.9      29.4      29.0      0.98


From owner-ppp-comp Tue May 18 12:29:08 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvXLP-0000Mka@daver.bungi.com>; Tue, 18 May 93 12:28 PDT
X-Path: fcr.com!brad
From: Brad Parker <brad@fcr.com>
To: ppp-comp@bungi.com
Subject: Re: link-level issues 
Date: Tue, 18 May 1993 14:58:16 -0400
Message-ID: <9305181858.AA17747@stemwinder.fcr.com>
References: <<art@opal.acc.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk


>> >>> What is your definition of garbled frame?  Just because a flag gets
>> >
>> >I would say a frame with bad fcs (usually due to a fifo overflow).
>> 
>> Hmm, most of the bit-stuffing chip I've dealt with have the FCS calculation
>> in the serial interface and continue to process the FCS properly even if
>> the FIFO overflows.  The overrun is reported as a separate error bit from
>> FCS error.

Perhaps if I state my intentions, my comments will be more clear.

I am interested in/involved with/implementing PPP on end-systems and
on servers.  None of the end-systems or servers I use have hardware
which can do LAPB.  Most have 8530 or 8250 class serial chips
(16550A's are a luxury).

I have done several implementations of PPP and all have done FCS in 
software.  All have suffered from serial overruns which caused FCS errors.
(one could signal overruns from the chip, one could not)

When bytes are dropped the frame will have a bad FCS and the upper level
protocols are left to retransmit.  With some protocols this is not a
win (in fact, it's a big loose).  So, LAPB looks like a nice way to
ensure more "reliability" in the data link.  Along with reliability,
I'd like some measure of responsiveness so the upper level protocols
don't have to compensate for a lossy link (since appletalk and ipx are
tuned for LANs, not WANs)

>> Over time, formal conformance suites seem to be getting more and more
>> specific.  The GOSIP-2 tests based on ISO-8882 are extremely specific.
>> Many of these test send various types of bad packets (misaddressed, short,
>> overlength, illegal control field, bad FCS, etc) and expect such packets to
>> be silently discarded.

I understand this.  If you want to be strictly conformant what I am
proposing won't pass this test.  However, does the spec specifically
prohibit sending a REJ frame in response to a frame with bad FCS?
(perhaps it does, I'm not sure)

>> >Sending A REJ seems like a harmless way for these little systems to indicate
>> >that the last packet was not received correctly.
>> 
>> A REJ specifically means that the sequence number on the last I-frame was
>> in-window but not the one expected, not that an arbitrary error occurred.
>> In most other cases an unexpected frame (with good FCS) will cause a FRMR
>> response.

I understand, but FRMR specifically will not cause a retransmit.  what I
desire is a quick retransmission of the last frame.

Is sending a REJ in this case prohibited?  Will it break an implementation?

>> ISO-7776 4.4.3:
>>     "Any frame received from the DCE/remote DTE which is invalid
>>      (see 3.8) shall be discarded by the DTE and no action shall
>>      be taken as a result of that frame."
>> 
>>      (3.8 includes FCS errors)

yes, I understand and I agree that is what the spec says. And I agree
that a 100% percent conformant implementation would not do what I am
proposing.  However, being 100% conformant would only make sense if
the spec where perfect and completely solved the problem.

[sorry, this was not intended to be a flame]

>> >MNP uses this same technique to great advantage.  
>> 
>> MNP is not LAPB.

Yes, but MNP's retransmission strategy makes it nice for end-systems which
drop characters.

I am not so much interested in a 100% conformant LAPB as I am in getting
more reliability in the link to compensate for poorly designed transport
protocols.  I can't fix the transport protocols, but I can control the 
data link they ride on.

You'll laugh, but in order to get all those Macintosh users to stop
using proprietary ARA and start using PPP, it has to provide a good
"user experience".  In order to work well for Appletalk (and IPX, I'll
claim), the data link must be responsive and reliable. I wish I could
rewrite the transports to make this not true, but I can't.  I wish I
could add better serial port to every Mac and PC out there, but I
can't do that either.  

I am hoping I can win with LAPB, but I'm not sure waiting for a
retransmit at the LAPB level is any better (in terms of throughput)
than waiting for a retransmit at the Appletalk transport layer.  One
of the "nice" things about ARA is that since it rides on MNP, it gets
this "quick retransmit when receiver overruns" behavior.  This keeps
the Appletalk protocols from needing to retransmitting (they do
anyway, but it's for other reasons - a different matter).

Am I alone on this?  Anyone else doing dial-in clients for PPP which
run on Mac or PC's?

-brad

From owner-ppp-comp Tue May 18 12:45:34 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvXbK-0000Mja@daver.bungi.com>; Tue, 18 May 93 12:45 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: compression document from Novell
Date: Tue, 18 May 1993 12:45:19 PDT
Message-ID: <m0nvXbD-0000OTC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

The following document was written for the purpose of evaluating
a number of compression schemes for inclusion into the Novell 
MPR product.  I have mangled the text from the original Microsoft
Word format to troff, and while I'm better at troff than I am at
Word - there may be erros^Hrs :-)

Enjoy... flames to me, ideas to the group, please.

.BS 
\l'6i'
.BE
.deTP
'sp
.)K
.af;P \\gP
.afP 1
.nr;P \\nP
.afP \\g(;P
.af;P 1
.ie\\n(Pv \{\
.ie(\\n(Pv=1)&(\\n(;P>1) 'sp 2
.el\{\
.ce
.ul
PRIVATE
.sp\} \}
.el'sp 2
.if!\\n(;P-1 .if \\nN 'sp
.if!\\n(;P-1 .if \\n(:S .tl \\*(}t
.if!\\n(;P-1 .if !\\nN .tl \\*(}t
.if\\n(;P-1 .ie \w'\\*(]n' .tl '\\*(]n - \\nP'''
.el.tl \\*(}t
\l'6i'
.sp
..
.nr Cl 4
.nr Hs 2
.nr Hb 3
.ds HF 3 3 3 3 3 3 3
.ds HP 12 11 10 10 10 10 10
.nr Ej 1
.PH "'Novell, Inc.'Compression for Wide Area Networks''"
.PF "'Introduction'' \\\\nP'"
.H 1 "Introduction
This short paper will present a number of compression options and 
alternatives.  It is hoped that it will provide ample information to 
choose an appropriate compression algorithm and method for 
implementation in the Novell Wide Area Networking product.

It is desired to perform some type of compression on the data to be 
transmitted to a remote site to increase the apparent bandwidth of 
the (probably) low-speed link.  This compression can take many 
forms, and can be applied at one or more layers of a networking 
protocol.  In this paper, implementation at the data link layer 
will be explored.

In selecting a compression algorithm, it is necessary to consider the 
following items:

.BL
.LI "Compression ratio"
.br
This is the ratio of the data emitted from the compression routine 
versus the data received by the routine.  For example, 300,000 bytes 
received and 100,000 bytes transmitted would be a 3:1 compression ratio.

.LI "Speed"
.br
The algorithm's speed must be considered, 
especially when Novell protocols are involved.

.LI "Size of memory required to implement the protocol"
.br
The amount of dictionary space required in RAM for each protocol.
.LE

.H 2 "Description of Method"

To test the various compression methods, some representative 
data was required.  This data was obtained from a public domain 
collection of binary, text and image files, collected by T.C. Bell 
and J.G. Cleary.  Collectively, these files are referred to as the 
Calgary/Canterbury text compression corpus.  This corpus is used 
in the book:

.in 1i
Bell, T.C., Cleary, J.G. and Witten, I.H. Text compression.
Prentice Hall, Englewood Cliffs, NJ, 1990
.in 0

and in the survey paper

.in 1i
Bell, T.C., Witten, I.H. and Cleary, J.G. "Modeling for text
compression," Computing Surveys 21(4): 557-591; December 1989
.in 0

The corpus is used to evaluate the practical performance of various 
text compression schemes.  Several other researchers are now using the 
corpus to evaluate text compression schemes.

Nine different types of text are represented, and to confirm that 
the performance of schemes is consistent for any given type, many of 
the types have more than one representative.  Normal English, 
both fiction and non-fiction, is represented by two books and 
papers (labeled book1, book2, paper1, paper2, paper3, paper4, 
paper5, paper6).  

More unusual styles of English writing are found in a bibliography 
(bib) and a batch of unedited news articles (news). Three computer 
programs represent artificial languages (progc, progl, progp). A 
transcript of a terminal session (trans) is included to indicate 
the increase in speed that could be achieved by applying compression 
to a slow line to a terminal.  All of the files mentioned so 
far use ASCII encoding.  Some non-ASCII files are also 
included: two files of executable code (obj1, obj2), some 
geophysical data (geo), and a bit-map black and white 
picture (pic).  The file geo is particularly difficult to 
compress because it contains a wide range of data values, 
while the file pic is highly compressible because of large 
amounts of white space in the picture, represented by long 
runs of zeros.

More details of the individual texts are given in the book 
mentioned above. Both book and paper give the results of 
compression experiments on these texts.

The corpus itself constitutes files bib, book1, book2, geo, news, 
obj1, obj2, paper1, paper2, paper3, paper4, paper5, 
paper6, pic, progc, progl, progp and trans.  (The book and paper 
above do not give results for files paper3, paper4,
paper5 or paper6.)

.TB "Input File descriptions"
.TS
box, center;
l | l | l | l | l
l | l | n | n | n.
Filename	Description	Lines	Words	Characters
_
bib	Bibliographic files (refer format)	6280	19274	111261 
book1	Hardy: Far from the madding crowd	16622	141274	768771 
book2	Witten: Principles of computer speech	15634	101221	610856 
geo	Geophysical data	18	617	102400 
news	News batch file	10059	53939	377109 
obj1	Compiled code for Vax: compilation of progp	87	495	21504 
obj2	T{
Compiled code for Apple Macintosh: 
Knowledge support system
T}	1213	4600	246814
paper1	T{
Witten, Neal and Cleary: 
Arithmetic coding for data compression
T}	1250	8512	53161 
paper2	Witten: Computer (in)security	1731	13829	82199 
paper3	Witten: In search of "autonomy"	1100	7219	46526 
paper4	Cleary: Programming by example revisited	294	2166	13286 
paper5	Cleary: A logical implementation of arithmetic	320	2099	11954 
paper6	T{
Cleary: Compact hash tables using 
bidirectional linear probing
T}	1019	6753	38105 
pic	T{
Picture number 5 from the CCITT 
Facsimile test files (text + drawings)
T}	0	49	513216
progc	C source code: compress version 4.0	1487	6313	39611 
progl	Lisp source code: system software	2244	9235	71646 
progp	T{
Pascal source code: prediction by partial 
matching evaluation program
T}	1966	4847	49379 
trans	Transcript of a session on a terminal	2737	9255	93695 
Total		64061	391697	3251493
.TE


These files, on their own, do not correctly represent traffic 
between routers.  They do, however, give a good base indication of 
common types of files that are likely to occur.  To more 
closely represent real traffic, a separate program was 
written to combine all 20 of these files into a single data 
stream, with PPP (Point-to-Point Protocol); 
IPX (Internetwork Packet eXchange); and NCP 
(Netware Core Protocol) headers prepended to the data.  Data from 
each of the 20 files was read, 512 bytes at a time, until the 
data was exhausted.  The 20 separate data streams were 
linear multiplexed into a single data stream, of 3,531,333 bytes 
(including headers).  For this (contrived) example, 
the IPX and NCP overhead was about 8%.  The program in the 
following figure produced this data stream.



.FG "Makepack.c"
.ft CW
.nf
#include <stdio.h>
#include "ipx.h"
#define NUMHOSTS 18

struct NCPPACK Hosts[NUMHOSTS];

char *Files[NUMHOSTS] = {
"../files/bib","../files/book1","../files/book2","../files/geo",
"../files/news","../files/obj1","../files/obj2","../files/paper1",
"../files/paper2","../files/paper3","../files/paper4","../files/paper5",
"../files/paper6","../files/pic","../files/progc","../files/progl",
"../files/progp","../files/trans"};

unsigned long Filepos[NUMHOSTS];


main(argc,argv)
int argc;
char **argv;
{
    int x,y,done;
    FILE *fo;
    struct ipx *ipx;
    struct ncp *ncp;
    if (argc < 2) {
	fprintf(stderr,"Usage: %s outfile\n",argv[0]);
	fprintf(stderr,"Builds an output file with PPP/IPX/NCP headers 		from the standard\n");
	fprintf(stderr,"compression benchmark test\n");
	fprintf(stderr,"Version 1.00  Tue Nov 10 13:34:02 1992      		Dave Rand\n");
	exit(1);
    }
    for (x=0; x< NUMHOSTS; x++) {
	ipx = &Hosts[x].ipx;
	ipx->ipx_sum = 0xffff;
	ipx->ipx_len = 512 + sizeof(struct ncp) + sizeof(struct ipx);
	ipx->ipx_tc = 0;
	ipx->ipx_pt = NW_PROTO_NCP;
	ipx->ipx_dna.x_net = rand() + (rand() << 16);
	for (y=0; y<6; y++)
	    ipx->ipx_dna.x_node[y] = rand();
	ipx->ipx_dna.x_sock = NW_SOCK_FS;
	ipx->ipx_sna.x_net = rand() + (rand() << 16);
	for (y=0; y<6; y++)
	    ipx->ipx_sna.x_node[y] = rand();
	ipx->ipx_sna.x_sock = NW_SOCK_FS;
	ncp = &Hosts[x].ncp;
	ncp->ncp_op = NCP_REPLY;
	ncp->ncp_seq = 0;
	ncp->ncp_conn = rand();
	ncp->ncp_task = rand();
	Filepos[x] = 0L;
    }
    
    fo = fopen(argv[1],"wb");
    if (fo == NULL) {
	fprintf(stderr,"Can't open file %s for output!\n",argv[1]);
	perror(argv[0]);
	exit(2);
    }
    done = 0;
    while (done != NUMHOSTS) {
	done = 0;
	for (x=0; x < NUMHOSTS; x++) {
	    if (Filepos[x] == -1L) {
		done++;
	    } else {
		packit(Files[x],&Filepos[x],&Hosts[x],fo);
	    }
	}
    }
    fclose(fo);
}

unsigned char buf[512];
packit(file,pos,host,fo)
char *file;
unsigned long *pos;
struct NCPPACK *host;
FILE *fo;
{
    FILE *fi;
    int size;
    fi = fopen(file,"rb");
    if (fi == NULL) {
	fprintf(stderr,"Can't open %s - continuing\n",file);
	*pos = -1L;
	return(1);
    }
    fseek(fi,*pos,0);
    size = fread(buf,1,512,fi);
    if (size != 512)
	*pos = -1L;
    else
	*pos = ftell(fi);
    fprintf(fo,"\377\003");
    fwrite(host,sizeof(struct NCPPACK),1,fo);
    fwrite(buf,size,1,fo);
    host->ncp.ncp_seq++;
    fclose(fi);
}
.R
.fi



.PF "'Test Results'' \\\\nP'"
.H 1 "Test Results"

The results of running the various compress programs are 
presented in the following table. 

.TB "Test results"
.TS
box,center;
l | l | l | l | l | l | l | l
l | l | n | n | n | n | n | n.
Compression Program	Method	Max	ratio	Output size	Ratio	Time	K/sec	RGF
_
HPACKA 0.75c0	LZA		1272999	2.774	307	11.50	4146
Info-ZIP 1.9	LZSSA	905	1297993	2.721	90	39.24	14422
ARJ 2.30NG	LZSSA	1658	1313249	2.689	58	60.89	22642
LHA 2.13	LZAH	1883	1470858	2.401	66	53.51	22286
AR002	LZAH	1859	1470865	2.401	176	20.06	8357
HYPACK 2.5	LZAH	23810	1528534	2.310	47	75.13	32522
PKZIP 1.10	LZSF	519	1549210	2.279	53	66.63	29231
Unix COMPRESS	LZW	550	1580342	2.235
LZ	LZH	48	1641731	2.151	274	12.89	5992
DWC A5.01	LZW	532	1747822	2.020	11	321.03	158892
UNIX ARC	LZW	541	1872769	1.886
PKPAK 3.61	LZW	541	1912705	1.846	19	185.86	100668
LARC 3.33	LZSS	8	1914044	1.845	84	42.04	22786
MDCD 1.00	LZW	515	1917596	1.842	35	100.90	54788
ZOO 2.10	LZW	503	1917643	1.841	80	44.14	23971
SQZ 1.08.2	LZW		1937345	1.823	270	13.08	7175
STAC 3.1 /p255	LZS	30	1949600	1.811	22	160.52	88618
Sea ARC	LZW	3003	2024142	1.745
PRED 1.02	Prediction	8	2117640	1.668	5	706.27	423728
V.42bis (12 bit)	LZW	471	2274812	1.552	65	54.39	34997
STAC 3.1 /p0	LZS	30	2314985	1.525	13	271.64	178076
NSWP3 1.020	RLE-Huffman	336	2339012	1.510	46	76.77	50848
SPLAY 1.00	Splay-tree	8	2426472	1.455	71	49.74	34176
RLE 1.00	RLE-only	86	3157471	1.118	5	706.27	631494
Original input file		3531333	1.000
.TE

Compression program is the program used to evaluate the particular 
compression method.  Notice that the same algorithm implemented by 
different programs can produce different results.  This 
usually reflects the programmer's choice of string search algorithm.

Method is the generic compression method applied to the input file.

Max ratio is the best-case compression ratio of the algorithm 
expressed. It was determined by compressing a 1,000,000 byte 
file filled with hex 00 characters.

Output size is the total output file size, including any 
header information required to correctly decompress the file.

Time is the time required to compress the given file. In all 
cases, decompression time was significantly better than 
compression time, and is therefore not given in this table.

K/sec is the Kilobytes per second rate of the algorithm, as 
seen from the input side (the ratio of the input file size 
and the execution time).

RGF is the Rand Goodness Factor for the particular algorithm. 
Higher numbers are better, and (coincidentally) 
illustrate the expected bytes-per-second of output bandwidth 
achievable with each implementation. For example, a 
RGF of 4,146 would be an excellent choice for a 19,200 
bps async modem channel, but not very effective on a T1 
(throughput would be lower than without compression, 
on this test data). The RGF is not a cure-all number; the 
compression ratio, the destination link bandwidth, and 
the input data set must be considered.

All of these algorithms, in their current form 
(most of them are implemented in C or Pascal) are capable of 
performing well on a <64Kbps link. The slower 
algorithms (LHA) will show an adequate-to-very good 
compression ratio.  Faster links are a problem, however. 
Few of the algorithms in their current form are adequate 
for medium to high speed links (>1Mbps, <10Mbps). 


.PF "'Link issues'' \\\\nP'"
.H 1 "Link issues

For any of these compression algorithms to be implemented, a 
reliable link between the source and destination 
must be implemented.  This is required due to the 
fact that even a one bit change in the compressed data stream 
may affect hundreds or thousands of bytes in the 
decompressed data stream.  

There are many forms of reliable data delivery systems 
available.  I suggest that the best one for the task at hand is 
the LAPB (Link Access Protocol B) standard.  This 
protocol has low overhead, known robustness, and is quite easy 
to implement.  It is also the protocol under review by the 
PPP compression group - though not accepted as a 
standard at this time.

In order to maximize the bandwidth of the link, and to 
allow for dynamic and transparent on-demand allocation of 
additional dial-up links (including load sharing), the 
upper-level packet structure will NOT be maintained through 
to the data link level.  Since a packet may (potentially) 
compress to just a few bits, it is worthwhile to re-packetize 
the data. This offers better line utilization 
(due to larger 2K packets), lower per-packet overhead, with 
no increase in latency.

By observing the `backlog' of packets outstanding to 
send on a link, line utilization figures can be obtained.  When 
the line utilization is at (for example) 150%, a new link 
can be established to the target system.  As more and more 
links are established to the target system, the bandwidth 
increases linearly.  By watching the highest-numbered line 
(most recently established) traffic, that link can be 
closed when the traffic through it falls below some threshold 
level.  Notice as well that latency is reduced somewhat as 
additional links are added.  Links that are dropped due to 
line failure need not be re-established, unless they are 
the only link to the target system.  Outbound traffic is always 
sent starting at the lowest numbered line (least 
recently established), and traffic is sent through additional lines 
only if warranted.

The following figure shows a single link in action

.FG "Data Compressor Diagram

(not included)

 
As can be seen, the data compressor resides in the TSM.  
This allows the TSM to open and establish additional 
links to the target system by connecting to additional WHSM's.

Data will be sent to the compressor in the form of a 
two-byte (network order) length ( of data in packet), the 
contents of the packet, and a CRC-16 as found in RFC-1331. The 
cost of computing this CRC is less than 0.5 
milliseconds for a 1.5K input packet.  The addition of the 
CRC-16 permits the compress/decompress pair to know 
when they are out of synchronization, and force a 
reset operation. It also allows for errors that may occur between 
the WHSM and the TSM.

LAPB will be performed with CRC-32, as found in the 
LAPB specification.  This ensures the maximum reliability 
in data transmission between source and destination, with a very 
slight (<0.1 %) increase in data size.  In the case 
of synchronous data transmission, it will also offer faster 
throughput due to software calculation of the CRC.  In 
the current hardware, it will provide almost a 2% increase in speed.


.H 2 "CRC 16/CRC 32 Computation Overhead"

CRC 16 can also be referred to as the HDLC FCS, or more 
properly, CRC-CCITT.  As described in RFC-1331, this 
expresses the polynomial (omitted).  CRC 16 is useful for catching 
all single bit, all two bit, most 
burst errors of less than 16 bits, and some larger 
errors in data streams of less than 32,767 bits in length (4K byte 
frames), with a failure rate of 0.0015%.  Multi-bit 
errors drastically reduce its effectiveness.

CRC 32 catches all single bit, two bit, and burst errors 
less than 31 bits. In packets of less than 2,147,483,648 bits 
(256 Mbytes), it catches larger errors with a failure 
rate of  0.0000000023%. It expresses the polynomial (omitted). 
CRC 32 is used commonly in Ethernet applications, and (while it 
looks more complex) is just as easy to compute as CRC 16.  

To compute CRC 16, one initializes a table of possible 
results of each of the 256 possible input states, initializes a 
variable to 0xFFFF, and iterates over the data with the following code:

.ft CW
fcs = (fcs >> 8) ^ fcstab[(fcs ^ DATA_BYTE) & 0xff];
.R

When all of the data has been fed through the CRC generator, the 
variable fcs will contain the CRC 16 value for the code. It is normal 
practice to complement this, and append it to the data. In the 
case of CRC 16, when this is done the appended (complemented) CRC 
can also be fed through the CRC generator. If the result of this is 
0xF0B8, the frame is good.  The CRC can also be compared to the CRC 
in the stream, and not run through the CRC generator (if desired).

An implementation of this in 8086 assembler would be as follows:

.ft CW
.nf
; AL has new byte to be computed
;
	mov	cx,fcs		; get current FCS value
	mov	bl,cl		; get lsb of fcs
	xor	bl,al		; xor in current data byte
	xor	bh,bh		; value is in the range 00-FF
	shl	bx,1		; multiply bx by 2 to get word index
	mov	ax,fcstab[bx]	; get new value from table
	xor	al,ch		; xor old fcs value
	mov	fcs,ax		; save current fcs

.R
.fi


In theory, this code should run in (1+1+1+1+1+1+1 = 7) 7 clocks on a 
486, for a throughput of 4.7 megabytes per second (or about 325 
microseconds for a 1.5Kbyte packet) on a 33 MHz 486 system. 
On an 8088, this routine takes (19 +2+2+2+2+22+19 = 68) 68 clocks, 
for a throughput of 70 Kbytes/sec (or 22 milliseconds for a 1.5Kbytes 
packet). Note that in the case of the 486, the throughput 
exceeds that of the ISA bus. Of course, this can be 
optimized - placing the FCS in a register will raise the
throughput to 6.6 Meg/sec; using 32 bit instructions will 
raise throughput to 8.25 Meg/sec.


To compute CRC 32, one initializes a table of possible 
results of each of the 256 possible input states, initializes a 
variable to 0xFFFFFFFF, and iterates over the data with the 
following code:

.ft CW
fcs = (fcs >> 8) ^ fcstab32[(fcs ^ DATA_BYTE) & 0xff];
.R

When all the data has been fed through the CRC generator, 
the variable fcs will contain the CRC 32 value for the 
code. It is normal practice to complement this, and append it 
to the data. In the case of CRC 32, when this is done 
the appended (complemented) CRC can also be fed through the 
CRC generator. If the result of this is 0xDEBB20E3, the frame 
is good.  The CRC can also be compared to the CRC in the stream, 
and not run through the CRC generator (if desired).

An implementation of this in 8086 assembler would be as follows:

.ft CW
.nf
; AL has new byte to be computed
;
	mov	cx,fcs		; get lsw of current FCS value
	mov	dx,fcs+2		; get msw of FCS
	mov	bl,cl		; get lsb of fcs
	mov	cl,ch		; shift FCS right 8 bits
	mov	ch,dl		; dh has msb
	xor	bl,al		; xor in current data byte
	xor	bh,bh		; value is in the range 00-FF
	shl	bx,1		; multiply by 4 as
	shl	bx,1		; each entry is 1 doubleword
	xor	cx,fcstab[bx]	; xor table with old fcs value
	mov	fcs,cx		; save current fcs
	mov	ax,fcstab+2[bx] ; get msw of fcstable
	xor	al,dh		; xor with old fcs value
	mov	fcs+2,ax		; save result
.R
.fi

In theory, this code should run in (1+1+1+1+1+1+1+1+1+1+1+1+1+1 = 14) 
14 clocks on a 486, for a throughput of 2.4 megabytes per second 
(or about 651 microseconds for a 1.5Kbyte packet) on a 33 MHz 
486 system. On an 8088, this routine takes 
(19+19+2+2+2+2+2+2+2+22+19+22+2+19 = 136) 136 clocks, for a 
throughput of 35 Kbytes/sec (or 44 milliseconds for a 1.5Kbytes 
packet).  Of course, this can be optimized - placing the FCS in a 
register will raise the throughput to 3.3 Meg/sec; 
using 32 bit instructions will raise throughput to 8.25 Meg/sec.


.PF "'Recommendations'' \\\\nP'"
.H 1 "Recommendations

I suggest that at least three algorithms be used to perform the 
compression.  The choice of the algorithm should be 
made based on the aggregate speed of the data link to the 
target system, with override capability by a configuration 
option.  The following table illustrates the recommendation.

.TB "Compression recommentations"
.TS
box,center;
l | l | l | l.
Line speed	Compression algorithm	Compression ratio	Effective speed / Max speed
_
<= 64Kbps	LZSSA	2.721	174 Kbps / 58 Mbps
<= 2 Mbps	LZW (max cw 16 bits)	2.235	4.5 Mbps / 1100 Mbps
<=10 Mbps	RLE	1.668	16.7 Mbps /  80 Mbps
.TE

The maximum speed is the theoretical maximum 
compression ratio for the algorithms specified.  It is extremely 
unlikely that these speeds would ever be seen in practice.  
The effective speed shows the speed that could be 
obtained running at the maximum specified line speed, 
with the sample data stream detailed in the method section 
of this document.  Real, effective speeds will (of course) 
depend on the data set. Fully random data will achieve 
effective line speeds of approximately 90% of available 
bandwidth (ignoring hardware limitations).


.PF "'Background Material'' \\\\nP'"
.H 1 "Background Material

The following material was written by Haruhiko Okumura, and 
gives some background on each of the various compression algorithms.

.ce
Data Compression Algorithms of LARC and LHarc

.H 2 Introduction

In the spring of 1988, I wrote a very simple data 
compression program named LZSS in C language, and uploaded 
it to the Science SIG (forum) of PC-VAN, Japan's biggest 
personal computer network.
 
That program was based on Storer and Szymanski's 
slightly modified version of one of Lempel and Ziv's 
algorithms.  Despite its simplicity, for most files its 
compression outperformed the archivers then widely used. 
Kazuhiko Miki rewrote my LZSS in Turbo Pascal and 
assembly language, and soon made it evolve into a complete 
archiver, which he named LARC.
 
The first versions of LZSS and LARC were rather slow.  
So I rewrote my LZSS using a binary tree, and so did 
Miki.  Although LARC's encoding was slower than the 
fastest archiver available, its decoding was quite fast, and 
its algorithm was so simple that even self-extracting 
files (compressed files plus decoder) it created were usually 
smaller than non-self-extracting files from other archivers. 

Soon many hobby programmers joined the archiver project 
at the forum. Very many suggestions were made, and 
LARC was revised again and again. By the summer of 1988, 
LARC's speed and compression have improved so 
much that LARC-compressed programs were beginning to be 
uploaded in many forums of PC-VAN and other 
networks. 

In that summer I wrote another program, LZARI, which 
combined the LZSS algorithm with adaptive arithmetic 
compression.  Although it was slower than LZSS, its 
compression performance was amazing. Miki, the author of 
LARC, uploaded LZARI to NIFTY-Serve, another big 
information network in Japan.  In NIFTY-Serve, Haruyasu 
Yoshizaki replaced LZARI's adaptive arithmetic 
coding with a version of adaptive Huffman coding to increase 
speed.  Based on this algorithm, which he called LZHUF, he 
developed yet another archiver, LHarc. 

In what follows, I will review several of these 
algorithms and supply simplified codes in C language. 

.H 2 "Simple coding methods

Replacing several (usually 8 or 4) "space" 
characters by one "tab" character is a very primitive method for data 
compression.  Another simple method is run-length coding, which 
encodes the message "AAABBBBAACCCC" 
into "3A4B2A4C", for example. 

.H 3 "LZSS coding"

This scheme is initiated by Ziv and Lempel [1].  A 
slightly modified version is described by Storer and Szymanski 
[2].  An implementation using a binary tree is proposed by 
Bell [3].  The algorithm is quite simple: Keep a ring 
buffer, which initially contains "space" characters only.  
Read several letters from the file to the buffer.  Then 
search the buffer for the longest string that matches 
the letters just read, and send its length and position in the 
buffer.  If the buffer size is 4096 bytes, the position 
can be encoded in 12 bits.  If we represent the match length in 
four bits, the <position, length> pair is two bytes long.  If 
the longest match is no more than two characters, then 
we send just one character without encoding, and restart 
the process with the next letter.  We must send one extra 
bit each time to tell the decoder whether we are sending a 
<position, length> pair or an unencoded character.
 
The accompanying file LZSS.C is a version of this algorithm.  
This implementation uses multiple binary trees to 
speed up the search for the longest match.  All the 
programs in this article are written in draft-proposed ANSI C.  I 
tested them with Turbo C 2.0. 

.H 3 "LZW coding

This scheme was devised by Ziv and Lempel [4], and 
modified by Welch [5]. The LZW coding has been adopted 
by most of the existing archivers, such as ARC and PKZIP.  The 
algorithm can be made relatively fast, and is 
suitable for hardware implementation as well. 

The algorithm can be outlined as follows: Prepare a 
table that can contain several thousand items.  Initially register 
in its 0th through 255th positions the usual 256 characters.  
Read several letters from the file to be encoded, and 
search the table for the longest match.  Suppose the 
longest match is given by the string "ABC".  Send the position 
of "ABC" in the table.  Read the next character from the 
file.  If it is "D", then register a new string "ABCD" in the 
table, and restart the process with the letter "D".  If the 
table becomes full, discard the oldest item or, preferably, 
the least used.  A Pascal program for this algorithm is 
given in Storer's book [6].
 
.H 3 "Huffman coding"

Classical Huffman coding was invented by 
Huffman [7].  A fairly readable account is given in Sedgewick [8]. 

Suppose the text to be encoded is "ABABACA", with 
four A's, two B's, and a C.  We represent this situation as 
follows:

.ft CW
.nf
	4	2    1
	|	|    |
	A	B    C
.fi
.R

Combine the least frequent two characters into one, 
resulting in the new frequency 2 + 1 = 3:

.ft CW
.nf
	4		  3
	|		 / \\
	A		B   C
.fi
.R

Repeat the above step until the whole characters combine into a tree:

.ft CW
.nf
		 7
		/ \\
               /   3
              /   / \\
             A   B   C

.R
.fi

Start at the top ("root") of this encoding tree, and 
travel to the character you want to encode.  If you go left, send a 
"0"; otherwise send a "1".  Thus, "A" is encoded by "0", "B" 
by "10", "C" by "11". Altogether, "ABABACA" will 
be encoded into ten bits, "0100100110". 

To decode this code, the decoder must know the encoding tree, 
which must be sent separately.
 
A modification to this classical Huffman coding is the 
adaptive, or dynamic, Huffman coding.  See, e.g., Gallager 
[9].  In this method, the encoder and the decoder processes 
the first letter of the text as if the frequency of each 
character in the file were one, say.  After the first 
letter has been processed, both parties increment the frequency of 
that character by one.  For example, if the first letter is 'C', 
then freq['C'] becomes two, whereas every other 
frequencies are still one. Then the both parties 
modify the encoding tree accordingly.  Then the second letter will 
be encoded and decoded, and so on. 

.H 3 "Arithmetic coding

The original concept of arithmetic coding is 
proposed by P.  Elias.  An implementation in C language is described 
by Witten and others [10]. 

Although the Huffman coding is optimal if each character 
must be encoded into a fixed (integer) number of bits, 
arithmetic coding wins if no such restriction is made. As 
an example we shall encode "AABA" using arithmetic 
coding.  For simplicity suppose we know beforehand that 
the probabilities for "A" and "B" to appear in the text are 
3/4 and 1/4, respectively. 

Initially, consider an interval:

.ft CW
	0 <=  x  < 1.
.R

Since the first character is "A" whose probability is 3/4, 
we shrink the interval to the lower 3/4:

.ft CW
	0 <=  x  < 3/4.
.R

The next character is "A" again, so we take the lower 3/4:

.ft CW
	0 <=  x  < 9/16.
.R

Next comes "B" whose probability is 1/4, so we take the upper 1/4:

.ft CW
       27/64 <=  x  < 9/16,
.R

because "B" is the second element in our alphabet, {A, B}.  
The last character is "A" and the interval is

.ft CW
       27/64 <=  x  < 135/256, 
.R

which can be written in binary notation 

.ft CW
  0.011011 <= x  < 0.10000111.
.R

Choose from this interval any number that can be represented in 
fewest bits, say 0.1, and send the bits to the right 
of "0."; in this case we send only one bit, "1".  Thus we have 
encoded four letters into one bit! With the Huffman 
coding, four letters could not be encoded into less than four bits. 

To decode the code "1", we just reverse the process: First, 
we supply the "0." to the right of the received code "1", 
resulting in "0.1" in binary notation, or 1/2.  Since this 
number is in the first 3/4 of the initial interval 0 <= x < 1, 
the first character must be "A".  Shrink the interval into the 
lower 3/4.  In this new interval, the number 1/2 lies in 
the lower 3/4 part, so the second character is again "A", and so 
on.  The number of letters in the original file must 
be sent separately (or a special 'EOF' character must be 
appended at the end of the file). 

The algorithm described above requires that both the 
sender and receiver know the probability distribution for the 
characters.  The adaptive version of the algorithm removes this 
restriction by first supposing uniform or any 
agreed-upon distribution of characters that approximates the 
true distribution, and then updating the distribution 
after each character is sent and received. 

.H 3 "LZARI

In each step the LZSS algorithm sends either a 
character or a <position, length> pair.  Among these, perhaps 
character "e" appears more frequently than "x", 
and a <position, length> pair of length 3 might be commoner than 
one of length 18, say.  Thus, if we encode the more frequent 
in fewer bits and the less frequent in more bits, the 
total length of the encoded text will be diminished.  
This consideration suggests that we use Huffman or arithmetic 
coding, preferably of adaptive kind, along with LZSS. 

This is easier said than done, because there are many 
possible <position, length> combinations.  Adaptive 
compression must keep running statistics of frequency distribution.  
Too many items make statistics unreliable. 
What follows is not even an approximate solution to the 
problem posed above, but anyway this was what I did in 
the summer of 1988. 

I extended the character set from 256 to three-hundred or 
so in size, and let characters 0 through 255 be the usual 
8-bit characters, whereas characters 253 + n represent that 
what follows is a position of string of length n, where n 
= 3, 4 , ....  These extended set of characters will be 
encoded with adaptive arithmetic compression. I also observed 
that longest-match strings tend to be the ones that were 
read relatively recently.  Therefore, recent positions should 
be encoded into fewer bits.  Since 4096 positions are too 
many to encode adaptively, I fixed the probability 
distribution of the positions "by hand." The distribution 
function given in the accompanying LZARI.C is rather 
tentative; it is not based on thorough experimentation.  In 
retrospect, I could encode adaptively the most significant 
6 bits, say, or perhaps by some more ingenious method adapt 
the parameters of the distribution function to the 
running statistics. 

At any rate, the present version of LZARI treats the 
positions rather separately, so that the overall compression is 
by no means optimal. Furthermore, the string length threshold 
above which strings are coded into <position, 
length> pairs is fixed, but logically its value must 
change according to the length of the <position, length> pair we 
would get. 

.H 3 "LZHUF

LZHUF, the algorithm of Haruyasu Yoshizaki's 
archiver LHarc, replaces LZARI's adaptive arithmetic coding with 
adaptive Huffman.  LZHUF encodes the most significant 6 bits 
of the position in its 4096-byte buffer by table 
lookup.  More recent, and hence more probable, positions are 
coded in less bits.  On the other hand, the remaining 
6 bits are sent verbatim.  Because Huffman coding encodes 
each letter into a fixed number of bits, table lookup can 
be easily implemented. Though theoretically Huffman cannot 
exceed arithmetic compression, the difference is very 
slight, and LZHUF is fairly fast. 

The accompanying file LZHUF.C was written by Yoshizaki.  
I translated the comments into English and made a 
few trivial changes to make it conform to the ANSI C standard.
 
.H 2 "References

.nf
[1] J. Ziv and A. Lempel, IEEE Trans. IT-23, 337-343 (1977).
[2] J. A. Storer and T. G. Szymanski, J. ACM, 29, 928-951 (1982).
[3] T. C. Bell, IEEE Trans. COM-34, 1176-1182 (1986).
[4] J. Ziv and A. Lempel, IEEE Trans. IT-24, 530-536 (1978).
[5] T. A. Welch, Computer, 17, No.6, 8-19 (1984).
[6] J. A. Storer, Data Compression: Methods and Theory (Computer Science Press, 1988).
[7] D. A. Huffman, Porch IRE 40, 1098-1101 (1952).
[8] R. Sedgewick, Algorithms, 2nd ed. (Addison-Wesley, 1988).
[9] R. G. Gallager, IEEE Trans. IT-24, 668-674 (1978).
[10] I. E. Witten, R. M. Neal, and J. G. Cleary, Commun. ACM 30, 520-540 (1987).

.fi


.PF "'Legal Issues'' \\\\nP'"
.H 1 "Legal Issues

Regretfully, no compression algorithms are free of patent 
contention. The following items come from the 
comp.compression frequently asked questions list. I make 
no claims as to the validity of any of the patent issues, 
nor of any of the implementations under investigation.

.BL
.LI "Run length encoding
.br
Tsukiyama has two patents on run length encoding: 4,586,027 
and 4,872,009 granted in 1986 and 1989 
respectively. The first one covers run length encoding in its 
most primitive form: a length byte followed by the 
repeated byte. The second patent covers the 'invention' of 
limiting the run length to 16 bytes and thus the encoding 
of the length on 4 bits.  Here is the start of claim 1 of 
patent 4,872,009, just for pleasure:

1. A method of transforming an input data string comprising a 
plurality of data bytes, said plurality 
including portions of a plurality of consecutive data 
bytes identical to one another, wherein said data bytes 
may be of a plurality of types, each type representing 
different information, said method comprising the 
steps of: [...]


.LI "LZ77"
.br
The Gibson & Graybill patent 5,049,881 covers the 
LZRW1 algorithm previously discovered by Ross Williams. 
Claims 4 and 12 are very general and could be interpreted 
as applying to any LZ algorithm using hashing 
(including all variants of LZ78):

4. A compression method for compressing a stream of 
input data into a compressed stream of output data 
based on a minimum number of characters in each input 
data string to be compressed, said compression 
method comprising the creation of a hash table, 
hashing each occurrence of a string of input data and 
subsequently searching for identical strings of input data 
and if such an identical string of input data is 
located whose string size is at least equal to the minimum 
compression size selected, compressing the 
second and all subsequent occurrences of such identical 
string of data, if a string of data is located which 
does not match to a previously compressed string of data, 
storing such data as uncompressed data, and for 
each input strings after each hash is used to find a 
possible previous match location of the string, the 
location of the string is stored in the hash table, 
thereby using the previously processed data to act as a 
compression dictionary.

Claim 12 is identical, with 'method' replaced 
with 'apparatus'. Since the 'minimal compression size' can be as small 
as 2, the claim could cover any dictionary technique of the 
LZ family. However the text of the patent and the other 
claims make clear that the patent should cover the LZRW1 
algorithm only.

The following papers, published before the patent was 
filed, describe applications of hashing to LZ77 compression:

Brent, R.P.  "A Linear Algorithm for Data Compression", 
Australian Computer Journal, Vol.19, No.2 (May 
1987), p.64.

Bell, T. "Longest match string searching for Ziv-Lempel 
compression" Res. Rept. 6/89, Dept. of Computer 
Science, Univ. of Canterbury, New Zealand (Feb 89).


Phil Katz, author of pkzip, also has a patent on 
LZ77 (5,051,745) but the claims only apply to sorted hash tables, 
and when the hash table is substantially smaller than the window size.

Robert Jung, author of 'arj', has recently been granted 
patent 5,140,321 for one variation of LZ77 with hashing.  
This patent covers the LZRW3-A algorithm, also previously 
discovered by Ross Williams. LZRW3-A was posted
on comp.compression on July 15, 1991. The patent was 
filed two months later on Sept 4, 1991. (The US patent 
system allows this because of the 'invention date' rule.)

Fiala and Greene obtained in 1990 a patent (4,906,991) on 
all implementations of LZ77 using a tree data structure. 
Claim 1 of the patent is much broader than the algorithms 
published by Fiala and Greene in Comm.ACM, April 
89. The patent covers the algorithm published by Rodeh 
and Pratt in 1981 (J. of the ACM, vol 28, no 1, pp 16-24).  
It also covers the algorithm previously patented by 
Eastman-Lempel-Ziv (4,464,650), and the algorithms used in 
lharc, lha and zoo.

IBM patented (5,001,478) the idea of combining a 
history buffer (the LZ77 technique) and a lexicon (as in LZ78).


.LI "LZ78"
.br
The LZW algorithm used in 'compress' is patented 
by IBM (4,814,746)   and Unisys (4,558,302). It is also used in 
the V.42bis compression  standard and in Postscript Level 2. 
(Unisys sells the license to modem manufacturers for 
a onetime $25,000 fee.) The IBM patent application was filed 
three weeks before that of Unisys, but the US patent 
office failed to recognize that they covered the same 
algorithm. (The IBM patent is more general, but its claim 7 is 
exactly LZW.)

AP coding is patented by Storer (4,876,541).

.LI "other data compression algorithms
.br
IBM holds a patent on the Q-coder implementation of 
arithmetic coding.  The arithmetic coding option of the 
JPEG standard requires use of the patented algorithm.  No 
JPEG-compatible method is possible without infringing 
the patent, because what IBM actually claims rights to is 
the underlying probability model (the heart of an 
arithmetic coder).

Bacon has patented (4,612,532) some form of Markov modeling.

.LE

As can be seen from the above list, all the most popular 
compression programs (compress, pkzip, zoo, lha, arj) are 
now covered by patents. (This says nothing about the 
validity of these patents.)

Here are some references on data compression patents. A 
number of them are taken from the list maintained by 
Michael Ernst <mernst@theory.lcs.mit.edu> in 
mintaka.lcs.mit.edu:/mitlpf/ai/patent-list (or patent-list.Z).

4.464.650	
.br
Apparatus and method for compressing data signals and 
restoring the compressed data signals. Inventors Lempel, 
Ziv, Cohn, Eastman. Assignees Sperry Corporation and AT&T 
Bell Laboratories. Filed 8/10/81, granted 8/7/84

4,558,302
.br
High speed data compression and decompression apparatus and 
method. Inventor Welch. Assignee Sperry 
Corporation (now Unisys). Filed 6/20/83, granted 12/10/85. The 
text for this patent can be ftped from 
rusmv1.rus.uni-stuttgart.de (129.69.1.12) in 
/info/comp.patents/US4558302.Z.

4,586,027
.br
Method and system for data compression and 
restoration. Assignee Hitachi, inventor Tsukimaya et al. Filed 
08/07/84, granted 04/29/86

4,612,532
.br
Inventor Bacon. Granted 9/1986

4,814,746
.br
Data compression method. Inventors Victor S. Miller, 
Mark N. Wegman. Assignee IBM. Filed 8/11/86, granted 
3/21/89. A previous application was filed on 6/1/83, 
three weeks before the application by Welch (4,558,302)

4,872,009
.br
Method and apparatus for data compression and restoration. 
Assignee Hitachi, inventor Tsukimaya et al. Filed 
12/07/87, granted 10/03/89.

4,876,541
.br
Stem [sic] for dynamically compressing and decompressing 
electronic data. Inventor James A. Storer. Assignee 
Data Compression Corporation. Filed 10/15/87, granted 10/24/89

4,955,066
.br
Compressing and Decompressing Text Files. 
Inventor Notenboom, L.A. Assignee Microsoft. Filed  10/13/89, 
granted 09/04/90.

5,001,478
.br
Method of Encoding Compressed Data. Filed 12/28/89, 
granted 03/19/91. Inventor Michael E. Nagy. Assignee IBM.

5,049,881
.br
Apparatus and method for very high data rate-compression 
incorporating lossless data compression and expansion 
utilizing a hashing technique. Inventors Dean K. Gibson, 
Mark D. Graybill. Assignee Intersecting Concepts, Inc. 
Filed 6/18/90, granted 9/17/91.

5,051,745
.br
String searcher, and compressor using same. 
Inventor  Phillip W. Katz (author of pkzip). Filed  8/21/90, granted 
9/24/91.

4,906,991
.br
Textual substitution data compression with 
finite length search window. Inventors Fiala,E.R., and Greene,D.H. 
Filed 4/29/1988, granted 3/6/1990. Assignee Xerox Corporation.

5,109,433
.br
Compressing and decompressing text files. Assignee Microsoft.

5,140,321
.br
Data compression/decompression method and apparatus. 
Filed 9/4/91, granted 8/18/92. Inventor Robert Jung. 
Assignee Prime Computer


Abbreviations for the compression method are: 
Huffman - Dynamic Huffman coding; RLE - Run Length 
Encoding; LZW - Lempel-Ziv-Welch dictionary coding; 
LZH - Lempel-Ziv static Huffman coding; LZSF - 
Lempel-Ziv with Shannon-Fano tree coding; 
LZAH - Lempel-Ziv with Adaptive Huffman coding; LZA - Lempel-
Ziv with Arithmetic coding; 
LZSS - Lempel-Ziv, as modified by Storer and Szymanski; 
LZSSA - LZSS with 
adaptive coding.

  Time to run algorithm on an AST 486/33, excluding file I/O time.

  The author is the sysop of the Science SIG of PV-VAN. His address is 12-2-404 Green Heights, 580 Nagasawa, 
Yokosuka, Japan 239

.SK
.R
.fi
.BS
.BE
.deTP
'sp
.)K
.af;P \\gP
.afP 1
.nr;P \\nP
.afP \\g(;P
.af;P 1
.ie\\n(Pv \{\
.ie(\\n(Pv=1)&(\\n(;P>1) 'sp 2
.el\{\
.ce
.ul
PRIVATE
.sp\} \}
.el'sp 2
.if!\\n(;P-1 .if \\nN 'sp
.if!\\n(;P-1 .if \\n(:S .tl \\*(}t
.if!\\n(;P-1 .if !\\nN .tl \\*(}t
.if\\n(;P-1 .ie \w'\\*(]n' .tl '\\*(]n - \\nP'''
.el.tl \\*(}t
.sp
..
.PH
.SK
.TC 4 1 2 0
.SK
.PF
.rs
.sp 2i
.ns
.S 20
.B
.ad r
Data Compression for 
.sp 10p
Wide Area Networks
.S
.R
.sp 2i
Revision 1.00   \n(mo/\n(dy/\n(yr
.sp 1.5i
Dave Rand
.br
Novell, Inc.
.br
San Jose, California
.sp 0.5i
.I

-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Tue May 18 12:55:52 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvXlJ-0000P2a@daver.bungi.com>; Tue, 18 May 93 12:55 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Tue, 18 May 1993 12:55:40 PDT
Message-ID: <m0nvXlF-0000MjC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 18,  9:22, Dave Carr writes:]
> Yes.  I can add a *fraction* of a bit to indicate uncompressable data.
> Of course, I don't have to pad to a byte boundary since I'm using an HDLC
> controller that supports residuals.  Damn protocols (PPP included) that waste 
> bits padding.
> 
> Your calculation must be on the uncompressed data.  At 4:1, an extra 16-bit 
> CRC is 1.6% of link bandwidth.

Yes, it was. I'm willing to toss 1.6% of the bandwidth for a more reliable
product.

> 
> Only if it gets a chance to be calculated :-)  I guess then you need also a 
> mechanism for resetting the compressor.  On a multi-link, this could involve
> tearing down all the links, or issuing a multi-link reset.

Absolutely. I pull bytes from the decompressor when assembling the data,
which brings me to the next point...

> 
> Seriously, I could see adding a length of uncompressed data to the packet.
> I should also be free to compress the uncompress length.  But then again,
> I already do that for 802.3 MAC types.
> 

What a good idea! What I would like to propose is that the LAPB frames
be decoupled from the incoming data frames.  As the compressors get better,
we will end up with fewer and fewer bytes in each packet.  If we treat the
LAPB link as a byte pipe, we can now stuff data in the form of:

	<length>
	<original data packet>
	<CRC>

down into the compressor. When the compressor feels like it (has a full
LAPB frame), it can send it. When the input runs out of data, it can
signal the compressor to flush any current data.  There is no additional
latency - and we end up with 'fuller' LAPB frames on the wire, which
increases the effective line utilization for both SYNC and ASYNC 
implementations.

Comments?



-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Tue May 18 13:01:02 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvXqJ-0000OIa@daver.bungi.com>; Tue, 18 May 93 13:00 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Tue, 18 May 1993 13:00:50 PDT
Message-ID: <m0nvXqF-0000OTC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 18, 10:18, Dave Carr writes:]
> No. I don't want a software CRC.  We happen to have hardware that can do it.
> But for those people that just can't do it hardware, the option is (or almost)
> is there (SCC).  

No, it is not. We can't do a CRC-32 in software on the SCC. And we can't do
it in hardware. We just can't support it with the existing hardware that
customers have installed.

> We intend to sell our code, not give it away.  However, the price is reasonable
> and is a one time license.  Refrain from asking about the cost and terms just
> yet, as management is tossing around the details.

I'm very interested in evaluating the code, and I have placed an official
request for NDA's and whatever else is required so that I may do so.

> 
> The algorithm is an LZ77 (history buffer) front end, followed by arithmetic
> encoding.  It is CPU intensive, but produces the highest compression ratio
> of any algorithm I tried.  Note, on Dave Rands test, this FZA algorithm
> scored 2.67:1, while HPACK and ZIP scored 2.7x.  

BTW - Sorry, Dave - I forgot to add your score into the document I just
mailed.  Could you please mail the correct line to the group to add in
to the table?


-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Tue May 18 13:04:48 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvXtx-0000OSa@daver.bungi.com>; Tue, 18 May 93 13:04 PDT
X-Path: bridge2.NSD.3Com.COM!vsp
From: Venkat Prasad <vsp@NSD.3Com.COM>
To: ppp-comp@bungi.com
Subject: Test - Please ignore
Date: Tue, 18 May 93 12:58:17 -0700
Message-ID: <199305181958.AA22135@himagiri.NSD.3Com.COM>
Reply-To: ppp-comp@bungi.com
Organization: 3Com, 5400 Bayfront Plaza, Santa Clara, CA 95052-8145
Precedence: bulk



From owner-ppp-comp Tue May 18 13:14:09 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvY30-0000Oha@daver.bungi.com>; Tue, 18 May 93 13:14 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: compression document
Date: Tue, 18 May 1993 13:13:57 PDT
Message-ID: <m0nvY2v-0000O6C@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

BTW - the compression document I sent out should be processed with
the "mm" macros.


-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Tue May 18 14:41:05 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvZP1-0000Oqa@daver.bungi.com>; Tue, 18 May 93 14:40 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: compression document
Date: Tue, 18 May 1993 17:32:24 -0400 (EDT)
Message-ID: <9305182132.AA15746@hobbit.gandalf.ca>
References: <<m0nvY2v-0000O6C@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

Here is one comment on the patent section "LZ77". Graybill patent
cannot be seriously taken to apply to all LZ77 algorithms.  I
even asked "the Ziv" at the Data Compression Conference '93 about
it.  Funny, he said, the original LZ77 code used hashing !  Talk
about prior art.




From owner-ppp-comp Tue May 18 14:41:11 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvZP0-0000OIa@daver.bungi.com>; Tue, 18 May 93 14:40 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Tue, 18 May 1993 17:16:01 -0400 (EDT)
Message-ID: <9305182116.AA12778@hobbit.gandalf.ca>
References: <<m0nvXlF-0000MjC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> > 
> > Seriously, I could see adding a length of uncompressed data to the packet.
> > I should also be free to compress the uncompress length.  But then again,
> > I already do that for 802.3 MAC types.
> > 
> 
> What a good idea! What I would like to propose is that the LAPB frames
> be decoupled from the incoming data frames.  As the compressors get better,
> we will end up with fewer and fewer bytes in each packet.  If we treat the
> LAPB link as a byte pipe, we can now stuff data in the form of:
> 
> 	<length>
> 	<original data packet>
> 	<CRC>

The length I can compress, the CRC I can't.  As long as the compressor
does not do something like "Oh, the length agrees with the size of the
packet, send a token to indicate that", but rather compresses it independently
using a separate model, then it should compress to a few bits on average.


> 
> down into the compressor. When the compressor feels like it (has a full
> LAPB frame), it can send it. When the input runs out of data, it can
> signal the compressor to flush any current data.  There is no additional
> latency - and we end up with 'fuller' LAPB frames on the wire, which
> increases the effective line utilization for both SYNC and ASYNC 
> implementations.
> 

Latency is not a big issue.  It only affect the first packet of a transfer.
If the bridge/router takes an extra millisecond
to compress the entire packet, no-one will know.  Any good windowed file
transfer protocol will not be affected by the delay.  Of course, then there's
IPX...:-)

The idea of flushing the frame sounds nice idea in theory.  Some modems do this. 
However, there are several factors which persuaded us to not do it:

(1) It requires overhead to do I/O in this manner.
(2) It also tends to blow the cache.
(3) Useless for 2-stage encoders such as our FZA. 
(4) Tends to tie link layer into packet layer.  Gets messy with multi-link,
    PPP, and the like.

In my testing, it was considerly more efficient to generate the compressed 
frame as one long buffer without stopping, and the if required, split into 
multiple buffers for transmission.  The code for doing a block move for 
example on an i960CA takes less than 1 clock per byte.  A compare/branch
on each byte takes 3 clocks. 

Therefore, I would vote to keep the link layer calls out of the compressor. 
If reduction of latency is an issue, get a faster algorithm or go to 
hardware.

From owner-ppp-comp Tue May 18 14:54:13 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvZbo-0000Qya@daver.bungi.com>; Tue, 18 May 93 14:54 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Tue, 18 May 1993 14:53:59 PDT
Message-ID: <m0nvZbk-0000QkC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 18, 17:16, Dave Carr writes:]
> In my testing, it was considerly more efficient to generate the compressed 
> frame as one long buffer without stopping, and the if required, split into 
> multiple buffers for transmission.  The code for doing a block move for 
> example on an i960CA takes less than 1 clock per byte.  A compare/branch
> on each byte takes 3 clocks. 

That's fine - you are then doing the same thing. There is no penalty for
shipping 1 compressed input frame per LAPB frame. You may split one 
uncompressed input frame over multiple LAPB frames, or you may just
ship one uncompressed input frame over one LAPB frame, in your model.
I'm suggesting that, if you want to, you can put more than one
compressed frame into one LAPB frame. You don't have to generate it
if you don't want to, and your code won't (or shouldn't) break on
the decompression side if you get multiple compressed frames in one
LAPB frame.

OK?



-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Tue May 18 15:36:48 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvaH0-00009Ma@daver.bungi.com>; Tue, 18 May 93 15:36 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Archive site information
Date: Tue, 18 May 1993 15:36:33 PDT
Message-ID: <m0nvaGw-0000PzC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

The archives of all PPP-COMP discussions are now available via 
anonymous ftp to sgi.com, in the other/ppp-comp/incoming
directory.

There is an up-to-date copy of the "Data Compression for Wide Area
Networks" there, as comp.doc and comp.ps.

Holler if you need anything else there, and thanks to SGI for
providing the services!


-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Wed May 19 09:27:54 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvqzO-0000AKa@daver.bungi.com>; Wed, 19 May 93 09:27 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 19 May 1993 09:22:29 -0400 (EDT)
Message-ID: <9305191322.AA16198@hobbit.gandalf.ca>
References: <<m0nvZbk-0000QkC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> 
> [In the message entitled "Re: Compression CRC - needed?" on May 18, 17:16, Dave Carr writes:]
> > In my testing, it was considerly more efficient to generate the compressed 
> > frame as one long buffer without stopping, and the if required, split into 
> > multiple buffers for transmission.  The code for doing a block move for 
> > example on an i960CA takes less than 1 clock per byte.  A compare/branch
> > on each byte takes 3 clocks. 
> 
> That's fine - you are then doing the same thing. There is no penalty for
> shipping 1 compressed input frame per LAPB frame. You may split one 
> uncompressed input frame over multiple LAPB frames, or you may just
> ship one uncompressed input frame over one LAPB frame, in your model.
> I'm suggesting that, if you want to, you can put more than one
> compressed frame into one LAPB frame. You don't have to generate it
> if you don't want to, and your code won't (or shouldn't) break on
> the decompression side if you get multiple compressed frames in one
> LAPB frame.

I think you misinterpreted what I said.  I compress the frame and pass it
to the link layer.  The link layer may choose to fragment the packet into
multiple LAPB frames.  The fragmentation can be done for two reasons, either
to support a multi-link or to shorten the size of the frames for a high
error rate link.  These are old statmux tricks.  The size of the optimal LAPB 
frame size is varied according to the link error rate.

I don't want the compression tied to this logic.  Compression rides on top
of the multi-link.

You seem to be driving for a multiplexed link, with multiple packets possible
in a LAPB frame.  We could do that, but maybe we should call it a statmux :-)
You'll need some form of encoding which tells how many frames are in the 
LAPB packet.  Ugghhh.  Note, it fairly easy to do this without too much
overhead when you use a fixed excoding scheme such as LZSS or a derivative.
You simply see if there are enough input bits after decoding the first frame
to constitute a second frame.  In fact, you don't even need an end-of-frame 
code.  All you do is simply pad the last byte of input with all 1's.
It's not so easy with an arithmetic encoder.  After all, a code can be less
than a bit long.

I would avoid multiplexed frames just to KISS.  It's not that it's impossible
to do.  I've written the code for 2 statmuxes, what's another.  But why
bother.


From owner-ppp-comp Wed May 19 09:37:22 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvr8i-0000Dea@daver.bungi.com>; Wed, 19 May 93 09:37 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 19 May 1993 09:37:07 PDT
Message-ID: <m0nvr8e-0000DdC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 19,  9:22, Dave Carr writes:]
> I don't want the compression tied to this logic.  Compression rides on top
> of the multi-link.

Correct, and that is what I am shooting for. Treat the link layer as a
simple bit-pipe. 

Not all implementations will be able to derive the length of the incoming
LAPB frame.  Some pad it out to a 16 or 32 bit multiple, some pad it out
to a fixed minimum length - some form of length code will have to be
passed down in the transmitted frames. Since that is the case anyway,
why not truely decouple the compressor from the LAPB?

> You'll need some form of encoding which tells how many frames are in the 
> LAPB packet.  Ugghhh. 

No, we don't care if there is one frame or 200 frames in a packet. We just
read bits from the packet - not frames.

> Note, it fairly easy to do this without too much

> It's not so easy with an arithmetic encoder.  After all, a code can be less
> than a bit long.

All the more reason to pack more information into each packet - why
waste the space? :-)




-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Wed May 19 10:46:21 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvsDR-00003Ba@daver.bungi.com>; Wed, 19 May 93 10:46 PDT
X-Path: acc.com!fbaker
From: fbaker@acc.com (Fred Baker)
To: ppp-comp@bungi.com
Subject: Re:  Compression Document
Date: Wed, 19 May 93 10:19:46 PDT
Message-ID: <9305191719.AA03463@saffron.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

silly question on page 8:

is it valid to use

	mov	ax,fcstab[bx]

to get a 32 bit value on an 8088/8086/80286? Don't you have to have a
80386 code segment?  (working from memory here: I try to not accept
jobs that require me to work with Intel chips).

From owner-ppp-comp Wed May 19 10:46:54 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvsDR-0000Fja@daver.bungi.com>; Wed, 19 May 93 10:46 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 19 May 1993 13:38:29 -0400 (EDT)
Message-ID: <9305191738.AA26703@hobbit.gandalf.ca>
References: <<m0nvr8e-0000DdC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> 
> Not all implementations will be able to derive the length of the incoming
> LAPB frame.  Some pad it out to a 16 or 32 bit multiple, some pad it out
> to a fixed minimum length - some form of length code will have to be
> passed down in the transmitted frames. Since that is the case anyway,
> why not truely decouple the compressor from the LAPB?

Fine.  
> 
> > You'll need some form of encoding which tells how many frames are in the 
> > LAPB packet.  Ugghhh. 
> 
> No, we don't care if there is one frame or 200 frames in a packet. We just
> read bits from the packet - not frames.

But now you require a frame length.  You also need an end-of-stream indicator
so you can forward all of what's in the pipe.  Otherwise, the decoder can't
tell if another frame follows.

> All the more reason to pack more information into each packet - why
> waste the space? :-)

I'm willing to waste the fraction of a bit.  I have to *live* with PPP
requirements to pad to a byte boundary.  

This padding of the last byte is a real pain.  In a Huffman or Markov based 
encoder, you have to reserve an end-of-frame indicator, or ensure that any
symbol does not have the encoding of all 1's.  The last byte is then padded
with ones, and the decoder append enough 1's on the end of the packet to
meet the maximum code size.

Using a length of uncompressed data field is an alternative.  Then, you
need a model for compressing the length fields.  Not to difficult a problem.

(Oh, that reminds me.  If in your encoding scheme you have a choice of 
assigning 1's or 0's to the more common branch, pick 0's.  It makes a
difference depending on the link layer.  Left as an exercise for the reader.)

The stat-mux logic in my brain says this is a good idea.  Perhaps I've been
to too many PPP meeting since my stat-mux days.  Now I tend to KISS and put
one compressed frame in a packet.

From owner-ppp-comp Wed May 19 11:00:27 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvsQx-00007Fa@daver.bungi.com>; Wed, 19 May 93 11:00 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 19 May 1993 10:59:58 PDT
Message-ID: <m0nvsQp-00005sC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 19, 13:38, Dave Carr writes:]
> But now you require a frame length.  You also need an end-of-stream indicator
> so you can forward all of what's in the pipe.  Otherwise, the decoder can't
> tell if another frame follows.

Correct. We need a "flush" code, indicating that there is no more data
valid to the start of the next byte, or you can have an implied code,
depending on your compressor, as you point out.

Much the same as if you were writing this to a disk file, and doing
fflush()'s from time to time.

> 
> The stat-mux logic in my brain says this is a good idea.  Perhaps I've been
> to too many PPP meeting since my stat-mux days.  Now I tend to KISS and put
> one compressed frame in a packet.

Ok - what does everyone think?

My proposal is to decouple the compressor from the transmitted LAPB
frames. We must have the length passed down, so we can re-construct
the length of the original (uncompressed frame). We must pass in-band
the length of the LAPB packet, since it may be padded. Should the
LAPB transport be allowed to pack multiple compressed frames in one
LAPB packet?



-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Wed May 19 11:04:58 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvsVV-000097a@daver.bungi.com>; Wed, 19 May 93 11:04 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re:  Compression Document
Date: Wed, 19 May 1993 11:04:45 PDT
Message-ID: <m0nvsVS-00008OC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re:  Compression Document" on May 19, 10:19, Fred Baker writes:]
> silly question on page 8:
> 
> is it valid to use
> 
> 	mov	ax,fcstab[bx]
> 
> to get a 32 bit value on an 8088/8086/80286? Don't you have to have a
> 80386 code segment?  (working from memory here: I try to not accept
> jobs that require me to work with Intel chips).

No, this is getting a 16 bit value. The CRC code that I posted earlier
will work with either 16 or 32 bit CRC's, and does it all in 32 bit mode.

The example was to show that computing the CRC is not an iterative
process, and does not expand to hundreds of instructions. 4 instructions
on the 386 for 16 or 32 bit CRC is the best code that I have written,
so far.



-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Wed May 19 11:37:53 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvt1M-0000Cea@daver.bungi.com>; Wed, 19 May 93 11:37 PDT
X-Path: opal.acc.com!art
From: art@opal.acc.com (Art Berggreen)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 19 May 93 11:38:26 PDT
Message-ID: <9305191838.AA12024@opal.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

>
>Ok - what does everyone think?
>
>My proposal is to decouple the compressor from the transmitted LAPB
>frames. We must have the length passed down, so we can re-construct
>the length of the original (uncompressed frame). We must pass in-band
>the length of the LAPB packet, since it may be padded. Should the
>LAPB transport be allowed to pack multiple compressed frames in one
>LAPB packet?

I tend to lean toward the KISS option.  I think that odds of successful
interoperability are increased by using the simplest model, and to
me that means packet-in/packet-out.  Also,  PPP is currently a datagram
service.  We are adding the use of seqeunced frame mode (LAPB) for
reliability, but lets not change the basic nature of PPP.

I hope that the use of LAPB is negotiable separately from compression.
I can imagine wanting the reliability of LAPB for other reasons.
(LAPB may be implied by compression, I'd just like the option of
LAPB without compression)

Art


From owner-ppp-comp Wed May 19 11:49:55 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvtCy-0000FCa@daver.bungi.com>; Wed, 19 May 93 11:49 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 19 May 1993 11:49:41 PDT
Message-ID: <m0nvtCv-0000FZC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 19, 11:38, Art Berggreen writes:]
> I hope that the use of LAPB is negotiable separately from compression.
> I can imagine wanting the reliability of LAPB for other reasons.
> (LAPB may be implied by compression, I'd just like the option of
> LAPB without compression)
> 

Yes, it will be. There are good reasons to use LAPB without compression
(eg transferring large amounts of compressed data).

There are compression types, despite what Dave has said, that do offer
significant compression without a reliable link overhead.  I will be
talking more about this in the coming weeks.

BTW - notice that almost everyone involved in Compression is named
Dave?


-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Wed May 19 12:25:25 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvtlF-00005aa@daver.bungi.com>; Wed, 19 May 93 12:25 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 19 May 1993 15:08:36 -0400 (EDT)
Message-ID: <9305191908.AA10165@hobbit.gandalf.ca>
References: <<m0nvsQp-00005sC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> Ok - what does everyone think?
> 
> My proposal is to decouple the compressor from the transmitted LAPB
> frames. We must have the length passed down, so we can re-construct
> the length of the original (uncompressed frame). We must pass in-band
> the length of the LAPB packet, since it may be padded. Should the
> LAPB transport be allowed to pack multiple compressed frames in one
> LAPB packet?

Only if you also allow a compressed frame to be sent in multiple LAPB
packets also.

From owner-ppp-comp Wed May 19 12:25:29 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvtlF-00005Oa@daver.bungi.com>; Wed, 19 May 93 12:25 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression Document
Date: Wed, 19 May 1993 15:04:52 -0400 (EDT)
Message-ID: <9305191904.AA09588@hobbit.gandalf.ca>
References: <<9305191719.AA03463@saffron.acc.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> to get a 32 bit value on an 8088/8086/80286? Don't you have to have a
> 80386 code segment?  (working from memory here: I try to not accept
> jobs that require me to work with Intel chips).

Ah Fred, you guys have an i960CA board.  Admit it, once in a while they
come out with some good chips.

From owner-ppp-comp Wed May 19 12:26:47 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvtmc-000093a@daver.bungi.com>; Wed, 19 May 93 12:26 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 19 May 1993 12:26:27 PDT
Message-ID: <m0nvtmW-0000MoC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 19, 15:08, Dave Carr writes:]
> 
> Only if you also allow a compressed frame to be sent in multiple LAPB
> packets also.

Agreed.



-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Wed May 19 13:10:03 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvuSX-00003oa@daver.bungi.com>; Wed, 19 May 93 13:09 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 19 May 1993 15:43:51 -0400 (EDT)
Message-ID: <9305191943.AA16244@hobbit.gandalf.ca>
References: <<m0nvtCv-0000FZC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> There are compression types, despite what Dave has said, that do offer
> significant compression without a reliable link overhead.  I will be
> talking more about this in the coming weeks.

Define significant.  If a reliable link compressor gets 4:1, what's
the unreliable compressor get.  The reliable link overhead is 2
bytes/frame.  

Sure, you can do run-length, LZ77 style compression, and a couple of
others without history, or using the history from this packet, but
to call it significant is like calling STAC significant.  

All this said, there is one class of compressor that can do well over
non-error-corrected links.

The old Datagram muxes used to download the compression tables from
the encoder to the decoder on a periodic basis.  They however used
an error-correcting protocol to do the download.  

One can also keep track between updates of a super-CRC on all the data
that had been passed.  If both ends agree that no errors occured, they
can switch to the new dictionary which is derived from the data.  Now
all you need is a way to synchronize switching from the old to new 
dictionary.

It isn't that I haven't thought or tried these methods out.  They just
seem like a real pain when LAPB solves the things you'll need to work
around.  

> BTW - notice that almost everyone involved in Compression is named
> Dave?

I just legally changed my name to "Mr. CompressorHead (TM)" to avoid 
confusion.

From owner-ppp-comp Wed May 19 13:21:34 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvudd-00009pa@daver.bungi.com>; Wed, 19 May 93 13:21 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 19 May 1993 13:21:17 PDT
Message-ID: <m0nvudZ-0000BlC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 19, 15:43, Dave Carr writes:]
> > There are compression types, despite what Dave has said, that do offer
> > significant compression without a reliable link overhead.  I will be
> > talking more about this in the coming weeks.
> 
> Define significant.  If a reliable link compressor gets 4:1, what's
> the unreliable compressor get.  The reliable link overhead is 2
> bytes/frame.  

Define 4:1 - I haven't seen 4:1 on *ANY* algorithm I have tested,
in the lab or with my test file. We really need to nail this down,
folks.

As I am under non-disclosure at this point, I can't.  Suffice it to say
that I am exploring all options with regard to a compression algorithm,
and I'm not going to eliminate any candidates without due process.

Including the Transcend wonder compression algorithm.

Again, the optimal alorithm for *everyone* to implement should be one
that is fast, doesn't use much RAM, is free or cheap, and is simple
to implement. And it would be nice if it gets 200:1!

-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Wed May 19 17:18:10 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nvyKf-00008Aa@daver.bungi.com>; Wed, 19 May 93 17:18 PDT
X-Path: hprnls6.rose.hp.com!davel
From: Dave Langley <davel@hprnls6.rose.hp.com>
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 19 May 93 14:43:21 PDT
Message-ID: <9305192143.AA16490@hprnls6.rose.hp.com>
References: <<m0nvudZ-0000BlC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> 
> > Define significant.  If a reliable link compressor gets 4:1, what's
> > the unreliable compressor get.  The reliable link overhead is 2
> > bytes/frame.  
> 
> Define 4:1 - I haven't seen 4:1 on *ANY* algorithm I have tested,
> in the lab or with my test file. We really need to nail this down,
> folks.
> 
Thanks, Dave (Rand) for this comment.  I have also been testing
compression algorithms and I have never seen any algorithm get
anywhere near 4:1 on real data. 

I agree we need a way to measure compression performance in a
"standard" way.  The Dave Rand file is a good start.  Any
suggestions?

I'm reluctant to sign my name given the likely response, but...



--
Dave Langley x4973

From owner-ppp-comp Wed May 19 21:26:32 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nw2Cz-0000QSa@daver.bungi.com>; Wed, 19 May 93 21:26 PDT
X-Path: sgi.com!news
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Thu, 20 May 1993 04:07:24 GMT
Message-ID: <i0skva0@rhyolite.wpd.sgi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

dlr@daver.bungi.com (Dave Rand) writes:
> ...
> Again, the optimal alorithm for *everyone* to implement should be one
> that is fast, doesn't use much RAM, is free or cheap, and is simple
> to implement. And it would be nice if it gets 200:1!


"Cheap" is not cheap enough.  Nothing that requires a payment to anyone
can be a candidate for the default.  If Gandalf wants their's to be the
default, they're going to have to give up the hope of getting money
from lisensing it.  They'll have to use the old Sun Microsystems notion
of giving away the idea, but implementing it sooner and better than
everyone else and possibly charging money for good implementations.

Even if you ignore the Internet icon of open-and-free, a non-free
default will evaporate in the PPP market.  The owners of workstations
and PC's and MAC's will ensure that the de facto default is a free
one.

Since Gandalf probably will not want to gamble with what they see as
their edge, you need to think seriously about which of the perhaps less
effective but free algorithms to choose as the default.

 ---

Yes, I still think you router guys are wrong about reliable link
protocols, but I refuse to argue about it again.  There are still
plenty of PPP protocol numbers available for more reasonable or at
least simpler ideas.


Vernon Schryver,  vjs@sgi.com


From owner-ppp-comp Wed May 19 23:06:11 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nw3lL-0000Ama@daver.bungi.com>; Wed, 19 May 93 23:05 PDT
X-Path: sgi.com!news
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Thu, 20 May 1993 05:37:48 GMT
Message-ID: <i0tvbm8@sgi.sgi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

dlr@daver.bungi.com (Dave Rand) writes:
> ...
> Again, the optimal alorithm for *everyone* to implement should be one
> that is fast, doesn't use much RAM, is free or cheap, and is simple
> to implement. And it would be nice if it gets 200:1!


"Cheap" is not cheap enough.  Nothing that requires a payment to anyone
can be a candidate for the default.  If Gandalf wants their's to be the
default, they're going to have to give up the hope of getting money
from lisensing it.  They'll have to use the old Sun Microsystems notion
of giving away the idea, but implementing it sooner and better than
everyone else and possibly charging money for good implementations.

Even if you ignore the Internet icon of open-and-free, a non-free
default will evaporate in the PPP market.  The owners of workstations
and PC's and MAC's will ensure that the de facto default is a free
one.

Since Gandalf probably will not want to gamble with what they see as
their edge, you need to think seriously about which of the perhaps less
effective but free algorithms to choose as the default.

 ---

Yes, I still think you router guys are wrong about reliable link
protocols, but I refuse to argue about it again.  There are still
plenty of PPP protocol numbers available for more reasonable or at
least simpler ideas.


Vernon Schryver,  vjs@sgi.com





From owner-ppp-comp Thu May 20 02:20:19 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nw6nB-00003Ma@daver.bungi.com>; Thu, 20 May 93 02:20 PDT
X-Path: sgi.com!news
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Thu, 20 May 1993 06:57:52 GMT
Message-ID: <i0v4s6m@sgi.sgi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

dlr@daver.bungi.com (Dave Rand) writes:
> ...
> Again, the optimal alorithm for *everyone* to implement should be one
> that is fast, doesn't use much RAM, is free or cheap, and is simple
> to implement. And it would be nice if it gets 200:1!


"Cheap" is not cheap enough.  Nothing that requires a payment to anyone
can be a candidate for the default.  If Gandalf wants their's to be the
default, they're going to have to give up the hope of getting money
from lisensing it.  They'll have to use the old Sun Microsystems notion
of giving away the idea, but implementing it sooner and better than
everyone else and possibly charging money for good implementations.

Even if you ignore the Internet icon of open-and-free, a non-free
default will evaporate in the PPP market.  The owners of workstations
and PC's and MAC's will ensure that the de facto default is a free
one.

Since Gandalf probably will not want to gamble with what they see as
their edge, you need to think seriously about which of the perhaps less
effective but free algorithms to choose as the default.

 ---

Yes, I still think you router guys are wrong about reliable link
protocols, but I refuse to argue about it again.  There are still
plenty of PPP protocol numbers available for more reasonable or at
least simpler ideas.


Vernon Schryver,  vjs@sgi.com








From owner-ppp-comp Thu May 20 02:48:34 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nw7Eg-0000Nta@daver.bungi.com>; Thu, 20 May 93 02:48 PDT
X-Path: sgi.com!news
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Thu, 20 May 1993 09:31:58 GMT
Message-ID: <i11d3mi@sgi.sgi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

dlr@daver.bungi.com (Dave Rand) writes:
> ...
> Again, the optimal alorithm for *everyone* to implement should be one
> that is fast, doesn't use much RAM, is free or cheap, and is simple
> to implement. And it would be nice if it gets 200:1!


"Cheap" is not cheap enough.  Nothing that requires a payment to anyone
can be a candidate for the default.  If Gandalf wants their's to be the
default, they're going to have to give up the hope of getting money
from lisensing it.  They'll have to use the old Sun Microsystems notion
of giving away the idea, but implementing it sooner and better than
everyone else and possibly charging money for good implementations.

Even if you ignore the Internet icon of open-and-free, a non-free
default will evaporate in the PPP market.  The owners of workstations
and PC's and MAC's will ensure that the de facto default is a free
one.

Since Gandalf probably will not want to gamble with what they see as
their edge, you need to think seriously about which of the perhaps less
effective but free algorithms to choose as the default.

 ---

Yes, I still think you router guys are wrong about reliable link
protocols, but I refuse to argue about it again.  There are still
plenty of PPP protocol numbers available for more reasonable or at
least simpler ideas.


Vernon Schryver,  vjs@sgi.com











From owner-ppp-comp Thu May 20 05:11:01 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nw9SI-0000W6a@daver.bungi.com>; Thu, 20 May 93 05:10 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Thu, 20 May 1993 05:10:27 PDT
Message-ID: <m0nw9S9-0000SjC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 20,  6:57, Vernon Schryver writes:]
> dlr@daver.bungi.com (Dave Rand) writes:
> > ...
> > Again, the optimal alorithm for *everyone* to implement should be one
> > that is fast, doesn't use much RAM, is free or cheap, and is simple
> > to implement. And it would be nice if it gets 200:1!
> 
> "Cheap" is not cheap enough.  Nothing that requires a payment to anyone

Uhh - Vernon? 

We got it.

(I think your gateway may be broken. The message ID's are different
coming from sgi.com.)


-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Thu May 20 06:20:21 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nwAXc-00006pa@daver.bungi.com>; Thu, 20 May 93 06:20 PDT
X-Path: sgi.com!news
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Thu, 20 May 1993 12:41:58 GMT
Message-ID: <i1465vs@sgi.sgi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

dlr@daver.bungi.com (Dave Rand) writes:
> ...
> Again, the optimal alorithm for *everyone* to implement should be one
> that is fast, doesn't use much RAM, is free or cheap, and is simple
> to implement. And it would be nice if it gets 200:1!


"Cheap" is not cheap enough.  Nothing that requires a payment to anyone
can be a candidate for the default.  If Gandalf wants their's to be the
default, they're going to have to give up the hope of getting money
from lisensing it.  They'll have to use the old Sun Microsystems notion
of giving away the idea, but implementing it sooner and better than
everyone else and possibly charging money for good implementations.

Even if you ignore the Internet icon of open-and-free, a non-free
default will evaporate in the PPP market.  The owners of workstations
and PC's and MAC's will ensure that the de facto default is a free
one.

Since Gandalf probably will not want to gamble with what they see as
their edge, you need to think seriously about which of the perhaps less
effective but free algorithms to choose as the default.

 ---

Yes, I still think you router guys are wrong about reliable link
protocols, but I refuse to argue about it again.  There are still
plenty of PPP protocol numbers available for more reasonable or at
least simpler ideas.


Vernon Schryver,  vjs@sgi.com














From owner-ppp-comp Thu May 20 06:59:33 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nwB9T-0000C8a@daver.bungi.com>; Thu, 20 May 93 06:59 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Thu, 20 May 1993 09:56:34 -0400 (EDT)
Message-ID: <9305201356.AA27203@hobbit.gandalf.ca>
References: <<i0skva0@rhyolite.wpd.sgi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> "Cheap" is not cheap enough.  Nothing that requires a payment to anyone
> can be a candidate for the default.  If Gandalf wants their's to be the
> default, they're going to have to give up the hope of getting money
> from lisensing it.  They'll have to use the old Sun Microsystems notion
> of giving away the idea, but implementing it sooner and better than
> everyone else and possibly charging money for good implementations.
> 
> Since Gandalf probably will not want to gamble with what they see as
> their edge, you need to think seriously about which of the perhaps less
> effective but free algorithms to choose as the default.

I would like to state that the license fees currently being considered 
by my management are relatively low when compared to V.42 bis standard
that modems use.  Even if I specify the algorithm in detail, you may
spend more developing the code than it costs to buy.  And in the end,
you'll spend plenty making it run fast, which is where most of my effort
has been spent.

I would like to know how much time and money people have spent working
out the details, writing the code, and debugging the compressed header
compression for IPX and IP.  I haven't seen anyone come up to me and
offer source code for all the PPP RFC's.  When they do, I'll give away
mine.

Included in the payment are royalty fees to the two authors of the
original code from which I started.

Vern, out the 10+ companies I have surveyed had no objectives to pay for
this code.  Are you alone, or are there others?  I'd like to hear. 

Feel free to pick another algorithm as a standard.  Many people are
satisfied with STAC type performance.  Of course, be prepared to dodge
the patent minefield when coming up with your own STAC compatable 
algorithm.  

From owner-ppp-comp Thu May 20 09:10:49 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nwDCY-0000FBa@daver.bungi.com>; Thu, 20 May 93 09:10 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: "Cheap" is not cheap enough
Date: Thu, 20 May 1993 12:11:45 -0400 (EDT)
Message-ID: <9305201611.AA25011@hobbit.gandalf.ca>
References: <<i1465vs@sgi.sgi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

In Dave Rands compression document, 9 of the 25 algorithms
can be immediately ruled out.  They are LZW based, and will cost
a minimum of $40,000 U.S. to license from Unisys and IBM.  This
assumes they violate none of the BT patents associated with 
V.42bis.  Otherwise the cost is up to $60,000.  

If the V.42 bis license for PPP is anything like the ones for
modems, you cannot use this code anyplace else.  It is product
type specific.

Of the remainder, only 4 are not LZ77 based.  All four (Predictor,
NSWP, Splay, and RLE are well down in the rankings in terms of
compression performance.

RLE and NSWP can be ruled out, unless someone wants to fight the RLE 
related patents.

I would like to know more about Predictor, so I can convince myself
that it doesn't infringe on any patents.

This brings the list down to 3, namely LZ77-derivatives, Predictor,
and Splay.

Of the 3, I favour an LZ77 derivative.  The patent issues associated
with LZ77 have been addressed by the major archivers.  LZ77 can be
made to run at high speed (STAC) with so-so compression performance,
or at high compression mode with lower throughput.

Here are some other reasons for LZ77:

(1) It is possible to write a STAC compatable algorithm without 
    infringing (IMHO) any patents.  This gives a HARDWARE solution
    today, plus a royalty free algorithm for the basic PPP
    compression;

(2) The absolute minimum memory requirements is 2K bytes.  This
    assumes compression in only one direction, and you're not the
    compressor.  For bidirectional compression, a minimum 
    implementation takes 16 Kbytes.

(3) The window size can be negotiated to the capabilities of the
    boxes;

(4) I can write the specification in one number, the STAC patent
    document number(s).  

(5) The algorithm can be extended to be FZA compatable :-)


From owner-ppp-comp Thu May 20 09:10:54 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nwDCW-0000Tra@daver.bungi.com>; Thu, 20 May 93 09:10 PDT
X-Path: sgi.com!news
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Thu, 20 May 1993 14:03:42 GMT
Message-ID: <i15cfis@sgi.sgi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

dlr@daver.bungi.com (Dave Rand) writes:
> ...
> Again, the optimal alorithm for *everyone* to implement should be one
> that is fast, doesn't use much RAM, is free or cheap, and is simple
> to implement. And it would be nice if it gets 200:1!


"Cheap" is not cheap enough.  Nothing that requires a payment to anyone
can be a candidate for the default.  If Gandalf wants their's to be the
default, they're going to have to give up the hope of getting money
from lisensing it.  They'll have to use the old Sun Microsystems notion
of giving away the idea, but implementing it sooner and better than
everyone else and possibly charging money for good implementations.

Even if you ignore the Internet icon of open-and-free, a non-free
default will evaporate in the PPP market.  The owners of workstations
and PC's and MAC's will ensure that the de facto default is a free
one.

Since Gandalf probably will not want to gamble with what they see as
their edge, you need to think seriously about which of the perhaps less
effective but free algorithms to choose as the default.

 ---

Yes, I still think you router guys are wrong about reliable link
protocols, but I refuse to argue about it again.  There are still
plenty of PPP protocol numbers available for more reasonable or at
least simpler ideas.


Vernon Schryver,  vjs@sgi.com

















From owner-ppp-comp Thu May 20 16:31:36 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nwK4d-00005za@daver.bungi.com>; Thu, 20 May 93 16:30 PDT
X-Path: rhyolite.wpd.sgi.com!vjs
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Thu, 20 May 93 11:44:26 -0600
Message-ID: <9305201744.AA05770@rhyolite.wpd.sgi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk


I appologize for the duplicates.

We use a internal usenet newsgroups for the X, IETF, and a lot
of other mailing lists.  My mechanisms for determining when
something came from an internal source instead of from an external
source are fooled by the headers as they come from bungi.com.

I don't see a possible fix, so I've turned off part of the gateway.


vjs



From owner-ppp-comp Thu May 20 16:31:39 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nwK4c-00000ta@daver.bungi.com>; Thu, 20 May 93 16:30 PDT
X-Path: rhyolite.wpd.sgi.com!vjs
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
To: ppp-comp@bungi.com
Subject: Re: "Cheap" is not cheap enough
Date: Thu, 20 May 93 11:41:25 -0600
Message-ID: <9305201741.AA05760@rhyolite.wpd.sgi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> In Dave Rands compression document, 9 of the 25 algorithms
> can be immediately ruled out.  They are LZW based, and will cost
> a minimum of $40,000 U.S. to license from Unisys and IBM.  This
> assumes they violate none of the BT patents associated with 
> V.42bis.  Otherwise the cost is up to $60,000.  

Oh, nonsense!

We workstation vendors have been openly shipping the BSD UNIX compress
command for many years.  

If UNISYS or IBM wanted to come after us, they could not, because they
have not.  Even if UNISYS had not pubically given away the software
rights about 6 years ago.

It might be different for you guys making special purpose boxes.


vjs



From owner-ppp-comp Thu May 20 16:31:41 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nwK4e-0000C6a@daver.bungi.com>; Thu, 20 May 93 16:30 PDT
X-Path: rhyolite.wpd.sgi.com!vjs
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Thu, 20 May 93 11:50:03 -0600
Message-ID: <9305201750.AA05879@rhyolite.wpd.sgi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk


> I would like to state that the license fees currently being considered 
> by my management are relatively low when compared to V.42 bis standard
> that modems use.  Even if I specify the algorithm in detail, you may
> spend more developing the code than it costs to buy.  And in the end,
> you'll spend plenty making it run fast, which is where most of my effort
> has been spent.
> 
> I would like to know how much time and money people have spent working
> out the details, writing the code, and debugging the compressed header
> compression for IPX and IP.  I haven't seen anyone come up to me and
> offer source code for all the PPP RFC's.  When they do, I'll give away
> mine.
> 
> Included in the payment are royalty fees to the two authors of the
> original code from which I started.
> 
> Vern, out the 10+ companies I have surveyed had no objectives to pay for
> this code.  Are you alone, or are there others?  I'd like to hear. 
> 
> Feel free to pick another algorithm as a standard.  Many people are
> satisfied with STAC type performance.  Of course, be prepared to dodge
> the patent minefield when coming up with your own STAC compatable 
> algorithm.  


The issue has nothing to do with how good the Gandalf algorithm is.
The issue has nothing to do with how much it might cost to write
and debug code.  It has nothing to do with the size of the royalty.

There is free code for all of the non-proprietary PPP protocols
(i.e. IP.  I intend no disrespect for the other protocols).  Look on
uunet.  Some of it has Brad Clements' "do not use this in a commercial
product" copyright.  The rest is not even that encumbered.

Have you contacted any of the big workstation vendors?  I do not
believe any of Sun, HP, DEC, IBM, or SGI can or will lisense your
algorithm.  The personal costs of jumping through the hoops our own
legal departments require are enormous.  We just will not do it unless
the gain is substantial.  If you can offer us a full, supported release
of all of PPP for our individual operating systems for say $50,000,
then it might fly with some but not all of us.  If you want to charge
$1000 for the right to write an implemenation of a minor feature of
PPP, then it will not happen at all.  Even if I wanted to waste lots
of my salary arguing with lawyers for a minor feature of PPP done wrong
(i.e. requiring reliable links), my management would not want to spend
its time helping me argue with those lawyers.  They would rightly point
out that there are much more valuable things to spend their time on, even
things that involve arguing with those same lawyers.

This has nothing to do with your lisense, which I trust is easy to
accept by any reasonable person.  Lawyers are scum, and insist on
arguing about everything.  

You have no hope of ever getting the IAB and the IETF in general to
allow into "the standards track" something that requires implementors
to pay Gandalf a fee to implement it.  You can have an informational
RFC, like the RFC's for NFS.

What do you think will appear in NetBSD, 386bsd, BSD/386, Solaris for
Intel systems, NT, OS/2, and the zillions of tiny outfits making
X,PPP,etc for DOS, NT, and OS/2?  Something that requires a lisense
or whatever alternative compression that does not?

Consider the lessons avaiable from PEM and PGP.


If you really believe in lisense fees, then why aren't you using v.42
and v.42bis for the link layer and its compression?  That would give
you far more interoperability, and more standards-clout.  I might be
able to sell SGI management on paying those fees.

The entire issue of patents on compression software is overblown.  All
of the major workstation vendors and almost everyone else is shipping
the BSD UNIX `compress` command.  We obviously do not have any fears of
the UNISYS patent.  Given the many years we've been doing it so openly,
UNISYS would have trouble enforcing the patent, even if they had not
publically given it away for software types about 6 years.

Almost all of you guys seem to be router guys.  That is why you are all
so interested in LAPB, for which you have hardware suppport.  That is
why the v.42bis analogy makes sense to you, but you don't want to 
actually use v.42bis.


vjs



From owner-ppp-comp Fri May 21 09:37:13 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nwa4z-0000EKa@daver.bungi.com>; Fri, 21 May 93 09:36 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: "Cheap" is not cheap enough
Date: Fri, 21 May 1993 09:49:54 -0400 (EDT)
Message-ID: <9305211349.AA02067@hobbit.gandalf.ca>
References: <<9305201741.AA05760@rhyolite.wpd.sgi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> If UNISYS or IBM wanted to come after us, they could not, because they
> have not.  Even if UNISYS had not pubically given away the software
> rights about 6 years ago.

Just because they don't want to fight over UNIX compress, doesn't mean
they won't fight if you imbed it in a box type product.  Will the stack
of LZW related patents on my desk, I would sooner avoid them all.  It
would cost substantially more to fight them than to use an alternative.
I like LZW.  It's fast and get reasonable compression.

From owner-ppp-comp Fri May 21 12:50:18 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nwd6W-0000NJa@daver.bungi.com>; Fri, 21 May 93 12:50 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Fri, 21 May 1993 14:32:36 -0400 (EDT)
Message-ID: <9305211832.AA20796@hobbit.gandalf.ca>
References: <<9305201750.AA05879@rhyolite.wpd.sgi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> 
> There is free code for all of the non-proprietary PPP protocols
> (i.e. IP.  I intend no disrespect for the other protocols).  Look on
> uunet.  Some of it has Brad Clements' "do not use this in a commercial
> product" copyright.  The rest is not even that encumbered.

Please post an archive site for the following:

(1) LAPB
(2) CIPX

Both of these are required by bridge/routers but nobody is questioning
them as a standard.  Anybody who doesn't have LAPB today must buy a 
working copy, develop it themselves, or use an integrated controller.
How can you keep saying that all the accepted/proposed standard are
available in working source code in the public domain?
> 
> Have you contacted any of the big workstation vendors?  I do not
> believe any of Sun, HP, DEC, IBM, or SGI can or will lisense your
> algorithm.  The personal costs of jumping through the hoops our own
> legal departments require are enormous.  We just will not do it unless
> the gain is substantial.  If you can offer us a full, supported release
> of all of PPP for our individual operating systems for say $50,000,

If you would do the same for us, we would buy you're code!  Instead,
we chose to buy the integrated and conformance tested PPP source from
another vendor.  We could have developed it ourselves, but even getting
from the published code to what we bought would have costed more money
and required man-hours which we could ill-afford.  

Instead, we decided to do a piece of the puzzle that hadn't been addressed.

> (i.e. requiring reliable links), my management would not want to spend
> its time helping me argue with those lawyers.  They would rightly point
> out that there are much more valuable things to spend their time on, even
> things that involve arguing with those same lawyers.
> 
> This has nothing to do with your lisense, which I trust is easy to
> accept by any reasonable person.  Lawyers are scum, and insist on
> arguing about everything. 

Yes.  So pay them to do the patent searches on every algorithm you
propose as a standard.  They'll love you.   
> 
> You have no hope of ever getting the IAB and the IETF in general to
> allow into "the standards track" something that requires implementors
> to pay Gandalf a fee to implement it.  You can have an informational
> RFC, like the RFC's for NFS.

Okay.  Suppose that I publish the details of the algorithm, and not
provide the source code.  This wouldn't be a problem then?  If I even
told you where to get the original public domain code, which was my
starting point, do you think you'll get a working version before
spending the amount I want to charge for working and field tested
code?  How fast would yours run?  Do you care? 

Would the IETF accept an algorithm description?  It must, otherwise
PPP wouldn't fly.  Or I could give out many incomplete and incompatibly
written chunks of code (like PPP) and then it would be okay?

> What do you think will appear in NetBSD, 386bsd, BSD/386, Solaris for
> Intel systems, NT, OS/2, and the zillions of tiny outfits making
> X,PPP,etc for DOS, NT, and OS/2?  Something that requires a lisense
> or whatever alternative compression that does not?

Again, they have the right to chose. They can make their own to get the
same performance.  I doubt they'll give theirs away though.

Then again, it makes more sense for them to put it at the session
layer though.  Then they could use LZW :-)
> 
> Consider the lessons avaiable from PEM and PGP.

In these cases, equivalent or good enough alternatives were found.  If
you find a free algorithm equivalent to FZA, I'll just on that bandwagon.
But please don't suggest that STAC or PREDICTOR is even close.  I didn't
see STAC getting kicked out the PPP meetings for suggesting theirs as the
standard.  I saw a lot of people listening.  What people didn't like was
the administration charge per copy.

FZA won't be for the masses.  The majority of vendors don't have a lot of
CPU left in their boxes and are tight on memory.  For them, STAC or
PREDICTOR are viable alternative.  We have the horsepower.  We designed in
compression 2 years ago.  What's right for me won't be right for you.  
We are counting that when a customer wants compression, he's willing to
pay to save line costs.  The way I see it, he'll need 2.695/1.668 = 62%
more line with PREDICTOR.  I don't think that will be a hard sell for us.

PPP wants everything to retrofit to all that old hardware.  That's nice.  

Does it mean I have to cripple my box because of old boxes?  Sorry.
Look, I've lived with padding to a byte boundary, padding to a word 
boundary, redundancy in the header, length fields, CRC's.  How about
we send HALF the data through and just call it compression.  The customer
won't care.  Let's not lose sight of the reason for compression.  It is
to maximize link utilization. 

Go ahead and pick another standard.  Let the customer decide.  I'm willing 
to do both algorithms, the standard one and FZA.

> If you really believe in lisense fees, then why aren't you using v.42
> and v.42bis for the link layer and its compression?  That would give
> you far more interoperability, and more standards-clout.  I might be
> able to sell SGI management on paying those fees.

Look at Dave Rand's comparison and you'll know the answer.  FZA is our
second generation bridge compression algorithm.  Even the old one 
beat V.42 bis on speed and compression.  If there isn't a suitable algorithm, 
you make one.  If there is one buy it.  Simple.

> Almost all of you guys seem to be router guys.  That is why you are all
> so interested in LAPB, for which you have hardware suppport.  That is
> why the v.42bis analogy makes sense to you, but you don't want to 
> actually use v.42bis.

We haven't got to the full blown router stage yet, but thanks for the compliment.
We bought some router code too!  Could have taken Comer's code and got it
working too.  Just not cost effective.  (Can anyone give me CISCO's IP routing
code?  I know how IP works, but I'll bet they did a lot more work on it than
I can afford to do.) 

But LAPB is from our statistical and X.25 multiplexers.  We didn't invent it.
But it has it's uses.  You and others are of the opinion it has no useful
purpose.  So spend all kinds of time making a non-reliable link work.  It's
easier for me to use what I already know and have working.  
Why should we both have to switch to a non-reliable link so it's a fair race?

From owner-ppp-comp Fri May 21 22:56:14 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nwmYp-0000Ppa@daver.bungi.com>; Fri, 21 May 93 22:55 PDT
X-Path: rhyolite.wpd.sgi.com!vjs
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Sat, 22 May 93 00:06:47 -0600
Message-ID: <9305220606.AA27956@rhyolite.wpd.sgi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk



> > There is free code for all of the non-proprietary PPP protocols
> > (i.e. IP.  I intend no disrespect for the other protocols).  Look on
> > uunet.  Some of it has Brad Clements' "do not use this in a commercial
> > product" copyright.  The rest is not even that encumbered.
> 
> Please post an archive site for the following:
> 
> (1) LAPB
> (2) CIPX
> ...

LAPB is not yet a part of PPP.
Isn't CIPX what might be called "proprietary"?


> > Have you contacted any of the big workstation vendors?  I do not
> > believe any of Sun, HP, DEC, IBM, or SGI can or will lisense your
> > algorithm....

>                                                        ...  Instead,
> we chose to buy the integrated and conformance tested PPP source from
> another vendor.  We could have developed it ourselves, but even getting
> from the published code to what we bought would have costed more money
> and required man-hours which we could ill-afford.  

Of course.  We agree.  TANSTAAFL.
I think SGI's PPP won't be based on the public code, for reasons
I hope and think are right.


> > This has nothing to do with your lisense, which I trust is easy to
> > accept by any reasonable person.  Lawyers are scum, and insist on
> > arguing about everything. 
> 
> Yes.  So pay them to do the patent searches on every algorithm you
> propose as a standard.  They'll love you.   

They'd love me just as much if I tell them to talk to you about
your lisense.  The issue is not their love for me, but my love for 
what they'd make me do.

By the way, does your lisense imdemnify your lisensees should someone
decide that FZA infringes?  If not, and I don't see how Gandalf would
be crazy enough to do it, the patent protection afforded by FZA is
about as useful as my claim that LZW was released to the public domain
for software use by UNISYS years ago.  In either case, due dilegence
may be required.


> > You have no hope of ever getting the IAB and the IETF in general to
> > allow into "the standards track" something that requires implementors
> > to pay Gandalf a fee to implement it.  You can have an informational
> > RFC, like the RFC's for NFS.
> 
> Okay.  Suppose that I publish the details of the algorithm, and not
> provide the source code.  This wouldn't be a problem then?  If I even
> told you where to get the original public domain code, which was my
> starting point, do you think you'll get a working version before
> spending the amount I want to charge for working and field tested
> code?  How fast would yours run?  Do you care? 

No problem.
Initial code would be great, but not required.
I could probably not get it working as soon or for less $$, by a long shot.
I'm egotistical to think my code is eventually as fast as anyone's.
I do care, but intial cost and time-to-market are only two of the
    important considerations, and lisensing and so forth is only one of
    the issues for those considerations.

> Would the IETF accept an algorithm description?  It must, otherwise
> PPP wouldn't fly.  Or I could give out many incomplete and incompatibly
> written chunks of code (like PPP) and then it would be okay?

Of course.
Yes.
What's the point?


> > What do you think will appear in NetBSD, 386bsd, BSD/386, Solaris for
> > Intel systems, NT, OS/2, and the zillions of tiny outfits making
> > X,PPP,etc for DOS, NT, and OS/2?  Something that requires a lisense
> > or whatever alternative compression that does not?
> 
> Again, they have the right to chose. They can make their own to get the
> same performance.  I doubt they'll give theirs away though.

NetBSD and 386bsd are completely free and complete implementations of
BSD UNIX.  Completely free if you have an Internet link.  Free for the
cost of the media if not.  The media is floppies, 1/4" tape, or CDROM.
Source included, of course.  set up on a single floppy so that you can
install the rest of the system over a SLIP or (I think) PPP link--if
only you can find a modem and a host time to move the ~50MBytes.

If you think ISDN cards will soon be cheap enough (I think so)
and ISDN lines available (I hope so), then there will be a free
compressing PPP for NetBSD, 386bsd, and BSD/386.  You can count on it,
just as the commercial X server vendors unfortunately found they could
count on better free X servers than they could make themselves.
Software is a crazy (crazier?) business right now.


> ...
> > Consider the lessons avaiable from PEM and PGP.
> 
> In these cases, equivalent or good enough alternatives were found.

You missed my point.  The RSA lisensing requried of PEM is crippling
PEM, while the unlisensed, illegal, and non-standard PGP is thriving
because of its lack of lisense.  I agree that PEM will survive in the
long run, but that run is longer than it might have been, or looked
like it was going to be a couple of years ago.


> ...
> FZA won't be for the masses.  The majority of vendors don't have a lot of
> CPU left in their boxes and are tight on memory.  For them, STAC or
> PREDICTOR are viable alternative.  We have the horsepower.  We designed in
> compression 2 years ago.  What's right for me won't be right for you.  
> We are counting that when a customer wants compression, he's willing to
> pay to save line costs.  The way I see it, he'll need 2.695/1.668 = 62%
> more line with PREDICTOR.  I don't think that will be a hard sell for us.

That's great.
A wonder-compresser that costs $10,000 per copy would be fine,
but not as the default.

I'd quibble about the power of typical boxes, except my perspective may
be warped by boxes with 32 150MHz R4400's with 64 GigaByte of common
DRAM.  Maybe router boxes are generally underpowered.  I don't know,
but I do wonder about anyone still using Multibus-I.


>..
> Does it mean I have to cripple my box because of old boxes?...

So don't.  Let FZA be one of the alternatives.  Charge as much as you
want.

If you're right that your competators do not have the horsepower to use
FZA, then you should make FZA free and the default, and release a free
mediocre source implementation, so that they'll be crippled by too good
an algorithm.

Remember that Sun's use of "give the code away", is a significant part
of why there is a Sun but no Apollo today, despite the fact that
Apollo invented workstations.  Sun gave away code for RPC from the
start.  Then they wrote RFC's for RPC and NFS.  And they lisensed their
NFS and YP code to competators like Silicon Graphics for low terms.


> Go ahead and pick another standard.  Let the customer decide.  I'm willing 
> to do both algorithms, the standard one and FZA.

That's all I'm urging, if you can't make FZA free.

If I were a Gandalf stockholder or big boss, I'd be yelling at you
about Kodak ("give cameras away to sell film") and Sun ("give NFS away
to sell workstations").

 
> ...
>             You and others are of the opinion [LAPB] has no useful
> purpose.  So spend all kinds of time making a non-reliable link work.  It's
> easier for me to use what I already know and have working.  
> Why should we both have to switch to a non-reliable link so it's a fair race?

Wrong, on 2 counts.  First is that what you find comfortable matters in
both a joint development and a standards committee, but in opposite
directions.  "Mutual assured disadvantage" is a real and valid concern
in standards committees.  If a reasonable case could be made that LAPB
was chosen only because some vendors were more comfortable with it,
then the IAB would be required to reject it, to avoid suffering those
infamous anti-trust lawsuits that have happened to other standards
organizations.   This issue has come up in the main IETF mailing list
in the last week or two.

Your saying that about LAPB in this mailing list is dangerous.  It
could be used in court to show that a cabal of router vendors excluded
an equally good technical, but cheaper and easier solution simply to
protect their competative positions.  That you even let yourself think
such a thought out loud makes me doubt all of the words here about
patent searches.


Second,  I think
    LAPB has uses for routers.
    LAPB is wrong for stations.

This is as true now and concerning compression as it was years ago
for simple IP when the PPP group was formed by locking the SLIP guys in
the same room as the interoperable router guys.

Routers must compress traffic for lots of TCP connections at once,
    but stations do not.
ROuters need to bridge other traffic, but stations do not.
Routers need to handle more than a single protocol stack, but
    stations generally do not.

Reliable links may be necessary for compression for routers, but
are not needed for the stations.

It would be nice if the same compression algorithms were available
whether you use LAPB or VJ-header prediction on unreliable links.

Please don't quibble.  I'll use any pair of words you want to
distinquish a "router", what you and Cisco and ACC build, from a
"station", a Sun, SGI, or 486 PC-AT box.

 ---

Bad things happen in standards committees when one member decides to
make a little money or just recover some costs by selling to other
members.  Much of the years of delay and bad features of FDDI are
directly attributable to just such decisions by a few members of
ANSI-X3T9.5.

Standards committees are different from joint developments.  If you
really want to be doing joint development, then please, for your own
sakes as well as the IETF's, leave the IETF umbrella and form a plain
old consortium.


vjs



From owner-ppp-comp Fri May 21 23:27:21 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nwn32-00006wa@daver.bungi.com>; Fri, 21 May 93 23:27 PDT
X-Path: rhyolite.wpd.sgi.com!vjs
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
To: ppp-comp@bungi.com
Subject: Re: "Cheap" is not cheap enough
Date: Sat, 22 May 93 00:19:41 -0600
Message-ID: <9305220619.AA28050@rhyolite.wpd.sgi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk


> > If UNISYS or IBM wanted to come after us, they could not, because they
> > have not.  Even if UNISYS had not pubically given away the software
> > rights about 6 years ago.
> 
> Just because they don't want to fight over UNIX compress, doesn't mean
> they won't fight if you imbed it in a box type product.  Will the stack
> of LZW related patents on my desk, I would sooner avoid them all.  It
> would cost substantially more to fight them than to use an alternative.
> I like LZW.  It's fast and get reasonable compression.


My "come after us", I mean "come after us workstation vendors."

I freely admit that things might be different for box type products.

That things are different is even consistent with my understand
of the UNISYS release.

That does imply that a solution might be to label you box type products
in a way that makes them more like workstation boxes than specialized
piles of silicon.


vjs



From owner-ppp-comp Sat May 22 05:26:19 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nwseM-00004ta@daver.bungi.com>; Sat, 22 May 93 05:26 PDT
X-Path: fcr.com!brad
From: Brad Parker <brad@fcr.com>
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed? 
Date: Sat, 22 May 1993 08:17:22 -0400
Message-ID: <9305221218.AA29023@stemwinder.fcr.com>
References: <<vjs@rhyolite.wpd.sgi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk


[ I don't think we're organized enought to be concidered for anti-trust ;-) ]

>> Second,  I think
>>     LAPB has uses for routers.
>>     LAPB is wrong for stations.
>> 
>> This is as true now and concerning compression as it was years ago
>> for simple IP when the PPP group was formed by locking the SLIP guys in
>> the same room as the interoperable router guys.

I thought I sent an explanation of why LAPB is good for end-nodes to this
list - perhaps it did not get to the list or perhaps you did not buy
it.  I assume it didn't get to the list (if you didn't buy it, please
say so - it's just one person's opinion)

Two sentence summary: Millions of small end-nodes (macintoshes and
pc's) have very poor serial port hardware.  Without a reliable
transport (such as LAPB) they will perform poorly as they will exhibit
an very high percentage of dropped packets (like 10%).

>> Reliable links may be necessary for compression for routers, but
>> are not needed for the stations.

as long as the station has good solid serial ports ;-)

>> It would be nice if the same compression algorithms were available
>> whether you use LAPB or VJ-header prediction on unreliable links.
>> 
>> Please don't quibble.  I'll use any pair of words you want to
>> distinquish a "router", what you and Cisco and ACC build, from a
>> "station", a Sun, SGI, or 486 PC-AT box.

(be careful - an 486 pc can outperform a 68020/68030 based router)

>> Bad things happen in standards committees when one member decides to

I guess I'm confused by this.  At the last IETF I was under the
impression that we were going to use a free compression method which
would do a nominal 2:1 and best case 4:1.  I was excited.

Having written a v.42bis from scratch and spent time talking to IBM,
Microcom (the unisys agent) and BT, I don't plan to use another
proprietary compression technique in the near future.

(and you're correct about the LZW-on-unix vs. LZW-in-an-embeded-system;
this appears to be a concious decision by some of the patent holders;
I asked and they said as much)

-brad

From owner-ppp-comp Sat May 22 10:01:42 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nwwwu-000027a@daver.bungi.com>; Sat, 22 May 93 10:01 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Sat, 22 May 1993 10:01:27 PDT
Message-ID: <m0nwwwp-0000NMC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 22,  0:06, Vernon Schryver writes:]
> Standards committees are different from joint developments.  If you
> really want to be doing joint development, then please, for your own
> sakes as well as the IETF's, leave the IETF umbrella and form a plain
> old consortium.
> 

Just so that everyone is clear: Fred Baker passed the RFC's to me to work
on. I am involving as many people as are interested in compression because I
don't think I know everything there is to know about compression algorithms,
and methods.  The goal here, as stated many times now, is to:

1) Figure out a method to negotiate compression.
2) Figure out a compression standard.

Since I didn't, and don't believe that there should be only one standard
compression algorithms, I extended part 1 to negotiate multiple compression
algorithms. There still must be at least one algorithm that is common to
everyone.

I am not closing my mind to *ANY* approach - that may include reliable links
or not, and may be a 2:1 compression or an 200:1. But we will have a
standard, and it will be agreed on by the majority of the people on this
list - not just me (or Dave, or Dave, or Vernon).

Please folks, let's keep our focus.

-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Sat May 22 10:10:55 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nwx5r-00000ya@daver.bungi.com>; Sat, 22 May 93 10:10 PDT
X-Path: mail.barrnet.net!brian
From: brian@mail.barrnet.net
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Sat, 22 May 1993 10:00:04 -0800
Message-ID: <9305221700.AA23276@Angband.Stanford.EDU>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

Brad Parker wrote:

>Two sentence summary: Millions of small end-nodes (macintoshes and
>pc's) have very poor serial port hardware.  Without a reliable
>transport (such as LAPB) they will perform poorly as they will exhibit
>an very high percentage of dropped packets (like 10%).

Brad:

I usually agree with you but we differ in opinion here.  First, I think
that serial hardware on the typical PC works better than you think.  I run
MacPPP on my Powerbook at 57600 bps with very few losses.  PC harware works
well at moderate speeds (up to 19200 bps) with 16450 UARTs and well over
57600 bps with 16550 FIFOed UARTs.

Secondly, I think that you overstate the need for error correction on a
link that loses 10% of packets.  I have extensive experience from amateur
packet radio with IP/TCP over excessively lossy links.  Amateur packet
radio uses AX.25 extensively which is a slightly modified version of LAPB
(the mods essentially consist of the inclusion of a source and destination
addresses so that it can be used over a shared channel).  A link that lost
only 10% of packets was considered to be a very good link.  SOTA when I was
experimenting was a packet loss rate in the 20%-30% range.  Experience
indicated that, until the packet loss rate reached 40%-50% it was better to
send the IP/TCP dgrams/segments in UI frames without error correction
(pretty close to PPP) and rely on TCP for retransmission than it was to try
to "improve" the apparent BER by turning on LAPB.  The overhead for LAPB
increased latency and reduced throughput.

Where turning on LAPB became a win was in the situation where we had to
traverse multiple lossy links (traversing 3 links, each with a 50% packet
loss rate would yield a total end-to-end packet delivery of about 12%). 
Then the loss of multiple TCP segments in a row triggered the TCP backoff
algorithm and caused throughput to drop precipitously (as if it wasn't
already bad enough :^).  In that case LAPB on the links was a win because
the probability of a segment making it to the other end approached unity,
albeit with *huge* delays.  At least that was something with which TCP
could cope.

Amateur packet radio was never good for much but it sure let you see how
the different protocols worked in the equivalent of networking hell.  :^) 
My suggestion to you is to cook up some monte carlo simulations of your
link and fiddle with the parameters.  You may be quite surprised at the
performance of IP/TCP (and other end-to-end reliable transport services)
over UI/PPP vs. IP/TCP over LAPB/PPP for marginally lossy links.  I think
that you will find that you can sustain moderately large packet loss rates
before LAPB begins to buy you anything.  The key point is to try it and not
argue guesses.

Brian Lloyd                                       3420 Sudbury Road
brian@lloyd.com                                   Cameron Park, CA  95682
brian@mail.barrnet.net                            (916) 676-3442 - fax
(415) 725-1392                                    (916) 676-1147 - voice


From owner-ppp-comp Sat May 22 10:17:30 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nwxCF-00000ya@daver.bungi.com>; Sat, 22 May 93 10:17 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Sat, 22 May 1993 10:17:20 PDT
Message-ID: <m0nwxCD-000010C@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 22,  8:17, Brad Parker writes:]
> I guess I'm confused by this.  At the last IETF I was under the
> impression that we were going to use a free compression method which
> would do a nominal 2:1 and best case 4:1.  I was excited.

I have one algorithm under patent search right now. If it comes up clean,
then (subject to approval by my management) I will offer it for
consideration. It does 8:1 best case, and 1.668:1 in my test case.
It involves about 20 instructions per byte - about 10 lines of C for
the compressor, and 10 for the decompressor.

I have contacted the Info-ZIP folks with regard to their implementation
of LZ77, and they have no problem with their code being used in the PPP
implementations. It does 905:1 best case, and 2.721:1 in my test case.
There are patent issues to be explored with this (and I am doing so).

I have also proposed using the LZW implementation found in the UNIX
compress algorithm.  There is a question as to its patent status, and
I am having this review currently. Its best case compression is 550:1,
and on my test case it gets 2.235:1.

Gandalf has proposed their algorithm, but I have not received it for
evaluation yet. They are still working out how to license it. Their
best case compression is unknown, and on my test case they get 2.695:1.

STAC has proposed their algorithm, but I don't think that anyone is
considering it as a standard at this time. It may be one of the
commonly available ones though.

Some other parties, not wishing to be indentified at this time, have
proposed compression algorithms not requiring a reliable link.  I have
received these implementations, and I'm reviewing them now. I will report
on them in this forum as soon as I have been advised that I may do so.

To this point, no other algorithms have been proposed or are under 
consideration. If I have omitted anyone's favourite algorithm, please
let me know!

-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Sat May 22 10:29:59 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nwxOI-00009va@daver.bungi.com>; Sat, 22 May 93 10:29 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Sat, 22 May 1993 10:29:45 PDT
Message-ID: <m0nwxOE-00002MC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 22, 10:00, brian@mail.barrnet.net writes:]
>  The key point is to try it and not argue guesses.

Agreed! This is where I found much of my data. Theoretically, all agorithms
with similar compression ratios should improve the apparent throughput of
a link. They don't.

I've asked Diane Heckman to share some of her experience with us, as she
has done some extensive analysis of various compression products in real
world situations. I hope she'll do this next week.



-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Sat May 22 13:28:57 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nx0BG-0000Ada@daver.bungi.com>; Sat, 22 May 93 13:28 PDT
X-Path: microsoft.com!tommyd
From: Thomas Dimitri <tommyd@microsoft.com>
To: ppp-comp@bungi.com
Subject: Compression negotiation
Date: Sat, 22 May 93 12:36:19 TZ
Message-ID: <9305221941.AA00662@netmail.microsoft.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

| The goal here, as stated many times now, is to:
|
| 1) Figure out a method to negotiate compression.
| 2) Figure out a compression standard.
|
| Since I didn't, and don't believe that there should be only one standard
| compression algorithms, I extended part 1 to negotiate multiple compression
| algorithms. There still must be at least one algorithm that is common to
| everyone.

Of course it should be negotiable.  I should hope we all agree on that.
Better compression algorithms may come out in the future (or cheap
ones on chips) or certain vendors may have better algorithms or
certain links may transmit a very specific sort of data and thus have
an affinity for a specific compression algorithm.  I can't see much harm
in negotiating multiple compression algorithms (except added complexity).

So to get the ball rolling, I'm going to make a few suggestions.  I'm sure,
in this group, I'll get wonderfully critcized, but that's the whole point of
my suggestion anyway.

In my jargon, I will refer to a client and server.  The client is the 
one calling
in and the server is the one who answers the call and puts or routes 
the client on
some network.  Although you could have servers calling other servers as 
well.  I'm
just trying to distinguish between the callee and the caller without using the
word callee because I don't like that word.

How about two 32 bit fields for compression negotiation?  These two 32 
bit fields
correspond to what compression scheme the client can send and receive.  
That is,
a client may wish ONLY TO DECOMPRESS.  It cannot send up compressed data.
Why would you want to distinguish this?  Well, a client may not have 
much memory -
perhaps it's running (GOD forbid) DOS, it's a PDA, or the firmware just doesn't
have it.  Well, with LZ77 based algorithms, the decompressor only has to keep
a history buffer, it doesn't need anything else and that is much less memory
then the compressor has to keep for its structures.  Plus, the client
might assume that most of the time it is 'downloading' anyway.

Each bit in the 32 bit field refers to a compression scheme.  We'll have to
assign these bit positions to every different compression scheme we come
up with (perhaps it should be a larger field then?) Perhaps they get assigned
like this...

Bit position 0 (LSB) - V.42bis
Bit position 1       - STAC
Bit position 2       - PK-ZIP

A client calling in may only be able to compress/decompress V.42bis and it can
also decompress STAC.  The server can compress/decompress all 3.

So the client sends these two 32 bits fields
I can COMPRESS -- 00000000000000000000000000000001
I can DECOMPRESS -00000000000000000000000000000011
The server takes what the client can COMPRESS and does a logical AND on 
what the
SERVER can DECOMPRESS.  It also takes what the client can DECOMPRESS 
and ANDs that
bit field with what it can COMPRESS.  It then sends these two 32 bit 
ANDED fields
back to the client.  These two 32 bit fields are the union of what the client
and server can mutually understand.

At this point, the client may wish to trim down the possibilities.  It 
might then
decide to just use the V.42bis algorithm and get rid of having to worry
about some packets possible containing STAC compression.  It sends the updated
bit fields to the server.  The server may trim down the list even more and
it sends back the list to the client.  At this point, compression has been
negotiated.

Any thoughts?

P.S. One problem I might quickly point out is that the scheme above is 
very simple,
perhaps too simple.  It does not negotiate buffer size or dictionary size (or
how the packet might be encoded).  So after client and server decide on what
scheme to go with, they can then negotiate the parameters of the scheme one by
one in a similar fashion as to how they negotiated which 
compression/decompression
scheme to use.

Also, I think it is therefore possible that NO compression scheme be 
negotiated.
However, I agree that we should all make a joint effort to all support at least
one compression/decompression scheme.  --Thomas

From owner-ppp-comp Sat May 22 18:37:27 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nx502-00002Fa@daver.bungi.com>; Sat, 22 May 93 18:37 PDT
X-Path: rhyolite.wpd.sgi.com!vjs
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Sat, 22 May 93 19:05:40 -0600
Message-ID: <9305230105.AA05538@rhyolite.wpd.sgi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk


Brad Parker <brad@fcr.com> writes:
> 
> [ I don't think we're organized enought to be concidered for anti-trust ;-) ]

The only trusts that get busted are the ones too disorganized to keep
quiet.  It's not a joking matter.


> >> Second,  I think
> >>     LAPB has uses for routers.
> >>     LAPB is wrong for stations.
> >> 
> >> This is as true now and concerning compression as it was years ago
> >> for simple IP when the PPP group was formed by locking the SLIP guys in
> >> the same room as the interoperable router guys.
> 
> I thought I sent an explanation of why LAPB is good for end-nodes to this
> list - perhaps it did not get to the list or perhaps you did not buy
> it.  I assume it didn't get to the list (if you didn't buy it, please
> say so - it's just one person's opinion)
> 
> Two sentence summary: Millions of small end-nodes (macintoshes and
> pc's) have very poor serial port hardware.  Without a reliable
> transport (such as LAPB) they will perform poorly as they will exhibit
> an very high percentage of dropped packets (like 10%).


That argument was rejected by the PPP working group years ago.  It's
still wrong, at least in the overly general form you've stated it, as
demonstrated by zillions of PC's running TCP over PPP and SLIP.


> >> Reliable links may be necessary for compression for routers, but
> >> are not needed for the stations.
> 
> as long as the station has good solid serial ports ;-)

Not true.  A more accurate statement is "as long as the station 
either has solid serial ports or uses only higher layer protocols with
reasonable timers, timeouts, and retransmissions."  I think even the
router guys finally agreed with this in the first PPP gladitorial bouts
between router guys and station guys.

Maybe MAC's and Appletalk need LAPB.  I think you or someone wrote as
much.  I don't know about Apple stuff.

Instead of "stations" vs. "routers", perhaps I should write
"IP hosts" vs. "all else".

There are a lot of PC's out there today running SLIP and PPP that
demonstrate that you do not need LAPB for TCP/IP.  As far as 99% of all
existing PPP installations are concerned, something that compressed
TCP/IP cheaply and easily, and works well enough on other stuff is
the right solution.

Maybe there will be a lot of Apple's running PPP.  On the other hand,
if their PPP implementations are the same quality as their TCP/IP, it's
not worth worrying about what their hardware can or cannot do.  At
least from the complaints about MacTCP or whatever it's called.

The fact there are as still many new SLIP installations as PPP is not a
compliment to PPP.  The only reasons that SLIP has slid down to parity
with PPP  are the free source implementations and Morningstar.
Otherwise, you router guys could go off on your own.


> >> Please don't quibble.  I'll use any pair of words you want to
> >> distinquish a "router", what you and Cisco and ACC build, from a
> >> "station", a Sun, SGI, or 486 PC-AT box.
> 
> (be careful - an 486 pc can outperform a 68020/68030 based router)

You might be right about that in general, although I can think of ways
to make it wrong.  I'd rather not get into a a fight between router guys.


> >> Bad things happen in standards committees when one member decides to
> 
> I guess I'm confused by this.  At the last IETF I was under the
> impression that we were going to use a free compression method which
> would do a nominal 2:1 and best case 4:1.  I was excited.
> 
> Having written a v.42bis from scratch and spent time talking to IBM,
> Microcom (the unisys agent) and BT, I don't plan to use another
> proprietary compression technique in the near future.

I prefer open stuff.


> (and you're correct about the LZW-on-unix vs. LZW-in-an-embeded-system;
> this appears to be a concious decision by some of the patent holders;
> I asked and they said as much)

Is it "LZW-in-an=embedded-system" or "LZA-in-hardware"?  If they said
the latter, even if they meant the former, there may be a loophole.

In any case, consider the implications for "station guys" like me.


vjs



From owner-ppp-comp Sat May 22 20:38:25 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nx6t4-0000A8a@daver.bungi.com>; Sat, 22 May 93 20:38 PDT
X-Path: rhyolite.wpd.sgi.com!vjs
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Sat, 22 May 93 21:40:11 -0600
Message-ID: <9305230340.AA06474@rhyolite.wpd.sgi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> if their PPP implementations are the same quality as their TCP/IP, it's
> not worth worrying about what their hardware can or cannot do.  At
> least from the complaints about MacTCP or whatever it's called.

I was referring to what I think is a public domain TCP/IP package,
not necessarily an official, commericial effort.

>                     The only reasons that SLIP has slid down to parity
> with PPP  are the free source implementations and Morningstar.
> Otherwise, you router guys could go off on your own.

I meant "go off on your own and stop being bothered by the non-router
guys."

vjs



From owner-ppp-comp Tue May 25 09:09:41 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0ny1Yt-0000DKa@daver.bungi.com>; Tue, 25 May 93 09:09 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Tue, 25 May 1993 09:36:47 -0400 (EDT)
Message-ID: <9305251336.AA26128@hobbit.gandalf.ca>
References: <<m0nwxCD-000010C@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> 
> I have contacted the Info-ZIP folks with regard to their implementation
> of LZ77, and they have no problem with their code being used in the PPP
> implementations. It does 905:1 best case, and 2.721:1 in my test case.
> There are patent issues to be explored with this (and I am doing so).

This is a sound base for any LZSS scheme.  However, where ZIP and the
like get their extra compression is by calculating Huffman codes for
a given block of data (~32K if I remember).  Unfortunately, we aren't
so lucky.  We may see 1 byte of data or 8000 bytes of data in a packet.
Sending the tree will be too much overhead.  We need a dynamic approach.
> 
> I have also proposed using the LZW implementation found in the UNIX
> compress algorithm.  There is a question as to its patent status, and
> I am having this review currently. Its best case compression is 550:1,
> and on my test case it gets 2.235:1.

I assume that you're using the standard 16-bit compress.  So we need
about 512K bytes or so (compressor + decompressor).
> 
> Gandalf has proposed their algorithm, but I have not received it for
> evaluation yet. They are still working out how to license it. Their
> best case compression is unknown, and on my test case they get 2.695:1.

I would guestimate 2048:1 is the best case, but I can improve on it if
it's too low :-)
> 
> STAC has proposed their algorithm, but I don't think that anyone is
> considering it as a standard at this time. It may be one of the
> commonly available ones though.

The algorithm is fine.  The licensing questionable.  The patents can
easily be worked around.  From the patent douments, it should be possible
to make a compatible and *free* implementation.  Perhaps someone should
hack a STAC compatible version from the INFO-ZIP code.


From owner-ppp-comp Tue May 25 09:09:48 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0ny1Z0-000029a@daver.bungi.com>; Tue, 25 May 93 09:09 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Tue, 25 May 1993 10:10:46 -0400 (EDT)
Message-ID: <9305251410.AA02340@hobbit.gandalf.ca>
References: <<9305221700.AA23276@Angband.Stanford.EDU>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> send the IP/TCP dgrams/segments in UI frames without error correction
> (pretty close to PPP) and rely on TCP for retransmission than it was to try
> to "improve" the apparent BER by turning on LAPB.  The overhead for LAPB
> increased latency and reduced throughput.

I'd like some proof (simulated or empirical) of the above statement.  This
seems counter-intuative to what I understand.  

First, a LAPB frame and a PPP packet are the same size (assuming modulo 7).  
The extra handful of statements to validate a correctly received in-sequence 
LAPB frame is minimal.

On an error-free link, the throughput and latency should be identical for 
the LAPB link.  This leaves the error case.  

What will make the difference in the error case is the MTU used on the
LAPB link.  If it is set it to the same as an error-free link,
then probably you'll end up with worse throughput than with no correction.
But this is caused by the latency of retransmitting the reject.  A small
MTU can minimize the amount of data that gets pitched while awaiting the
retransmission.

A variable MTU size on the link gives the best of both worlds.

What are the link settings for AX.25?

> Amateur packet radio was never good for much but it sure let you see how
> the different protocols worked in the equivalent of networking hell.  :^) 
> My suggestion to you is to cook up some monte carlo simulations of your
> link and fiddle with the parameters.  You may be quite surprised at the
> performance of IP/TCP (and other end-to-end reliable transport services)
> over UI/PPP vs. IP/TCP over LAPB/PPP for marginally lossy links.  I think
> that you will find that you can sustain moderately large packet loss rates
> before LAPB begins to buy you anything.  The key point is to try it and not
> argue guesses.

Sounds like a good approach.

From owner-ppp-comp Tue May 25 09:09:50 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0ny1Z4-0000RSa@daver.bungi.com>; Tue, 25 May 93 09:09 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression negotiation
Date: Tue, 25 May 1993 10:18:14 -0400 (EDT)
Message-ID: <9305251418.AA03665@hobbit.gandalf.ca>
References: <<9305221941.AA00662@netmail.microsoft.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> How about two 32 bit fields for compression negotiation?  These two 32 
> bit fields

One problem comes to mind is that this scheme has no way of conveying the
which of the supported encoding schemes is preferred.

A simple list of supported schemes in sorted order of preference would
solve this.


From owner-ppp-comp Tue May 25 09:13:11 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0ny1cb-0000Bua@daver.bungi.com>; Tue, 25 May 93 09:13 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression negotiation
Date: Tue, 25 May 1993 09:12:54 PDT
Message-ID: <m0ny1cV-0000DYC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression negotiation" on May 25, 10:18, Dave Carr writes:]
> > How about two 32 bit fields for compression negotiation?  These two 32 
> > bit fields
> 
> One problem comes to mind is that this scheme has no way of conveying the
> which of the supported encoding schemes is preferred.
> 
> A simple list of supported schemes in sorted order of preference would
> solve this.
> 

And this is what we have proposed. The compression ID's can range in
value from 1-255, with a null terminator. You offer all the algorithms
you currently support, sorted in order of preference for this connection
(or all connections, if you don't alter preference.


-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Tue May 25 11:00:08 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0ny3Hv-0000E7a@daver.bungi.com>; Tue, 25 May 93 10:59 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Tue, 25 May 1993 13:51:00 -0400 (EDT)
Message-ID: <9305251751.AA10561@hobbit.gandalf.ca>
References: <<9305220606.AA27956@rhyolite.wpd.sgi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> LAPB is not yet a part of PPP.

Better speak up quick against it or it will be.

> Isn't CIPX what might be called "proprietary"?

Sure.  But over 60% of installations need it.  Therefore, it will go
into a lot of vendors equipment.  How can this be considered 
"proprietary"?  If CIPX goes through, it is standard.
> 
> Of course.  We agree.  TANSTAAFL.

With all the TLA's and FLA's floating around, what the heck is
TANSTAAFL?

> By the way, does your lisense imdemnify your lisensees should someone
> decide that FZA infringes?  If not, and I don't see how Gandalf would
> be crazy enough to do it, the patent protection afforded by FZA is
> about as useful as my claim that LZW was released to the public domain
> for software use by UNISYS years ago.  In either case, due dilegence
> may be required.

If FZA is not patent free, there are a lot of algorithms that aren't
either.  I would have to believe that PKZIP, INFO-ZIP, FREEZE, HPACK,
ARJ, LHA*, ... are using LZ77 as their base algorithm for a reason.
It is interesting to note that at least 2 of them used to use LZW as 
their base algorithm, but have switched to LZ77 at least partially to
avoid patent problems.

I have relied on the trust of several original authors.  If Limpel and
Ziv used hashing in their original LZ77 encoder, should I be bothered
about the Graybill patent?  NOT!  

LZ77 is proven, patent-free, and effective.  

> > Would the IETF accept an algorithm description?  It must, otherwise
> > PPP wouldn't fly.  Or I could give out many incomplete and incompatibly
> > written chunks of code (like PPP) and then it would be okay?
> 
> Of course.
> Yes.
> What's the point?

The point is that PPP comes in many incompatible fragments of code.
The two basic ways for a vendor to get it working in their box is to spend 
money buying a complete implmentation from someone who has it, or spend
time modifying existing code.  So here we are.  I can write at RFC describing
FZA, and give you some snippets of code, all in a manner consistent with
PPP style of things.  Are you any better off than paying for the working
code.  
> 
> Remember that Sun's use of "give the code away", is a significant part
> of why there is a Sun but no Apollo today, despite the fact that
> Apollo invented workstations.  Sun gave away code for RPC from the
> start.  Then they wrote RFC's for RPC and NFS.  And they lisensed their
> NFS and YP code to competators like Silicon Graphics for low terms.

How low?  Is FZA licensing that different?
> 
> If I were a Gandalf stockholder or big boss, I'd be yelling at you
> about Kodak ("give cameras away to sell film") and Sun ("give NFS away
> to sell workstations").

This stategy is a sound one if you can grab part of or all of the market
that is created by the giveaway.  I'm not so sure it applies here.  
We seem to be playing catch up at the moment.
> > ...
> >             You and others are of the opinion [LAPB] has no useful
> > purpose.  So spend all kinds of time making a non-reliable link work.  It's
> > easier for me to use what I already know and have working.  
> > Why should we both have to switch to a non-reliable link so it's a fair race?
> 
> Wrong, on 2 counts.  First is that what you find comfortable matters in
> both a joint development and a standards committee, but in opposite
> directions.  "Mutual assured disadvantage" is a real and valid concern
> in standards committees.  If a reasonable case could be made that LAPB
> was chosen only because some vendors were more comfortable with it,
> then the IAB would be required to reject it, to avoid suffering those
> infamous anti-trust lawsuits that have happened to other standards
> organizations.   This issue has come up in the main IETF mailing list
> in the last week or two.
> 
First, let me point out that LAPB isn't my favourite error-corrected 
protocol.  I would rather get rid of wasted bits and simplify the
protocol.  I would also like to get rid of padding.  But there are 
several vendors with hardware that can't do it.  It is really a 
backwards compatability problem.

Second, I'll go anyway the wind blows on the link layer.  We are totally
software based.  But there will have to be a compelling reason to change.

Third, I believe LAPB was selected because it does the job and is fully
specified.  Fred didn't have to spend valuable time re-inventing the
wheel.  

That we may come out ahead because we have LAPB may be lucky for us.  Mind
you, it's the ONLY part of PPP that we have that we haven't bought.  You
have working IP code.  Probably you had it before PPP.  In all fairness,
IP should have been dropped, and everyone forced to use DECNET :-)  We
had to buy IP routing or develop it to catch up.  Please don't tell me
we all start off at the same point in the race.  And yes, it is a race.

If public domain LAPB code existed, there would have been no discussion
at all.

> Your saying that about LAPB in this mailing list is dangerous.  It
> could be used in court to show that a cabal of router vendors excluded
> an equally good technical, but cheaper and easier solution simply to
> protect their competative positions.  

What other equally technical solution did you have in mind?  The only
reliable link mentioned so far is LAPB.  Are you proposing another?

> That you even let yourself think
> such a thought out loud makes me doubt all of the words here about
> patent searches.

Well, don't take my word.  Ask Jean-Loup Gailley (co-author of Info-ZIP).
He knows more about LZ77 than most patent attourneys do.  Since I'm
an engineer, not a lawyer, you shouldn't trust any patent advice from
me anyway.  But I think I understand all the patent issues of LZ77.

> It would be nice if the same compression algorithms were available
> whether you use LAPB or VJ-header prediction on unreliable links.
> 
Nice and convenient, but certainly not optimal.

> Bad things happen in standards committees when one member decides to
> make a little money or just recover some costs by selling to other
> members.  Much of the years of delay and bad features of FDDI are
> directly attributable to just such decisions by a few members of
> ANSI-X3T9.5.

Perhaps that's why nobody uses V.42bis and MNP5 :-)  Instead, let's spend 
a few years discussing the merits of the various compression algorithms,
patent searching, testing, and the like.  

> Standards committees are different from joint developments.  If you
> really want to be doing joint development, then please, for your own
> sakes as well as the IETF's, leave the IETF umbrella and form a plain
> old consortium.

Perhaps that will be the route we take.  But I'd like to hear from more
people.  So far, this compression committee must consist of you, Dave, and
me.  

From owner-ppp-comp Tue May 25 11:59:05 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0ny4D8-0000Aza@daver.bungi.com>; Tue, 25 May 93 11:58 PDT
X-Path: microsoft.com!tommyd
From: Thomas Dimitri <tommyd@microsoft.com>
To: ppp-comp@bungi.com
Subject: Re: Compression negotiation
Date: Tue, 25 May 93 11:37:26 TZ
Message-ID: <9305251841.AA21535@netmail.microsoft.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

| And this is what we have proposed. The compression ID's can range in
| value from 1-255, with a null terminator. You offer all the algorithms
| you currently support, sorted in order of preference for this connection
| (or all connections, if you don't alter preference.
|

For my enlightenment, what is the current negotiation proposal besides
the above?  The above sounds like a good idea to me.  However, there was
still no comment on the difference between compressing and decompressing -
these are really two independent events.  I do not believe it is necessary
to have BOTH sides compress.  In certain circumstances, only one side may
need to compress because that side is doing the bulk of the transmission.

If that should also be negotiated then how?

Also, why force negotiation of just ONE compression scheme?  Take PKZIP,
it uses many compression schemes to ensure optimal compression.  Certain
compressors may be optimized for text or binaries or bitmaps.  Depending
which type of data is sent out, a different scheme may be warranted.  So my
question is, should we allow MULTIPLE compression schemes to be negotiated
or just ONE?

--Thomas

From owner-ppp-comp Tue May 25 12:50:29 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0ny50p-00006Wa@daver.bungi.com>; Tue, 25 May 93 12:50 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression negotiation
Date: Tue, 25 May 1993 12:50:09 PDT
Message-ID: <m0ny50l-0000NHC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression negotiation" on May 25, 11:37, Thomas Dimitri writes:]
> 
> For my enlightenment, what is the current negotiation proposal besides
> the above?  The above sounds like a good idea to me.  However, there was
> still no comment on the difference between compressing and decompressing -
> these are really two independent events.  I do not believe it is necessary
> to have BOTH sides compress.  In certain circumstances, only one side may
> need to compress because that side is doing the bulk of the transmission.

That is correct. Each side may negotiate compression to a different (or no)
value. Type 0 (in my implementation) means no compression. 

> 
> Also, why force negotiation of just ONE compression scheme?  Take PKZIP,
> it uses many compression schemes to ensure optimal compression.  Certain
> compressors may be optimized for text or binaries or bitmaps.  Depending
> which type of data is sent out, a different scheme may be warranted.  So my
> question is, should we allow MULTIPLE compression schemes to be negotiated
> or just ONE?

We are trying to define ONE COMMON algorithm that everyone can support.
Other algorithms are possible, and encouraged. One party will administer
numbers assigned for private compression protocols.

PKZIP doesn't really do multiple compression types. PKARC, and ARC did
actually try multiple compression algorithms on each file, chosing the best
one for the file.  PKZIP makes guesses about the file, based on its size, and
the contents of the first block.  Running multiple algorithms is *VERY*
expensive in terms of time, and memory (on a per-link basis). It is also
hard to pick an algorithm, since there is typically not much data to look at
- just a few packets.

It is *FAR* more effective to compress the source file before sending it
over the link, in all cases, because you are sending less data (no protocol
headers at all, for example). At the application level, you also have a
better idea what type of algorithm would be best (lossy/lossless), and can
devote more time to making a good choice. 

With all that said, nothing stops you from defining a "WonderType" (tm)
of compression that runs the data through multiple algorithms, and
does the best one possible.  You can prepend an indicator of which
algorithm did the compression, and switch to the appropriate decompressor.



-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Tue May 25 13:25:02 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0ny5YG-0000Ata@daver.bungi.com>; Tue, 25 May 93 13:24 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression negotiation
Date: Tue, 25 May 1993 16:18:58 -0400 (EDT)
Message-ID: <9305252018.AA09070@hobbit.gandalf.ca>
References: <<9305251841.AA21535@netmail.microsoft.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> 
> Also, why force negotiation of just ONE compression scheme?  Take PKZIP,
> it uses many compression schemes to ensure optimal compression.  Certain
> compressors may be optimized for text or binaries or bitmaps.  Depending
> which type of data is sent out, a different scheme may be warranted.  So my
> question is, should we allow MULTIPLE compression schemes to be negotiated
> or just ONE?
> 
> --Thomas
> 
Ignoring the older compression schemes inside ZIP (shrinking and ???), the
main ones are:
STORING: Just store as plain text
IMPLODING:
INFLATING:

The latter are LZ77 based, with different back ends.  The encoder makes the
choice of the method based on size.  Logically, you can call this one
compression method.

Some archivers, HPACK for example, try to figure out the file type and
switch the model accordingly.  Again, this is one method.

To switch algorithm types on the fly requires some heuristic on the
performance of the current choice, and what other algorithms supported
would get.  It's not that this is impossible, a noteable example being 
the ACT compressor, but it involves extra processing power to know
when to switch.

Also, there are some patents applying to running multiple compression
dictionaries in parallel, and chosing the shorter output, that might 
need careful attention.

From owner-ppp-comp Tue May 25 13:25:07 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0ny5YH-0000Aja@daver.bungi.com>; Tue, 25 May 93 13:24 PDT
X-Path: microsoft.com!tommyd
From: Thomas Dimitri <tommyd@microsoft.com>
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Tue, 25 May 93 13:20:57 TZ
Message-ID: <9305252025.AA05958@netmail.microsoft.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

| From: Dave Carr  <netmail!dcarr@gandalf.ca>
|
| >
| > I have contacted the Info-ZIP folks with regard to their implementation
| > of LZ77, and they have no problem with their code being used in the PPP
| > implementations. It does 905:1 best case, and 2.721:1 in my test case.
| > There are patent issues to be explored with this (and I am doing so).
|
| This is a sound base for any LZSS scheme.  However, where ZIP and the
| like get their extra compression is by calculating Huffman codes for
| a given block of data (~32K if I remember).  Unfortunately, we aren't
| so lucky.  We may see 1 byte of data or 8000 bytes of data in a packet.
| Sending the tree will be too much overhead.  We need a dynamic approach.

If I remember correctly, ZIP uses several approaches to compressing
data, not just the LZSS + Huffman scheme.  I fear this code might take
up too much memory or be overly complicated.  I've found that squeaking
out 10% more in compression is very difficult to do, compressors
usually have to special case things and go through lots of hoops to
make it happen.  Perhaps we should avoid ZIP.

| >
| > I have also proposed using the LZW implementation found in the UNIX
| > compress algorithm.  There is a question as to its patent status, and
| > I am having this review currently. Its best case compression is 550:1,
| > and on my test case it gets 2.235:1.
|
| I assume that you're using the standard 16-bit compress.  So we need
| about 512K bytes or so (compressor + decompressor).

This is way TOO much memory in my opinion.  A 64 port server on NT
(yes, that's what we support with RAS today) would eat up 32 megs
of memory - this is too much for even a goliath OS like NT to
be expected to keep in locked down memory.

| >
| > Gandalf has proposed their algorithm, but I have not received it for
| > evaluation yet. They are still working out how to license it. Their
| > best case compression is unknown, and on my test case they get 2.695:1.
|
| I would guestimate 2048:1 is the best case, but I can improve on it if
| it's too low :-)

May I ask why we care so much about best compression cases?  Once you
get around 50:1 I don't think it matters at all.  Your average largest
frame size is 1500 right?

The important numbers are the test case numbers and WORST case numbers.
I see BEST case numbers but not WORST case numbers.  WORST case is
important because with many compression algorithms you can't back out
once you've started compressing because the history buffers, Huffman
tables, etc. are already updated.  It is possible for non-compressable
data or already compressed data to be sent across the link.  You want
to ensure that this data does not get expanded too much.

| >
| > STAC has proposed their algorithm, but I don't think that anyone is
| > considering it as a standard at this time. It may be one of the
| > commonly available ones though.
|
| The algorithm is fine.  The licensing questionable.  The patents can
| easily be worked around.  From the patent douments, it should be possible
| to make a compatible and *free* implementation.  Perhaps someone should
| hack a STAC compatible version from the INFO-ZIP code.
|

In my opinion, I cannot see how anybody would want to go the LZW route.
It takes too much memory, its slow, and its compression ratio isn't
that great.  I think Ross Williams proved that with LZRW-3a, which
was much faster, consumed much less memory, and offered better compression.

Unfortunately, I cannot comment on anything that has to do with
patents, I'll let the Daves do that.  I would like to suggest that
someone offer up some sample LZ77 or LZ78 or Arithmetic compression
scheme.  I think the goal is a fast, small memory, but good compression
scheme.  This is definitely possible and we should accept nothing less.
The believe STAC code fits the billing in this case.

Obviously, there are tradeoffs when it comes to speed, memory, and
compression.  Here is my evaluation..

I think we should shoot for 64K or less per connection for compression.
Speed/Compression Ratio - the way to evaluate this trade-off is
(time to transmit on byte (LinkSpeed)) VS (time to compress one more byte)

That is, if the linkspeed is 9600bps or ~1Kbyte/sec then it takes
1 millisec to transmit one byte.  If you are going to transmit
1500 bytes, that's ~1.5 secs.  Now if it takes 10 millisecs (on whatever
the basis of CPU power is - how about a 386/33 ?) to compress it to
800 bytes, then you can transmit the frame in ~810 millisecs (10 to
compress, 800 to send the 800 bytes).  If you were to optimize
the algorithm and squeeze out 50 more bytes, but it takes 20
more millisecs to compress it then in this particular case, that's
a win.  Because now it takes 780 ms to transmit the frame (750 + 30).
Of course, it also consumes more CPU power, in which other tasks
could be done, so that has to be factored in as well.  My point is,
an improved compression ratio only makes sense if the time it takes
to save x bytes is faster than the time it takes to transmit
those x bytes (I'm assuming that compression time is the same as
decompression time, which is actually not true for many algorithms).

Thus, when we talk about compression algorithms I would like to
see the following data...

1. Average compression ratio for the test case.
2. Time spent in compression engine per bytes compressed -
   or rather compression speed in bytes/sec.  Also,
   decompression speed if it is much different.  We need
   a default CPU for this.
3. Memory consumed for compressor and decompressor.
4. WORST case compression if compression cannot be backed out.
   BEST case compression only if it is a low number like 10:1.
5. Estimated code size or rough estimate of complexity of code
   since we all have to implement it or port it.
6. Patent/License problems if any.

I realize that some of this information is difficult to
gather, but I think it is difficult to vote on an algorithm
without this information.  Dave, can we get this data for
the Gandalf compression scheme?  I think it shouldn't be too hard
to get this data for the STAC and LZW stuff too (even though
I contend that LZW is far from the best viable option).  --Thomas

From owner-ppp-comp Tue May 25 13:44:33 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0ny5qy-0000O8a@daver.bungi.com>; Tue, 25 May 93 13:44 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Tue, 25 May 1993 13:43:49 PDT
Message-ID: <m0ny5qi-0000NtC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 25, 13:20, Thomas Dimitri writes:]
> May I ask why we care so much about best compression cases?  Once you
> get around 50:1 I don't think it matters at all.  Your average largest
> frame size is 1500 right?

I don't care at all about the maximum compression ratio.  The RGF is
what I care about (Rand Goodness Factor). This is the data output rate
of the compressor.  You don't want a T1 running a compression algorithm
getting 200:1 if the RGF is 996 (996 bytes per second)! I put the
maximum compression rate in to show that even though we can theoretically
get very good compression ratios, in real life (or at least as close as
my test gets to real life - open to debate) we don't get very good 
compression at all. 4:1 lives only in marketing documents.

> The important numbers are the test case numbers and WORST case numbers.
> I see BEST case numbers but not WORST case numbers.  WORST case is
> important because with many compression algorithms you can't back out
> once you've started compressing because the history buffers, Huffman
> tables, etc. are already updated.  It is possible for non-compressable
> data or already compressed data to be sent across the link.  You want
> to ensure that this data does not get expanded too much.

Predictor gets a worst case of 8:9 (8 bytes in, 9 bytes out). It is
hard to determine a worst case number for some of the other algorithms,
so I haven't tried yet.


> Unfortunately, I cannot comment on anything that has to do with
> patents, I'll let the Daves do that.  I would like to suggest that
> someone offer up some sample LZ77 or LZ78 or Arithmetic compression
> scheme.  I think the goal is a fast, small memory, but good compression
> scheme.  This is definitely possible and we should accept nothing less.
> The believe STAC code fits the billing in this case.

Again, we all want a compressor that is free, runs in zero time, uses
no memory, and gets 200:1 ;-)


> Thus, when we talk about compression algorithms I would like to
> see the following data...
> 
> 1. Average compression ratio for the test case.

Done.

> 2. Time spent in compression engine per bytes compressed -
>    or rather compression speed in bytes/sec.  Also,
>    decompression speed if it is much different.  We need
>    a default CPU for this.

Done. See my table. Both input data rate, and output (the important one)
are shown. RGF is perhaps the best metric, in my opinion.

> 3. Memory consumed for compressor and decompressor.

Predictor takes 64K/connection/direction. LZW (compress) takes 500K/con/dir.
Info-ZIP takes about 64K/con/dir.

> 4. WORST case compression if compression cannot be backed out.
>    BEST case compression only if it is a low number like 10:1.

Worst case compression is very hard to determine with some algorithms.
LZW is probably 1:2 (assuming 16 bit codewords). Info-ZIP is probably
around 1:3.

I don't understand what you mean about BEST case compression. It is just
a number - pointed out so that every can see that in real-life best case
numbers cannot be achieved.

> 5. Estimated code size or rough estimate of complexity of code
>    since we all have to implement it or port it.

All code is relatively small. Certainly less than 1000 lines of C code.
Predictor can be done in about 20 lines of C. LZW in about 150 lines.
Info-ZIP in about 700 lines or so.

> 6. Patent/License problems if any.

This is hard. See my document for some of the issues. It *APPEARS* that
we step on someone's toes no matter WHICH algorithm we choose to implement.
I am working on this issue though.


-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Tue May 25 18:20:25 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyAAC-00005Ha@daver.bungi.com>; Tue, 25 May 93 18:20 PDT
X-Path: microsoft.com!tommyd
From: Thomas Dimitri <tommyd@microsoft.com>
To: ppp-comp@bungi.com
Subject: Re: Compression negotiation
Date: Tue, 25 May 93 15:13:39 TZ
Message-ID: <9305252217.AA13755@netmail.microsoft.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

Well it seems to me that negotiation goes something like
this (correct me if I'm wrong)...

First send a NULL terminated sequence of bytes from 1-255
in order of preference for compression AND decompression.
I assume this means TWO NULL terminated sequences need to
be sent out.

The other end responds with just ONE negotiated value for
compression and decompression.

So far, I like this approach.  We've got 255 different algorithms
to assign and I don't think we need to negotiate multiple algorithms.
That seems like overkill.  If someone really wants it, they can just
make up a super-algorithm and get a number assigned for it.

I also like the approach of negotiating for compression and
decompression.

After selecting the compression and decompression algorithm do
we need a specific parameter negotiation to follow?  V.42bis
negotiates dictionary size, and a couple other things - should
PPP have a specific negotiation which negotiates things like
dictionary size (or history buffer size)?  -Thomas

From owner-ppp-comp Tue May 25 18:20:43 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyAAX-00002Ta@daver.bungi.com>; Tue, 25 May 93 18:20 PDT
X-Path: opal.acc.com!art
From: art@opal.acc.com (Art Berggreen)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Tue, 25 May 93 14:25:56 PDT
Message-ID: <9305252125.AA05869@opal.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

>
>> The important numbers are the test case numbers and WORST case numbers.
>> I see BEST case numbers but not WORST case numbers.  WORST case is
>> important because with many compression algorithms you can't back out
>> once you've started compressing because the history buffers, Huffman
>> tables, etc. are already updated.  It is possible for non-compressable
>> data or already compressed data to be sent across the link.  You want
>> to ensure that this data does not get expanded too much.
>
>Predictor gets a worst case of 8:9 (8 bytes in, 9 bytes out). It is
>hard to determine a worst case number for some of the other algorithms,
>so I haven't tried yet.

Let's not underestimate the need to deal with uncompressable data.  A
lot of files might be precompressed by the end user before transfer.
At least in the bridge/routers, packets containing uncompressable data
may be interleaved with all the others.  Also, encrypted data will
typically look like a random stream and not be compressable.  With the
potential growth of encryption in networking, this needs to be considered.
Finally, I think our test cases need to deal with significant amounts
of data-stream interleaving (at least for multiuser hosts and
bridge/routers).  I'd like to know what packets of uncorrelated data
and protocol control packets (i.e. ACKs) do to the history buffers
and the compression efficiency of a particular algorithm.

Art


From owner-ppp-comp Tue May 25 18:21:19 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyAB7-00006Na@daver.bungi.com>; Tue, 25 May 93 18:21 PDT
X-Path: Novell.COM!Diane_Heckman
From: Diane_Heckman@Novell.COM (Diane Heckman)
To: ppp-comp@bungi.com
Subject: Can you trust the compression ratio?
Date: Tue, 25 May 93 16:18:37 PDT
Message-ID: <9305252318.AA24702@va.SJF.Novell.COM>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

>>  The key point is to try it and not argue guesses.
 
>Agreed! This is where I found much of my data. Theoretically, all algorithms
>with similar compression ratios should improve the apparent throughput of
>a link. They don't.
  
>I've asked Diane Heckman to share some of her experience with us, as she
>has done some extensive analysis of various compression products in real
>world situations. I hope she'll do this next week.
   
    
     
>--
>Dave Rand
>{pyramid|mips|bct|vsi1}!daver!dlr       Internet: dlr@daver.bungi.com

Gee, thanks, Dave.  

Into the fray I go.

Compression ratios are nice figures for comparisons in a perfect 
world, but many other factors must be considered.  Seems to me 
that what users want is higher throughput through the WAN interface, 
thereby getting lower phone bills, and a quicker return of the 
command prompt.  So, if a compression algorithm can get 900:1 
compression, but takes more processor time to compress or 
decompress the data than without compression, why bother?  
The user won't see an increase in throughput, which is what 
they want.  This (that users want increased throughput on the line) 
should be obvious to everyone reading this.

Actual compression ratios vary greatly from the theoretical 
maximum compression ratios advertised by various algorithms.  
Kinda like EPA ratings, only more so.  But, as I said above, 
increased throughput is what really matters.  This is effected 
by the implementation, the upper layer protocol, the reliability 
of the line, and the type of data being transferred, for starters.  
Surely there are more.  The type of data being transferred is the 
easiest factor to measure right now, so I ran some simple tests on 
the files mentioned in Dave Rand's "Compression Analysis for 
Wide Area Networking" document.

First, a description of the data....   Quoting from Dave Rand's document :
"Nine different types of text are represented, and to confirm 
that the performance of schemes is consistent for any given type, 
many of the types have more than one representative.  Normal English, 
both fiction and non-fiction, is represented by two books and papers 
(labeled book1, book2, paper1, paper2, paper3, paper4, paper5, paper6).  
More unusual styles of English writing are found in a bibliography (bib) 
and a batch of unedited news articles (news). Three computer programs 
represent artificial languages (progc, progl, progp). A transcript 
of a terminal session (trans) is included to indicate the increase 
in speed that could be achieved by applying compression to a slow 
line to a terminal.  All of the files mentioned so far use ASCII 
encoding.  Some non-ASCII files are also included: two files of 
executable code (obj1, obj2), some geophysical data (geo), and 
a bit-map black and white picture (pic).  The file geo is particularly 
difficult to compress because it contains a wide range of data 
values, while the file pic is highly compressible because of large 
amounts of  white space in the picture, represented by long runs 
of zeros."



       Figure 1: Input file descriptions

     Filename	   uncompressed     compressed	    Speed Ratio
                     in seconds     in seconds
     book1            118, 119      87, 88           1.35 : 1
     book2            93, 94        69, 64           1.35 : 1
     paper1           9, 9          6, 6             1.5 : 1
     paper2           14, 15        9, 9             1.66 : 1
     paper3           8, 8          5.75, 5.26       1.39,1.52:1
     paper4           2.45, 2.41    1.82, 1.52       1.35,1.58:1
     paper5           2.2, 2.18     1.82, 152        1.36,1.57:1
     paper6           7, 7          4.73, 4.1        1.48,1.7:1
     AVERAGE                                         1.479 : 1

     bib              18, 18        9, 10            2 ,1.8: 1
     news             100, 64       43, 41           2.32,1.52:1
     AVERAGE                                         1.96 : 1

     progc            7, 7           4.88, 4.26      1.43,1.65:1
     progl            12, 12         6.4, 6.47       1.87 : 1
     progp            8.16, 8.14     4.47, 4.45      1.83 : 1
     AVERAGE                                         1.74 : 1

     trans            16, 16         7, 7            2.28 : 1

     obj1             3.78, 3.74     2.06, 2.04      1.83 : 1
     obj2             42, 42         28, 27          1.5 : 1
     AVERAGE                                         1.66 : 1

     geo              16, 16         13, 13          1.23 : 1

     pic              84, 80         26, 34          3.23,2.35:1

     Overall AVERAGE                                 1.70,1.46:1



Notes on testing methodology:
Data was obtained by a batch file which reads:
     timer copy d:\filename f:\junk\filename
     timer copy f:\junk\filename d:\filenam2
     c:\dos\fc d:filename d:filenam2

Timer is a program which prints the elapsed time for the 
copy operation.  No file compare (fc) errors were found.  
In some cases, two speed ratios are given, the first for the 
copy to the server (f:), the second for the copy to the 
client (d:).  Two sets of tests were run, with compression 
enabled and with compression disabled.  The speed ratio was 
obtained by dividing (the time the copy took with compression 
disabled) by (the time the copy took with compression enabled).
Average speed ratios for a like set of files were obtained by 
summing the speed ratios and dividing by the number of files*2.  
For example, the average speed ratio for the files bib and 
news was 1.96 ((2 + 2 + 2.32 + 1.52)/4).  

Workstation was a 386/33, 8 Meg memory, with a RAMDRIVE D: drive 
of 5 megabytes, to avoid disk latency.  The units under test 
were 2 486/33E 16Meg memory, Novell NetWare 3.11 servers running 
MPR2.1 PPP code with PREDICTOR as the compression algorithm.  
The server that was the source and sink ran PBURST (packet 
burst), which allows 8 packet transmits before the ACK is sent.


Now for the "hanging out the dirty laundry part", our 
implementation experience.  Originally, we had planned to 
use 3 different compression algorithms, to handle different 
line speeds.  The simplest code was written first, PREDICTOR.   
We ran through a set of tests to verify that a higher 
throughput was achieved.  

Next, we tried LZW.  Test again.  The COMPRESSION RATIO WAS 
HIGHER with LZW than with Predictor, but THE INCREASE IN 
THROUGHPUT WAS LESS THAN PREDICTOR.  Presumably, the code 
took longer to compress the packet than the time saved with 
the compression.  That data is not included here, but it 
can be dug up, if need be.

We found that the Predictor code gave higher throughput 
than the LZW.  So, we stuck with the Predictor, and optimized it...

We also ran tests against another vendor's (who shall remain
nameless) product.  This product had a hardware support module, 
and used an LZW algorithm.  The compression ratio that they showed
was always better, but the time it took to transfer the file 
was usually longer.  The performance data varied, and
in some cases was faster than our algorithm.  But, in 90% of the 
cases, out Predictor algorithm was faster, even though the 
compression ratio displayed was lower.

We did the tests with IPX using an tool called PERFORM3, which 
copies data from the server to the workstation.  
We also ran tests with TCP/IP, using a batch file with FTP puts 
and gets.




What I think is needed is a reliable compressor, that can be 
coded efficiently, and is easy to understand (to avoid 
interoperability problems).  

The goal behind data compression is to move the data across 
the line faster.  

>From what I've seen, theoretical compression ratios are not 
reliable indicators of increased throughput.  

If any of you have data refuting that statement, please feel 
free to post it on this mailing list.....  
					 

Diane



From owner-ppp-comp Tue May 25 18:24:03 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyADl-00007Fa@daver.bungi.com>; Tue, 25 May 93 18:23 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression negotiation
Date: Tue, 25 May 1993 18:23:53 PDT
Message-ID: <m0nyADi-00005JC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression negotiation" on May 25, 15:13, Thomas Dimitri writes:]
> After selecting the compression and decompression algorithm do
> we need a specific parameter negotiation to follow?  V.42bis
> negotiates dictionary size, and a couple other things - should
> PPP have a specific negotiation which negotiates things like
> dictionary size (or history buffer size)?  -Thomas

The private algorithms can do whatever they want. The common algorithm should
not have any options, in my opinion.



-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Tue May 25 18:25:52 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyAFW-00006ia@daver.bungi.com>; Tue, 25 May 93 18:25 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Tue, 25 May 1993 18:25:41 PDT
Message-ID: <m0nyAFR-00005JC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 25, 14:25, Art Berggreen writes:]
> Finally, I think our test cases need to deal with significant amounts
> of data-stream interleaving (at least for multiuser hosts and
> bridge/routers).  I'd like to know what packets of uncorrelated data
> and protocol control packets (i.e. ACKs) do to the history buffers
> and the compression efficiency of a particular algorithm.
> 


That is what I have done in my test case. ftp to sgi.com:other/ppp-comp
under comp.doc (troff) or comp.ps (PostScript) for more information.

Better test files are welcome.



-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Wed May 26 07:06:38 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyM7c-00002fa@daver.bungi.com>; Wed, 26 May 93 07:06 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 26 May 1993 10:04:15 -0400 (EDT)
Message-ID: <9305261404.AA10483@hobbit.gandalf.ca>
References: <<9305252025.AA05958@netmail.microsoft.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> If I remember correctly, ZIP uses several approaches to compressing
> data, not just the LZSS + Huffman scheme.  I fear this code might take
> up too much memory or be overly complicated.  I've found that squeaking
> out 10% more in compression is very difficult to do, compressors
> usually have to special case things and go through lots of hoops to
> make it happen.  Perhaps we should avoid ZIP.

There are several compression schemes used within ZIP.  However, they
are there mainly for backwards compatability.  With each new improvement
in the methods, the odds of using the older methods is dramatically 
reduced.  "Reduce" and "Shrink" hardly get used any more.  Deflate is
much better than these methods and tends to be the one chosen most
often.  There is of course "Storing" which shouldn't be too hard to
implement, and takes no memory!

Deflate can take the output of the LZ77 encoder and either encode it
using static Huffman encoding derived from the block of data (in which
case the Huffman tree is written with the block), or with a static
predetermined encoding.

For the record, LZ77 refers to the generic scheme of encoding data as
either (literal) or (distance,length).  LZSS is a method of encoding
the output of LZ77 using unary codes.  We should try to use the same
terms to avoid confusion.

> |
> | I assume that you're using the standard 16-bit compress.  So we need
> | about 512K bytes or so (compressor + decompressor).
> 
> This is way TOO much memory in my opinion.  A 64 port server on NT
> (yes, that's what we support with RAS today) would eat up 32 megs
> of memory - this is too much for even a goliath OS like NT to
> be expected to keep in locked down memory.

A reasonable LZH (ZIP) scheme can be implemented in 64 or 128K of memory.
The main reduction comes from the size of the LZ77 window and the 
associated hash table.  The smaller window reduces compression.  The
smaller hash table should give the same performance as long as it is
scaled the same as the window.
> |
> | I would guestimate 2048:1 is the best case, but I can improve on it if
> | it's too low :-)
> 
> May I ask why we care so much about best compression cases?  Once you
> get around 50:1 I don't think it matters at all.  Your average largest
> frame size is 1500 right?

You're right.  Pure specmanship.  Probably some marketting requirement.
> 
> The important numbers are the test case numbers and WORST case numbers.
> I see BEST case numbers but not WORST case numbers.  WORST case is
> important because with many compression algorithms you can't back out
> once you've started compressing because the history buffers, Huffman
> tables, etc. are already updated.  It is possible for non-compressable
> data or already compressed data to be sent across the link.  You want
> to ensure that this data does not get expanded too much.

Right on!  The worst case for FZA is .999:1.  It's funny that I don't
have to do anything special to get it either.  The beauty of arithmetic
encoding.  It gets arbitrarily close to the entropy of the source.
> 
> In my opinion, I cannot see how anybody would want to go the LZW route.
> It takes too much memory, its slow, and its compression ratio isn't
> that great.  I think Ross Williams proved that with LZRW-3a, which
> was much faster, consumed much less memory, and offered better compression.

All LZW schemes are not equal.  V.42bis is LZW based, and a reasonable
implementation takes about 32K.  However, you can see that Unix COMPRESS
gets a lot higher compression.
> 
> Unfortunately, I cannot comment on anything that has to do with
> patents, I'll let the Daves do that.  I would like to suggest that
> someone offer up some sample LZ77 or LZ78 or Arithmetic compression
> scheme.  I think the goal is a fast, small memory, but good compression
> scheme.  This is definitely possible and we should accept nothing less.
> The believe STAC code fits the billing in this case.

Any of the major archivers is a good starting point for the LZ77 parsing.
You can pick either ZIP, HPACK, FREEZE, LHA, ...  Just make sure there
are no restrictions on commercial uses, or GNU licenses attached.
For the arithmetic code, there is a version in HPACK, or get the CACM '87
code.

STAC fits the small memory criteria, but compression...BUUUZZZT.  On the
fast criteria, it is no faster than any good LZ77 scheme.
> 
> Obviously, there are tradeoffs when it comes to speed, memory, and
> compression.  Here is my evaluation..
> 
> I think we should shoot for 64K or less per connection for compression.
> Speed/Compression Ratio - the way to evaluate this trade-off is
> (time to transmit on byte (LinkSpeed)) VS (time to compress one more byte)
> 
> 1. Average compression ratio for the test case.

Have we agreed on Dave Rand's test?  I personally don't feel the Calgary
Corpus is complete or typical.  Also, the encapsulation with only IPX
and working from a file leave a bit to be desired.  

Not that I'm bothered by FZA results on the test, but I'd like to see
SNIFFER traces fed directly into the compressor.  It tests all aspects
of the compressor (header compression, end of stream encoding, etc).
As well, I don't need to maintain a separate version of the compressor
to work on a file.  I can simply create a front end which passes one frame
at a time to the working compression code.

> 2. Time spent in compression engine per bytes compressed -
>    or rather compression speed in bytes/sec.  Also,
>    decompression speed if it is much different.  We need
>    a default CPU for this.

This is not easy for some algorithms.  If you mean just for the
tast file, then an overall average is possible.  LZ77 and LZW are
non-linear.  The higher the compression ratio, the lower the cost
per byte to encode them.  Markov type schemes are fairly predictable.

We'll also need to use the same I/O routine and test harnesses.  We'll
want to remove the effects of really good I/O on the measurements.  I for
one am illiterate when it comes to DOS.  But the effects are dramatic
when the I/O is done specific for a platform (see PKZIP 2.*).

> I realize that some of this information is difficult to
> gather, but I think it is difficult to vote on an algorithm
> without this information.  Dave, can we get this data for
> the Gandalf compression scheme?  I think it shouldn't be too hard
> to get this data for the STAC and LZW stuff too (even though
> I contend that LZW is far from the best viable option).  --Thomas

The average on Dave's test was 2.695:1.  However, as with many
compression algorithms, it depends on the amount of memory you give
it, and how fast you want it to go.  

The speed numbers are a little difficult.  My algorithm is optimized
for in memory frames, not file I/O.  I simply used getc and putc to
hack together a test to get the compression ratio for Dave's test.
Also, I work on a SPARCStation 1+ which does not support all the
arithmetic functions in hardware (or executes microcode for them I 
don't know), but the arithmetic code tends to run a lot slower on the
SPARC.

FZA has no practical upper limit on compression ratio.  It's lower
limit can be held at 12.5%, but in practise this serves no useful
purpose.  The arithmetic code in practise gets the compression
ratio down to 1:1 on any file >10K.

From owner-ppp-comp Wed May 26 07:48:06 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyMlo-0000Bya@daver.bungi.com>; Wed, 26 May 93 07:47 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 26 May 1993 10:34:30 -0400 (EDT)
Message-ID: <9305261434.AA15641@hobbit.gandalf.ca>
References: <<m0ny5qi-0000NtC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> 
> I don't care at all about the maximum compression ratio.  The RGF is
> what I care about (Rand Goodness Factor). This is the data output rate
> of the compressor.  You don't want a T1 running a compression algorithm
> getting 200:1 if the RGF is 996 (996 bytes per second)! I put the
> maximum compression rate in to show that even though we can theoretically
> get very good compression ratios, in real life (or at least as close as
> my test gets to real life - open to debate) we don't get very good 
> compression at all. 4:1 lives only in marketing documents.

The RGF is skewed for speed however.  In real life, I would say that
90% or higher data links run at 64Kbps or less.  
> 
> Done. See my table. Both input data rate, and output (the important one)
> are shown. RGF is perhaps the best metric, in my opinion.

Here is one of my objections with the RGF.  Suppose I get 4:1 with my
algorithm, but only keep the link half full, should my RGF be lower than
an algorithm that gets 2:1 but keeps the link full?
> 
> > 4. WORST case compression if compression cannot be backed out.
> >    BEST case compression only if it is a low number like 10:1.
> 
> Worst case compression is very hard to determine with some algorithms.
> LZW is probably 1:2 (assuming 16 bit codewords). 

Buzzt.  16-bit LZW will have codes assigned to approximately (65536 - 257) 
bigrams.  On a long uncompressable file, it should come close to 1:1.  On 
short files, the dictionary may not fill, and therefore it wouldn't be using
16-bit codes.

> Info-ZIP is probably around 1:3.

What?  It can store the file with very little overhead.

> All code is relatively small. Certainly less than 1000 lines of C code.
> Predictor can be done in about 20 lines of C. LZW in about 150 lines.
> Info-ZIP in about 700 lines or so.

A Info-ZIP like compressor could be done in about 300 lines or less if
you don't need a bazillion ifdef's for all the hosts it runs on. 
> 
> > 6. Patent/License problems if any.
> 
> This is hard. See my document for some of the issues. It *APPEARS* that
> we step on someone's toes no matter WHICH algorithm we choose to implement.
> I am working on this issue though.

Dave, I'd wish you'd tell me the patents that I don't know about.  You seem
to imply that FZA has patent problems, and I don't agree.  I don't want the
rest of the list to become biased away from LZ77.  It appears to have the
best chance of any algorithm to be free of patent issues. 

From owner-ppp-comp Wed May 26 08:34:35 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyNUg-0000CIa@daver.bungi.com>; Wed, 26 May 93 08:34 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 26 May 1993 08:34:06 PDT
Message-ID: <m0nyNUW-0000AiC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 26, 10:34, Dave Carr writes:]
> Here is one of my objections with the RGF.  Suppose I get 4:1 with my
> algorithm, but only keep the link half full, should my RGF be lower than
> an algorithm that gets 2:1 but keeps the link full?

The problem is keeping the link full, at 'appropriate' speeds (whatever
they are).  The RGF is concerned with the output data rate, which takes
into account compression ratio and compression speed. Again, private
algorithms can (and will) do whatever they want to - we are looking for
a 'standard' algorithm that can handle a wide variety of speeds. And
4:1 compression ratios live only in marketing documents.

> 
> Buzzt.  16-bit LZW will have codes assigned to approximately (65536 - 257) 
> bigrams.  On a long uncompressable file, it should come close to 1:1.  On 
> short files, the dictionary may not fill, and therefore it wouldn't be using
> 16-bit codes.

I agree. On long files, LZW will get close to 1:1. On packets, with a full
dictionary, and few matches, it will get 1:2. We aren't compressing files.
I know this only because I have seen it in the lab.  As the dictionary
gets full in the UNIX compress implementation, there is a delay before it
is flushed and reset. During this period, the compression ratio dropped
to around 0.662:1, and may have gone lower (I wasn't watching too carefully).

It is sometimes hard to figure out the maximum expanion ratio, since it
depends on a number of factors.  Naturally, any algorithm can be modified to
send uncompressed data if the compressed data is not smaller - the compress
code that I looked at did not have this capability.

> 
> > Info-ZIP is probably around 1:3.
> 
> What?  It can store the file with very little overhead.

Not on small packets, it can't. We aren't compressing files.

> 
> Dave, I'd wish you'd tell me the patents that I don't know about.  You seem
> to imply that FZA has patent problems, and I don't agree.  I don't want the
> rest of the list to become biased away from LZ77.  It appears to have the
> best chance of any algorithm to be free of patent issues. 

I'm not implying anything. I'm researching it, and my best current
information tells me that all LZ-related algorithms are covered by one or
more patents. The validity of the patents is questionable, of course. I'm
not a lawyer, and I don't pretend to understand which algorithms are or are
not covered. Besides, the issue is not going to depend on you and I agreeing
FZA isn't covered by patents, it is going to depend on the lawyers agreeing
that it isn't covered.  Arithmetic compression also appears to be covered by
IBM patents.  I know that you said that you have worked around at least the
STAC patent. 

Personally, I *LIKE* LZ77. It gets GREAT compression. I'm not biasing
anything. I've tried to present all algorithms that I know about, so
that everyone has the opportunity to review the data and come to their
own conclusions. Part of that data involves patents.  Even though I have
received assurance from the Info-ZIP authors that I can use the algorithm,
I would still run it through our lawyers before putting it in a product.

I'm also still waiting for your non-disclosure so that I can try your
algorithm.  Even if it doesn't make it into the standard, I'm still
interested in it as a potential product.

I'm also not limiting my search to the algorithms that I know about.
I'm *ACTIVELY* looking at different algorithms, and different ways
of doing compression.  I've even talked to the WEB^H^H^HTranSCend folks,
who claim "amazing, simply amazing" compression rates.  Yes, these are
commercial products.  No, it isn't a factor in the evaluation.  We must
look at all alternatives.  We can decide to eliminate candidates after
looking at them, but I don't think that we should ignore anyone.

The LZW algorithm in the compress algorithm is patented, but no one is
sure if IBM/Unisys will come after you if you use it. USL has apparently
paid a license fee to Unisys for the right to include compress in the
USL UNIX release. I'm checking on that now.

Predictor, to the best of my knowledge, is not patented, and cannot be
patented due to prior art.  I'll know more when the patent search is
complete.  This will be in 5 weeks or less.  I submitted the request about 2
weeks ago to the Novell patent lawyer, who of course is named (in the
tradition of all people involved in compression) Dave Black.


-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Wed May 26 08:38:07 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyNXz-00001na@daver.bungi.com>; Wed, 26 May 93 08:37 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 26 May 1993 10:43:56 -0400 (EDT)
Message-ID: <9305261443.AA16149@hobbit.gandalf.ca>
References: <<9305252125.AA05869@opal.acc.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> Let's not underestimate the need to deal with uncompressable data.  A

I agree.  The ability of an algorithm to recognize uncompressable data
and provide an alternate encoding if necessary to limit expansion
is very important.  One can either special case it, or use a method 
which limits or eliminates it altogether.

> At least in the bridge/routers, packets containing uncompressable data
> may be interleaved with all the others.  Also, encrypted data will
> typically look like a random stream and not be compressable.  With the
> potential growth of encryption in networking, this needs to be considered.
> Finally, I think our test cases need to deal with significant amounts
> of data-stream interleaving (at least for multiuser hosts and
> bridge/routers).  I'd like to know what packets of uncorrelated data
> and protocol control packets (i.e. ACKs) do to the history buffers
> and the compression efficiency of a particular algorithm.

Depends.  If you perform per-session compression, or use one table.
Interleaving can bring any good algorithm to it's knees if it only
uses one table for all sessions.  I would caution however that per
session compression tables is one claim of a patent I filed two years
ago (still pending).  There may be other patents filed or granted in
this area by others as well.

Acks are mostly eaten up by the header compression, so little data
should get through to the "data" compressor.  Control information
should be separated from the data of the packet, since it represent
represents another logical data stream.  Again, tread lightly, patent
claim territory.


From owner-ppp-comp Wed May 26 09:09:59 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyO30-00002Ia@daver.bungi.com>; Wed, 26 May 93 09:09 PDT
X-Path: um.cc.umich.edu!bill.simpson
From: "William Allen Simpson" <bill.simpson@um.cc.umich.edu>
To: ppp-comp@bungi.com
Subject: Re: Compression negotiation
Date: Wed, 26 May 93 11:43:51 EDT
Message-ID: <1219.bill.simpson@um.cc.umich.edu>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

I'm here, too.  Just watching.

But you all seem to be bogged down in arguing about compression algorithms
(again).

Let's nail down the negotiation technique.  Then, we have a framework
for testing.

Assume that there will be more than one negotiated.  Let's just test
algorithms, and check out the legal status, in the background.

Negotiation first, testing next, arguments afterward.

Bill.Simpson@um.cc.umich.edu

From owner-ppp-comp Wed May 26 09:13:23 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyO6N-0000Aea@daver.bungi.com>; Wed, 26 May 93 09:13 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression negotiation
Date: Wed, 26 May 1993 09:13:11 PDT
Message-ID: <m0nyO6K-0000CNC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression negotiation" on May 26, 11:43, "William Allen Simpson" writes:]
> 
> Let's nail down the negotiation technique.  Then, we have a framework
> for testing.

I'm pretty sure that the mechanism is sound. I'll post the details
later on today.





-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Wed May 26 10:59:18 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyPkZ-00002ea@daver.bungi.com>; Wed, 26 May 93 10:58 PDT
X-Path: rhyolite.wpd.sgi.com!vjs
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 26 May 93 11:25:03 -0600
Message-ID: <9305261725.AA17091@rhyolite.wpd.sgi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

dcarr@gandalf.ca (Dave Carr) writes:
> ...
> Depends.  If you perform per-session compression, or use one table.
> Interleaving can bring any good algorithm to it's knees if it only
> uses one table for all sessions.  I would caution however that per
> session compression tables is one claim of a patent I filed two years
> ago (still pending).  There may be other patents filed or granted in
> this area by others as well.


That seems likely to be a weak claim, based on prior art.
    -VJ compression.
    -Brad Clement's efforts with splay trees.

As well as painfully "obvious to one skilled in the field."


vjs



From owner-ppp-comp Wed May 26 10:59:22 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyPka-0000A2a@daver.bungi.com>; Wed, 26 May 93 10:58 PDT
X-Path: opal.acc.com!art
From: art@opal.acc.com (Art Berggreen)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 26 May 93 10:05:12 PDT
Message-ID: <9305261705.AA06634@opal.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

>
>Depends.  If you perform per-session compression, or use one table.
>Interleaving can bring any good algorithm to it's knees if it only
>uses one table for all sessions.  I would caution however that per
>session compression tables is one claim of a patent I filed two years
>ago (still pending).  There may be other patents filed or granted in
>this area by others as well.

Define session for a high traffic, multi-protocol router.  By
network level protocol? By protocol and and source/destination
address information? By protocol, src/dst address and src/dst
port information?  For something that is intimately entwined
with PPP, I don't think we should use any more information than
is naturally used inside PPP.  Also for interoperability reasons
i think we need to remember the KISS principle.  I also think
that if one divided the sessions on a fine enough basis to really
separate correlated data, that the memory requirements would
become a problem.  (I also don't like all the dynamic state
management that this implies)

>Acks are mostly eaten up by the header compression, so little data
>should get through to the "data" compressor.  Control information
>should be separated from the data of the packet, since it represent
>represents another logical data stream.  Again, tread lightly, patent
>claim territory.

IMHO (at least for multi-protocol bridge/routers) that VJ style
header compression won't be that popular combined with data compression.
Between that facts that the number of concurrent transport sessions may
be high, that many bridge/routeing protocols don't have header compression
defined, and that header compression doesn't do anything for the packet
data (which tends to cause the congestion) that header compression is a
nuisance when also doing full data compression.

Art


From owner-ppp-comp Wed May 26 12:01:14 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyQih-00005Ya@daver.bungi.com>; Wed, 26 May 93 12:00 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 26 May 1993 14:52:02 -0400 (EDT)
Message-ID: <9305261852.AA25500@hobbit.gandalf.ca>
References: <<m0nyNUW-0000AiC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> The problem is keeping the link full, at 'appropriate' speeds (whatever
> they are).  The RGF is concerned with the output data rate, which takes
> into account compression ratio and compression speed. Again, private
> algorithms can (and will) do whatever they want to - we are looking for
> a 'standard' algorithm that can handle a wide variety of speeds. And
> 4:1 compression ratios live only in marketing documents.

Okay.  Suppose I have this algorithm that can average oh so 2.695:1 for
example, and can keep a 768K bps link 75% full, and you have an algorithm
that, just for arguments sake, gets 1.668:1 but can keep the 768K link
full.  Which is better?
> 
> I agree. On long files, LZW will get close to 1:1. On packets, with a full
> dictionary, and few matches, it will get 1:2. We aren't compressing files.

Then why are we using your test setup????

> I know this only because I have seen it in the lab.  As the dictionary
> gets full in the UNIX compress implementation, there is a delay before it
> is flushed and reset. During this period, the compression ratio dropped
> to around 0.662:1, and may have gone lower (I wasn't watching too carefully).

But that is UNIX compress.  Try V.42 bis with paramter (correct me if I'm 
wrong, it's been a while), N7 to something other than the default of 1.
Then it learns faster.
> 
> It is sometimes hard to figure out the maximum expanion ratio, since it
> depends on a number of factors.  Naturally, any algorithm can be modified to
> send uncompressed data if the compressed data is not smaller - the compress
> code that I looked at did not have this capability.

Sometimes this is true.  The learning in LZW involves dictionary updates.
If uncompressed data is past down the link, you have 2 choices:

(1) Reset the dictionary.  (Not desireable);
(2) Make the decompressor COMPRESS the packet, thereby performing the same
updates to the dictionary.  Unfortunately, now you need a hash table on the
decompressor, doubling the memory requirements.

In the case of of LZSS it's true.  For LZH or LZA, the models are the problem.

> > What?  It can store the file with very little overhead.
> 
> Not on small packets, it can't. We aren't compressing files.

Well, yes and no.  We aren't compressing files.  Good, let's get a better test!

In general, most of the Info-Zip code can't be used for packet at a time
compression.  You need a fully dynamic back end to make it practical.
If you were instead to use LZH or LZA, you can calculate the worst case of
the Huffman or Arithmetic backend.  On uncompressable data, the output will
be mostly literals. 

> I'm not implying anything. I'm researching it, and my best current
> information tells me that all LZ-related algorithms are covered by one or
> more patents. 

There is a lot of stuff going on between Microsoft and STAC regarding old
patents they have acquired.  Of course, I wonder if they know about that
16.5 year old patent owned by IBM that covers the literal indication flag?
Of course, that might screw up Predictor as well.

Let's stop beating around the bush.  Quote patent numbers or refrain from
bashing LZ77.  I think you're wrong.  Plain and simple.  I am not alone.
I would hardly think that all the archivers are wrong.  

> The validity of the patents is questionable, of course. I'm
> not a lawyer, and I don't pretend to understand which algorithms are or are
> not covered. 

True.  There are multiple patents for the same algorithm too.  But prior art
sure does wonders.  

> Besides, the issue is not going to depend on you and I agreeing
> FZA isn't covered by patents, it is going to depend on the lawyers agreeing
> that it isn't covered.  

> Arithmetic compression also appears to be covered by
> IBM patents.  I know that you said that you have worked around at least the
> STAC patent. 

IBM's patents all deal with removing multiplies in their *approximate* 
arithmetic code, as do Rissanen (sp?) et al.  There are no patents on 
the original CACM code.  Talk to the authors, I did.  Research the 
original code, not all the improvements.  Start from square one as I
did 10 months ago.

A lot of the research on arithmetic compression, and the majority of
arithmetic patents deal with binary sources, and are useless to us.

STAC's patents are very simple (but elegant) and fast.  They claim a hash
refresh operation that is amortised on a per character basis.  This leads
to very predictable encoding times.  HOWEVER, Info-ZIP has a hash refresh,
so does the new version of Freeze.  Neither infringe on the STAC patents.
This is what I mean by work around.  The method I chose, I came up 
independently with, about 4 months after Jean-Loup Gailley thought of it.
Hardly a patentable idea, as it was "obvious" once you knew the problem.

> Personally, I *LIKE* LZ77. It gets GREAT compression. I'm not biasing
> anything. I've tried to present all algorithms that I know about, so
> that everyone has the opportunity to review the data and come to their
> own conclusions. Part of that data involves patents.  Even though I have
> received assurance from the Info-ZIP authors that I can use the algorithm,
> I would still run it through our lawyers before putting it in a product.

Fair enough.
> 
> I'm also still waiting for your non-disclosure so that I can try your
> algorithm.  Even if it doesn't make it into the standard, I'm still
> interested in it as a potential product.
>
It's managements decision, not mine. I'm not sure an NDA is appropriate here.
For example, looking at it may lock you out of (ever) fielding a competing LZ
based compressor. I'd rather ship you a couple of bridges to show it
works.  We have done that (quite successful I might add) to Compression
Technologies.  No problems with management on that.
 
> Predictor, to the best of my knowledge, is not patented, and cannot be
> patented due to prior art.  I'll know more when the patent search is
> complete.  This will be in 5 weeks or less.  I submitted the request about 2
> weeks ago to the Novell patent lawyer, who of course is named (in the
> tradition of all people involved in compression) Dave Black.
> 
Have you found that Telebyte patent I thought might cause problems? 
There are a lot less patents on Markov type encoding, and I agree that 
Predictor should be easier to prove patent-free.  But LZ seems to be
past this point too.  It just took longer.



From owner-ppp-comp Wed May 26 12:01:17 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyQie-00005La@daver.bungi.com>; Wed, 26 May 93 12:00 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression negotiation
Date: Wed, 26 May 1993 15:02:15 -0400 (EDT)
Message-ID: <9305261902.AA27350@hobbit.gandalf.ca>
References: <<1219.bill.simpson@um.cc.umich.edu>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> 
> I'm here, too.  Just watching.

Come on, the rest of you, out that closet with Bill (nothing derogatory
intended).  Let's get more points of view!

> But you all seem to be bogged down in arguing about compression algorithms
> (again).

True.  Shades of last summer.  
> 
> Let's nail down the negotiation technique.  Then, we have a framework
> for testing.

Agreed.
> 
> Assume that there will be more than one negotiated.  Let's just test
> algorithms, and check out the legal status, in the background.
> 
> Negotiation first, testing next, arguments afterward.

Push(arguments); 
Push(testing);
Push(negotiation);
TaskSwitch();
Pop(negotiation);
PPP_standardize();

(p.s. What happened to LAPB? Is this put on the stack too?)

We seem to be in agreement (sort of) on the negotiation.  We need
parameters for most algorithms.  We need a generic facility.  If 
Predictor has no parameters fine, just a null argument list.

From owner-ppp-comp Wed May 26 12:36:19 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyRGh-0000EQa@daver.bungi.com>; Wed, 26 May 93 12:36 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression negotiation
Date: Wed, 26 May 1993 14:03:51 -0400 (EDT)
Message-ID: <9305261803.AA16639@hobbit.gandalf.ca>
References: <<m0nyADi-00005JC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> The private algorithms can do whatever they want. The common algorithm should
> not have any options, in my opinion.
> 
Nothing programmable in Predictor? Ahhhh.

The default algorithm should be usable on a variety of platforms.
Any LZ77 algorithm offers such flexability (window size at least).
The encoder can also limit the amount of searching depending on
the processor available.

From owner-ppp-comp Wed May 26 12:50:53 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyRUb-0000Bua@daver.bungi.com>; Wed, 26 May 93 12:50 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 26 May 1993 12:50:17 PDT
Message-ID: <m0nyRUR-0000EPC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 26, 14:52, Dave Carr writes:]
> 
> Okay.  Suppose I have this algorithm that can average oh so 2.695:1 for
> example, and can keep a 768K bps link 75% full, and you have an algorithm
> that, just for arguments sake, gets 1.668:1 but can keep the 768K link
> full.  Which is better?

The one that keeps the link full. Lets look at it a different way. The
algorithm that gets 2.695 can accept, based on this example, a maximum of
1.552 Mbps of input data. In other words, it works best with links of
512 Kbps or less. At T1, the link speed is 1.5 Mbps - you are still going
to get 1.5 Mbps throughput (assuming you have CPU left), but your
compression ratio is still 2.695:1. On an E1 link (2.048 Mbps), you will
still get 1.552 Mbps throughput - and still have a compression ratio of
2.695:1.

The algorithm that gets 1.668:1 can get 2.592 Mbps on a 1.5 Mbps T1 link.
Or 3.4161 Mbps on an E1 link.

On a 9600 bps link, almost any algorithm would appear to be a good choice.

Diane posted a really cool expose on this yesterday.

> > 
> > I agree. On long files, LZW will get close to 1:1. On packets, with a full
> > dictionary, and few matches, it will get 1:2. We aren't compressing files.
> 
> Then why are we using your test setup????

My test file consists of packets, arranged in a file.  As Diane pointed out,
compression ratio does not tell the whole story - you have to put it up
on a router in order to get a good feel for the algorithm.

Another point of view is that we are using my test file because no one
has offered anything better.

> 
> But that is UNIX compress.  Try V.42 bis with paramter (correct me if I'm 
> wrong, it's been a while), N7 to something other than the default of 1.
> Then it learns faster.

That was what I was talking about. I have 'recommended' (in the "Data
Compression for Wide Area Networks") three algorithms. I have serious
experience with those three. I don't have as much experience with the
other algorithms, since I didn't go any further than running the test
files through them, and getting the results.

> Well, yes and no.  We aren't compressing files.  
> Good, let's get a better test!

I'm waiting!  *YOU* go through running all the tests, on all the different
algorithms, and write a nice, concise report.

> Let's stop beating around the bush.  Quote patent numbers or refrain from
> bashing LZ77.  I think you're wrong.  Plain and simple.  I am not alone.
> I would hardly think that all the archivers are wrong.  

Great! I have quoted patent numbers in my document. I have a stack of them
on my desk. I think that it is covered - but it doesn't matter *AT ALL*
what I think. Period. It *ONLY* matters what the lawyers think is
covered. AND I AM NOT BASHING ANYTHING. Dave - if you want to take this
any further, take it to private email.


-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Wed May 26 12:59:42 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyRdC-00001Ka@daver.bungi.com>; Wed, 26 May 93 12:59 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 26 May 1993 15:39:59 -0400 (EDT)
Message-ID: <9305261940.AA01319@hobbit.gandalf.ca>
References: <<9305261725.AA17091@rhyolite.wpd.sgi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> 
> dcarr@gandalf.ca (Dave Carr) writes:
> > ...
> > uses one table for all sessions.  I would caution however that per
> > session compression tables is one claim of a patent I filed two years
> > ago (still pending).  There may be other patents filed or granted in
> > this area by others as well.
> 
> 
> That seems likely to be a weak claim, based on prior art.
>     -VJ compression.
>     -Brad Clement's efforts with splay trees.
> 
> As well as painfully "obvious to one skilled in the field."

VJ does not deal with per-session tables.  Perhaps Splay Trees does.
If I thought my claims were covered by these, I wouldn't have filed.
I'm sure the patent office will tell me :-)  

There are a lot of weak patents, noteably hashing for LZ77.  There
is even one with covers eliminating hashing by using a 64K lookup
table.  You don't even have to be skilled in the art to think that
one up.

From owner-ppp-comp Wed May 26 12:59:47 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyRdG-00004Da@daver.bungi.com>; Wed, 26 May 93 12:59 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Can you trust the compression ratio?
Date: Wed, 26 May 1993 14:01:06 -0400 (EDT)
Message-ID: <9305261801.AA16194@hobbit.gandalf.ca>
References: <<9305252318.AA24702@va.SJF.Novell.COM>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> Seems to me 
> that what users want is higher throughput through the WAN interface, 
> thereby getting lower phone bills, and a quicker return of the 
> command prompt.  So, if a compression algorithm can get 900:1 
> compression, but takes more processor time to compress or 
> decompress the data than without compression, why bother?  

Why bother what?  Running the algorithm, or running it on that
processor.  What you are implying is that one algorithm is not
suited to the processor and therefore can easily be discarded
as a likely candidate.

Dave and Diane, what we have found is the user is willing to pay 
the price for a really good compression algorithm.  I've been working
in the field for about 5 years and found that companies that have
really good compression don't have a very hard sell charging $5000
extra over a so-so compression device.  The reason is the line
costs are recovered in a few months.  Capital equipment costs don't
matter, but the recurring line costs do.

If it costs an extra $200 in hardware, it's money well spent. 
This can be payed back in less than a month in line costs.

We currently ship in excess of 600 compression bridges a month.
Not once has a customer asked for T1 compression.  While I believe
there is a market for T1 compression, it's not a large market
today compared to 64K or 128K compression.

My point is if you want to support higher speeds, I don't think
changing the compression algorithm is the preferred answer. It is
one solution, but not the only one.  I would much prefer a
hardware assist for a good algorithm, rather than sacrificing
compression.

> The user won't see an increase in throughput, which is what 
> they want.  This (that users want increased throughput on the line) 
> should be obvious to everyone reading this.

Then, it should be as easy to see that a user will want the algorithm
that does 3:1 instead of 2:1.

The costs can be calculated and tradeoffs made.  When the compression
enters the exotic range requiring 200 MIPs out a processor, then it
may not be worth the extra effort and cost.  But please, let's get
close to state-of-the-art.
> 
> Actual compression ratios vary greatly from the theoretical 
> maximum compression ratios advertised by various algorithms.  

And sometimes they are exceeded (rarely).  And we try to educate
our marketting types to not fight this war with fictious numbers.
But it's kind of hard when the competitors using STAC are claiming
4:1.

> Timer is a program which prints the elapsed time for the 
> copy operation.  No file compare (fc) errors were found.  
> In some cases, two speed ratios are given, the first for the 
> copy to the server (f:), the second for the copy to the 
> client (d:).  Two sets of tests were run, with compression 
> enabled and with compression disabled.  The speed ratio was 
> obtained by dividing (the time the copy took with compression 
> disabled) by (the time the copy took with compression enabled).
> Average speed ratios for a like set of files were obtained by 
> summing the speed ratios and dividing by the number of files*2.  
> For example, the average speed ratio for the files bib and 
> news was 1.96 ((2 + 2 + 2.32 + 1.52)/4).  

This produces very inconsistent results and certainly does not
reflect the performance of the algorithm.  What it shows is how
time sensitive IPX, even with PBURST, is to delay.  However,
IP running on a SUN workstation is not sensitive to the delay.
Even with PBURST, Novel cannot keep a 64 Kbps link full.  Is this
the fault of the compression?  I think not.  

I suggest that we remove this Novell bias from the measurement.

> Next, we tried LZW.  Test again.  The COMPRESSION RATIO WAS 
> HIGHER with LZW than with Predictor, but THE INCREASE IN 
> THROUGHPUT WAS LESS THAN PREDICTOR.  Presumably, the code 
> took longer to compress the packet than the time saved with 
> the compression.  That data is not included here, but it 
> can be dug up, if need be.

Sure.  But this shows the hardware is difficient, not the
algorithm.  If I ran PREDICTOR on a Z80, would it keep a T1
link full?  

> We also ran tests against another vendor's (who shall remain
> nameless) product.  This product had a hardware support module, 
> and used an LZW algorithm.  The compression ratio that they showed
> was always better, but the time it took to transfer the file 
> was usually longer.  The performance data varied, and
> in some cases was faster than our algorithm.  But, in 90% of the 
> cases, out Predictor algorithm was faster, even though the 
> compression ratio displayed was lower.

Not speaking for other vendors, but all compression product were
not created equal.  A comparison to one vendors equipment is 
hardly conclusive.  (Challenge against our 5220 implied :-))

> We also ran tests with TCP/IP, using a batch file with FTP puts 
> and gets.

If you're running PC/TCP, don't.  It can't tolerate delays very
well either.  You'll also get different results using batch files.
> 
> What I think is needed is a reliable compressor, that can be 
> coded efficiently, and is easy to understand (to avoid 
> interoperability problems).  

True.  Or all run the same code and know they'll interoperate.
> 
> The goal behind data compression is to move the data across 
> the line faster.  

No!  There are 2 numbers, throughput and delay.  You seem to
favour delay.  
> 
> >From what I've seen, theoretical compression ratios are not 
> reliable indicators of increased throughput.  

In your test setup, I found the same.  But it is the delay.
WARNING** Novell bashing **
Too bad PBURST didn't do it right!

> If any of you have data refuting that statement, please feel 
> free to post it on this mailing list.....  

Indeed!

I suggest the following setup.  Connect 2 or 3 clients across the
same link to the server.  Now repeat the tests.  You'll find that
the link is now kept full with either algorithm, and the LZW will
probably beat Predictor.  Why?  Because the link is now the
bottleneck.  PBurst can now keep the link full.  This assumes the
compression is not CPU bound.

One final note.  Getting a prompt back on a T1 link is more a
function of MTU and protocol priority.  It has little to do with
how fast your compression runs.


From owner-ppp-comp Wed May 26 12:59:53 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyRdD-0000BXa@daver.bungi.com>; Wed, 26 May 93 12:59 PDT
X-Path: acc.com!fbaker
From: fbaker@acc.com (Fred Baker)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 26 May 93 13:05:23 PDT
Message-ID: <9305262005.AA24842@saffron.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

>> IMHO (at least for multi-protocol bridge/routers) that VJ style
>> header compression won't be that popular combined with data compression.
>> Between that facts that the number of concurrent transport sessions may
>> be high, that many bridge/routeing protocols don't have header compression
>> defined, and that header compression doesn't do anything for the packet
>> data (which tends to cause the congestion) that header compression is a
>> nuisance when also doing full data compression.

besides which, it seems like the information VJ pulls out and collapses
into a small session identifier would become a small number of
frequently seen octet strings, which would be replaced by a code by the
data compression algorithm.

From owner-ppp-comp Wed May 26 13:00:30 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyReC-00009Wa@daver.bungi.com>; Wed, 26 May 93 13:00 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression negotiation
Date: Wed, 26 May 1993 13:00:20 PDT
Message-ID: <m0nyRe8-00009QC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression negotiation" on May 26, 14:03, Dave Carr writes:]
> > The private algorithms can do whatever they want. The common algorithm should
> > not have any options, in my opinion.
> > 
> Nothing programmable in Predictor? Ahhhh.

Actually, there is. The amount of space used for the 'history', for one.

I just wanted to keep it as simple as possible for the common case.

Standardized negotiation of options is only possible if you have a
free-format data area, as each of the compression algorithms supported
is going to require vastly different options. For example, a LZW
varient might require max codeword size, max match length, and
dictionary reset/reuse point. Predictor might require 4 parameters.
Etc.

Anyone have good ideas?



-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Wed May 26 13:08:15 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyRle-0000BLa@daver.bungi.com>; Wed, 26 May 93 13:08 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 26 May 1993 16:06:07 -0400 (EDT)
Message-ID: <9305262006.AA02508@hobbit.gandalf.ca>
References: <<9305261705.AA06634@opal.acc.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> Define session for a high traffic, multi-protocol router.  By

Sessions are relative to the point of view.  The deeper you go, the
better the compression gets.  This should be obvious.  Ideally,
data compressed at the source is compressed optimally.  Down in
a network device, it may be necessary to into the higher layers
to separate out the sources.

> For something that is intimately entwined
> with PPP, I don't think we should use any more information than
> is naturally used inside PPP.  

It's there.  What more do you need?  Yes, you must remember state.
Header compression already goes part way.  This is a natural
(some would say obvious) extension.

> Also for interoperability reasons
> i think we need to remember the KISS principle.  

It's not that hard.  See VJ and CIPX.

> I also think
> that if one divided the sessions on a fine enough basis to really
> separate correlated data, that the memory requirements would
> become a problem.  

Yes, it's the main problem.  But there is an elegant solution.  It's
not perfect, but it does substantially reduce the memory requirements
while getting most of the possible compression.  When I have the IPX
version working, I'll try it out on Dave Rands file and post the
results.

>(I also don't like all the dynamic state management that this implies)

No worse than VJ.  In fact, since I rely on a error-free transport, the
IP code is much shorter.
> 
> IMHO (at least for multi-protocol bridge/routers) that VJ style
> header compression won't be that popular combined with data compression.
> Between that facts that the number of concurrent transport sessions may
> be high, that many bridge/routeing protocols don't have header compression
> defined, and that header compression doesn't do anything for the packet
> data (which tends to cause the congestion) that header compression is a
> nuisance when also doing full data compression.

IMHO, you are incorrect on all these points.  First, header compression
code runs much faster than the generic data compression at least on my
compressor.  It is therefore a speed enhancement.  Any time you can take
advantage of knowledge of the source data stream, you win.  For example,
see VJ comments on using LZ on the header.  He can do a lot better with
fewer lines of code executed.

Second, separating the data stream does improve the compression.  And
it isn't a minor amount.  VJ style compression runs at what 4K bytes?  
Even at 64K for 256 sessions, it's hardly going to break the bank.
Running the header code costs little or nothing, but gains a lot.

The fact that header compression takes so long to write is partly
because of the non-reliable link, and the lack of coherent stategies
employed.  Just look at all the special cases in the VJ code.  It's
simpler to compress it layer by layer, not doing IP/TCP in some
cases.  And it assumes your routing IP.  If you want to bridge a 
protocol, you have to deal with MAC layers as well.

The complexity of the compression protocols is not the fault of the
protocol.  It's our fault.  We need a simpler approach. 
 
Anyway, I've got to get started on my IP/TCP/UDP compression code.
It has to be finished by the end of the week :-)



From owner-ppp-comp Wed May 26 13:31:45 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyS8C-00004na@daver.bungi.com>; Wed, 26 May 93 13:31 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Can you trust the compression ratio?
Date: Wed, 26 May 1993 13:31:10 PDT
Message-ID: <m0nyS7z-00003IC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Can you trust the compression ratio?" on May 26, 14:01, Dave Carr writes:]
> My point is if you want to support higher speeds, I don't think
> changing the compression algorithm is the preferred answer. It is
> one solution, but not the only one.  I would much prefer a
> hardware assist for a good algorithm, rather than sacrificing
> compression.

In order to support a common algorithm, it is my opinion that we
need a software-only solution. Private algorithms can do what
they like, including external hardware.

> 
> > The user won't see an increase in throughput, which is what 
> > they want.  This (that users want increased throughput on the line) 
> > should be obvious to everyone reading this.
> 
> Then, it should be as easy to see that a user will want the algorithm
> that does 3:1 instead of 2:1.

Only if it is fast enough to fill the link. When the LZW algorithm
was installed, it could not fill the link.  It probably was due
to my poor coding, and a couple of weeks work may have fixed it.

> 
> I suggest that we remove this Novell bias from the measurement.

No problem, Dave. Run the tests on your stuff, and report the
results! You have done that with compression ratio, how about with
time?

> Sure.  But this shows the hardware is difficient, not the
> algorithm.  If I ran PREDICTOR on a Z80, would it keep a T1
> link full?  

(rapidly calculating in my lightning-fast, bear-trap TRS-80)
Yes, but it would have to run at over 13 Mhz. 20 Mhz Z80s are
available, as are 64180's.


> > We also ran tests with TCP/IP, using a batch file with FTP puts 
> > and gets.
> 
> If you're running PC/TCP, don't.  It can't tolerate delays very
> well either.  You'll also get different results using batch files.

She used LWP-dos.

> I suggest the following setup.  Connect 2 or 3 clients across the
> same link to the server.  Now repeat the tests.  You'll find that
> the link is now kept full with either algorithm, and the LZW will
> probably beat Predictor.  Why?  Because the link is now the
> bottleneck.  PBurst can now keep the link full.  This assumes the
> compression is not CPU bound.

Nope. We tried this. First thing I thought of.



-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Wed May 26 14:20:59 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyStz-0000AIa@daver.bungi.com>; Wed, 26 May 93 14:20 PDT
X-Path: acc.com!fbaker
From: fbaker@acc.com (Fred Baker)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 26 May 93 14:14:23 PDT
Message-ID: <9305262114.AA25231@saffron.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

>> > Okay.  Suppose I have this algorithm that can average oh so 2.695:1 for
>> > example, and can keep a 768K bps link 75% full, and you have an algorithm
>> > that, just for arguments sake, gets 1.668:1 but can keep the 768K link
>> > full.  Which is better?
>> 
>> The one that keeps the link full.

I disagree; the one that keeps the link full can transfer 768*1.668 =
1281 KBPS; the other can transfer 768*2.695*.75 = 1552 KBPS on the 768
KBPS link.  The best one is the one that moves the most decompressed
bits across.

From owner-ppp-comp Wed May 26 14:30:46 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyT3C-0000BMa@daver.bungi.com>; Wed, 26 May 93 14:30 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Wed, 26 May 1993 14:29:58 PDT
Message-ID: <m0nyT2w-00008JC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 26, 14:14, Fred Baker writes:]
> >> > Okay.  Suppose I have this algorithm that can average oh so 2.695:1 for
> >> > example, and can keep a 768K bps link 75% full, and you have an algorithm
> >> > that, just for arguments sake, gets 1.668:1 but can keep the 768K link
> >> > full.  Which is better?
> >> 
> >> The one that keeps the link full.
> 
> I disagree; the one that keeps the link full can transfer 768*1.668 =
> 1281 KBPS; the other can transfer 768*2.695*.75 = 1552 KBPS on the 768
> KBPS link.  The best one is the one that moves the most decompressed
> bits across.

(Hastly removing my foot from my mouth)

The best one is the one that moves the most decompressed bits across,
for the desired link speed. This is why in my original document, I
suggested several different algorithms, depending on the link speed.
On lower speed links, you can afford to take more time to compress
data - on higher speed links, you need higher throughput, hardware
assist, or a faster CPU.

A LZW compressor could fill the 768Kbps link (based on better code), and
would achieve 1.716 Mbps.

You should always try to keep the link full, and select the best algorithm
for the job. 

We are trying to specify at least one algorithm that can run across a
range of speeds. Perhaps it would be better to suggest more than one?


-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Wed May 26 16:01:09 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyUSx-00008oa@daver.bungi.com>; Wed, 26 May 93 16:00 PDT
X-Path: Novell.COM!Diane_Heckman
From: Diane_Heckman@Novell.COM (Diane Heckman)
To: ppp-comp@bungi.com
Subject: Re: Can you trust the compression ratio?
Date: Wed, 26 May 93 15:24:25 PDT
Message-ID: <9305262224.AA19801@va.SJF.Novell.COM>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

Dave -
   Are you out there?  I'm formulating a response to Mr. Carr.

Diane

From owner-ppp-comp Wed May 26 16:36:41 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyV1M-0000O3a@daver.bungi.com>; Wed, 26 May 93 16:36 PDT
X-Path: acc.com!fbaker
From: fbaker@acc.com (Fred Baker)
To: ppp-comp@bungi.com
Subject: Perspective
Date: Wed, 26 May 93 16:38:08 PDT
Message-ID: <9305262338.AA27474@saffron.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

Maybe it's time to go back to first principles and decide what problem
it is that we want to solve. I'll take a crack at a definition and
anybody that doesn't like it can tell me what words to change to what
other words.

In North America, something like 80% of leased lines run at or below 64
KBPS.  Of those, probably half are 14.4 using trellis encoding, and
half are 56 KBPS analog or DDS circuits.  Internationally, the
percentage of links at or below 64 KBPS is close to 100%.

In North America, the cost of a T1 link is about equal to the cost of
three or four 64 KBPS links.  The cost of a fractional DS1 link is less
than the cost of an equivalent number of 64 KBPS links, but not a lot
less.

The marketplace demand is not to get T1 across a 9.6 line, it's to
delay the expense of increasing bandwidth by getting more bits across
the link you've already got.  We're looking at minimizing $$$$, not
maximizing engineering elegance.

The guy who is trying to save a buck on his dial or leased line isn't
really interested in adding bunches of memory or interface cards to his
system to do it.  That looks like he's spending money.  He will do so
if he has to - he can be argued into the position that a one time
expense is nominal compared to a monthly tarrif bill - but he'd really
rather not.  And he's really not going to be out there measuring the
compression performance; if both his users and his accountant are
happy, he's happy.

For traffic mix, we're not discussing the ANS backbone here, but we
*are* discussing traffic that may be traversing multiple paths or which
may not be easily sorted out into individual sessions.  In any given
second, we are talking about a NetWare SAP update, the odd OSPF HELLO,
a Mac Chooser sending its "I'm here is anybody else" and getting the
replies, one or two mailgrams in flight, and a file transfer, maybe
two.  If we're lucky, one of those file transfers was moving a
compressed file.  Or maybe the mailgram was sent by MS Mail and has a
compressed enclosure.  But don't bet on it - odds are it's NNTP.  The
next second, it might be a different mailgram, a new file transfer
might start or the old one end, but that's not till then.

So the requirements are *not* that it acheive 200:1 compression on some
files, but that

    1)  it acheives good (> 2:1) compression consistently,

    2)  when it sees uncompressable data it acheives pretty close to 1:1

    3)  As Diane points out, the compress/decompress latency should
	balance favorably against the improved transmission latency, so
	that echoplex delay doesn't get worse - in the worst case it
	stays the same, and preferably it gets better

    4)  It needs to not assume that it can take over the machine for an
	extended period; the machine might need to be juggling other
	balls in real time as well

    5)  It needs to be something that nobody's going to sue anybody
	about

    6)	It needs to be specified in a publicly available document.
	About the extent of the proprietary ownership of any algorithms
	that can reasonably remain is that the documentation might need
	to show that "we use the algorithm originally designed by the
	Prometheus company" or a boot-time banner might say the same
	thing. We can talk about royalties, but realistically there's
	going to be a lot of folks who don't bother to pay them; see (5).

    7)  It needs to be simple & well enough specified enough that a
	large number of people can implement it from a publicly
	available document and all get it right
			   ~~~~~~~~~~~~~~~~~~~~

    8)  It needs to be implementable in a system which is using the
	same algorithm on a reasonable number (5 or 10) of lines or
	frame relay virtual circuits and doesn't have infinite memory.

Let me throw out a war story to help with this perspective.  We have an
OEM who has a customer with a global network.  Some folks on this list
are going to recognize it as I describe it.  This network runs IP,
AppleTalk, and XNS IDP, and probably some things they don't tell us
about.  In the US, it is largely 256 KBPS, T1, and T3 links, with the
odd 56 KBPS tail circuit.  Everywhere else it's almost exclusively 64
KBPS.

A few months ago, they turned on our fair queuing code.  They got
immediate feedback from Europe saying "whatever you broke, don't fix
it." It seems that when folks started FTPing files around (this company
has a predilection towards massive binary files) the interactive user's
sessions used drop like flies.  With fair queuing, they see a change in
echoplex delay but are able to continue working effectively.  File
transfers improved, too: due to the way transports adapt themselves to
bandwidth, two FTPs sharing a link rarely split it 50/50 - it's usually
more like 80/20.  With fair queuing, it's 50/50, or 33/33/33, etc, so
FTP behavior is much more predictable.

A couple of weeks ago, they turned on compression on all links at or
below 64 KBPS (we use Stac compression on LAPB).  Echoplex delay was
cut in half and file transfers took less time.  They got instant
feedback from Europe on that as well.

Last weekend, they turned on OSPF in Europe (they have had it running
all month in North America; this makes the OSPF portion a 190 router
internet...).  Monday May 24 was the first Monday morning in recent
memory that NetOps got no gripes about congestion from anywhere in the
world.

That's the target.

From owner-ppp-comp Wed May 26 19:37:34 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyXqQ-00002Ka@daver.bungi.com>; Wed, 26 May 93 19:37 PDT
X-Path: Novell.COM!Diane_Heckman
From: Diane_Heckman@Novell.COM (Diane Heckman)
To: ppp-comp@bungi.com
Subject: Re: Can you trust the compression ratio?
Date: Wed, 26 May 93 17:08:48 PDT
Message-ID: <9305270008.AA22251@va.SJF.Novell.COM>
Reply-To: ppp-comp@bungi.com
Precedence: bulk


First an apology for sending the where-are-you message to 
the whole list rather than just Dave.  But, with the volume on this
list I doubt anyone noticed.......



From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Can you trust the compression ratio ?

>> Diane's post
> Dave Carr's comments



>> Seems to me
>> that what users want is higher throughput through the WAN interface,
>> thereby getting lower phone bills, and a quicker return of the
>> command prompt.  So, if a compression algorithm can get 900:1
>> compression, but takes more processor time to compress or
>> decompress the data than without compression, why bother?
  
>Why bother what?  Running the algorithm, or running it on that
>processor.  What you are implying is that one algorithm is not
>suited to the processor and therefore can easily be discarded
>as a likely candidate.
"Why bother running data compression" is what I meant.  I assume
most users don't have the option of changing processors
without buying new equipment and spending $$$.

>Dave and Diane, what we have found is the user is willing to pay
>the price for a really good compression algorithm.  I've been working
>in the field for about 5 years and found that companies that have
>really good compression don't have a very hard sell charging $5000
>extra over a so-so compression device.  The reason is the line
>costs are recovered in a few months.  Capital equipment costs don't
>matter, but the recurring line costs do.

Dave. What I thought this mailing list was about was choosing 
an algorithm that everyone can implement (probably on 
their existing platforms) so that implementations from 
different vendors can interoperate.  Not every vendor has
the option to change processors, or do a hardware add-on.
I didn't have the impression that we were limiting any vendor
from using their wonder compressor and making heaps-o-money.

>If it costs an extra $200 in hardware, it's money well spent.
>This can be payed back in less than a month in line costs.
But assuming every PPP vendor will add extra hardware to
support the agreed upon compression method seems ludicrous
to me.   

>We currently ship in excess of 600 compression bridges a month.
>Not once has a customer asked for T1 compression.  While I believe
>there is a market for T1 compression, it's not a large market
>today compared to 64K or 128K compression.
Again, probably true.  But markets change.  Remember the first
IBM-PC that seemed so *fast*?  It seems better to choose an algorithm
that does something for higher line speeds.  For us, LZW, at higher
line speeds transferred the file in more time with compression than
without.
    

>> The user won't see an increase in throughput, which is what
>> they want.  This (that users want increased throughput on the line)
>> should be obvious to everyone reading this.
     
>Then, it should be as easy to see that a user will want the algorithm
>that does 3:1 instead of 2:1.
	   ----
Sure, that's easy to see, but does anyone on this mailing 
list know what these algorithms really DO?  I tried to 
share what we learned during implementation.  If the algorithm 
claims to compress at 3:1 instead of 2:1, but in reality takes 
more time to run through the compression code, send the packet, 
and decompress the packet, then where is the benefit?  
	    

>> Actual compression ratios vary greatly from the theoretical
>> maximum compression ratios advertised by various algorithms.

>And sometimes they are exceeded (rarely).  And we try to educate
>our marketting types to not fight this war with fictious numbers.
>But it's kind of hard when the competitors using STAC are claiming
>4:1.
This mailing-list shouldn't fall into the same trap as the marketing types.
If they are comparing theoritical compression ratios without regard to
realistic results, then caveat emptor.  But (I presume), the people on
this mailing list are engineers who believe in what they see, not
marketing hype.
	     
>> Timer is a program which prints the elapsed time for the
>> copy operation.  No file compare (fc) errors were found.
>> In some cases, two speed ratios are given, the first for the
>> copy to the server (f:), the second for the copy to the
>> client (d:).  Two sets of tests were run, with compression
>> enabled and with compression disabled.  The speed ratio was
>> obtained by dividing (the time the copy took with compression
>> disabled) by (the time the copy took with compression enabled).
>> Average speed ratios for a like set of files were obtained by
>> summing the speed ratios and dividing by the number of files*2.
>> For example, the average speed ratio for the files bib and
>> news was 1.96 ((2 + 2 + 2.32 + 1.52)/4).
	      
>This produces very inconsistent results and certainly does not
>reflect the performance of the algorithm.  What it shows is how
>time sensitive IPX, even with PBURST, is to delay.  However,
>IP running on a SUN workstation is not sensitive to the delay.
Good, do you have similar results with IP on a SUN workstation?
Post them!
>Even with PBURST, Novel cannot keep a 64 Kbps link full.  Is this
**It was a 56Kb/s link.  Sorry for omitting it earlier.**
Actually, we get up to 135 Kb transferred per second on a 
56Kb/s link, using compression.
>the fault of the compression?  I think not.
The compression algorithm does not exist separately, so 
why measure it that way?  PPP is not very useful without 
an upper layer protocol to send the data.  Those upper 
layer protocols all have some built-in delays or implementation 
problems.  Some were not implemented to work well on a 
low speed link.
	       
>I suggest that we remove this Novell bias from the measurement.
Great!  Provide us some real data without it!
		
>> Next, we tried LZW.  Test again.  The COMPRESSION RATIO WAS
>> HIGHER with LZW than with Predictor, but THE INCREASE IN
>> THROUGHPUT WAS LESS THAN PREDICTOR.  Presumably, the code
>> took longer to compress the packet than the time saved with
>> the compression.  That data is not included here, but it
>> can be dug up, if need be.
	 
>Sure.  But this shows the hardware is difficient, not the
>algorithm.  If I ran PREDICTOR on a Z80, would it keep a T1
>link full?
But we can't change our hardware.  I assume most other PPP
vendors out there can't either.
		  
		   
>> We also ran tests with TCP/IP, using a batch file with FTP puts
>> and gets.
		    
>If you're running PC/TCP, don't.  It can't tolerate delays very
>well either.  You'll also get different results using batch files.
No, I was using Novell's LAN Workplace for DOS, an inhouse 
version with a bug fix that cleans up a problem with the 
silly window syndrome.

>> What I think is needed is a reliable compressor, that can be
>> coded efficiently, and is easy to understand (to avoid
>> interoperability problems).
	     
>True.  Or all run the same code and know they'll interoperate.
Good.  Glad to see that we agree on something.

>> The goal behind data compression is to move the data across
>> the line faster.
		      
>No!  There are 2 numbers, throughput and delay.  You seem to
>favour delay.
I'm not understanding what you mean here.  Please define.


>>From what I've seen, theoretical compression ratios are not
>> reliable indicators of increased throughput.
		       
>In your test setup, I found the same.  But it is the delay.
>WARNING** Novell bashing **
>Too bad PBURST didn't do it right!
Part of the reason for my posting was to point out that
"compression ratio" is not the end-all and be-all figure
hat we should be concerned about.  Sure, IPX has delays,
IPX doesn't deal too well with low speed links, PBURST is
not God's gift.  I'm certain that other protocols and products 
have like problems.  None of our products operate in a perfect 
world, and there are things beyond our control as engineers 
that will effect the speed of a compression algorithm.

			
>>From what I've seen, theoretical compression ratios are not
>> reliable indicators of increased throughput.
>> If any of you have data refuting that statement, please feel
>> free to post it on this mailing list.....
>Indeed!
I don't understand that as a refutation of the above statement.



Diane
			 


From owner-ppp-comp Wed May 26 20:54:10 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyZ2Y-00002Ba@daver.bungi.com>; Wed, 26 May 93 20:54 PDT
X-Path: MorningStar.Com!karl
From: Karl Fox <karl@MorningStar.Com>
To: ppp-comp@bungi.com
Subject: Re: Compression negotiation
Date: Wed, 26 May 93 23:52:59 -0400
Message-ID: <9305270352.AA01996@remora.MorningStar.Com>
Reply-To: ppp-comp@bungi.com
Organization: Morning Star Technologies, Inc.
Precedence: bulk

Dave Rand wrote:

   Standardized negotiation of options is only possible if you have a
   free-format data area, as each of the compression algorithms supported
   is going to require vastly different options. For example, a LZW
   varient might require max codeword size, max match length, and
   dictionary reset/reuse point. Predictor might require 4 parameters.
   Etc.

   Anyone have good ideas?

I prefer assigning a separate LCP option to each compression algorithm
-- it's `the PPP way'.  I doubt we'll have 50 different algorithms; in
the long run, a really pumped-up box *might* be able to offer two or
three different choices.

From owner-ppp-comp Thu May 27 09:44:06 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyl3W-000010a@daver.bungi.com>; Thu, 27 May 93 09:43 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Thu, 27 May 1993 09:21:26 -0400 (EDT)
Message-ID: <9305271321.AA09409@hobbit.gandalf.ca>
References: <<m0nyRUR-0000EPC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> > that, just for arguments sake, gets 1.668:1 but can keep the 768K link
> > full.  Which is better?
> 
> The algorithm that gets 1.668:1 can get 2.592 Mbps on a 1.5 Mbps T1 link.
> Or 3.4161 Mbps on an E1 link.

You missed my point.  I only said the other algorithm could keep a 768K
link full, not a T1 or E1 link.  The RGF must be relative to the *desired*
link speed.  If the user wants a 384K algorithm, mine should come out on
top, because I will get more data through.
> 
> Diane posted a really cool expose on this yesterday.
> 
Read my response to her posting.

> > > 
> > > I agree. On long files, LZW will get close to 1:1. On packets, with a full
> > > dictionary, and few matches, it will get 1:2. We aren't compressing files.
> > 
> > Then why are we using your test setup????
> 
> My test file consists of packets, arranged in a file.  As Diane pointed out,
> compression ratio does not tell the whole story - you have to put it up
> on a router in order to get a good feel for the algorithm.

You arrange them as one long file, and compress them as such.  You do not
read a frame from the file, compress it, mark the end of frame, and ship it.
Otherwise, you couldn't run ZIP, et al.

Mine's on our bridge.  It does not suffer from Diane's test scenario unless
run above the specified 256K bps.  Then again, I only designed it for 128K.
Most users don't run T1!
> 
> Another point of view is that we are using my test file because no one
> has offered anything better.

I haven't had any negative responses to my proposal of capturing SNIFFER
traces.   Real live traffic.  What could be better.
> 
> I'm waiting!  *YOU* go through running all the tests, on all the different
> algorithms, and write a nice, concise report.

I'm not disputing the validity of your report.  It shows a valid comparison
of the relative compression between the algorithms.  I'd also like to thank 
you and Novell for the time spend doing the comparision.  It is very useful
information.

For the record, I went through many of the tests you did, but 2 years ago.
The results are in my lab book.  The testing did not contain all the algorithms 
that yours did, but most of them that existed at the time were tested.  I also
garnered a lot of the same information from the comp.compression FAQ.  It too
shows the relative compression.

I do have a test harness and many trace files for my compression.  As I stated 
before, these are only good if you have an archaic LANALYZER with an V1.0 software.  
Even ours is bust.  If someone is willing to capture LANALYZER traffic, I'll
make the test harness available (free even!).

If someone has a SNIFFER file parser, I've got IPX and IP traces already from
the University of Alberta.  They have given me permission to use them for this
testing.


From owner-ppp-comp Thu May 27 09:44:17 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyl3i-0000F2a@daver.bungi.com>; Thu, 27 May 93 09:44 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Thu, 27 May 1993 09:51:32 -0400 (EDT)
Message-ID: <9305271351.AA11608@hobbit.gandalf.ca>
References: <<9305262005.AA24842@saffron.acc.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> besides which, it seems like the information VJ pulls out and collapses
> into a small session identifier would become a small number of
> frequently seen octet strings, which would be replaced by a code by the
> data compression algorithm.

The number of string matches spit out for a TCP header would be:

1 - Protocol Version, Type
2 - Total Length  (one if your lucky)
3 - Upper byte of Packet Id
4 - Lower byte of Packet Id
5 - Possibly Fragment, TTL, Protocol
6 - Upper byte of Checksum 
7 - Lower byte of Checksum
8 - IP SA DA, TCP Src Port, Dst Port, maybe 3 bytes of SEQ
9 - Lower byte of SEQ
10 - Upper 3 bytes of ACK, maybe all 4, maybe flags
11 - Maybe all of window
12 - Upper byte of Checksum
13 - Lower byte of checksum
14 - Urgent Pointer

With LZW, your looking at 172 bits = 21 bytes.

Probably not noticable if your compression is STAC.  But then
again, after one 1500 byte frame on the STAC, only the packet
from the last header will be in the history. 

Header compression will save 25+ bytes/packet over feeding the
same frame through your STAC chip.  Even on a 1500 byte packet,
t works out to a throughput increase of:

3% at 2:1
5% at 3:1
7% at 4:1 

On 512 byte packets, tripling those numbers becomes even more 
significant.

But you also get better compression on the data, because it
doesn't have all this control info taking up dictionary space.

Look at the VJ comparison.  On ASCII files, doing header 
compression get 15% improvement.  On binary, it's 11%.

Considering the amount of work one does to get it, the little
space required to save state, and the speed it saves,  I'd say
it's a win.

From owner-ppp-comp Thu May 27 09:44:26 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyl3x-00008xa@daver.bungi.com>; Thu, 27 May 93 09:44 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Can you trust the compression ratio?
Date: Thu, 27 May 1993 10:18:59 -0400 (EDT)
Message-ID: <9305271419.AA13548@hobbit.gandalf.ca>
References: <<m0nyS7z-00003IC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> In order to support a common algorithm, it is my opinion that we
> need a software-only solution. Private algorithms can do what
> they like, including external hardware.

Fine.  Does the software need to support E1 speeds.  I think not.
A lot of people like STAC.  They do have an algorithm that runs
at E1, on a 33 MHz i960CA and only out of fast SRAM.  Many people
would rather put in a $30 STAC chip, then a i960CA.  But this
doesn't make STAC unsuitable as the default.

Just because predictor is fast, doesn't make it the only solution.

Take for example Combinet.  They have an 80186 doing the bridging, 
and use STAC (hardware I think) to do compression.  But they're
price is right. 

In your case, you probably want to do only software upgrades for
your server.  Fine.  This limits what you can run.  
> 
> Only if it is fast enough to fill the link. When the LZW algorithm
> was installed, it could not fill the link.  It probably was due
> to my poor coding, and a couple of weeks work may have fixed it.

Exactly.  Define the goodness at each common link speed, and now
add a bonus factor for software only solution.
> 
> No problem, Dave. Run the tests on your stuff, and report the
> results! You have done that with compression ratio, how about with
> time?

Fine.  On an i960CA @25 MHz, running out of 2-1-1-1 wait-state DRAM,
and using 75% of the avaiable CPU, and getting 4:1 compression,
I can keep a 300 Kbps link full in both directions.  And yes, I can 
get 4:1 on real files.  Of course, you won't have to worry about
ever getting 4:1, but I do!

At lower compression ratios, a higher speed link can be kept full.

Of course, I could bump up that number by fine tuning the algorithm,
by having decent memory, by having a data cache, by probably a
factor of 2.  I will not, because the customers don't want it.

I'll let you work out a MIPs, cache, and memory factor and translate.
> 
> > I suggest the following setup.  Connect 2 or 3 clients across the
> > same link to the server.  Now repeat the tests.  You'll find that
> > the link is now kept full with either algorithm, and the LZW will
> > probably beat Predictor.  Why?  Because the link is now the
> > bottleneck.  PBurst can now keep the link full.  This assumes the
> > compression is not CPU bound.
> 
> Nope. We tried this. First thing I thought of.  

Funny.  It works this way on our bridge.  But then again, we free up a
lot more link bandwidth by compressing further.

From owner-ppp-comp Thu May 27 10:09:24 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nylRu-00005ba@daver.bungi.com>; Thu, 27 May 93 10:09 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Can you trust the compression ratio?
Date: Thu, 27 May 1993 10:08:43 PDT
Message-ID: <m0nylRf-0000AmC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Can you trust the compression ratio?" on May 27, 10:18, Dave Carr writes:]
> Fine.  Does the software need to support E1 speeds.  I think not.
> A lot of people like STAC.  They do have an algorithm that runs
> at E1, on a 33 MHz i960CA and only out of fast SRAM.  Many people
> would rather put in a $30 STAC chip, then a i960CA.  But this
> doesn't make STAC unsuitable as the default.
> 

According to Fred, the majority of the links are 64Kbps or less. I'll
be happy with an algorithm that can fill a 64Kbps link, on a 'typical'
processor. I've suggested that should be a 386, for obvious reasons.
Vernon thinks that the minimum CPU should be a 486@25 Mhz (since there
are more workstations than routers, and workstations typically have
faster CPU's. All of the algorithms in my tests were done on a 486@33Mhz,
with I/O time removed.

> Just because predictor is fast, doesn't make it the only solution.

There are other algorithms I am looking at now that use more CPU time
than predictor, that may be a better choice.  As soon as I get the
all clear, I will report on it as well. 

> 
> In your case, you probably want to do only software upgrades for
> your server.  Fine.  This limits what you can run.  

As Fred also says,

    6)	It needs to be specified in a publicly available document.
	About the extent of the proprietary ownership of any algorithms
	that can reasonably remain is that the documentation might need
	to show that "we use the algorithm originally designed by the
	Prometheus company" or a boot-time banner might say the same
	thing. We can talk about royalties, but realistically there's
	going to be a lot of folks who don't bother to pay them; see (5).

While that can mean hardware, most times anything that can be done
in hardware can be done in software, albeit a bit slower.

> > 
> > No problem, Dave. Run the tests on your stuff, and report the
> > results! You have done that with compression ratio, how about with
> > time?
> 
> Fine.  On an i960CA @25 MHz, running out of 2-1-1-1 wait-state DRAM,
> and using 75% of the avaiable CPU, and getting 4:1 compression,
> I can keep a 300 Kbps link full in both directions.  And yes, I can 
> get 4:1 on real files.  Of course, you won't have to worry about
> ever getting 4:1, but I do!

No problem, Dave. Run the tests on your stuff, and report the 
results! You have done that with compression ratio, and CPU utilization,
how about with time?

(Dave has previously published results for his algorithm, as shown below:)

                  Size    ZIPped     ZIP    5220       ZIP      5220     5220/
                                   Ratio   Ratio  KBytes/s  KBytes/s       ZIP
CALGARY CORPUS 

bib             111261     35142     3.2     2.7      23.7      20.0      0.84
book1           768761    313459     2.5     2.3      18.4      17.0      0.92
book2           610856    206736     3.0     2.7      22.2      20.0      0.90
geo             102400     68575     1.5     1.5      11.2      11.0      0.98
news            377109    144570     2.6     2.4      19.6      18.0      0.92
obj1             21504     10407     2.1     2.0      15.5      15.0      0.97
obj2            246814     81678     3.0     2.8      22.7      21.0      0.93
paper1           53161     18657     2.8     2.7      21.4      20.0      0.94
paper2           82199     29838     2.8     2.5      20.7      19.0      0.92
paper3           42526     18180     2.3     2.5      17.5      19.0      1.08
paper4           13286      5621     2.4     2.5      17.7      19.0      1.07
paper5           11954      5080     2.4     2.5      17.6      19.0      1.08
paper6           38105     13314     2.9     2.8      21.5      21.0      0.98
pic             513216     56028     9.2     7.7      68.7      58.0      0.84
progc            39611     13356     3.0     2.7      22.2      20.0      0.90
progl            71646     16357     4.4     4.0      32.9      30.0      0.91
progp            49379     11310     4.4     4.0      32.7      30.0      0.92



-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Thu May 27 10:28:29 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nylkI-00004ga@daver.bungi.com>; Thu, 27 May 93 10:28 PDT
X-Path: dlr
From: dlr@daver.bungi.com (Dave Rand)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Thu, 27 May 1993 10:27:48 PDT
Message-ID: <m0nylk6-0000CWC@daver.bungi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

[In the message entitled "Re: Compression CRC - needed?" on May 27,  9:21, Dave Carr writes:]
> > 
> > Another point of view is that we are using my test file because no one
> > has offered anything better.
> 
> I haven't had any negative responses to my proposal of capturing SNIFFER
> traces.   Real live traffic.  What could be better.

Nothing could be better than real data.  The problem is, where do we get
it from? As I said, no one has offered anything better.  If you can
distribute those captured traces you have, please upload them to
sgi.com:other/ppp-comp/incoming.

When last we talked about this, the problem was: How do we get traces that
we can pass out to people, that contain representative samples of all types
of encapsulations we are likely to use?

Novell has an IPX-centric view of the world. Novell sells a *lot* of
routers (every Netware server is a router). Novell wants to see a lot
of low-latency, small packet traffic.

Other people have an AppleTalk point of view. AppleTalk wants to see
medium latency, small packet traffic.

Yet others have an IP outlook on life. IP can handle wide ranges of
latency, and doesn't much care about the size of the packet.

I've done what I consider is a good first whack at showing what routers
are dealing with. Lets go on to the next phase. Perhaps what is required
is not one unified "run this through your router" file, but three (or
more?) captured traces.

In order to make the tests meaningful, I think that we need to have
a "reasonable" number of clients, accessing a "reasonable" number of
hosts.  I also think that >1 megabyte of data should be available.

I know there is at least one person from BarrNet on this list - would
you object to supplying some traces of IP traffic from barrnet on a
high usage hour? That will give us a good case of IP traffic.

If you are at a University site, would you be willing to capture some
high-usage IPX or AppleTalk traffic for us to evaluate?

I suspect that marketing types will want us to use traces captured from
people sending 10 megabyte uncompressed files containing only the character
0x00 :-)  Let's figure out what traffic *really* is!



-- 
Dave Rand
{pyramid|mips|bct|vsi1}!daver!dlr	Internet: dlr@daver.bungi.com

From owner-ppp-comp Thu May 27 16:03:27 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyqxq-0000EPa@daver.bungi.com>; Thu, 27 May 93 16:02 PDT
X-Path: rhyolite.wpd.sgi.com!vjs
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Thu, 27 May 93 15:41:24 -0600
Message-ID: <9305272141.AA00762@rhyolite.wpd.sgi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk


> [In the message entitled "Re: Compression CRC - needed?" on May 27,  9:21, Dave Carr writes:]
> > > 
> > > Another point of view is that we are using my test file because no one
> > > has offered anything better.
> > 
> > I haven't had any negative responses to my proposal of capturing SNIFFER
> > traces.   Real live traffic.  What could be better.
> 
> Nothing could be better than real data.  The problem is, where do we get
> it from? As I said, no one has offered anything better.  If you can
> distribute those captured traces you have, please upload them to
> sgi.com:other/ppp-comp/incoming.


Please.  What is "real data"?

If only sgi.com had enough disk space, I could give you gigabytes/day
of packet traces from the SGI `netsnoop` package running on any of the
hundreds of networks in the SGI corporate network.  Binary files with
all of the bits.  Trivial to decode into whatever form you want.

But is that "real data"?  Yes, for some definitions but not others:

    1. we do a more NFS than many.  An FDDI ring I just now checked
	is running at about 9 Mbit/sec, and from a quick eyeball of 50
	packets about 50% rcp and 50% NFS.
    2. if you snoop in the wrong places, you won't see a lot of some
	traffic because of switchin hubs hidden inside the ethernets all
	over the place.
    3. there will be a dribble of Ethertalk in those samples, but no IPX.
	There will be more TFTP than many other networks (GBytes/day).
	There will be there will be less X than some places more
	more than others.
    4. one had better hope no `ttcp` or `ping -f` tests are being run,
	since 100 Mbit/sec of the pattern used by either ping or ttcp
	should compress down to no more than 200 Kbit/sec.
	(detailed justification of that 500:1 ratio if you want)
    4. I'd rather see my serial line traffic sampled (I'm connected
	to the network via a lash-up of 56K and 14.4K), but my
	mixture of MByte/day of compressable SMTP and NFS and
	incompressible LZW data might not be typical.

My point is that the phrase "real data" is not well definable.

(my offer stands for a MByte of traces for anyone that wants them.)


> ...
> Yet others have an IP outlook on life. IP can handle wide ranges of
> latency, and doesn't much care about the size of the packet.

Well, there is telnet and rlogin, and even NFS that cares about
small, low latency packets.


vjs



From owner-ppp-comp Thu May 27 16:03:40 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyqyb-00008Qa@daver.bungi.com>; Thu, 27 May 93 16:03 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Can you trust the compression ratio?
Date: Thu, 27 May 1993 16:53:42 -0400 (EDT)
Message-ID: <9305272053.AA24923@hobbit.gandalf.ca>
References: <<m0nylRf-0000AmC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> [In the message entitled "Re: Can you trust the compression ratio?" on May 27, 10:18, Dave Carr writes:]

And now for something completely different, how do I get my mailer
to put the above message in my response?  I use ELM on a SUN.

In response to Dave Rand:
> be happy with an algorithm that can fill a 64Kbps link, on a 'typical'
> processor. I've suggested that should be a 386, for obvious reasons.
> Vernon thinks that the minimum CPU should be a 486@25 Mhz (since there
> are more workstations than routers, and workstations typically have
> faster CPU's. 

386 or 486 is fine.  Is it Intel Inside?  Or AMD?  Memory speed, size,
cache, etc need to be specified.  If it's C code, we should be using
the same compiler.

> There are other algorithms I am looking at now that use more CPU time
> than predictor, that may be a better choice.  As soon as I get the
> all clear, I will report on it as well. 

Drop us hints.  I hate suspense.

> No problem, Dave. Run the tests on your stuff, and report the 
> results! You have done that with compression ratio, and CPU utilization,
> how about with time?

The time is included in the table below.  Time to transfer a file using
SUN-SUN FTP.  Time per byte of input is relative to the compression ratio
(as with any LZ compressor).

To get an efficient PC measurement, as I'm sure Predictor is,
would take a lot of time.  It took a day alone to hack FZA to work with files.
I felt this was reasonable to find out how good the compression was.  But to
make it efficient on a PC will take more days than I can afford.

I will agree to measuring a frame at a time compressor, using the
same test harness and compiler as you.  I think that's fair.  I have provided
1.4 MB of traces for this purpose.

Let's agree on the benchmarking, then I'll do it.
> 
> (Dave has previously published results for his algorithm, as shown below:)

Not to be taken out of context, this table represents a comparison of the
FZA algorithm, in it's currently released form, of FTP transfers between
SUN workstations over our 5220i bridge.  The ZIP column is the transfer rate
as reported by FTP multiplied by the ZIP file compression ratio.  This can
be considered the ultimate, transferring the file compressed at the source.
The 5220 ratio is the transfer rate reported by FTP when the uncompressed
file is sent over the 5220i.  The final column is the ratio, ie. how close to
Nirvana.  Link speed is 64 Kbps.  ZIP == Info-Zip 1.9 == PKZIP 2.04 == GZIP.
> 
>                   Size    ZIPped     ZIP    5220       ZIP      5220     5220/
>                                    Ratio   Ratio  KBytes/s  KBytes/s       ZIP
> CALGARY CORPUS 
> 
> bib             111261     35142     3.2     2.7      23.7      20.0      0.84
> book1           768761    313459     2.5     2.3      18.4      17.0      0.92
> book2           610856    206736     3.0     2.7      22.2      20.0      0.90
> geo             102400     68575     1.5     1.5      11.2      11.0      0.98
> news            377109    144570     2.6     2.4      19.6      18.0      0.92
> obj1             21504     10407     2.1     2.0      15.5      15.0      0.97
> obj2            246814     81678     3.0     2.8      22.7      21.0      0.93
> paper1           53161     18657     2.8     2.7      21.4      20.0      0.94
> paper2           82199     29838     2.8     2.5      20.7      19.0      0.92
> paper3           42526     18180     2.3     2.5      17.5      19.0      1.08
> paper4           13286      5621     2.4     2.5      17.7      19.0      1.07
> paper5           11954      5080     2.4     2.5      17.6      19.0      1.08
> paper6           38105     13314     2.9     2.8      21.5      21.0      0.98
> pic             513216     56028     9.2     7.7      68.7      58.0      0.84
> progc            39611     13356     3.0     2.7      22.2      20.0      0.90
> progl            71646     16357     4.4     4.0      32.9      30.0      0.91
> progp            49379     11310     4.4     4.0      32.7      30.0      0.92


From owner-ppp-comp Thu May 27 16:03:46 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyqxn-0000FMa@daver.bungi.com>; Thu, 27 May 93 16:02 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Thu, 27 May 1993 16:19:08 -0400 (EDT)
Message-ID: <9305272019.AA18194@hobbit.gandalf.ca>
References: <<m0nyT2w-00008JC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk


> We are trying to specify at least one algorithm that can run across a
> range of speeds. 

You may be trying to specify that, but I wasn't. 

> Perhaps it would be better to suggest more than one?

Sure.  FZA.  Not in it's present form, but in it's intended form.
I have designed it to be scalable.  We have been waiting for the
T1 requests to pour in, so of course the high speed variant has been 
on the back burner.

I should be able to scale it to T1, but with a drop in the 
compression ratio.  How low, I don't know.

As with any LZ based method, the decoder will run at that speed no
problem.  The encoder's searching would have to be constrained,
and the hash table maybe resized.  Assembly language may be
needed.


From owner-ppp-comp Thu May 27 16:03:47 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyqyb-00007ya@daver.bungi.com>; Thu, 27 May 93 16:03 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Compression CRC - needed?
Date: Thu, 27 May 1993 17:26:00 -0400 (EDT)
Message-ID: <9305272126.AA01274@hobbit.gandalf.ca>
References: <<m0nylk6-0000CWC@daver.bungi.com>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> Nothing could be better than real data.  The problem is, where do we get
> it from? As I said, no one has offered anything better.  If you can
> distribute those captured traces you have, please upload them to
> sgi.com:other/ppp-comp/incoming.

...Working!  Look for the files there people!

Files are:

novell.enc.z
tcpip.enc.z

Both Sniffer traces, I hope their good.  I haven't even used them.
Fresh out of the package.  Compressed with GZIP.
Who's got a Sniffer parser, public domain please.  We don't want
to pay for anything :-)

> When last we talked about this, the problem was: How do we get traces that
> we can pass out to people, that contain representative samples of all types
> of encapsulations we are likely to use?

The Internet contains no secrets.  This is where I would get IP traces.
Perhaps someone can enlighten me.  Are there any public networks that
talk Novell, Appletalk, Decnet?  How about friendly customers?  
Universities are good.  That's where these traces came from.  That's
where I got my old LANALYZER traces.  Perhaps we could convert them
from LANALYZER to SNIFFER if we get desperate.

The same file need not contains all different protocols.  In reality,
I should have asked UofA for a mixed trace.  At the time I wanted to
write my Novell header compression.  So I asked for Novell traffic.
I will ask them again for 20 Megabytes of traffic, mixed.  
> 
> I've done what I consider is a good first whack at showing what routers
> are dealing with. Lets go on to the next phase. Perhaps what is required
> is not one unified "run this through your router" file, but three (or
> more?) captured traces.

I think you've done a good job too, Dave.  I wasn't trying to put you or
Predictor down.  Considering it beats STAC, and is faster by the looks of
you're benchmarking, it's a job well done.  

> In order to make the tests meaningful, I think that we need to have
> a "reasonable" number of clients, accessing a "reasonable" number of
> hosts.  I also think that >1 megabyte of data should be available.
> 
At least a *few* megabytes.  My current LANALYZER files are probably 20+ MB.
One bonus is, compression algorithms can be data sensitive.  One can pass
the whole Calgary Corpus just fine, and blow up on something trivial.  The
more the better.

> I know there is at least one person from BarrNet on this list - would
> you object to supplying some traces of IP traffic from barrnet on a
> high usage hour? That will give us a good case of IP traffic.

The Internet!
> 
> If you are at a University site, would you be willing to capture some
> high-usage IPX or AppleTalk traffic for us to evaluate?

Ask customers too!
> 
> I suspect that marketing types will want us to use traces captured from
> people sending 10 megabyte uncompressed files containing only the character
> 0x00 :-)  Let's figure out what traffic *really* is!

Let's see.  Literal, literal, go back 2 and copy 9,999,998.  I make that
out to be 40 bits.  Even works out to a byte boundary with no padding.
That's 2,000,000:1!  I'll go tell marketting now so they can get a jump on
you're guys!

Let's not figure it out.  That would raise the issue of special case code.
Remember this benchmark?  No names please.  Variables and functions renamed
or deleted to protect the guilty!

   for (;;);
     xx();

Not that we can stop anyone from analyzing the test case, but it should be
large and varied enough to minimize the prospective gains from special case
code.


From owner-ppp-comp Thu May 27 16:04:05 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyqxn-000060a@daver.bungi.com>; Thu, 27 May 93 16:02 PDT
X-Path: opal.acc.com!art
From: art@opal.acc.com (Art Berggreen)
To: ppp-comp@bungi.com
Subject: Re: Can you trust the compression ratio?
Date: Thu, 27 May 93 11:46:41 PDT
Message-ID: <9305271846.AA11466@opal.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

>
>According to Fred, the majority of the links are 64Kbps or less. I'll
>be happy with an algorithm that can fill a 64Kbps link, on a 'typical'
>processor. I've suggested that should be a 386, for obvious reasons.
>Vernon thinks that the minimum CPU should be a 486@25 Mhz (since there
>are more workstations than routers, and workstations typically have
>faster CPU's. All of the algorithms in my tests were done on a 486@33Mhz,
>with I/O time removed.

Let's not get too focussed on servicing a single link.  Many routers
can be expected to support multiple serial links.  I'd also like not
to assume that I can tie up all of my CPU cycles performing compression/
decompression.  Compression is just one of several tasks I need CPU
cycles for.  But I do think that the default compression algorithm
should allow something like a 16mHz 386 to fill a 64kb link in both
directions.  Support for more links or higher link speeds may require
more CPU and/or extra hardware  (let's keep in mind feasible hardware
implementations).  And finally, we should specify something that can be
added to installed devices (not require massive amounts of memory)
with a software upgrade.  Your marketing departments will be happy to
sell a company special wiz-bang compression option and all the upgrade
memory it needs.

Art


From owner-ppp-comp Thu May 27 16:04:18 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyqzE-00003wa@daver.bungi.com>; Thu, 27 May 93 16:03 PDT
X-Path: gandalf.ca!dcarr
From: dcarr@gandalf.ca (Dave Carr)
To: ppp-comp@bungi.com
Subject: Re: Can you trust the compression ratio?
Date: Thu, 27 May 1993 16:08:18 -0400 (EDT)
Message-ID: <9305272008.AA16518@hobbit.gandalf.ca>
References: <<9305270008.AA22251@va.SJF.Novell.COM>>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

> "Why bother running data compression" is what I meant.  I assume
> most users don't have the option of changing processors
> without buying new equipment and spending $$$.
> 
> Dave. What I thought this mailing list was about was choosing 
> an algorithm that everyone can implement (probably on 
> their existing platforms) so that implementations from 
> different vendors can interoperate.  Not every vendor has
> the option to change processors, or do a hardware add-on.
> I didn't have the impression that we were limiting any vendor
> from using their wonder compressor and making heaps-o-money.

Diane.  What I thought this list was PPP compression.  For a lot
of vendors, they may not have even a sync port.  So, give them
an option of running async.  Sure, most people run sync, but
async isn't going to be designed in any new products as the only
method.  The concept of PPP is let everyone play.  And play what
they can afford.

In this light, people could run you're algorithm if they have
the memory, could run my algorithm if they have the CPU, but may
not be able to do either.  So what!  To say the standard must be
able to run an any existing platform is ludicrous.  It would then
have to use *zero* CPU and *no* memory, because even some of
Gandalf's products are at their limits.  To EXPECT PPP compression
to be driven by these products is just a silly thought.
> 
> Sure, that's easy to see, but does anyone on this mailing 
> list know what these algorithms really DO?  I tried to 

Sure, we know because we have SHIPPED over 3000 compression
bridges in the 7 months.  And yes, it works.  No it doesn't
keep a T1 link full. 

But with all the stretched claims of STAC performance (sorry Fred,
I saw you're data sheets), why quote numbers.  I refused to play
the marketting game.  What I gave them was a comparison to Info-Zip.
You have to have some standard in any measurement system.

> share what we learned during implementation.  If the algorithm 
> claims to compress at 3:1 instead of 2:1, but in reality takes 
> more time to run through the compression code, send the packet, 
> and decompress the packet, then where is the benefit?  

It doesn't.  It *may* take 5 usec to compress a byte with FZA.  It
may take 2 usec.  As long a the link speed is 4 times the compression
time, it will not make a difference until 4:1 compression is exceed.  
Sure, at these times/byte I agree the algorithm can't do T1.  But it 
damn will do it with a hardware assist.  

If a customer wants to upgrade a T1 link for compression, he can
afford $500 for hardware.  I'm not about to design that kind of
headroom for the software and not use it.

This group does NOT need a standard software algorithm that runs at
T1.  Sorry!  

> This mailing-list shouldn't fall into the same trap as the marketing types.
> If they are comparing theoritical compression ratios without regard to
> realistic results, then caveat emptor.  But (I presume), the people on
> this mailing list are engineers who believe in what they see, not
> marketing hype.

So, I am too.  I didn't quote maximums until someone asked, and it
was also followed by a :-).  But I did go through Dave Rand's test,
and faired pretty good.  

> >Even with PBURST, Novel cannot keep a 64 Kbps link full.  Is this
> **It was a 56Kb/s link.  Sorry for omitting it earlier.**
> Actually, we get up to 135 Kb transferred per second on a 
> 56Kb/s link, using compression.
> >the fault of the compression?  I think not.

> The compression algorithm does not exist separately, so 
> why measure it that way?  PPP is not very useful without 
> an upper layer protocol to send the data.  Those upper 
> layer protocols all have some built-in delays or implementation 
> problems.  Some were not implemented to work well on a 
> low speed link.

Correct.  So I don't want the compression to be blamed.  Not all
TCP's are equal either.  SUN's works great over a wide area network.
IMHO, FTP SOftware's #####, let's just say it needs a bit of work.
But, I had to field a lot of complaints that the bridge was at fault,
only to prove to the customer it was the protocol.  And believe me,
no matter how fast the algorithm is, it wouldn't solve them.  In some
cases, I could compress the frames in 1-2 character times on the link.
Yes, people do bridge over 9600 bps up here in Canada.  Funny thing,
even time sensitive LAT will work!  PC/TCP won't work!

> 	       
> >I suggest that we remove this Novell bias from the measurement.

> Great!  Provide us some real data without it!

I'll do you one better.  FTP (sorry, no Novell file transfers 
supported :-)) to hobbit.gandalf.ca.  In the /incoming directory
(the directory does not have read permissions, but you can get the
files) there are 2 files:

novell.enc.Z     <= You're bonus, just for calling.  No obligation.
tcpip.enc.Z

Both are SNIFFER traces generously donated from the UofA.  Feel free to
us them for benchmarking.  I would welcome more contributions to the
"PPP Compression Corpus".

> Good.  Glad to see that we agree on something.

Really?  Maybe it was a typo :-)  

> >> The goal behind data compression is to move the data across
> >> the line faster.
> 		      
> >No!  There are 2 numbers, throughput and delay.  You seem to
> >favour delay.
> I'm not understanding what you mean here.  Please define.

Simply, throughput = link speed * compression ratio.  
        delay = latency
Now, the throughput is (or should be) linear up to a threshold,
and then will actually fall lower as you try to push more bits
through than possible.

Until the threshold is reached, you should be doing the comparison
in the linear region.  You're taking my results and scaling them
into the non-linear region.  Unfair!

Latency has 2 factors.  When no data is queued on the link, the
latency becomes the compression time per byte.  No wait, it's
compression time per byte divided by the amount of CPU it's
going to get.  So it your server is busy, then you may only
get 10% of the CPU to do the compression.  I have designed
around getting 90% of the CPU (and we have it).

The other latency number has to do with data already queued for
the link.  Predictor will have more data queued than FZA at
link speeds up to 256K (my linear region).  

Funny number that latency.  Very deceptive.  Never equals compression
time per byte in real life.

> >WARNING** Novell bashing **
> >Too bad PBURST didn't do it right!
> Part of the reason for my posting was to point out that
> "compression ratio" is not the end-all and be-all figure
> hat we should be concerned about.  Sure, IPX has delays,

Neither is time-per-byte.  Either must be taken with a grain-of-
salt, and applied with great care.

> IPX doesn't deal too well with low speed links, PBURST is
> not God's gift.  I'm certain that other protocols and products 

No argument here.  Hey, we agreed twice now.  It's catchy.

> have like problems.  None of our products operate in a perfect 
> world, and there are things beyond our control as engineers 
> that will effect the speed of a compression algorithm.

And even beyond that.  Compression the way I see it is NOT an
afterthought.  Even after implementing my third real life 
compressor (the first one happened to be a speedy Markov compressor
similar to Predictor from what I know), I am still learning about
subtle interactions in the field.  Also, I don't know how many
times I've wished the hardware was like this instead of that, or
had more memory, or ...

To have an algorithm that everyone can run on their existing hardware
is wishful at best.  They can't or may.  Okay they might get lucky.
If they can, they could probably run mine too, but not as fast. 


From owner-ppp-comp Thu May 27 23:33:57 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nyy0i-0000Doa@daver.bungi.com>; Thu, 27 May 93 23:33 PDT
X-Path: rhyolite.wpd.sgi.com!vjs
From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver)
To: ppp-comp@bungi.com
Subject: Re: Can you trust the compression ratio?
Date: Thu, 27 May 93 20:57:20 -0600
Message-ID: <9305280257.AA04020@rhyolite.wpd.sgi.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk


>                              Sure, most people run sync,


What?  Most people run async, but my definition of people includes more
than just you router guys.  It even includes people using DB-9
connectors.

You need to define the problem.  You could say that you are working of
PPP compression protocols for dedicated multi-protocol routers and
bridges.  That would have a lot of good effects:

    -it would remove a lot of the patent confusion, since the
	patent issues are very different for dedicated boxes
	and stations.

    -it might make it possible for you to use a lisensed algorithm
	as the default.  Perhaps the IAB would go along with you
	if all of you agreed among yourselves that a lisensed and
	patent protected algorithm is good.  There is no
	hope of getting the stations guys to agree.

    -you are clearly going to pick something based on LAPB, despite the
	fact that the vast majority of PPP implementations and
	installations will not have it for at least years.

    -if you would restrict your problem to dedicated boxes, you
	could eliminate both the ancient zilch-MIPS IBM-PC's and the
	and the GIPS/GFLOP/GByte "workstations", and so have a hope of
	picking an algorithm that is neither ridiculously light nor
	heavy for the boxes you care about.

    -you would get station people with silly ideas like me out of your
	hair.

    -it would be easier for station people like me to get a compression
	protocol that would be fit the traffic we see.

"Stations" see very different traffic than multi-protocol-router-bridges:
    1. single protocol family,
	 for now all IP, maybe with Appletalk or IPX coming on strong,
	 but almost always a single protocol family for any given link 
    2. often a single protocol within the family...most do TCP/IP.
    3. no bridging
    4. mostly async lines at 14.4 or less, with some 56K and promises 
	of 1 or more channels of B-ISDN.  (even the ISDN is often
	async via TA's)


I've poked around just a little at major other UNIX workstation
vendors, and found some agreement that what is going on here will not
be officially supported on many if any UNIX workstations in the
foreseeable future.  Are there any representatives of the other top 5
or 10 UNIX system vendors here, and do they disagree with that perhaps
extreme statement?  What about Microsoft or anyone else who talks about
NT (e.g. DEC)?  Yes, fancy workstations often have LAPB, x.25, TP4,
3270, and the rest, but how many are interested in using it with PPP?

PPP is generally supported on the stations by people in their spare
time, at best as tertiary products, while you guys devote some of your
primary development money to PPP.  For example, that Dave Rand's
address is not at novel.com may mean nothing, but it is suggestive.

As far as I know, the only significant commericial workstation interest
in PPP is for ISDN, and that could evaporate quickly if the IPLPDN guys
get their act together.

This leaves two constituencies.  Yourselves with major professional PPP
concerns for bridge-routers, and the rest of us.  The division seems
obvious and desirable.


vjs



From owner-ppp-comp Fri May 28 12:43:55 1993
Return-Path: <owner-ppp-comp>
Received: by daver.bungi.com (/\==/\ Smail3.1.24.1 #24.2)
	id <m0nzALC-00003Ca@daver.bungi.com>; Fri, 28 May 93 12:43 PDT
X-Path: opal.acc.com!art
From: art@opal.acc.com (Art Berggreen)
To: ppp-comp@bungi.com
Subject: Re: Can you trust the compression ratio?
Date: Fri, 28 May 93 10:43:42 PDT
Message-ID: <9305281743.AA13483@opal.acc.com>
Reply-To: ppp-comp@bungi.com
Precedence: bulk

>
>You need to define the problem.  You could say that you are working of
>PPP compression protocols for dedicated multi-protocol routers and
>bridges.  That would have a lot of good effects:
>	.
>	.
>	.
>"Stations" see very different traffic than multi-protocol-router-bridges:

I think Vernon has stated a very important point.  I definitely think
we are trying to cover conflicting requirements.  We may have to agree
on a compromise solution as the default, and try to negotiate a more
preferred mode for the particular environment.  But a single, common,
simple, default algorithm that has a chance of widespread implementation
is needed.

Art


