PPP and SLIP Performance and Connection Problems

[Top] [Concepts] [Symptoms] [Dialing problems] [Ping fails] [Server Only] [Server's net only] [Modem Issues] [Modem Cables] [TCP Window] [Bad Serial Ports] [HylaFAX caused problems] [enough already!] [Telebit PEP modems] [See Also]

Or, What to look for when things go wrong

This is a difficult page to write, as people are notorious at coming up with new problems, and misunderstanding things in novel ways... This page is basically a collection of problems that I have seen people make. Unless I specifically state otherwise, all issues apply to both SLIP and PPP, even though I will be referring to PPP throughout.

The first thing is to determine where the problem lies. You have to go beyond "it doesn't work". PPP and SLIP are inherently complex systems, as you are putting a lot of networking issues in a single conceptual box. There are also a lot of places where a small mistake can creep in, and a single small mistake can prevent things from working.


Conceptual Problems


Problems and Symptoms

The following are a list of common problems and symptoms, with a description and a link (if necessary) to more detailed information. I am very interested in feedback on ways to improve this page!
The modem won't dial
A large number of things can cause this problem, including configuration errors, modem problems, and even a bad cable. There were also a series of Indys shipped with IRIX-5.2 and bad serial ports. If you have this problem, you have to get the motherboard replaced. If these look OK, you'll have to start detailed debugging to trace the problem.

I can't get to host
This is usually an indication of a routing problem. If this is the first destination you've tried, then you need to check to make sure that PPP is even connected. If you can't even /usr/etc/ping server, then go back to the previous section on The modem won't dial.

Otherwise, you need to start following up on tracing routing problems.

I can't get to machines by name, just by IP address
This is almost always a resolver configuration issue. Detailed info is available on the resolver configuration page

PPP doesn't die when the modem hangs up (usually, but not exclusively, only a problem on servers)
This is a strong indication that the DCD signal is not making it through. The 3 common sources of the problem are:
  1. using the ttyd port instead of the ttyf port (which does modem control and flow control).
  2. Your cable doesn't have the DCD pin connected through. See the cable details section. This is a comon problem, since Mac modem cables often don't have DCD pin connected, and most modems come from the factory with DCD always on (dropping DCD indicates the modem hung up).
  3. Your modem is not configured to have DCD follow carrier. See the modem configuration page for details.

The modem won't hangup when I stop PPP
This normally means that DTR isn't getting through to the modem. The 3 common sources of the problem are:
  1. using the ttyd port instead of the ttyf port (which does modem control and flow control).
  2. Your cable doesn't have the DTR pin connected through. See the cable details section.
  3. Your modem is not configured to hangup on DTR drop. See the modem configuration page for details.

PPP doesn't timeout and hangup the phone line
In addition to the above problems, PPP has to be configured to timeout. /etc/ppp.conf should be configured with the quiet keyword. You may want to change the value of active_timeout= or inactive_timeout=, though the defualts should be fine for most people. See `man ppp` for more details. In fact, don't change them unless you know what you are doing!

Even if everything else is configured correctly, the link may not drop because some (possibly hidden) process is actively using the link. One sign of this is that the SD and RD lights on the modem are going "too much". Note that there are a number of protocols that are passed, but not counted (NTP, TIMED, etc), and you can add to that list. See the ppp man page for details.

I don't get full speed on file transfers
File transfers pause for about 30 seconds every minute or so
This is frequently due to the default IRIX TCP window size being so large. The fix is fairly simple, although the explanation is complicated. You see this by watching the lights on the modem. The time intervals may vary depending on the modem speed.

PPP logs the message "I_PUSH on fd 3 failed"
This means that the required PPP kernel modules are not in your kernel. Either your installation failed, your kernel failed to link for some reason, or you didn't reboot after the installation.

PPP logs "can't lock port" on my server
This is a problem with the way you start ppp. This is most commonly seen when you are also running FAX software. The fix may be tricky, and may not let you do precisely what you want.

The other common cause of this message is a typo in the first Pipeline article about configuring PPP. It incorrectly stated that PPP needs to get started from a script, and then provided an example which wouldn't work! The fix is to use /usr/etc/ppp as the login shell, instead of the script in the example. (A later issue had a correction, but not everybody saw it).

I now get Invalid Login when my PPP trys to login. It used to work, and it works if I do it manually with cu
If you have a Telebit T2500 or WorldBlazer modem, you are probably connecting in PEP ("CONNECT FAST") mode, which you definitely don't want to do for PPP or SLIP! The way to fix this is to fix your modem configuration.

PPP trys to negotiate with the remote, and fails
or it logs: "received spurious Protocol-Reject"
There are some PPP implementations that don't correctly handle negotiating protocols that they don't understand. The spec says that they should just NAK ones they don't understand at all, so they are out of compliance. The solution is to disable the offending (advanced) protocols on your end. This solves a lot of these problems. In your /etc/ppp.conf entry, add:
               -mp -ccp
       
Remember that continuation lines start with spaces or tabs.

PPP (or SLIP) fails logging in with the message "enough already"
This is caused by dialing into a terminal server with a very long and verbose message. It is over-running the look-ahead buffer in the chat-script module common to PPP, SLIP, and UUCICO. For details, see below.


The modem won't dial

Typographical errors are common when editing the configuration files. All of the stuff you need to enter into configuration files is case sensitive. If you type something in lower case that is supposed to be in upper case, it won't work (and vice-versa). Similarly, white space (spaces and/or tabs, and sometimes newlines) are also critical. Because the lines of apparent gibberish in the /etc/uucp/Systems file are long, instructions will frequently wrap the single logical line into 2 or more physical lines. The break is always done at a space, and you need to leave that space when you type in the line (Note that using a backslash ("\") character as the last character on a line escapes the newline and causes the next line to be logically part of that one. This is harder to explain and to get correct, though).

If the modem still doesn't dial, you need to start detailed debugging. This will get you as far as the dialup router. Once you can ping your server, PPP is (technically) working. and you need to look elsewhere for your problem.

Note: The server is functioning as a dialup router. Information about routing is available on the IP address and Routing page.


I can't talk to the server

This is usually (again) because of an error in a configuration file, usually an error in either the phone number or the password. Detailed debugging is again the starting point to fix this.

Very rarely, problems arise due to some strange routing problem, usually on the server. This is extremely difficult to trace down and fix. It seems to "magically" fix itself when some other, apparently unrelated, change is made to the network or server configuration.


I can only talk to the server

Once you can /usr/etc/ping the server, PPP is working. All remaining problems are really routing or application issues, although PPP can help with them.

An exception is an inconsistency in the way that VJ-header compression is (or isn't) negotiated. Both sides must agree on this, or ping will work, but any TCP application (telnet, netscape, etc) will fail. If you suspect this problem, try adding -vj_comp to /etc/ppp.conf and restart ppp. Remember to take it out if it doesn't fix anything!

If you can ping the server, but can't to any other machines, you are having a routing problem. There are a few cases to consider:

Your machine is on the same network as the server -- you are doing Proxy ARP routing.
You need to have the server to proxy for your machine when ARP requests are made. Instructions are located at the proxy ARP routing page.

Your machine is on a different IP network than the server -- you are probably doing host routing.
In many cases this is a simpler configuration, but there are also some nasty pitfalls. Instructions are located at the seperate PPPnet routing issues page. One common pitfall is that some routers ignore the host routes, or need to be specially configured to recognize them.


Can Only Talk to Hosts on the Server's Net

Again, there are a few cases to consider:

You are on the same IP network and using Proxy ARP routing.
The most probable cause is that the router to the rest of the network has a incorrect or stale ARP entry. The proper solution is to flush the ARP cache on the router, reduce the ARP timeout to something reasonable (say, 10 minutes), and stop whatever it is that is creating the incorrect ARP entry. Additional information is available on the proxy ARP routing page.

You are on a seperate PPP net, probably using host-routes.
Problems routing this way are usually the result of a router that messes up host-routes. Everybody on your network may be messed up, and there are certainly black holes, where packets to/from your net just disappear without a trace. Aditional information is available on the seperate PPPnet routing issues page.


Check the Modem and Cable


Modem Cables

Modem cables seem to cause the most problems.

Standard Mac modem cables don't work. Many of the so-called high speed modem cables, while technically not correct (they seem to often be missing RTS), seem to work well enough. This is the first place to check if your connection is flakey.

Full information is found on the modem configuration page.


Shrinking the TCP window size

(The following description is over-simplified) The TCP window size is how much data can be sent on ahead by the sender without waiting for it to be acknowledged ("ACKed") by the receiver. IRIX uses a very large TCP window size (60kB) by default. Many dialup routers have buffers that are much smaller than this, so the part of the data in the window being sent that is larger than the buffer gets discarded. When future packets have the wrong data position in them, TCP waits for a bit (about 30 seconds) for the missing packets to arrive, before sending a packet to the sender asking for it to be resent. This will continue until the remaining data is less than the remote's buffer size.

The fix is rather easy, but you are editing your kernel configuration, so be very careful! Like any other system configuration, you have to be root (the "Super User"). First, change directory to the sysgen area (these instructions assume you are running IRIX-5.2 or later):

   cd /var/sysgen/master.d
then using an editor you are comforatble with (like jot, vi or emacs, but not a "word processor") edit the file bsd. Change the following lines (they are in 2 different places in the file) from:
   unsigned long tcp_sendspace = 60 * 1024; /* must be < 512K */
   unsigned long tcp_recvspace = 60 * 1024; /* must be < 512K */

   unsigned long udp_sendspace = 60 * 1024; /* must be < (64K - 28) */
to:
   unsigned long tcp_sendspace = 8 * 1024; /* must be < 512K */
   unsigned long tcp_recvspace = 8 * 1024; /* must be < 512K */

   unsigned long udp_sendspace = 8 * 1024; /* must be < (64K - 28) */
After you have saved your changes, you will need to build a new kernel and reboot (we'll save the old one, just in case. But don't reboot if you get an error! Some warnings are OK):
   cd /
   ln unix unix.BAK
   autoconfig -f
   reboot


"Bad" Serial Ports on Indy

A series of the Indy were built and shipped with "bad" serial ports (actually, it was a fix to a MIDI problem that broke HW flow control, a mis-match of chip and motherboard revisions). A software bug in IRIX-5.2 masked the hardware bug. The SW bug was fixed by patch151 and by IRIX-5.3 and later. The symptoms are that /dev/ttyd* and /dev/ttym* devices work, but /dev/ttyf* devices don't (where * can be 1 or 2). If using the ttym ("modem control") device instead of the ttyf ("flow control") device lets PPP dial the modem, then this is (possibly) your problem, and you will need to have your motherboard replaced by SGI hardware service. You can temporarily run with (eg:) ttym2 instead of ttyf2, but you may get occasional errors due to data overrun.

Final determination of whether this is indeed your problem, and the resolution of it, is entirely up to your SGI service organization (how's that for covering the different names it's known by around the world!).


Problems Interoperating With FAX Software

Many people have setup one of the FAX packages, the most common of which is HylaFAX (used to be called FlexFAX). They usually have a program running on the serial port (it may be called faxd) that answers the modem for FAX mode. Unfortunately, it doesn't emulate /usr/lib/uucp/uugetty properly, so getting PPP or SLIP connection to work in dialin mode can be tricky, especially if you are trying to do anything tricky.

The problem is in locking the serial port, so that only one process at a time can use it. IRIX PPP and SLIP are designed to interoperate with uugetty's port locking protocol. Part of this protocol is that the lockfile contains the PID of the process locking the port. PPP and SLIP can take over the lock if it is their own PID or their parent's PID (PPID). The normal method of launching a login on a serial port goes like this:

  1. uugetty gets the login ID, creates the lockfile, and execs login
  2. login gets the password(s), and execs the account's shell
This means that the PID of uugetty that it put in the lockfile is the same as the account's shell (say, /usr/etc/ppp), so that the lock criteria is satified.

However, the account's shell might be a script:

Things are different with faxd from any of the FAX packages. They go off track at the first thing -- they fork login, not exec it, so the lockfile already contains login's PPID. Therefore, you can't use a script as a shell that doesn't exec ppp, since ppp can't deal with a lockfile owned by it's PPPID.

Unfortunately, in some releases of IRIX, PPP will only respect a lockfile with it's own PID in it. You can't run it in a script from uugetty. You either have to use getty for dialin only, or workaround so that you can exec ppp in the script.

Some strategies that allow you to work within the constraints are detailed in the Adding SLIP and PPP clients page.


PPP (or SLIP) fails logging in with the message "enough already"

This is caused by dialing into a terminal server with a very long and verbose message. It is over-running the look-ahead buffer in the chat-script module common to PPP, SLIP, and UUCICO. This normally never occurs when using only PAP or CHAP logins. Unfortunately, the kind of places where this occurs don't usually allow you to use PAP or CHAP. There are 2 solutions:
  1. Ask your server to shorten the prompt message, or change to newer hardware allowing PAP or CHAP logins for PPP.

  2. The other option is to make intermediate waits for common words in the message, or repeatedly wait for the terminating condition. A specific example may look like:
    remhost Any ACU 38400 5556000 "" \p\r\c name:-\c-name:-\c-name:--name: abuser ssword: xxx PPP
    
    In other cases, you will have to be more creative, and choose words that occur every 2 to 4 lines in the message, to be sure to not overrun the look-ahead buffer. Something like:
    remhost Any ACU 38400 5556000 "" \p\r\c this \c page \c too \c long \c name: abuser ssword: xxx PPP
    
    It is vitally important that you end each "send" portion of the chatscript with the \c character, or you will not get what you expect!

Fixing a Telebit T2500 or WorldBlazer

If you are connecting in PEP mode (you see CONNECT FAST when you connect), then you need to fix your modem's configuration. This is only a problem with the Telebit modems that do PEP. There are two ways to do this.
One is to connect to the modem using cu or kermit, and manually changing things.
To do this, connect to the modem, but don't dial a number! Then give the following command to the modem:
ats50=6&w
followed by the Enter key. Now your modem will connect in V.32 mode, which is what you want for PPP or SLIP. For the WorldBlazer (which does V.32bis), use ats50=7&w.

The other option is to re-run the configure script.
You will need to be root ("superuser") to do this, just like when you first setup your system. Give one of the following commands (slightly modified from your configuration instructions):
/usr/lib/uucp/fix-telebit -o -m t2500 -c s111=0s50=6 2
/usr/lib/uucp/fix-telebit -o -m WB -c s111=0s50=6 -s 38400 2
Then when you next run PPP, things should work fine (if this is the only problem, of course).


Hopefully there has been enough info here for you to figure out your connection problem. This info is based on looking at the problem from the client (dialing) end, becuase that is where most problems are discovered.

Other Useful Information

A random selection of potentially useful WWW pages:
[Top] [Concepts] [Symptoms] [Dialing problems] [Ping fails] [Server Only] [Server's net only] [Modem Issues] [Modem Cables] [TCP Window] [Bad Serial Ports] [HylaFAX caused problems] [enough already!] [Telebit PEP modems] [See Also]

I hope and intend that this documentation can help you with your PPP connection problems. My other commitments (like work) permitting, I will attempt to help you on issues not covered, or that you are unclear on. Please make sure that you provide me a valid return email address! (I won't try to fix it).

Scott Henry <scotth@sgi.com>

Last modified: Mon Mar 10 17:31:54 1997