[Top]
[Concepts]
[Symptoms]
[Dialing problems]
[Ping fails]
[Server Only]
[Server's net only]
[Modem Issues]
[Modem Cables]
[TCP Window]
[Bad Serial Ports]
[HylaFAX caused problems]
[enough already!]
[Telebit PEP modems]
[See Also]
Or, What to look for when things go wrong
This is a difficult page to write, as people are notorious at coming up
with new problems, and misunderstanding things in novel ways... This page
is basically a collection of problems that I have seen people make. Unless
I specifically state otherwise, all issues apply to both SLIP and PPP, even
though I will be referring to PPP throughout.
The first thing is to determine where the problem lies. You have to go
beyond "it doesn't work". PPP and SLIP are inherently complex systems, as
you are putting a lot of networking issues in a single conceptual box.
There are also a lot of places where a small mistake can creep in, and a
single small mistake can prevent things from working.
The following are a list of common problems and symptoms, with a
description and a link (if necessary) to more detailed information. I am
very interested in feedback on ways to improve this page!
- The modem won't dial
- A large number of things can cause this problem, including
configuration errors,
modem problems, and even
a bad cable.
There were also a series of Indys shipped with
IRIX-5.2 and bad serial ports.
If you have this problem, you have to get the motherboard replaced.
If these look OK, you'll have to start
detailed debugging
to trace the problem.
- I can't get to host
- This is usually an indication of a routing problem. If
this is the first destination you've tried, then you need to check
to make sure that PPP is even connected. If you can't even
/usr/etc/ping server, then go back to the
previous section on The modem won't dial.
Otherwise, you need to start following up on
tracing routing problems.
- I can't get to machines by name, just by IP address
- This is almost always a resolver configuration issue.
Detailed info is available on the resolver
configuration page
- PPP doesn't die when the modem hangs
up (usually, but not exclusively, only a problem on
servers)
- This is a strong indication that the DCD signal is
not making it through. The 3 common sources of the problem are:
- using the ttyd port instead of the ttyf
port (which does modem control and flow control).
- Your cable doesn't have the DCD pin
connected through. See the cable details section.
This is a comon problem, since Mac modem cables often don't
have DCD pin connected, and most modems come
from the factory with DCD always on (dropping
DCD indicates the modem hung up).
- Your modem is not configured to have DCD
follow carrier. See the modem configuration
page for details.
- The modem won't hangup when I stop PPP
- This normally means that DTR isn't getting through
to the modem. The 3 common sources of the problem are:
- using the ttyd port instead of the ttyf
port (which does modem control and flow control).
- Your cable doesn't have the DTR pin
connected through. See the cable details section.
- Your modem is not configured to hangup on DTR
drop. See the modem
configuration page for details.
- PPP doesn't timeout and hangup the phone line
- In addition to the above problems, PPP has to be configured to
timeout. /etc/ppp.conf should be configured with the
quiet keyword. You may want to change the value of
active_timeout= or inactive_timeout=, though
the defualts should be fine for most people. See `man
ppp` for more details. In fact, don't change them unless you
know what you are doing!
Even if everything else is configured correctly, the link may not
drop because some (possibly hidden) process is actively using the
link. One sign of this is that the SD and RD
lights on the modem are going "too much". Note that there are a
number of protocols that are passed, but not counted
(NTP, TIMED, etc), and you can add to that
list. See the ppp man page for details.
- I don't get full speed on file transfers
- File transfers pause for about 30 seconds every minute or
so
- This is frequently due to the default IRIX TCP window size
being so large. The fix is fairly simple,
although the explanation is complicated. You see this by watching
the lights on the modem. The time intervals may vary depending on
the modem speed.
- PPP logs the message "I_PUSH on fd 3 failed"
- This means that the required PPP kernel modules are not in your
kernel. Either your installation failed, your kernel failed to link
for some reason, or you didn't reboot after the installation.
- PPP logs "can't lock port" on my server
- This is a problem with the way you start ppp. This is
most commonly seen when you are also running FAX software. The fix may be tricky, and may not let you do
precisely what you want.
The other common cause of this message is a typo in the first
Pipeline article about configuring PPP. It incorrectly
stated that PPP needs to get started from a script, and then
provided an example which wouldn't work! The fix is to use
/usr/etc/ppp as the login shell, instead of the script in
the example. (A later issue had a correction, but not everybody saw
it).
- I now get Invalid Login when my PPP trys to login.
It used to work, and it works if I do it manually with
cu
- If you have a Telebit T2500 or WorldBlazer modem,
you are probably connecting in PEP ("CONNECT FAST") mode,
which you definitely don't want to do for PPP or
SLIP! The way to fix this is to fix your modem
configuration.
- PPP trys to negotiate with the remote, and fails
- or it logs: "received spurious Protocol-Reject"
- There are some PPP implementations that don't correctly handle
negotiating protocols that they don't understand. The spec says
that they should just NAK ones they don't understand at
all, so they are out of compliance. The solution is to disable the
offending (advanced) protocols on your end. This solves a lot of
these problems. In your /etc/ppp.conf entry, add:
-mp -ccp
Remember that continuation lines start with spaces or tabs.
- PPP (or SLIP) fails logging in with the message "enough
already"
- This is caused by dialing into a terminal server with a very long
and verbose message. It is over-running the look-ahead
buffer in the chat-script module common to PPP, SLIP, and UUCICO.
For details, see below.
Typographical errors are common when editing the configuration files.
All of the stuff you need to enter into configuration files is case
sensitive. If you type something in lower case that is
supposed to be in upper case, it won't work (and vice-versa).
Similarly, white space (spaces and/or tabs, and sometimes
newlines) are also critical. Because the lines of apparent gibberish in
the /etc/uucp/Systems
file are long, instructions will
frequently wrap the single logical line into 2 or more physical lines. The
break is always done at a space, and you need to leave that space
when you type in the line (Note that using a backslash
("\") character as the last character on a line escapes
the newline and causes the next line to be logically part of that
one. This is harder to explain and to get correct, though).
If the modem still doesn't dial, you need to start detailed debugging. This will get you as far as
the dialup router. Once you can ping
your server, PPP is
(technically) working. and you need to look elsewhere for your problem.
Note: The server is functioning as a dialup
router. Information about routing is available on the IP address and Routing page.
This is usually (again) because of an error in a configuration file,
usually an error in either the phone number or the password. Detailed debugging is again the starting point to
fix this.
Very rarely, problems arise due to some strange routing problem, usually
on the server. This is extremely difficult to trace down and fix.
It seems to "magically" fix itself when some other, apparently unrelated,
change is made to the network or server configuration.
Once you can /usr/etc/ping
the server, PPP is working. All
remaining problems are really routing or application issues, although PPP
can help with them.
An exception is an inconsistency in the way that VJ-header compression
is (or isn't) negotiated. Both sides must agree on this, or
ping will work, but any TCP application
(telnet, netscape, etc) will fail. If you suspect
this problem, try adding -vj_comp to /etc/ppp.conf
and restart ppp. Remember to take it out if it doesn't fix anything!
If you can ping the server, but can't to any other machines, you are
having a routing problem. There are a few cases to consider:
- Your machine is on the same network as the server -- you are doing
Proxy ARP routing.
- You need to have the server to proxy for your machine when
ARP requests are made. Instructions are located at the proxy ARP routing
page.
- Your machine is on a different IP network than the server -- you are
probably doing host routing.
- In many cases this is a simpler configuration, but there are also
some nasty pitfalls. Instructions are located at the seperate PPPnet routing
issues page. One common pitfall is that some routers ignore the
host routes, or need to be specially configured to recognize them.
Again, there are a few cases to consider:
- You are on the same IP network and using Proxy ARP routing.
- The most probable cause is that the router to the rest of the
network has a incorrect or stale ARP entry. The proper solution is
to flush the ARP cache on the router, reduce the ARP timeout to
something reasonable (say, 10 minutes), and stop whatever it is that
is creating the incorrect ARP entry. Additional information is
available on the proxy
ARP routing page.
- You are on a seperate PPP net, probably using host-routes.
- Problems routing this way are usually the result of a router that
messes up host-routes. Everybody on your network may be messed up,
and there are certainly black holes, where packets to/from
your net just disappear without a trace. Aditional information is
available on the seperate
PPPnet routing issues page.
- Since most modems are setup to let you hear them dial, this is a
good first starting point. If you can hear the modem dialing, then
go to the turn on debugging section,
otherwise you need to check for more fundamental configuration
issues.
- Do a fundamental sanity check of the hardware, make sure that the
modem is plugged in, turned on, and the modem cable is securely
connected to both the computer and modem. Double check that it is
plugged into the same serial port that you configured it for --
don't setup the configuration to use /dev/ttyf2 and then plug the
modem cable into port 1 on the back of the computer!
- Make sure that you are plugged into the correct jack on the modem.
Most modems have two phone jacks on the back, one goes to the wall
jack (usually labelled "line" or "wall") and the other is for
plugging a telephone into the same phone line (usually labelled
"phone"). This is an easy error if you deal with more than one
modem, because each modem brand has a different jack on the left.
- Plug a phone into the second jack on the modem if it has one (or in
place of modem otherwise), and try dialing the phone number the
modem is supposed to dial. Be careful, because if the number is
correct, you'll get a modem squeal in your ear. If you don't, then
either the number is incorrect, there is a problem with the phone
line, or the modem isn't setup correctly on the other end.
- Check your cable. This is one of the most common sources of
problems, partly due to the unfortunate (in my mind) decision of SGI
to use the same connector as the Mactintosh. The problem is that
the pinout is nearly the same, enough so that things seem to sort-of
work. See the modem cable
section for more details.
- Although not strictly hardware, it is worth checking that you are
using the same speed (19200 or 38400 baud) everywhere. Specifically
check the Devices and Systems files. Verify that you gave the same
speed option to the fix-* configuration script. The modems are
configured to use a fixed speed and many won't even talk at a
different speed. (Any experience you may have had with "autobaud"
modems is misleading.) Note that the
speed used in the configuration files is the
serial speed, not the modem's
modulation speed (the one they advertise: "28,800bps!").
- A final check if everything else looks good is to actually try to
type at the modem. Modems and serial ports, and even cables, have
been known to die! Go to the cu
instruction page for details on how to become intimate with your
modem. It helps to find your modem manual...
Modem cables seem to cause the most problems.
Standard Mac modem cables don't work. Many of the
so-called high speed modem cables, while technically not correct
(they seem to often be missing RTS), seem to work well enough.
This is the first place to check if your connection is flakey.
Full information is found on the modem
configuration page.
(The following description is over-simplified) The TCP window size is how
much data can be sent on ahead by the sender without waiting for it to be
acknowledged ("ACKed") by the receiver. IRIX uses a very large TCP window
size (60kB) by default. Many dialup routers have buffers that are much
smaller than this, so the part of the data in the window being sent that is
larger than the buffer gets discarded. When future packets have the wrong
data position in them, TCP waits for a bit (about 30 seconds) for the
missing packets to arrive, before sending a packet to the sender asking for
it to be resent. This will continue until the remaining data is less than
the remote's buffer size.
The fix is rather easy, but you are editing your kernel configuration,
so be very careful! Like any other system configuration,
you have to be root (the "Super User"). First, change directory
to the sysgen area (these instructions assume you are running IRIX-5.2 or
later):
cd /var/sysgen/master.d
then using an editor you are comforatble with (like jot,
vi or emacs, but not a "word
processor") edit the file bsd. Change the following lines (they
are in 2 different places in the file) from:
unsigned long tcp_sendspace = 60 * 1024; /* must be < 512K */
unsigned long tcp_recvspace = 60 * 1024; /* must be < 512K */
unsigned long udp_sendspace = 60 * 1024; /* must be < (64K - 28) */
to:
unsigned long tcp_sendspace = 8 * 1024; /* must be < 512K */
unsigned long tcp_recvspace = 8 * 1024; /* must be < 512K */
unsigned long udp_sendspace = 8 * 1024; /* must be < (64K - 28) */
After you have saved your changes, you will need to build a new kernel and
reboot (we'll save the old one, just in case. But don't reboot if you get
an error! Some warnings are OK):
cd /
ln unix unix.BAK
autoconfig -f
reboot
A series of the Indy were built and shipped with "bad" serial ports
(actually, it was a fix to a MIDI problem that broke HW flow control, a
mis-match of chip and motherboard revisions). A software bug in IRIX-5.2
masked the hardware bug. The SW bug was fixed by patch151 and by
IRIX-5.3 and later. The symptoms are that /dev/ttyd* and
/dev/ttym* devices work, but /dev/ttyf* devices don't
(where * can be 1 or 2). If using the
ttym ("modem control") device instead of the ttyf
("flow control") device lets PPP dial the modem, then this is (possibly)
your problem, and you will need to have your motherboard replaced by SGI
hardware service. You can temporarily run with (eg:) ttym2
instead of ttyf2, but you may get occasional errors due to data
overrun.
Final determination of whether this is indeed your problem, and the
resolution of it, is entirely up to your SGI service organization (how's
that for covering the different names it's known by around the world!).
Many people have setup one of the FAX packages, the most common of which is
HylaFAX (used to be called FlexFAX). They usually have a
program running on the serial port (it may be called faxd) that
answers the modem for FAX mode. Unfortunately, it doesn't emulate
/usr/lib/uucp/uugetty properly, so getting PPP or SLIP
connection to work in dialin mode can be tricky, especially if you are
trying to do anything tricky.
The problem is in locking the serial port, so that only one process at a
time can use it. IRIX PPP and SLIP are designed to interoperate with
uugetty's port locking protocol. Part of this protocol is that
the lockfile contains the PID of the process locking the port.
PPP and SLIP can take over the lock if it is their own PID or their
parent's PID (PPID). The normal method of launching a login on a serial
port goes like this:
- uugetty gets the login ID, creates the lockfile,
and execs login
- login gets the password(s), and execs
the account's shell
This means that the PID of uugetty that it put in the lockfile
is the same as the account's shell (say, /usr/etc/ppp), so that
the lock criteria is satified.
However, the account's shell might be a script:
- If the script does some stuff, and then execs
ppp, then all is still well, as ppp's PID is
still the one in the lock file.
- If the script does some stuff, starts ppp and waits for
it to finish, then does other stuff, all is still well, as
ppp's PPID is the one in the lockfile, and it still
thinks that it can take over the lock.
Things are different with faxd from any of the FAX packages.
They go off track at the first thing -- they fork login, not
exec it, so the lockfile already contains login's
PPID. Therefore, you can't use a script as a shell that doesn't
exec ppp, since ppp can't deal with a
lockfile owned by it's PPPID.
Unfortunately, in some releases of IRIX, PPP will only respect a lockfile
with it's own PID in it. You can't run it in a script
from uugetty. You either have to use getty for
dialin only, or workaround so that you can exec ppp in the
script.
Some strategies that allow you to work within the constraints are detailed
in the Adding SLIP and PPP clients page.
This is caused by dialing into a terminal server with a very long and
verbose message. It is over-running the look-ahead buffer in the
chat-script module common to PPP, SLIP, and UUCICO. This normally never
occurs when using only PAP or CHAP logins. Unfortunately, the kind of
places where this occurs don't usually allow you to use PAP or CHAP. There
are 2 solutions:
- Ask your server to shorten the prompt message, or change to newer
hardware allowing PAP or CHAP logins for PPP.
- The other option is to make intermediate waits for common words in
the message, or repeatedly wait for the terminating condition. A
specific example may look like:
remhost Any ACU 38400 5556000 "" \p\r\c name:-\c-name:-\c-name:--name: abuser ssword: xxx PPP
In other cases, you will have to be more creative, and choose words
that occur every 2 to 4 lines in the message, to be sure to not
overrun the look-ahead buffer. Something like:
remhost Any ACU 38400 5556000 "" \p\r\c this \c page \c too \c long \c name: abuser ssword: xxx PPP
It is vitally important that you end each "send" portion of the
chatscript with the \c character, or you will not get
what you expect!
If you are connecting in PEP mode (you see CONNECT FAST when you
connect), then you need to fix your modem's configuration. This is only a
problem with the Telebit modems that do PEP. There are
two ways to do this.
- One is to connect to the modem using
cu or kermit, and manually changing things.
- To do this, connect to the modem, but don't dial a
number! Then give the following command to the modem:
ats50=6&w
followed by the Enter key. Now your modem will connect in
V.32 mode, which is what you want for PPP or SLIP. For
the WorldBlazer (which does V.32bis), use
ats50=7&w.
- The other option is to re-run the configure script.
- You will need to be root ("superuser") to do this, just
like when you first setup your system. Give one of the following
commands (slightly modified from your configuration instructions):
/usr/lib/uucp/fix-telebit -o -m t2500 -c s111=0s50=6 2
/usr/lib/uucp/fix-telebit -o -m WB -c s111=0s50=6 -s 38400 2
Then when you next run PPP, things should work fine (if this is the
only problem, of course).
Hopefully there has been enough info here for you to figure out your
connection problem. This info is based on looking at the problem from the
client (dialing) end, becuase that is where most problems are discovered.
A random selection of potentially useful WWW pages:
[Top]
[Concepts]
[Symptoms]
[Dialing problems]
[Ping fails]
[Server Only]
[Server's net only]
[Modem Issues]
[Modem Cables]
[TCP Window]
[Bad Serial Ports]
[HylaFAX caused problems]
[enough already!]
[Telebit PEP modems]
[See Also]
I hope and intend that this documentation can help you with your PPP
connection problems. My other commitments (like work) permitting, I will
attempt to help you on issues not covered, or that you are unclear on.
Please make sure that you provide me a valid return email
address! (I won't try to fix it).
Scott Henry
<scotth@sgi.com>
Last modified: Mon Mar 10 17:31:54 1997