| Graphics file: X Window Dump |
Two new data variables are now available from the adam:
blocked.cosmos # of blocked socket writes/second
lost.cosmos # of lost samples/second
These can be displayed by cockpit and xstrip. They are also
averaged in the covar files.
From the attached plot, you can see that network jambs are occuring every
5 minutes. It is not exactly 5 minutes however. They occur in groups of 6,
with five spikes spaced 4 min 40 seconds apart, and then a longer gap of
about 6 minutes 30 seconds, so that the entire group takes about exactly
30 minutes.
I made this plot from Splus:
> fun.plot.prep(c("blocked.cosmos","lost.cosmos"),1996,191)
(Day 191 is Jul 9).
Since they are not spaced exactly 5 minutes apart, it does not apear
to be timed with the pam polling process. To further exclude pam
as a suspect, John suggested that I shut down eve_rf on cocklebur.
I shut it down from 00:21 to 00:34 on Jul 10. The blockages
still occured, so pam is off the hook.
Could it be a profiler?
A useful diagnostic is to display "blocked.cosmos" with xstrip, and set
the options->chartwidth to 3000, which results in a grid line every 10
minutes.
223: ADAM/NETWORK, Site ASTER, Wed 10-Jul-1996 15:02:05 GMT, xstrip of adam network jambs
- Previous -
Next -
Index
| Graphics file: X Window Dump |
Here is a window dump of an xstrip plot of blocked.cosmos and lost.cosmos
Press "Grapics Viewer" to see it.
- 397: UNIX, Site ASTER, Thu 01-Aug-1996 14:58:47 GMT, aster rebooted yesterday
We forgot to note that we rebooted aster itself yesterday (and thus
also cosmos) because the serial port was "locked" and wouldn't let
the dp process connect to the outside world over the modem.
(We had tried restarting dp, but even kermit couldn't connect to the
port.) Rebooting fixed the problem.
- 419: ADAM/NETWORK, Site ASTER, Mon 05-Aug-1996 21:45:14 GMT, Cosmos yoyo up and down
Cosmos has been up and down several times today. Latest outage was at 21:30.
Had to go out and reset the ADAM. Other outages rebooted automatically.
- 428: ADAM/NETWORK, Site ASTER, Wed 07-Aug-1996 19:05:44 GMT, Changes to ingest & adam code
On Monday, August 5th these changes were made to the aster system:
ingest: increased the no-activity timeout from 2 minutes to
5 minutes
sync code on matrix: increased the sample buffer from 16*4096 to
24 * 4096 bytes.
Ingest was rebuilt, installed and restarted. The matrix code
was rebuilt. Since the adam was conveniently crashing every hour or so
I just let it load the new code and spawn a new ingest on its next reboot,
which happened at 21:36 on aug 5th.
It has been up since then, so perhaps these changes helped.
|
|