The ISS sites are configured to transfer data every hour, at 10 minutes past the hour. The transfer works by uploading a compressed tar file, with an extension .tar.gz, via a CGI script on the ATD web server. The CGI script is located in /etc/httpd/cgi-bin/iss/iss-data.cgi on linus. The version-controlled copy of this script is in the /iss CVS repository under iss/cgi.
The CGI script is a simple python script that accepts several form parameters: the file itself, the site name, and a MD5 checksum of the file. The site name must match one of the expected site names: iss2, iss3, or iss4. The script copies the data file into an incoming directory according to the site. Once the copy is complete, the MD5 checksum of the uploaded file is compared against the one submitted in the form. The upload is not considered complete unless the checksums match. If the upload completes sucessfully, then the CGI script returns a web page with a status message including the words "uploaded successfully". The data transfer script at the site (/iss/etc/init.d/datasend) does not consider the transfer successful until it sees those words in the response.
The incoming directories for each site are under the home directory of the iss account: /h/atd/iss/incoming.
A cron job running on mead polls the incoming directories for new data files. This the relevant line from the crontab file:
5 * * * * /h/atd/iss/bin/iss_poll_incoming name2004 iss2 iss3 iss4 < /dev/null > /dev/null 2>& 1 &
This line starts the iss_poll_incoming script at 5 minutes past every hour. There needs to be exactly one instance of this script running on mead at any one time, so the script runs only during the same hour in which it was started. If it should ever crash, a new one will be started in its place in the following hour. Normally, the script does not crash and simply exits as soon as it notices that its hour has passed.
The iss_poll_incoming script is responsible for detecting new data files and processing them sequentially, one by one, avoiding the concurrent processing of all three sites and bogging down mead. Each data file found is handled by running the script /h/atd/iss/bin/iss_accept_delivery. The iss_poll_incoming script contains a builtin two-minute delay between successive calls to the iss_accept_delivery script, so the processing of the site data files will be spread out over several minutes.
Before passing off a data file to iss_accept_delivery, the iss_poll_incoming script renames the data file with a prefix of polled-. This prevents the data file from being picked up by future or errant instances of iss_poll_incoming and processed more than once, in case the handling should fail, stall, or just take an unexpectedly long time. Thus if the iss_accept_delivery script should fail, its possible for data files with a polled- prefix to be left in a site's incoming directory.
The iss_accept_delivery script takes care of unpacking the data within an uploaded data file and triggers further processing on the new data. Given the project name, site name, and file name as command-line parameters, the script unzips and extracts the data files into the site's data directory. The data directory is located through the links in the projects configuration directory, such as /h/atd/iss/project/name2004. See the section called “ISS Project Configuration Directories” for details.
Once the archive file has been extracted, the script checks for the incoming script under the project's in.d directory. If the script exists, it is scheduled to run in the background with at. Then iss_accept_delivery script emails all of its output and some of the status details extracted from the tar file to the iss account. Since the incoming script is batched with at -m, that output will be emailed separately to the iss account, with a subject similar to "iss2 data on mead for name2004".
After the data file has been handled, it is moved into the done subdirectory of the particular site's incoming directory. For example, /h/atd/iss/incoming/iss2/done contains all of the data files which have been handled already for site iss2. Thus all of those filenames have the polled- prefix.
The done directories are not yet scrubbed automatically, so the disk quota for the iss user needs to be checked periodically to ensure that disk space limitations do not begin to foul up the works. Run this command to check:
mead:/h/atd/iss>quota -v
Disk quotas for iss (uid 11215):
Filesystem usage quota limit timeleft files quota limit timeleft
/home/esig 0 1700000 1900000 0 60000 70000
/home/atd 356560 1000000 1200000 6705 60000 70000
/h/guest 0 300000 320000 0 18000 20000
/net/win_sssf1
0 1500000 2000000 0 50000 60000
In the above example, the iss user has used about one third of its disk space quota.
Each ISS project gets its very own directory under /h/atd/iss/project, and this directory points to or contains further scripts and directories for data processing specific to the project. For example, the name2004 project uses the directory /h/atd/iss/project/name2004. This directory contains four entries:
A link to the top-level web directory for this project.
The zebra project configuration directory.
A directory containing various scripts for generating realtime products, such as the realtime plot images, the image summary pages, and the tklog web pages.
A link to the top-level data directory for the project.
The top-level data directory contains the data directories for each of the individual sites and is also the Zebra datastore directory and location of the Zebra file database, e.g., the cache file Zebra.cache.mead. For name2004, the link points to www/realtime/data, which in turn is a link to /net/ftp/pub/archive/iss/name2004. This allows all of the data to be accessible two ways:
The zebra project directory defines the settings for running a zebra datastore and display sessions, including an interactive session for visualizing the data and a batch session for generating plot images in realtime. See Zebra Batch Plotting and Zebra Display Session.
The iss account keeps a CVS checkout of the zebra source tree under /h/atd/iss/zebra/zebra, and that source tree installs into /h/atd/iss/zebra. The zebra project configuration directory contains a link called top which usually points to /h/atd/iss/zebra, but the link can be changed easily to test the project configuration against other zebra installations.
The most important script in this directory is the incoming script. The iss_accept_delivery script looks for this script in this directory, and if the script exists then it is run with the same command-line options as were passed to iss_accept_delivery, meaning the project name, site name, and the data file being handled are all available to trigger further processing. For name2004, the incoming script looks like this:
#! /bin/sh # # Called by iss_accept_delivery with these arguments: # # filename to process (eg. ds.iss1.312.070427.tar.gz) # iss_id (eg. iss1) # project name (eg. ace) cd /h/atd/iss/project/name2004 in.d/plotnow $3 $2 in.d/summary $3 $2 in.d/tkloghtml $3 $2
All this does is run updates of the latest realtime plots, the plot summary pages, and the tklog web pages. Note that none of these scripts actually use the filename parameter. Most of them will limit the updates to a particular site if one is passed on the command line, otherwise they will run the update for all of the sites. The incoming script and any of its subordinate scripts in in.d can also be run manually at any time, but they expect to be run from the project directory, eg, /h/atd/iss/project/name2004.
Here is the basic layout of the web directories for an ISS project, using the name2004 project as an example.