Monitoring ISS Realtime Processing Through Email

All of the realtime data handling on the ATD side is run under the iss account, and most of it on mead. The background scripts either log their output somewhere, or email it to the iss account, or both. Since the iss account is a shared account, anyone can monitor the email to verify that everything is working. The idea is to allow access to the iss email account through IMAP, so that anyone with the password can check on the iss email. Setup your email client with a new account, using imap.atd.ucar.edu as the IMAP server and iss as the user account.

The iss account has a .procmailrc file under its home directory to automatically file away the routine email messages. The procmail processing is triggered through the .forward file in the iss home directory.

Note

Note that both the .forward and .procmailrc files need to disallow write permissions to anyone but iss; in other words, set the mode to 644.

These messages are delivered to the Mail/realtime folder. Any email messages with subjects recognized as errors or failures are specifically left in the main Inbox folder. Thus when everything is operating normally, there will be no email messages in the Inbox (except for emails with sounding attachments), and the realtime folder will contain an hourly barrage of email messages triggered by each round of data transfers. Normally there will be three to four messages per site per data transfer. The first is the email from iss_accept_delivery with the subject iss2 data on mead for name2004. Then there are one or two messages from the zebra batch plotting, depending upon the time of day. These messages have the subject zplotd ok. Finally the output from the at batch script which ran the plotting will send its email, with the subject Output from "at" job.

The sounding emails are purposely left in the Inbox folder so they can be checked for successful ingest and plotting. The convention is then to move those messages into the Mail/soundings folder.

Email messages from failed zplotd runs are not uncommon. Most of them can be safely ignored, since the following transfers in the next hour will attempt another plot anyway. However, the email messages do contain useful information for diagnoses. All of the emails contain a URL for each of the generated plot images, so it is easy to check exactly the images just produced by that run. Also, at the end of the email there are URLs for the log file, so you can browse directly to the log file to investigate the cause of the problem.

Recently, the most common problem seen, and then only once every few days, is that one of the zebra processes will exit with an error message and cause one of the sets of displays to be skipped for plot generation. The error messages appears like this:


Message handler disconnect: error 0

So far this problem has not been judged worthy of further investigation.