This appendix was written to help those who are interested in using RAF's archived data. RAF data sets from the late 1970s through 1993 in the now-obsolete GENPRO format are archived on the SCD Mass Store System.
The first GENPRO processing program (GENPRO-I) was written for the Control Data Corporation (CDC) 6600 and 7600 computer systems. These machines had a 60-bit word size. The data output (written to 1/2-inch magnetic tape at the time) consisted of a header record in CDC Display Code followed by binary data records. The header record described, in text form, the format of the binary data records which followed, listing the variables in the order of their appearance in the data records, as well as their rates, starting locations within a record and unpacking scale factors. The data records used CDC-standard 60-bit unsigned integers. To save space, GENPRO-I packed three 20-bit data values into each word. To keep an appropriate number of significant digits and to prevent overflow conditions, the individual values were scaled (offset added to a value to prevent negative numbers; value then multiplied by a constant--usually 1000--to preserve significant digits; and the result truncated to an unsigned integer) before packing. To restore these data, a user must read and decode the header then use its information to read and unpack the binary data values. A special routine to unpack the data 20 bits at a time exists in the standard NCAR software libraries. (RAF no longer has software to handle GENPRO-I data.)
In 1983 the second-generation GENPRO processor (GENPRO-II) was developed for the then-new Cray machines. The new, revised GENPRO format, quite similar to the original, used an ASCII header written to a separate file followed by a binary data file written with unsigned 32-bit integers. The basic scaling method was kept with the exception that only one value was put into each 32-bit word, i.e., no packing was done.
A GENPRO-II header, the first of a paired-file data set, is written in ASCII, having 800-byte records that can be further subdivided into ten (10) 80-character "lines." The number of records in this header file varies depending upon the number of variables that are recorded in the binary data file which follows. (This number typically will be the same for all files in a research project's data set.) The last "line" in the header file consists of the characters " ENDHD" (between quotes) with the last record padded to 800 bytes, if needed.
The header file gives English instructions for decoding the data records in its accompanying binary data file, which follows. It has information such as the project's name, flight date and start and end times. Instructions for decoding the binary data follow that and are typically (though not always) divided into 3 parts:
The second file of a pair is a binary data file having a constant number of bytes per physical record, consisting of 32-bit, unsigned integers. A physical record is usually subdivided into a number of logical records, each of which constitutes one second of data for all the variables. The binary record format typically will be the same for all files in a research project's data set.)
The binary data are decoded per instructions given in its associated header file. Variable descriptions in the header appear in the same order as the accompanying data. (A variable's starting position within a record is given in bits, not bytes.) A 32-bit integer is converted to a variable's value by dividing the decoded integer by the header's FACTOR value then subtracting the header's TERM value. For variables whose rate is higher than 1 sample per second (sps), values are consecutively repeated within the logical record.
RAF's archived GENPRO data are kept on the SCD Mass Store System (MSS), most in COS-blocked bitfiles. (The Cray-developed COS blocking scheme preserves the record and file boundaries of the original data.) The document Converting Cray-style datasets for use on non-Cray computers describes available NCAR/SCD software that allows non-Cray computers to use COS-blocked files.
All but a few of the GENPRO-I data sets recently were copied and converted into COS-blocked bitfiles to make access easier. (A small number of the GENPRO-I data sets remain in TBM format, which was used by the former Ampex TMS-4 Terabit Memory System. These may not be viable, since SCD's conversion software no longer runs on any of SCD's supercomputers.)