Here are some thoughts about the broader issues which will affect the design of this facility. This is neither exhaustive nor complete; it is just meant as a starting point for a discussion of the design. Ultimately what we are able to build will be limited by our resources. Like the homeowner who is remodelling the house, we will define our desires first and then start removing rooms when we get the bid from the contractor!
An examination of a few types of user is a good way to start thinking about the requirements for the IDPF. There are a number of distinct ways in which this tool will be utilized, and each will have very different modes of interaction. The basic functionality has to support the needs of at least the following types of users:
These are the staff members who are carrying out the routine post processing of the field collected data sets. These users needs a structure which allows configuration of the post processing procedures, and then the ability to execute it in a batch mode. Capability for intermediate checks (tabular or graphical) is required.
The SSSF data group receives data from a project which operated 4 ISS sites. From each site were received a QIC tape with CLASS data, and ANSI optical disk containing surface met data, and an optical disk from the profiler containing spectral data. The data from each instrument type needs to be carried through a sequence of formatting and quality control steps.
In this case, a data processing configuration would probably be already designed and available to the staff in a "turn key" form, although it might need some fine tuning. The technician would carry out the following steps:
The breaks between steps are provided simply to allow the process to be halted at one stage before proceeding on to the next. There may be (for whatever reason) a need to repeat a step, and the steps would be designed so that the preceding one could be repeated with out needing to return to the very start of processing. It is also useful to have discrete steps defined for operations that require operator interaction, such as graphical viewing or data editing. You need to be able to initiate the later types of activities on demand, rather than having them happen whenever the processing scheme happens to get to that point.
The scientists are typically developing analysis algorithms, or analyzing data sets in support of their research activities. In either case, they require an environment which makes it easy to manipulate and peruse data sets. In this type of usage, the scientist will often iterate over a process of modifying the analysis, running the data through it, and viewing the results. This users needs a framework that supports: importing, extracting and organizing data sets, applying the analysis algorithms, and developing and using custom graphical displays. Frequently the scientist will be examining and combining data from completely independent observing systems.
A scientist is studying TOGA/COARE data in relation to the Madden-Julian Oscillation. She wants to examine a time series of CAPEs that are calculated from ISS CLASS soundings, by looking at a display of the time series and a display of the power spectrum of the time series. She also wants to examine individual soundings, and "knock out" obviously erroneous data segments which are invalidating the CAPE calculations. She will make a first pass and examine the displays mentioned, looking for wild results. If she notices areas where the results are suspicious, she will examine (and edit, if appropriate) the associated soundings, re-run the computations, and examine the graphical products. She will iterate through this process several times.
In another scenario, a scientist is experimenting with wind derivation techniques using the spaced antenna profiler. His basic data set is a complex time series of radar returns from the four receiver panels. These time series are subjected to a variety of parallel processing paths in order to compare winds computed by a variety of methods. He will be writing and modifying the processing algorithms, and will be creating a new output product, using the same input data, for each run. It is important to note that the output products can be identical in all appearances (time and date, variable names and dimension, etc.); that their only difference may be the method used to compute them. He then wants to make visual and quantitative comparisons between the methods. In addition, he will "instrument" some of the programs along the processing chain in order to understand and verify some part of the algorithm. He will be devising new graphics displays almost continuously, as he develops and refines his algorithms.
The software engineers will use the system in much the same manner as the scientists and data processors. They will typically be building individual analysis modules that are used for post processing and analysis, and will be designing and testing IDPF configurations to support both.
Here are some ideas regarding useful features that the IDPF might provide. The following is list of ideas that come to mind when thinking about IDPF users and their typical needs.
Here is a first cut at defining how the IDPF system might be broken down into named components or subsystems. It simply defines names for the first six of the requirements listed in the previous section.