In the UML Distilled book, Fowler identifies four phases of the development process: inception, elaboration, construction, and transition. I've narrowed this section to the parts of those phases related to discovery, be it requirements discovery, discovery of use cases, or risk discovery. Fowler categorizes risks into requirements, skills, technological, and political. I'd like to add the risk of personal embarrassment. This section outlines my discovery process so far.
This project was supposed to have started more than a year ago. The idea was that there needed to be an easier way to process profiler data. Profiler data in particular must be processed through several non-trivial steps to compute observed radar signals into quality-controlled, reliable meteorological measurements. Multiple methods with multiple parameterizations need to be tracked and inter-compared. Results from one or more computations become inputs to one or more further computations.
For example, for the doppler beam swinging (DBS) system, a third-party program called the Profiler Operations Program (POP) records the radar signals. Then the POP data must be converted (ingested) into a more accessible format, accessible by users and accessible by user programs. Currently the format is netCDF. POP also generates wind and temperature measurements using its built-in algorithm. Alternative algorithms and analyses, currently in the form of user programs and scripts, use the radar data to generate their own wind and temperature measurements as well as QC measurements and visualizations. In a typical wind computation, radar returns are discretely transformed into spectra, which become moments, which then can be combined among directed beams to derive horizontal winds. Naturally, in practice things are not so straightforward.
There is a precedence in the ISS Data Processing Facility document from December 1994. I suppose it is the original inception document, from which sprouted investigations into Khoros and visual programming environments. The IDPF document includes some useful scenarios which are worth reading: /net/sssf2/profiler/doc/idpf_reqs.fm, or profiler/idpf/idpf_reqs.html. Following are some of the requirements, summarized and somewhat elaborated:
Ability to configure post-processing, execute that configuration in a batch mode, and produce intermediate checks of the processing.
Ability to conveniently develop and integrate unique or alternative analysis algorithms, such as directed or developed by scientists. Some of the algorithms or processing modules will require graphical displays. It should be possible to "publish" a module, such as in a library or as some pluggable component, so that others can use it in their own processing.
Integration of data from independent observing systems.
Data management, from raw instrument data to intermediate products to final output, along with a way to identify and share those products among other users and processing tasks.
The above list suggests several of the use cases I've imagined to this point, which are outlined in a later section.
During inception, I came up with some additional ideas for what I thought would be general requirements for the project:
Simplify and consolidate profiler data access into a useful and flexible API (C++) which isolates profiler software from any underlying file and database access, as well providing better memory and i/o management. Provide a fundamental error and message logging facility for all programs to share, as well as layers for handling standard command-line arguments and environment variables. Make the API abstraction useful for both DBS and MAPR beam data.
Organize the profiler software tree to facilitate the sharing of code and to simplify the building and maintenance of the code.
Include thorough documentation with the new API.
Improve tracking (auditing) of data processing within the data themselves.
Make the netcdf file conventions more consistent so that processing software and tools are more interoperable.
Decide if it's worthwhile to design some generic interfaces for common processing operations, such as moments, winds, and so on, and extend the profiler API to allow plug-n-process, where applications can easily choose and change processing methods, perhaps even at run-time.
Decide if it's worthwhile to implement some sort of processing control framework, a way to flexibly configure and execute data processing, analysis, and visualization as a conceptual chain.
Improve documentation; provide online documentation.
I'm thinking a few of the above are irrelevant at this point, or too specific. There's an obvious bias towards implementation details like the API and performance, perhaps an indication of the lack of any real analysis. I left them in for historical completeness.
Fowler makes a good point about not just documenting the chosen ideas, but also documenting why certain ideas are abandoned, so that no time is wasted discounting the same idea later or trying to remember the justification for abandoning the idea. So if I remove anything, make sure I give a reason.
Recently, it has become apparent that we should be able to generalize parts of the system so that other instruments besides the profiler can take advantage of the features which are not specific to the profiler, such as auditing and the processing model. Data processing is certainly not unique to profilers, and it would be nice to see if this project could have expanded applicability.
I feel I should add at this point that I too easily tend to generalize and expand a design. I'll add this risk to Fowler's list and call it the "overkill" risk, the risk of biting off more than can be chewed with the resources (especially time) available. Fowler does mention "schedule risk", in respect to underestimating the time required to fulfill a set of use cases. So along with requirements, I will try to identify the non-requirements, those pies in the sky which I will be thinking about but which I should not allow to influence unnecessarily the immediate work.
So far, the system being conceived has little or no mention of real-time requirements. Yes, it would be nice for the system to be useful on field systems, and more in-field data QC has been mentioned as a SSSF objective, but that won't be part of this baseline architecture. If someone else has other ideas, they need to be made known.
I am not making any distinction between on- and off-line data storage. Yes, some day it might be convenient for the data management to include support for offline data inventories, but not this particular day.
Commercial tools such as Splus, Excel, MATLAB, IDL, and PV-WAVE are used for a great deal of the processing and visualization of SSSF data. However, for the moment I don't know how to address the question of integrating these tools with the profiler data processing system, other than attempting to export data formats which these tools can access. Since each tool would need to be handled individually, data exporting will have to suffice for the baseline architecture.
I've been thinking about graphical user interfaces for the various tools, but I'm not sure that is a concrete requirement. It would be nice to add later, but perhaps it is not an immediate consideration relative to the more important requirements of data processing.
So far I don't think any of the use cases suggest that the system must be distributed. The data must be shared, but does that require that the data must be shared among machines beyond the usual remote file system capabilities?
![]() | As I think of more, or as more non-requirements are suggested, I'll add them here. And of course, non-requirements can be moved to the requirements list as deemed necessary. |