5. Design

The current prototype is written in C++ using XML to read and store the data space as a single file. As an aside, I recommend using log4cpp and cppunit in C++ projects.

The design is based on a few fundamentals. Several years ago object-oriented database research produced a manifesto of sorts about the minimal requirements which would distinguish any old DB as an OODB. Two of those requirements: every object must have an identifier, and their must be a set of primitive types with which more complicated, user-defined objects can be built up. That notion, some concepts from RDBMS, and a few other ideas led me to this design.

I start with some basic primitives from our data domain and build from there.

Entity. Most of the basic primitives are derived from an Entity class. The Entity has name, title, description, and an Entity can contain other entities.

DataSpace. The DataSpace is a namespace for organizing, identifying, and locating entities. An Entity is bound to the DataSpace with a path name, like a directory path. So related entities can be grouped into DataSpace directories. The binding gives an Entity a path with which it can be referenced in other entities. The path is meant to be global, so that data models can reference and inherit from other enties.

Units. Units are a first-class object, with a known syntax. The current implementation uses udunits. Wherever a numerical value is recorded, the units need to be recored also.

5.1. Dimension

In ten words or less, a Dimension is a relation; it relates fields which measure the same physical quantity. Examples are time, temperature, pressure, latitude, longitude, azimuth, elevation. More than one field may measure a Dimension, so associating a Dimension with a field indicates that the field values are compatible. In the derivations Chris Burghart added to Zebra, he introduced 'field types' as a way of specifying in a derivation formula generic variables like temperature and relative humidity. If a field was the correct type, then it could be discovered and used as input to the derivation. This is the same idea, only the Dimension is a fundamental part of defining a field and not an afterthought.

Dimensions are needed because Units are only part of the answer to compatibility between fields. For example, temperature and dewpoint both have units of temperature but represent different physical quantities.

The other key to a Dimension is that it needs to be universal to be consistent, so Dimensions are bound to the DataSpace with meaningful names which distinguish them from other Dimensions, like '/dimensions/radar/beamelevation'. Rather than requiring a registry of field names, which would also be a solution, there is a registry of Dimensions to which fields can be related by their global name.

5.2. Data Instances

A data model is a 'type' or a 'class'. It is a model for a dataset but does not actually store any data (except class data), just like a C++ or Java class contains typed members which when instantiated will hold values. In this design, all models are built up from a few basic model elements.

There are two aspects to the data model design, the entities which represent the models and the entities which implement instances of the models and actually store data. A data storage object is the instantiation of a data model. The software implementation of data storage was designed to be general, so it supports the known primitive types and arbitrary data domains and fields. It's the real substance behind a data interface in which the most common aspects to storing data can be shared, such as type conversions, array and memory allocation, units conversions, and data file format interfaces.

I'll describe the model for representing data values first, since that will introduce the concepts which need to be represented in a data model type.

5.3. Fiber bundle model

To be completed.

5.4. Models

These are classes which implement a data model definition. I've confused myself several times because they are essentially classes which define classes, so the distinction gets fuzzy.

5.5. Storing the dataspace in XML

The dataspace and datamodel have XML representations, which should make it easier to exchange and extend the information in the dataspace. For example, the XML can be converted to HTML for publishing on the web, and new kinds of data domains and data types can be added.