The current prototype is written in C++ using XML to read and store the data space as a single file. As an aside, I recommend using log4cpp and cppunit in C++ projects.
The design is based on a few fundamentals. Several years ago object-oriented database research produced a manifesto of sorts about the minimal requirements which would distinguish any old DB as an OODB. Two of those requirements: every object must have an identifier, and their must be a set of primitive types with which more complicated, user-defined objects can be built up. That notion, some concepts from RDBMS, and a few other ideas led me to this design.
I start with some basic primitives from our data domain and build from there.
Entity. Most of the basic primitives are derived from an Entity class. The Entity has name, title, description, and an Entity can contain other entities.
DataSpace. The DataSpace is a namespace for organizing, identifying, and locating entities. An Entity is bound to the DataSpace with a path name, like a directory path. So related entities can be grouped into DataSpace directories. The binding gives an Entity a path with which it can be referenced in other entities. The path is meant to be global, so that data models can reference and inherit from other enties.
Units. Units are a first-class object, with a known syntax. The current implementation uses udunits. Wherever a numerical value is recorded, the units need to be recored also.
In ten words or less, a Dimension is a relation; it relates fields which measure the same physical quantity. Examples are time, temperature, pressure, latitude, longitude, azimuth, elevation. More than one field may measure a Dimension, so associating a Dimension with a field indicates that the field values are compatible. In the derivations Chris Burghart added to Zebra, he introduced 'field types' as a way of specifying in a derivation formula generic variables like temperature and relative humidity. If a field was the correct type, then it could be discovered and used as input to the derivation. This is the same idea, only the Dimension is a fundamental part of defining a field and not an afterthought.
Dimensions are needed because Units are only part of the answer to compatibility between fields. For example, temperature and dewpoint both have units of temperature but represent different physical quantities.
The other key to a Dimension is that it needs to be universal to be consistent, so Dimensions are bound to the DataSpace with meaningful names which distinguish them from other Dimensions, like '/dimensions/radar/beamelevation'. Rather than requiring a registry of field names, which would also be a solution, there is a registry of Dimensions to which fields can be related by their global name.
A data model is a 'type' or a 'class'. It is a model for a dataset but does not actually store any data (except class data), just like a C++ or Java class contains typed members which when instantiated will hold values. In this design, all models are built up from a few basic model elements.
There are two aspects to the data model design, the entities which represent the models and the entities which implement instances of the models and actually store data. A data storage object is the instantiation of a data model. The software implementation of data storage was designed to be general, so it supports the known primitive types and arbitrary data domains and fields. It's the real substance behind a data interface in which the most common aspects to storing data can be shared, such as type conversions, array and memory allocation, units conversions, and data file format interfaces.
I'll describe the model for representing data values first, since that will introduce the concepts which need to be represented in a data model type.
To be completed.
These are classes which implement a data model definition. I've confused myself several times because they are essentially classes which define classes, so the distinction gets fuzzy.
DataModel. A DataModel is a container for entities, usually more model entities. It's like a struct or class.
FieldModel. The FieldModel defines a field type. It gives a name for the field, its dimension, the path to the domain model, units, and memory type. A FieldModel when instantiated yields a Field, but the implementation of the instance will be a subclass of Field according to memory type and the kind of domain.
DomainModel. Models a base domain. On the most basic level, it gives the rank of the base domain and names the set of dimensions in the base domain space, such as time and height, or time, latitude, and longitude. The DomainModel also names the kind of base domain implementation and any information needed for that. For example, if the domain uses a field to store a discrete mapping between manifold domain and base domain points, then the model for that field would be included.
DataModel as Query. A DataModel can represent not only a dataset type to be instantiated, but also a dataset request to be filled. For example, a GUI application can lookup a dataset by name, get its model, and display the model for the user to select the fields to download. As fields are added or eliminated, the application modifies the data model for the request. When completed, the data model is realized by instantiating the data model request and filling in the requested fields. A DataModel query can also allow simple transformations to the original datamodel, like requesting different units, different memory type, or derived fields. Such transformations can be handled easily and automatically by the software.
The dataspace and datamodel have XML representations, which should make it easier to exchange and extend the information in the dataspace. For example, the XML can be converted to HTML for publishing on the web, and new kinds of data domains and data types can be added.