Facilitate, encapsulate, and enforce consistency in the meanings of fields, dimensions, and units. Related to that, provide one information source from which software and documentation which depends on those meanings can be generated. For example, if a data model were defined for the surface data recorded in the ISS project in Reno, then the 'one model' can be used to generate documentation for investigators as to what is in the data files, and it can be used to generate software interfaces for reading and processing those data.
Audit trails: track changes in a data stream, identify versions of a data stream which software tools support, track derivations of data so that derived datasets have 'resolvable references' to the original data from which they were derived. This could be taken to the point of not duplicating data in derived or QCd datasets which has not changed; store only the changes or the additional fields.
Do not limit the design with a particular notion of metadata. Metadata depend upon the user perspective. Sometimes metadata are used as an online catalog of offline data, in which case the metadata chosen depend upon scale and storage efficiency. Sometimes they are based on the kinds of searches and queries expected, such as geographical or temporal. Realistically a single dataset of data may have many 'masks' for identifying the metdata, so the system should support that kind of masking rather than hardcoding a particular set of metadata. Where a universal notion of metadata is needed, the system should identify that metadata and record the interpretation of that metadata in a single, global place. Just like for data formats, ATD has looked into metadata standards like the Dublin Core. And just like for data formats, I don't think there will be a perfect match for the metadata ATD needs. Better to define the metadata we need, as part of our 'business logic', then provide a translation to standard metadata as needed.
netcdf attributes are an incomplete abstraction. If you can attach a value to a field, why can't that value have attributes like units as well, or why can't it have more than one dimension. Instead, allow a datamodel to be a hierarchy, so any entity in the datamodel can contain members. This provides an abstraction for attributes as well as a more manageable namespace for variables.
The data model software should work for application parameters and configuration info as well as for traditional observational data. The parameters to a processing program or an instrument's configuration are vital parts of documenting and auditing the data trail, and as such should be supported within the same data system and with as much integrity.
There are lots of similarities between data streams. It should be possible to use an inheritance or prototyping mechanism to define specializations from more general models.
Generate APIs from the data models. Even though a data model might have a generic access interface, it can be made still more convenient and intuitive by generating an interface using the language of the model. For example, a model which includes a field called 'tdry' generates an interface to a data subclass which includes a method 'set_tdry'. It is exactly the same idea as CORBA, DCOP IDL (KDE), MCOP IDL (ARTS), and Qt designer XML. Applications for a particular domain, like profilers, can further subclass the data object to add domain-specific convenience interfaces.
A tool for sharing and organizing data models needs a way to identify them. There needs to be a way to bind and resolve entities in a global namespace, so that names can be referenced and to ensure uniqueness and consistency in naming.
Provide a translation for data instances to and from a serialized network form using the data model definition.