My thinking on all of this has of course been heavily influenced by a few projects, notably ARM, netcdf, and Zebra, plus a few recent developments in interface definition languages like CORBA.
Netcdf makes a good example, and a good reference point in deciding how to extend a data model. The netcdf header is the definition of the contents of the file, like a model. Constructing a netcdf file has two clear phases, first the definition and then the addition of data. Perhaps the netCDF file abstraction is somewhat overloaded in implementing both model and data; maybe the complications of trying to 'redefine' a netcdf file is evidence of that. Zebra has inconsistencies in that 'global' attributes meant to be truly global for an entire dataset over many files have only file scope.
Likewise, CDL and ncgen are obviously a good idea, but CDL can only represent the netcdf abstraction. CDL has no inherent notion of concepts which we might want to introduce into the ATD data model. Rather than shoehorn ATD data logic by establishing complicated conventions in netcdf, we can implement the data logic layer. netcdf can be used as the backend if we wish, but the 'conventions' for extending the netcdf abstraction need only be defined and maintained in one place. Likewise, the code produced by ncgen is a huge main() function and not an API that could be plugged into an application.
Zebra has some notion of datastreams in the platform definition. The datastore configuration defines all the platforms in a dataset, where the platform specifies the data file format and data 'organization', like raster, 2-d grids, 3-d grids, mobile and so on. I think there are many advantages to the design: the datastore library uses an object-oriented abstraction called a DataChunk to represent fields of data, and a little more expressive then the netcdf model. The library internally and transparently converts several different data formats to and from datachunks, so all clients, from ingestors to visualization, use the single datastore api. One disadvantage to the zebra design is that it expects all the fields in a data file to share the same organization. Another problem has been that Zebra caters to a wide variety of conventions, practice, and file formats to integrate external datasets, in a sense being dragged down to the lowest common denominator. Anything we replace it with internally should start by first defining the abstraction we want to use and then migrate our data to fit it.
RAF has its own catalog of field specifications which RAF users and software need to know and share. The particular set of fields recorded on a flight are pulled from that catalog.
VisAD has a data model implemented entirely in Java. It's model is directed towards visualization and its requisite data manipulations once the data are in memory. The model does not address (so far as I can tell) sharing models via an external definition or storing data to files.