MILAGRO Data Policy and Management Plan

Official version, 3 February 2006


The MILAGRO team is committed to full sharing of data between all participants during all phases of the field experiment and subsequent research and analysis activities. This document is intended to provide a framework through which this sharing might be efficient and inclusive. The ultimate goal is not only to have unfettered access to data, but to foster collaborations across the various components of MILAGRO that will fully exploit the scientific value of the MILAGRO observations. Given the size and diversity of the MILAGRO effort, it is imperative that scientific courtesy be observed and that investigators be consulted and appropriately acknowledged whenever their data is used.

Data Timeline

The status of data sets can be thought of as going through three phases: Field, Research, and Public. Each phase will have its own Data Repository (described below). For MILAGRO, the dates corresponding to these phases are:

  • Field Phase: March 1-29, 2006
  • Research Phase: up to March 1, 2008
    Preliminary data due September 1, 2006
    Final data due March 1, 2007
  • Public Phase: March 1, 2008 onward

During the Field Phase, it is requested, if possible, for investigators to submit their data to the Field Repository within 24 hours of the measurement (from aircraft or ground). These Field Data (raw data, QA/QC level 0) will be used only during the Field Study to assess that the science goals of the mission are being met and to evaluate the forecast models. All data in the Field Repository will be deleted at the end of the mission (i.e., March 30, 2006).

To facilitate scientific analysis of all the data, it is requested that Preliminary Data (QA/QC level 1) be submitted to the Research Repository by September 1, 2006. Final Data must be submitted by March 1, 2007. During the Research Phase, data are only available to MILAGRO participants and the Research Repository will be protected by password. Each data file should include in its header (metadata) a clear indication of whether the data is Preliminary or Final. The ICARTT format also specifies to label each file with a revision number (in the file name as well as the file header).

All data will be made publicly available on March 1, 2008 by removing the password control on the Research Repository.

Data Policy

In order to be sure that data is used and acknowledged fairly and properly, all MILAGRO participants are requested to accept the following responsibilities:

  • Submit data according to the specified schedule, in one of the specified formats
  • Provide adequate metadata in data files
  • Consult with PIs when using their data
  • Invite PIs of any data used to be co-author (particularly during research phase)
  • PIs should be available to answer questions about their data after submission (need to provide contact information in file headers)

During the Research Phase, all MILAGRO participants will have access to all data.

All data will become public on March 1, 2008. This is consistent with NASA and NSF data policies to make data public 1-2 years after an experiment. Individual MILAGRO component groups (i.e., MCMA, MAX-Mex, MIRAGE, and INTEX-B) may choose to release their data to the public sooner. Such elections to accelerate public release will not affect the protected status of the larger MILAGRO data holdings. Component groups may also elect to share data or collaborate with groups outside the MILAGRO community. Such data sharing with third parties will be arbitrated by leadership within the relevant component group and will respect the protected status of data from the other component groups.

The data to be archived includes:

  • Measurement results
  • Model results
  • Satellite observations
  • Meteorological forecasts

Acknowledgement Statements

When any of the MILAGRO data are used in a publication, an acknowledgement statement should be included, recognizing the efforts and funding from the large number of people and agencies involved.

[Please send suggestions to Louisa Emmons(emmons@ucar.edu) - either for an overall MILAGRO statement or for individual missions (MIRAGE, MCMA, etc.), satellite data sets, etc.]

Data Formats

To allow consistency with previous measurement campaigns, two data formats will be used during MILAGRO: (1) ICARTT (modified NASA Ames) and (2) NARSTO Data Exchange Standard (DES). Both of these formats are text (ascii) file formats that are easily produced and used by most investigators. Data that are not suitable for text files (e.g., from LIDAR, AMS, satellites, models, etc.) should be archived in their community standard format (e.g., NetCDF, graphics, etc.). Analysis of the data sets will be facilitated if one format is used by all groups at a given site. All DC8 and C130 investigators will use the ICARTT format.

ICARTT format description

NARSTO DES format description

Data Repositories

  • NCAR Community Data Portal
    • For Field, Preliminary and Final Data from any group
    • Upload through website
    • Any format accepted; no format checking (honor system)
  • NASA Tropospheric Chemistry Integrated Data Center
    • Automatic web upload and format-checking software ready for ICARTT format
    • Will accept Field, Preliminary, and Final Data for DC8, C130, J31, ION
  • NARSTO Quality Systems Science Center (QSSC)
    • Accepts Final Data for archiving
    • NARSTO DES format checked with existing automated format and metadata content checking software
    • ICARTT format checking by Data Managers with QSSC interactions as needed (tentative).

Data Catalog

The Data Catalog (on the NCAR/EOL MILAGRO website) will list all measurements made and have links directly to the data for all phases (Field, Research, Archive).

"Readme" files should be provided for the data catalog, detailing the measurement and analysis techniques.

Merges

Combining all measurements from one platform or site on a common time base makes the files much easier to use for analysis. However, this is time-consuming work and someone needs to be identified to create and check them. Merges are valuable to have at all stages (preliminary, as data is revised, final).

The NASA Langley group (Jim Crawford et al.) plans to create merges of the ICARTT-format files that are in the INTEX-B archive (measurements from the DC8, C130).

Conversion between NARSTO DES and ICARTT formats

It seems feasible to convert files between NARSTO DES and ICARTT formats, since these two formats are simply text files of columns of numbers with a header of metadata. NARSTO DES files should have all the information needed for ICARTT, and it may also be possible to convert from ICARTT to NARSTO DES. However, it is not trivial, and we need someone to write programs and do the conversions.

There are a few issues that would make the data formats more compatible, and would improve the formats:

Standard variable names. NARSTO DES uses a set of standard variable names. Users will be asked (but not required) to use these standard names in the ICARTT format. NARSTO will develop a cross-referenced set of shorter standard names for the ICARTT format users, including eliminating special characters from the variable names (commas, semicolons, etc.) to make them easier to parse and use in various software programs.

Units. A set of standard units names will be compiled. Units will be added to the ICARTT format.

Data Managers

Data Managers are needed for each platform or site. They will ensure that if possible, data is submitted to the Field Repository during the field study. After the campaign, they will ensure that all data is submitted to the Research and Final Repositories, according to the deadlines given above.

Data Managers:

NASA DC8, J31, B200, IONS: Jim Crawford, NASA Langley

NCAR C130: Louisa Emmons, NCAR

DOE G1: John Hubbe, PNNL

Twin Otter: Bob Yokelson, U.Montana

T0: Jared Morante, MCE

T1: Alex Guenther/Louisa Emmons, NCAR (NSF)
Telma Casto (Mex.)
Jeff Gafney (DOE)

T2: Will Shaw, PNNL