HIPPO Data Set Documentation Guidelines

The documentation (i.e., the "Readme" file) that accompanies each project data set is as important as the data itself. This information permits collaborators and other analysts to become aware of the data and to understand any limitations or special characteristics of data that may impact its use elsewhere. The data set documentation should accompany all data set submissions and contain the information listed in the outline below. While it will not be appropriate for each and every data set to have information in each documentation category, the following outline (and content) should be adhered to as closely as possible to make the documentation consistent across all data sets. It is also recommended that a documentation file submission accompany for each preliminary and final data set.





TITLE: This should match the data set name

AUTHOR(S):

  • Name(s) of PI and all co-PIs

    Complete mailing address, telephone/facsimile Nos.,

    E-mail address of PIs, and WWW address (if applicable)

    Similar contact information for data questions (if different than above)

1.0 DATA SET OVERVIEW:

  • Introduction or abstract

    Time period covered by the data

    Physical location (including lat/lon/elev) of the measurement or platform

    Data source if applicable (e.g., for operational data include agency)

    Any World Wide Web address references (i.e., additional documentation such as Project WWW site)

2.0 INSTRUMENT DESCRIPTION:

  • Brief text (i.e., 1-2 paragraphs) describing the instrument with references

    Figures (or links), if applicable

    Table of specifications (i.e., accuracy, precision, frequency, resolution, etc.)

3.0 DATA COLLECTION AND PROCESSING:

  • Description of data collection

    Description of derived parameters and processing techniques used

    Description of quality control procedures

    Data intercomparisons, if applicable

4.0 DATA FORMAT:

  • Data file structure and file naming conventions (e.g., column delimited ASCII, NetCDF, GIF, JPEG, etc.)

    Data format and layout (i.e., description of header/data records, sample records)

    List of parameters with units, sampling intervals, frequency, range

    Data version number and date

    Description of flags, codes used in the data, and definitions (i.e., good, questionable, missing, estimated, etc.)

5.0 DATA REMARKS:

  • PI's assessment of the data (i.e., disclaimers, instrument problems, quality issues, etc.)

    Missing data periods

    Software compatibility (i.e., list of existing software to view/manipulate the data)

 

6.0 REFERENCES:

  • List of documents cited in this data set description



 

 

DATA SET CONTENT GUIDELINES

An inherent flexibility of the EOL data management system permits data in all different formats to be submitted, stored and retrieved from the EOL data archive. The following data format guidelines are intended to help standardize the information provided with any data archived for the project. These guidelines are based on EOL experience in handling thousands of different data files of differing formats. Specific suggestions are provided for naming a data file as well as information and layout of the header records and data records contained in each file. This information is important when data are shared with other project participants to minimize confusion and aid in the analysis. An example of the layout of an ASCII file using the guidelines is provided below. Keep in mind that it is not mandatory that the data be received in this format. However, if the project participants are willing to implement the data format guidelines described below, there are some improved capabilities for integration, extraction, compositing and display via the EOL data archive that are available.

A) Naming convention - all data files should be uniquely named and include an extension indicating the type of file such as:

  • .gif = GIF image format file

    .jpg = jpg image format file

    .txt = Text or ASCII format file

    .cdf = NetCDF format file

    if compressed, the file name should have an additional extension indicating the type of compression (i.e., .tar, .gz, .z, etc.)

    Files should be ASCII, NetCDF, GIF, JPG, or Postscript whenever possible. Preferred format for ASCII data files is space, comma, or tab delimited columns, with a UTC date/time stamp at the beginning of each data record. If the data in the file are comma delimited, decimal places must be periods, not commas.

B) For Text (ASCII) files, the records should consist of both header records and data records. The header records at a minimum should consist of:

ASCII DATA HEADER RECORDS SPECIFICATIONS

Standard header records should precede the data records within the file itself. The header records should contain the following information:

   
PI/DATA CONTACT = Text [PI and data contact name(s) and affiliation(s)]
DATA COVERAGE = Start/Stop time of continuous data or sampling interval (See format for time in Data Records below)
PLATFORM/SITE = Text [e.g., C130, BROWN, etc.]
INSTRUMENT = Text [instrument name]
COORDINATES = Fixed site coordinates (decimal degrees) or "mobile" (Text)
DATA VERSION = Alphanumeric [unique ID (i.e., revision date), preliminary or final]
REMARKS = Text [PI remarks that aid in understanding data file structure and contents. Items such as file type, how missing and/or bad data are denoted]

------------------------------------------------------------

NOTE - Other information that should be contained in the REMARKS section (if applicable):

Missing Value indicator - Text or integer [value used for data for missing information] (e.g., -99 or 999.99, etc)

Instrument limitations/accuracy:

Below Measurement Threshold - Text or Integer [Value used to signify reading below instrument detection threshold] (e.g., <0.00005)

Above Measurement Threshold - Text or Integer [Value used to signify reading above instrument saturation]

**NOTE** This type of header information cannot be contained within GIF and Postscript files. They will need to be submitted with attached files or separate documentation containing this information.

ASCII DATA RECORDS SPECIFICATIONS:

  1. First data record consists of parameters identifying each column.
  2. Second data record consists of respective parameter units.
  3. Third data record begins actual data and consists of a date/time column followed by position coordinates (if mobile) and subsequent observations at that time and position.

NOTE - If, for some reason, the PI cannot provide the date/time in the format shown above, it is important that the time be given in UTC. If local time is also supplied, a conversion to UTC must be provided. If local daylight savings time is used, a similar conversion to UTC must be provided. In addition to UTC and/or local time, other date/time formats (e.g., decimal days) can be used but must be fully documented.

NOTE - Having a common date/time stamp and common position coordinates in each data record will permit the ability to extract data and integrate multiple data records from different data sets.

  • Date/time must be in UTC and recommended format is: YYYYMMDDHHmmss.ss where:
    • YYYY= Year

      MM = Month (00-12)

      DD = Day (01-31)

      HH = Hour (00-23)

      mm = Minute (00-59)

      ss = Second (00-59)

      .ss = Decimal Second (00-59)
  • For every mobile platform data set, position coordinates (i.e., latitude, longitude, elevation or altitude) should be expressed in decimal degrees for each data point. This may be done by: (a) providing date/time of collection with position coordinates in each data record; or (b) providing date/time of collection for each data point in the submitted file, with an associated file containing date/time and location either from the platform navigation database or GPS file.

    Latitude - Northern hemisphere expressed as positive or "N" and Southern hemisphere expressed as negative or "S".

    Longitude - 0-360o deg moving east from Greenwich; west longitude goes from 180o to 360o; or Eastern hemisphere expressed as positive or "E" and Western hemisphere expressed as negative or "W".

  • All data files must contain variable names and units of measurements as column headings (if applicable).
  • The internal format structure of the file should remain constant after the first data record to ensure continuity and permit plotting and graphing.
  • Only COMPLETE replacement or updated data/metadata files can be accepted.

SAMPLE DATA SET (ASCII FORMAT):

The following is an example of an ASCII format data set in which the header precedes the reported data, and the data is organized in columns separated by spaces. Each column is identified by parameter and each parameter's units of measure are listed in the respective column. Also each row has a date/time of observation reported in Universal Time Coordinated (UTC) along with position coordinates. This data set organization is ideal for plotting and integration of various data sets. This data set format should be used whenever possible and could be easily produced automatically from a spread sheet computer program.

------------------------------------------------------------------------------------------------------------

PI/DATA CONTACT= Doe, John (U of Hawaii)/ Doe, Jane (NCAR)

DATA COVERAGE = START: 0421133500; STOP: 0421135500 UTC

PLATFORM/SITE = C130

INSTRUMENT = C-130 External Sampler Data

LOCATION = mobile

DATA VERSION = 1.0 (10 July 2001), PRELIMINARY

REMARKS = National Center for Atmospheric Research, ACE-Asia

REMARKS = ppm values are mole fraction

REMARKS= nM/m3 at 25c and 101.3 kPa; DMS and NH4 in Parts per million (PPM)

REMARKS = Missing data = 99.9; Bad data = 88.8

REMARKS = Data point Date/Time provided in UTC

   
DATE/TIME LAT LONG SAMPLE NO2 CO DMS NH4
UTC Deg Deg NUMBER nM/m3 PPM PPM PPM
20010421133500.00 33.087 125.116 E1.160.1 1000.65 200.67 345.98 2342.980
20010421133510.00 33.090 125.120 E1.160.2 1003.45 200.60 349.76 2353.345

.

.

.

etc.

------------------------------------------------------------------------------------------------------------