ISFS NetCDF Files

Software

ISFS data are commonly provided in NetCDF files. Information on the NetCDF file format and software is available at http://www.unidata.ucar.edu/software/netcdf/.  Support for reading NetCDF files is available for software written in C, C++,  Fortran, Python, Java, MATLAB, R and IDL.

An  isfs R package, specifically written for reading and analysing ISFS data, is also available.

Files

The time series are grouped into files with a fixed time length, typically 24 hours, but sometimes longer or shorter depending on the time resolution of the data. The file names for 24 hour files will typically be of the form prefix_YYYYMMDD.nc, where prefix describes the dataset, YYYY is the year, and MMDD are the 2 digit month and day. The files will contain data from 00:00 to 24:00 UTC for the given day.

For high-resolution data files which may be less than 24 hours in length, the file names will be of the form prefix_YYYYMMDD_HH.nc, where HH is the two-digit hour in UTC of the start time of the data in the file.

Variable Names and Attributes

The primary components of a NetCDF file are NetCDF variables. Each measured time series variable is stored as a NetCDF variable, which will have one or more of the following NetCDF attributes:

  • short_name The name of the variable, as assigned by the ISFS data acquistion system.
  • long_name A more complete description of the measurement or sensor.
  • units A string indicating the measurement units.
  • _FillValue Value, typically 1.0x1037, indicating that a measurement is not available at the given time.
  • counts If a variable is a time-averaged mean or higher moment, the counts attribute will indicate the name of the NetCDF variable containing the number of data points that were averaged.

The ISFS variable names, as indicated in the short_name attribute, often contain periods to separate fields in the name, and single quote marks (i.e. prime) to indicate a deviation.  See ISFS Variable Names for information on how names are assigned.

A NetCDF variable is accessed by its name. The allowed set of characters in a NetCDF variable name is more limited than is used by the ISFS data acquistion system. For example, NetCDF variable names cannot contain periods or quote marks.  The NetCDF variable name is generated from the ISFS variable name by the substitution of underscores for characters that are not allowed or recommended in NetCDF variable names. For example a wind speed at 10 meters AGL, would have an ISFS variable name and short_name attribute of Spd.10m, and a NetCDF variable name of Spd_10m. A covariance between w.15m and h2o.15m would have an ISFS variable name and short_name attribute of w'h2o'.15m, and a NetCDF variable name of w_h2o__15m.

Time, Sample, and Station Dimensions

Time series variables will have a corresponding time dimension. For example, a variable that is stored at 1 sample/sec in a 24 hour NetCDF file will have a time dimension of 24x60x60=86400, for the number of seconds in a day. 24 hour files containing 5 minute statistics will have a time dimension of 86400/300=288.

A file may contain variables with differing time resolutions, such that in addition to time and other variables at a 1 second resolution, it may also contain variables with a higher time resolution, or example, 1/20th of a second. Those 20 sample/second variables will have a sample dimension of 20 in addition to the time dimension. If there are more than one sample dimension in a file, the additional sample dimensions will have unique names, with the sampling rate as a suffix, for example, sample_10 for 10 sample/second variables.

Variables with the same name may have been sampled at more than one location. For example a wind speed at 10 meters AGL, would have an ISFS variable name of Spd.10m, and a NetCDF variable name of Spd_10m.  If this variable was sampled at more than one ISFS location, it will have a station dimension whose value will be the number of ISFS stations that were deployed at the project.

 

Time Representation

The base_time variable contains one value, the time of the start of the file, as a number of POSIX (non-leap) seconds since 1970 Jan 1, 00:00 UTC. See http://en.wikipedia.org/wiki/Unix_time for more information about POSIX or Unix time.

The files also contain a time variable with a time dimension whose values represent a number of seconds since the base_time.  The time variable has a units attribute which provides a human-readable representation of base_time, as seen in this output of the ncdump program of a file of 5 minute statistics:

dimensions:
        time = UNLIMITED ; // (288 currently)
variables:
        int base_time ;
                base_time:units = "seconds since 1970-01-01 00:00:00 00:00" ;
        double time(time) ;
                time:units = "seconds since 2015-04-29 00:00:00 00:00" ;
 

Raw Sampling

The ISFS data system acquires samples from sensors in an asynchronous manner, with each sensor having its own internal processor clock which is not synchronized with any other sensors or with absolute time. The samples from each sensor are time-tagged at the time of their receipt by the data sytem, based on an accurate system clock. This system clock is continually monitored and adjusted by NTP software, using a GPS reference clock with a precise PPS (pulse-per-second) signal, and is generally correct to within 50 microseconds of the PPS signal.

During post-processing, time tags for samples from some sensors are further adjusted as appropriate for the given sensor, based on documentation of the internal delays of the sensor.

Before being written to NetCDF files, the data are resampled to an even time grid, either by time-based averaging, or by a simple nearest-in-time resampling algorithm. The dataset documention will indicate whether the data is time-averaged or resampled.

Time-Averaged Data

For averaged data, each time corresponds to the middle of each averaging period. Typically for time-averaged data, there will be only one time resolution in the file and no variables will have a sample dimension.  For example, for 5 minute (300 second) averages, the values of the time variable will be 150, 450, 750, etc, which are the middle times of each successive 300 second interval, as a number of seconds since the base-time.

 

Resampled High Rate Data

Before being written to the NetCDF files, the raw, asynchronous samples are re-sampled to an evenly spaced time sequence, using a simple method of matching the raw sample nearest-in-time to the evenly-spaced times. No interpolation or averaging is done.

The time-tag of a sample with time index i, for a variable without a sample dimension, is simply:

ti = base_time + timei

This time-tag represents the time of the sample as a  number of POSIX seconds since Jan 1, 1970 00:00 UTC

For variables with a sample dimension, the samples corresponding to a given time index are evenly spaced around the corresponding timei. The time-tag for a sample from a variable that was re-sampled at rate R sec-1, with time index i, sample index j, is:

ti,j = base_time + timei - (dT / 2) + (1/sample) x (1/2 + j)

dT = (timei - timei-1)  is the interval between timei values, which is (sample / R) seconds
i is the time index
j is the sample index, ranging from 0 to the sample dimension minus one

The sample dimension for a variable is usually chosen to be its rate, R, so that d= 1 second.

Reading Multi-dimensional Data

In NetCDF files, the values for each variable are stored with right-most dimensions varying most rapidly, i.e. row-major order. For example, a variable with dimensions (timesample) is stored with the sample index (j in the above discussion)  varying more rapidly than the time index, i

If the values of a variable are read in one continguous read with a C or C++ program, or any language which stores multi-dimensional data in a row-major order, the code can be simplified by declaring the variable as a multi-dimensional array with a time dimension followed by the sample dimension. For example if the time dimension in a file is 7200, and the sample dimension for a variable is 20, the declaration of the variable in C or C++ should be:

float pres[7200][20];

With Fortran, or in any column-major programming language, the dimensions would be reversed:

real pres(20,7200)

If a variable has a station dimension, it will follow the time and sample dimensions, and so the station index will vary the fastest. For example, for a station dimension of 30:

C/C++

float pres[7200][20][30];

Fortran

real pres(30,20,7200)

 

Coordinate Systems of Wind Variables

Wind measurements from 2D anemometers are generally in geographic coordinates, where the direction the wind is coming from is reported with respect to 0=north, 90=east, etc.  When wind is reported in terms of its U and V components , +U is wind flowing to the east, and +V is wind to the north. 

See https://www.eol.ucar.edu/content/wind-direction-quick-reference.

Wind measurements from 3D sonic anemometers may be in  instrument or geographic coordinates, and may or may not be corrected for sonic tilt.

See https://www.eol.ucar.edu/content/sonic-tilt-corrections for information on tilt corrections.

More recent ISFS NetCDF files contain some information on the wind coordinates in the global attributes section. The ncdump program can be used to display the global attributes. For example, a file containing 3D wind data that have been rotated from instrument to geographic coordinates, but have not been tilt-corrected, has these global attributes:

// global attributes:
    :dataset = "qc_geo_notiltcor" ;
    :dataset_description = "QC, winds in geographic, non-tilt corrected coordinates" ;
    :wind3d_horiz_coordinates = "geographic" ;
    :wind3d_horiz_rotation = 1 ;
    :wind3d_tilt_correction = 0 ;

The datasetdataset_description, and wind3d_horiz_coordinates attributes are descriptive.
 
wind3d_tilt_correction is a logical value, 0=false, 1=true, indicating whether tilt correction has been applied to the 3D winds.
 
wind3d_horiz_rotation is a logical value, which if 0 (false), indicates the 3D wind data have not been rotated, and hence are in instrument coordinates. A value of 1 (true) for wind3d_horiz_rotation indicates the winds have been rotated in the horizontal plane, and the wind3d_horiz_coordinates attribute will indicate the resultant coordinates.
 
In our processing, a tilt correction is first applied to the sonic winds, followed by the horizontal rotation.
 
For older files, without the above global attributes, the 3D wind coordinates should be documented in the project page. Hints may also be found in the file names, with abbreviations such as  "geo" (geographic) or "instr" (instrument), and "tc" for tilt correction or "ntc" for no tilt correction.
 

Quality Indicators of 3D Wind Variables from CSAT3 Anemometers

ISFS 3D wind measurements are commonly done with Campbell Scientific CSAT3 sonic anemometers.  With every wind sample, a CSAT3 also outputs a diagnostic value, consisting of four bit fields, where a bit will be set if a problem is detected.   As documented in the CSAT3 manual ( https://s.campbellsci.com/documents/us/manuals/csat3.pdf)  in table B7, "Decoding the Diagnostic Flags from Word 4", the bits have the following meaning:
 
CSAT3 bit Indication ISFS bit ISFS decimal value
b12 Sonic signal amplitude too low 0  1
b13 Sonic signal amplitude too high 1 2
b14 Poor signal lock 2 4
b15 Difference in the speed of sound between the three non orthogonal axes is greater than 2.360 m s–1 (~ 4 °C @ 25 °C) 3 8
  Unexpected value for sample counter: possible data loss 4 16
 
 
In high rate ISFS NetCDF files this diagnostic is usually stored in a variable called diagbits.  For example, a sonic at 20 meters with wind variables having short names of  u.20mv.20mw.20m and tc.20m, will also have a variable with a short name diagbits.20m.
 
As shown in the table, CSAT3 bits 12-15 are shifted to the least significant bits (0-3) before being stored in diagbits.  The decimal value of diagbits for a sample will be the sum of the shown decimal values.  
 
The most common cause of a non-zero diagbits is the presence of water droplets on the anemometer transducers. 
 
The uvw and tc values of a high rate 3D wind sample are set to the missing value by ISFS processing if any of bits 0-3 (CSAT3 bits b12-b15) are non-zero.
 
Bit 4 in diagbits is added by ISFS software to indicate that the sample counter provided by the CSAT3 was not  the expected value (the previous counter plus 1, modulus 64). If diagbits has a value of 16 or greater, it is an indication samples are being lost during sampling, transmission or storage. The CSAT3 sample counter is not otherwise stored in the NetCDF data.  
 
Since the high rate values are set to the missing value when bits 0-3 are non-zero, means and covariances of CSAT3 wind data will not include those flagged high rate samples.  NetCDF files of averages generally include an ldiag variable for each sonic anemometer.  ldiag is an average of a logical diagnostic over the period, with value of 0 or 1 indicating whether diagbits was non-zero. The average of  ldiag is then the fraction of time over each average period that diagbits was non-zero.
 
To screen averaged data, choose a maximum allowed value for ldiag, for example 0.01, and discard the means or covariances for a sonic when ldiag exceeds that value.
 

Counts Variables

If an averaged variable has a counts attribute, it will be the name of a variable containing the number of points in each average. For example, a 5 minute average of a sonic running at 20 sample/sec, would have a counts value of 6000 (+-1) for a period of clean data.  The counts variable can also be used to screen suspect averages.