Software
ISFS data are commonly provided in NetCDF files. Information on the NetCDF file format and software is available at http://www.unidata.ucar.edu/software/netcdf/. Support for reading NetCDF files is available for software written in C, C++, Fortran, Python, Java, MATLAB, R and IDL.
An isfs R package, specifically written for reading and analyzing ISFS data in R, is also available.
Files
The time series are grouped into files with a fixed time length, typically 24 hours, but sometimes longer or shorter depending on the time resolution of the data. The file names for 24 hour files will typically be of the form prefix_YYYYMMDD.nc, where prefix describes the dataset, YYYY is the year, and MMDD are the 2 digit month and day. The files will contain data from 00:00 to 24:00 UTC for the given day.
For high-resolution data files which may be less than 24 hours in length, the file names will be of the form prefix_YYYYMMDD_HH.nc, where HH is the two-digit hour in UTC of the start time of the data in the file.
Variable Names and Attributes
The primary components of a NetCDF file are NetCDF variables. Each measured time series variable is stored as a NetCDF variable, which will have one or more of the following NetCDF attributes:
- short_name The name of the variable, as assigned by the ISFS data acquistion system.
- long_name A more complete description of the measurement or sensor.
- units A string indicating the measurement units.
- _FillValue Value, typically 1.0x1037, indicating that a measurement is not available at the given time.
- counts If a variable is a time-averaged mean or higher moment, the counts attribute will indicate the name of the NetCDF variable containing the number of data points that were averaged.
The ISFS variable names, as indicated in the short_name attribute, often contain periods to separate fields in the name, and single quote marks (i.e. prime) to indicate a deviation. See ISFS Variable Names for information on how names are assigned.
A NetCDF variable is accessed by its name. The allowed set of characters in a NetCDF variable name is more limited than is used by the ISFS data acquistion system. For example, NetCDF variable names cannot contain periods or quote marks. The NetCDF variable name is generated from the ISFS variable name by the substitution of underscores for characters that are not allowed or recommended in NetCDF variable names. For example a wind speed at 10 meters AGL, would have an ISFS variable name and short_name attribute of Spd.10m, and a NetCDF variable name of Spd_10m. A covariance between w.15m and h2o.15m would have an ISFS variable name and short_name attribute of w'h2o'.15m, and a NetCDF variable name of w_h2o__15m.
Time, Sample, and Station Dimensions
Time series variables will have a corresponding time dimension. For example, a variable that is stored at 1 sample/sec in a 24 hour NetCDF file will have a time dimension of 24x60x60=86400, for the number of seconds in a day. 24 hour files containing 5 minute statistics will have a time dimension of 86400/300=288.
A file may contain variables with differing time resolutions, such that in addition to time and other variables at a 1 second resolution, it may also contain variables with a higher time resolution, or example, 1/20th of a second. Those 20 sample/second variables will have a sample dimension of 20 in addition to the time dimension. If there are more than one sample dimension in a file, the additional sample dimensions will have unique names, with the sampling rate as a suffix, for example, sample_10 for 10 sample/second variables.
Variables with the same name may have been sampled at more than one location. For example a wind speed at 10 meters AGL, would have an ISFS variable name of Spd.10m, and a NetCDF variable name of Spd_10m. If this variable was sampled at more than one ISFS location, it will have a station dimension whose value will be the number of ISFS stations that were deployed at the project.
Time Representation
The base_time variable contains one value, the time of the start of the file, as a number of POSIX (non-leap) seconds since 1970 Jan 1, 00:00 UTC. See http://en.wikipedia.org/wiki/Unix_time for more information about POSIX or Unix time.
The files also contain a time variable with a time dimension whose values represent a number of seconds since the base_time. The time variable has a units attribute which provides a human-readable representation of base_time, as seen in this output of the ncdump program of a file of 5 minute statistics:
Data Sampling
The ISFS data system acquires samples from sensors in an asynchronous manner, with each sensor having its own internal processor clock which is not synchronized with any other sensors or any external clock source. The samples from each sensor are time-tagged at the time of their receipt by the data sytem, based on an accurate system clock. This system clock is continually monitored and adjusted by NTP software, using a GPS reference clock with a precise PPS (pulse-per-second) signal, and is generally correct to within 50 microseconds of the PPS signal.
During post-processing, time tags for samples from some sensors are further adjusted as appropriate for the given sensor, based on documentation of the internal delays of the sensor.
Before being written to NetCDF files, the data are resampled to an even time grid, either by time-based averaging, or by a simple nearest-in-time resampling algorithm. The dataset documention will indicate whether the data is time-averaged or resampled.
Time-Averaged Data
For averaged data, each time corresponds to the middle of each averaging period. Typically for time-averaged data, there will be only one time resolution in the file and no variables will have a sample dimension. For example, for 5 minute (300 second) averages, the values of the time variable will be 150, 450, 750, etc, which are the middle times of each successive 300 second interval, as a number of seconds since the base-time.
Resampled High Rate Data
Before being written to the NetCDF files, the raw, asynchronous samples are re-sampled to an evenly spaced time sequence, using a simple method of matching the raw sample nearest-in-time to the evenly-spaced times. No interpolation or averaging is done.
The time-tag of a sample with time index i, for a variable without a sample dimension, is simply:
ti = base_time + timei
This time-tag represents the time of the sample as a number of POSIX seconds since Jan 1, 1970 00:00 UTC
For variables with a sample dimension, the samples corresponding to a given time index are evenly spaced around the corresponding timei. The time-tag for a sample from a variable that was re-sampled at rate R sec-1, with time index i, sample index j, is:
ti,j = base_time + timei - (dT / 2) + (dT / sample) x (1/2 + j)
dT = sample / R seconds, is the interval between timei values
i is the time index
j is the sample index, ranging from 0 to (sample - 1)
The sample dimension for a variable is usually chosen to be its rate, R, so that dT = 1 second.
Reading Multi-dimensional Data
In NetCDF files, the values for each variable are stored with right-most dimensions varying most rapidly, i.e. row-major order. For example, a variable with dimensions (time, sample) is stored with the sample index (j in the above discussion) varying more rapidly than the time index, i.
If the values of a variable are read in one continguous read with a C or C++ program, or any language which stores multi-dimensional data in a row-major order, the code can be simplified by declaring the variable as a multi-dimensional array with a time dimension followed by the sample dimension. For example if the time dimension in a file is 7200, and the sample dimension for a variable is 20, the declaration of the variable in C or C++ should be:
float pres[7200][20];
With Fortran, or in any column-major programming language, the dimensions would be reversed:
real pres(20,7200)
If a variable has a station dimension, it will follow the time and sample dimensions, and so the station index will vary the fastest. For example, for a station dimension of 30:
C/C++
float pres[7200][20][30];
Fortran
real pres(30,20,7200)
Coordinate Systems of Wind Variables
Wind measurements from 2D anemometers are generally in geographic coordinates, where the direction the wind is coming from is reported with respect to 0=north, 90=east, etc. When wind is reported in terms of its U and V components , +U is wind flowing to the east, and +V is wind to the north.
See https://www.eol.ucar.edu/content/wind-direction-quick-reference.
Wind measurements from 3D sonic anemometers may be in instrument or geographic coordinates, and may or may not be corrected for sonic tilt.
See https://www.eol.ucar.edu/content/sonic-tilt-corrections for information on tilt corrections.
More recent ISFS NetCDF files contain some information on the wind coordinates in the global attributes section. The ncdump program can be used to display the global attributes. For example, a file containing 3D wind data that have been rotated from instrument to geographic coordinates, but have not been tilt-corrected, has these global attributes:
// global attributes:
:dataset = "qc_geo_notiltcor" ;
:dataset_description = "QC, winds in geographic, non-tilt corrected coordinates" ;
:wind3d_horiz_coordinates = "geographic" ;
:wind3d_horiz_rotation = 1 ;
:wind3d_tilt_correction = 0 ;
Quality Indicators of 3D Wind Variables from CSAT3 Anemometers
CSAT3 bit | Indication | ISFS bit | ISFS decimal value |
b12 | Sonic signal amplitude too low | 0 | 1 |
b13 | Sonic signal amplitude too high | 1 | 2 |
b14 | Poor signal lock | 2 | 4 |
b15 | Difference in the speed of sound between the three non orthogonal axes is greater than 2.360 m s–1 (~ 4 °C @ 25 °C) | 3 | 8 |
Unexpected value for sample counter: possible data loss | 4 | 16 |