Brief demonstration of ncompare: comparing the structure, groups, variables, and attributes of two netCDF files¶
Installation instructions for ncompare can be found in either of these locations:
Command Line Usage¶
ncompare's command line arguments, provided by the --help description¶
✍️ Syntax Note: Commands are preceded by an exclamation point "!"
(which is needed to run shell commands in a Jupyter notebook) can be run from a terminal.
In a shell/terminal, the exclamation point should not be used.
! ncompare --help
usage: ncompare [-h] [--only-diffs] [--file-text FILE_TEXT]
[--file-csv FILE_CSV] [--file-xlsx FILE_XLSX] [--no-color]
[--show-attributes] [--show-chunks]
[--column-widths COLUMN_WIDTHS COLUMN_WIDTHS COLUMN_WIDTHS]
[--version]
path_a path_b
Compare the variables contained within two different netCDF datasets
positional arguments:
path_a First (netCDF or HDF) file
path_b Second (netCDF or HDF) file
options:
-h, --help show this help message and exit
--only-diffs Only display variables and attributes that are
different
--file-text FILE_TEXT
A text file to which the output will be written.
--file-csv FILE_CSV A csv (comma separated values) file to which the
output will be written.
--file-xlsx FILE_XLSX
An Excel file to which the output will be written.
--no-color Turn off all colorized output
--show-attributes Include variable attributes in comparison
--show-chunks Include chunk sizes in the table that compares
variables
--column-widths COLUMN_WIDTHS COLUMN_WIDTHS COLUMN_WIDTHS
Width, in number of characters, of the three columns
in the comparison report
--version Show the current version.
Example 1: Two netCDF files with the same groups, variables, and attributes¶
Data files are first defined. The examples here rely on three files: two from NOAA National Centers of Environmental Information's (NCEI) (a) Global Precipitation Climatology Project (GPCP) Climate Data Record (CDR), Monthly V2.3 and one from the (b) Climate Data Record (CDR) of Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN-CDR), Version 1 Revision 1) (a daily quasi-global precipitation product), accessible via this GPCP catalog and this PERSIANN catalog:
- https://www.ncei.noaa.gov/thredds/catalog/cdr/gpcp_final/2023/catalog.html?dataset=cdr_gpcp_final/2023/gpcp_v02r03_monthly_d202301_c20230411.nc
- https://www.ncei.noaa.gov/thredds/catalog/cdr/gpcp_final/2023/catalog.html?dataset=cdr_gpcp_final/2023/gpcp_v02r03_monthly_d202302_c20230505.nc
- https://www.ncei.noaa.gov/thredds/fileServer/cdr/persiann/2023/PERSIANN-CDR_v01r01_20230419_c20231030.nc
from pathlib import Path
file_urls = [
"https://www.ncei.noaa.gov/thredds/fileServer/cdr/gpcp_final/2023/gpcp_v02r03_monthly_d202301_c20230411.nc",
"https://www.ncei.noaa.gov/thredds/fileServer/cdr/gpcp_final/2023/gpcp_v02r03_monthly_d202302_c20230505.nc",
"https://www.ncei.noaa.gov/thredds/fileServer/cdr/persiann/2023/PERSIANN-CDR_v01r01_20230419_c20231030.nc",
]
file_names = [Path(url).name for url in file_urls]
To download these files (e.g., for the first time running this notebook), run the following:
import requests
for url, filename in zip(file_urls, file_names):
r = requests.get(url, allow_redirects=True)
open(filename, "wb").write(r.content)
Next, we pass the two filepaths to ncompare, and any differences would be printed in red. In this case, there are no differences; therefore, all of the variables are printed in black.
✍️ Syntax Note: the curly brackets, "{" and "}", that follow are simply a way to substitute python variables into a shell command.
In a shell/terminal, one can just write out the full arguments, separated by spaces.
For example, the following command would be run at the terminal as ncompare notebook_example_data/MOP03JM-202205-L3V95.6.3.he5 notebook_example_data/MOP03JM-202205-L3V95.9.3.he5
✍️ ncompare Options Note: the --column-widths 33 26 26 arguments are optional, and they are being used here to shrink the columns width-wise from their defaults to a size that fits better in the GitHub notebook renderer.
! ncompare --column-widths 33 26 26 {file_names[0]} {file_names[1]}
File A: gpcp_v02r03_monthly_d202301_c20230411.nc File B: gpcp_v02r03_monthly_d202302_c20230505.nc Root-level Dimensions: Are all items the same? ---> True. [('latitude', 72), ('longitude', 144), ('nv', 2), ('time', 1)] Root-level Groups: Are all items the same? ---> True. (No items exist.) All variables: File A File B All Variables - -------------------------- -------------------------- GROUP #00 -------------------------/ -------------------------/ num variables in group: 8 8 - -------------------------- -------------------------- -----VARIABLE-----: lat_bounds lat_bounds dtype: float32 float32 dimensions: ('latitude', 'nv') ('latitude', 'nv') shape: (72, 2) (72, 2) -----VARIABLE-----: latitude latitude dtype: float32 float32 dimensions: ('latitude',) ('latitude',) shape: (72,) (72,) -----VARIABLE-----: lon_bounds lon_bounds dtype: float32 float32 dimensions: ('longitude', 'nv') ('longitude', 'nv') shape: (144, 2) (144, 2) -----VARIABLE-----: longitude longitude dtype: float32 float32 dimensions: ('longitude',) ('longitude',) shape: (144,) (144,) -----VARIABLE-----: precip precip dtype: float32 float32 dimensions: ('time', 'latitude', 'longitude') ('time', 'latitude', 'longitude') shape: (1, 72, 144) (1, 72, 144) -----VARIABLE-----: precip_error precip_error dtype: float32 float32 dimensions: ('time', 'latitude', 'longitude') ('time', 'latitude', 'longitude') shape: (1, 72, 144) (1, 72, 144) -----VARIABLE-----: time time dtype: float32 float32 dimensions: ('time',) ('time',) shape: (1,) (1,) -----VARIABLE-----: time_bounds time_bounds dtype: float32 float32 dimensions: ('time', 'nv') ('time', 'nv') shape: (1, 2) (1, 2) - -------------------------- -------------------------- SUMMARY -------------------------- -------------------------- Total # of shared variables: 8 8 Total # of non-shared variables: 0 0 Total # of shared groups: 0 0 Total # of non-shared groups: 0 0 Total # of shared attributes: 24 24 Total # of non-shared attributes: 0 0 Done. 0
Example 2: Two netCDF files with different groups, variables, and attributes¶
! ncompare --column-widths 33 30 30 {file_names[0]} {file_names[2]}
File A: gpcp_v02r03_monthly_d202301_c20230411.nc File B: PERSIANN-CDR_v01r01_20230419_c20231030.nc Root-level Dimensions: Are all items the same? ---> False. (2 items are shared, out of 6 total.) Which items are different? File A File B #00 ------------------------------ ------------------('lat', 480) #01 --------------('latitude', 72) ------------------------------ #02 ------------------------------ -----------------('lon', 1440) #03 ------------('longitude', 144) ------------------------------ #04 ---------------------('nv', 2) ---------------------('nv', 2) #05 -------------------('time', 1) -------------------('time', 1) Number of non-shared items: 2 2 Root-level Groups: Are all items the same? ---> True. (No items exist.) All variables: File A File B All Variables - ------------------------------ ------------------------------ GROUP #00 -----------------------------/ -----------------------------/ num variables in group: 8 6 - ------------------------------ ------------------------------ -----VARIABLE-----: lat dtype: float32 dimensions: ('lat',) shape: (480,) -----VARIABLE-----: lat_bnds dtype: float32 dimensions: ('lat', 'nv') shape: (480, 2) -----VARIABLE-----: lat_bounds dtype: float32 dimensions: ('latitude', 'nv') shape: (72, 2) -----VARIABLE-----: latitude dtype: float32 dimensions: ('latitude',) shape: (72,) -----VARIABLE-----: lon dtype: float32 dimensions: ('lon',) shape: (1440,) -----VARIABLE-----: lon_bnds dtype: float32 dimensions: ('lon', 'nv') shape: (1440, 2) -----VARIABLE-----: lon_bounds dtype: float32 dimensions: ('longitude', 'nv') shape: (144, 2) -----VARIABLE-----: longitude dtype: float32 dimensions: ('longitude',) shape: (144,) -----VARIABLE-----: precip dtype: float32 dimensions: ('time', 'latitude', 'longitude') shape: (1, 72, 144) -----VARIABLE-----: precip_error dtype: float32 dimensions: ('time', 'latitude', 'longitude') shape: (1, 72, 144) -----VARIABLE-----: precipitation 0m0m dtype: float32[ dimensions: ('time', 'lon', 'lat') shape: (1, 1440, 480) -----VARIABLE-----: time time dtype: float32 int32 dimensions: ('time',) ('time',) shape: (1,) (1,) -----VARIABLE-----: time_bounds dtype: float32 dimensions: ('time', 'nv') shape: (1, 2) - ------------------------------ ------------------------------ SUMMARY ------------------------------ ------------------------------ Total # of shared variables: 1 1 Total # of non-shared variables: 7 5 Total # of shared groups: 0 0 Total # of non-shared groups: 0 0 Total # of shared attributes: 2 2 Total # of non-shared attributes: 22 16 Differences were found in these attributes: ['dimensions', 'dtype', 'shape'] Done. 50
More file details can be examined by using the --show-attributes and --show-chunks options¶
! ncompare --show-attributes --show-chunks --column-widths 33 30 30 {file_names[0]} {file_names[2]}
File A: gpcp_v02r03_monthly_d202301_c20230411.nc File B: PERSIANN-CDR_v01r01_20230419_c20231030.nc Root-level Dimensions: Are all items the same? ---> False. (2 items are shared, out of 6 total.) Which items are different? File A File B #00 ------------------------------ ------------------('lat', 480) #01 --------------('latitude', 72) ------------------------------ #02 ------------------------------ -----------------('lon', 1440) #03 ------------('longitude', 144) ------------------------------ #04 ---------------------('nv', 2) ---------------------('nv', 2) #05 -------------------('time', 1) -------------------('time', 1) Number of non-shared items: 2 2 Root-level Groups: Are all items the same? ---> True. (No items exist.) All variables: File A File B All Variables - ------------------------------ ------------------------------ GROUP #00 -----------------------------/ -----------------------------/ num variables in group: 8 6 - ------------------------------ ------------------------------ -----VARIABLE-----: lat dtype: float32 dimensions: ('lat',) shape: (480,) chunksize: contiguous bounds: lat_bnds long_name: latitude standard_name: latitude units: degrees_north valid_max: 60.0 valid_min: -60.0 -----VARIABLE-----: lat_bnds dtype: float32 dimensions: ('lat', 'nv') shape: (480, 2) chunksize: contiguous -----VARIABLE-----: lat_bounds dtype: float32 dimensions: ('latitude', 'nv') shape: (72, 2) chunksize: contiguous comment: latitude values at the north and south bounds of each pixel. -----VARIABLE-----: latitude dtype: float32 dimensions: ('latitude',) shape: (72,) chunksize: contiguous axis: Y bounds: lat_bounds long_name: Latitude standard_name: latitude units: degrees_north valid_range: [-90.0, 90.0, ...] -----VARIABLE-----: lon dtype: float32 dimensions: ('lon',) shape: (1440,) chunksize: contiguous bounds: lon_bnds long_name: longitude standard_name: longitude units: degrees_east valid_max: 360.0 valid_min: 0.0 -----VARIABLE-----: lon_bnds dtype: float32 dimensions: ('lon', 'nv') shape: (1440, 2) chunksize: contiguous -----VARIABLE-----: lon_bounds dtype: float32 dimensions: ('longitude', 'nv') shape: (144, 2) chunksize: contiguous comment: longitude values at the west and east bounds of each pixel. -----VARIABLE-----: longitude dtype: float32 dimensions: ('longitude',) shape: (144,) chunksize: contiguous axis: X bounds: lon_bounds long_name: Longitude standard_name: longitude units: degrees_east valid_range: [0.0, 360.0, ...] -----VARIABLE-----: precip dtype: float32 dimensions: ('time', 'latitude', 'longitude') shape: (1, 72, 144) chunksize: contiguous cell_methods: area: mean time: mean coordinates: time latitude longitude long_name: NOAA Climate Data Record (CDR) of GPCP Monthly Satellite-Gauge Combined Precipitation missing_value: -9999.0 standard_name: precipitation amount units: mm/day valid_range: [0.0, 100.0, ...] -----VARIABLE-----: precip_error dtype: float32 dimensions: ('time', 'latitude', 'longitude') shape: (1, 72, 144) chunksize: contiguous coordinates: time latitude longitude long_name: NOAA CDR of GPCP Satellite-Gauge Combined Precipitation Error missing_value: -9999.0 units: mm/day valid_range: [0.0, 100.0, ...] -----VARIABLE-----: precipitation dtype: float32 dimensions: ('time', 'lon', 'lat') shape: (1, 1440, 480) chunksize: [1, 1440, 480] _FillValue: -1.0 cell_method: sum long_name: NOAA Climate Data Record of PERSIANN-CDR daily precipitation missing_value: -9999.0 standard_name: precipitation_amount units: mm valid_max: 999999.0 valid_min: 0.0 -----VARIABLE-----: time time dtype: float32 int32 dimensions: ('time',) ('time',) shape: (1,) (1,) chunksize: contiguous contiguous axis: T bounds: time_bounds calendar: Gregorian long_name: time time standard_name: time time units: days since 1970-01-01 00:00:00 0:00 days since 1979-01-01 0:0:0 -----VARIABLE-----: time_bounds dtype: float32 dimensions: ('time', 'nv') shape: (1, 2) chunksize: contiguous comment: time bounds for each time value - ------------------------------ ------------------------------ SUMMARY ------------------------------ ------------------------------ Total # of shared variables: 1 1 Total # of non-shared variables: 7 5 Total # of shared groups: 0 0 Total # of non-shared groups: 0 0 Total # of shared attributes: 5 5 Total # of non-shared attributes: 60 42 Differences were found in these attributes: ['_FillValue', 'axis', 'bounds', 'calendar', 'cell_method', 'cell_methods', 'chunksize', 'comment', 'coordinates', 'dimensions', 'dtype', 'long_name', 'missing_value', 'shape', 'standard_name', 'units', 'valid_max', 'valid_min', 'valid_range'] Done. 114
Python Package Usage Example¶
from ncompare import compare
total_number_of_differences = compare(
file_names[0],
file_names[2],
only_diffs=True,
show_chunks=True,
show_attributes=True,
column_widths=[33, 30, 30],
)
File A: gpcp_v02r03_monthly_d202301_c20230411.nc
File B: PERSIANN-CDR_v01r01_20230419_c20231030.nc
Root-level Dimensions:
Are all items the same? ---> False. (2 items are shared, out of 6 total.)
Which items are different?
File A File B
#00 ------------------------------ ------------------('lat', 480)
#01 --------------('latitude', 72) ------------------------------
#02 ------------------------------ -----------------('lon', 1440)
#03 ------------('longitude', 144) ------------------------------
Root-level Groups:
Are all items the same? ---> True. (No items exist.)
All variables:
File A File B
All Variables
- ------------------------------ ------------------------------
GROUP #00 -----------------------------/ -----------------------------/
num variables in group: 8 6
- ------------------------------ ------------------------------
-----VARIABLE-----: lat
dtype: float32
dimensions: ('lat',)
shape: (480,)
chunksize: contiguous
bounds: lat_bnds
long_name: latitude
standard_name: latitude
units: degrees_north
valid_max: 60.0
valid_min: -60.0
-----VARIABLE-----: lat_bnds
dtype: float32
dimensions: ('lat', 'nv')
shape: (480, 2)
chunksize: contiguous
-----VARIABLE-----: lat_bounds
dtype: float32
dimensions: ('latitude', 'nv')
shape: (72, 2)
chunksize: contiguous
comment: latitude values at the north and south bounds of each pixel.
-----VARIABLE-----: latitude
dtype: float32
dimensions: ('latitude',)
shape: (72,)
chunksize: contiguous
axis: Y
bounds: lat_bounds
long_name: Latitude
standard_name: latitude
units: degrees_north
valid_range: [-90.0, 90.0, ...]
-----VARIABLE-----: lon
dtype: float32
dimensions: ('lon',)
shape: (1440,)
chunksize: contiguous
bounds: lon_bnds
long_name: longitude
standard_name: longitude
units: degrees_east
valid_max: 360.0
valid_min: 0.0
-----VARIABLE-----: lon_bnds
dtype: float32
dimensions: ('lon', 'nv')
shape: (1440, 2)
chunksize: contiguous
-----VARIABLE-----: lon_bounds
dtype: float32
dimensions: ('longitude', 'nv')
shape: (144, 2)
chunksize: contiguous
comment: longitude values at the west and east bounds of each pixel.
-----VARIABLE-----: longitude
dtype: float32
dimensions: ('longitude',)
shape: (144,)
chunksize: contiguous
axis: X
bounds: lon_bounds
long_name: Longitude
standard_name: longitude
units: degrees_east
valid_range: [0.0, 360.0, ...]
-----VARIABLE-----: precip
dtype: float32
dimensions: ('time', 'latitude', 'longitude')
shape: (1, 72, 144)
chunksize: contiguous
cell_methods: area: mean time: mean
coordinates: time latitude longitude
long_name: NOAA Climate Data Record (CDR) of GPCP Monthly Satellite-Gauge Combined Precipitation
missing_value: -9999.0
standard_name: precipitation amount
units: mm/day
valid_range: [0.0, 100.0, ...]
-----VARIABLE-----: precip_error
dtype: float32
dimensions: ('time', 'latitude', 'longitude')
shape: (1, 72, 144)
chunksize: contiguous
coordinates: time latitude longitude
long_name: NOAA CDR of GPCP Satellite-Gauge Combined Precipitation Error
missing_value: -9999.0
units: mm/day
valid_range: [0.0, 100.0, ...]
-----VARIABLE-----: precipitation
dtype: float32
dimensions: ('time', 'lon', 'lat')
shape: (1, 1440, 480)
chunksize: [1, 1440, 480]
_FillValue: -1.0
cell_method: sum
long_name: NOAA Climate Data Record of PERSIANN-CDR daily precipitation
missing_value: -9999.0
standard_name: precipitation_amount
units: mm
valid_max: 999999.0
valid_min: 0.0
-----VARIABLE-----: time time
dtype: float32 int32
axis: T
bounds: time_bounds
calendar: Gregorian
units: days since 1970-01-01 00:00:00 0:00 days since 1979-01-01 0:0:0
-----VARIABLE-----: time_bounds
dtype: float32
dimensions: ('time', 'nv')
shape: (1, 2)
chunksize: contiguous
comment: time bounds for each time value
- ------------------------------ ------------------------------
SUMMARY ------------------------------ ------------------------------
Total # of shared variables: 1 1
Total # of non-shared variables: 7 5
Total # of shared groups: 0 0
Total # of non-shared groups: 0 0
Total # of shared attributes: 5 5
Total # of non-shared attributes: 60 42
Differences were found in these attributes:
['_FillValue', 'axis', 'bounds', 'calendar', 'cell_method', 'cell_methods', 'chunksize', 'comment', 'coordinates', 'dimensions', 'dtype', 'long_name', 'missing_value', 'shape', 'standard_name', 'units', 'valid_max', 'valid_min', 'valid_range']
Done.
The output of ncompare is the total number of differences (across variables, groups, and attributes):
print(total_number_of_differences)
114
END of Notebook.