Skip to content

Commit

Permalink
Revision for compute_RSS method
Browse files Browse the repository at this point in the history
  • Loading branch information
Debasish-Mahapatra committed Apr 4, 2024
1 parent 1f37fdf commit deec65b
Show file tree
Hide file tree
Showing 5 changed files with 217 additions and 20 deletions.
73 changes: 62 additions & 11 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,73 @@
# Changelog

## Version 1.5.0b2
## Version 1.5.1beta5

### New Features
### MAJOR REVISION OF CODE

- Added the `help()` function to provide a convenient way to explore the available methods in the NWP_Stats class and view their descriptions and usage instructions. Users can now easily access the documentation for specific methods or view the entire list of available methods. This enhancement addresses the issue #1 raised in the GitHub repository.
Did a major fix to ```comute_rpss``` to work for both scalar and non-scalar values.

Example usage:
```python
from nwpeval import NWP_Stats
``` python

def compute_rpss(self, threshold, dim=None):
"""
Compute the Ranked Probability Skill Score (RPSS) for a given threshold.
Args:
threshold (float): The threshold value for binary classification.
dim (str, list, or None): The dimension(s) along which to compute the RPSS.
If None, compute the RPSS over the entire data.
Returns:
xarray.DataArray: The computed RPSS values.
"""
# Convert data to binary based on the threshold
obs_binary = (self.obs_data >= threshold).astype(int)
model_binary = (self.model_data >= threshold).astype(int)

# Calculate the RPS for the model data
rps_model = ((model_binary.cumsum(dim) - obs_binary.cumsum(dim)) ** 2).mean(dim=dim)

# Calculate the RPS for the climatology (base rate)
base_rate = obs_binary.mean(dim=dim)
rps_climo = ((xr.full_like(model_binary, 0).cumsum(dim) - obs_binary.cumsum(dim)) ** 2).mean(dim=dim)
rps_climo = rps_climo + base_rate * (1 - base_rate)

# Calculate the RPSS
rpss = 1 - rps_model / rps_climo

return rpss

```

The updated `compute_rpss` method will work correctly for both scalar and non-scalar `base_rate` values.

In the context of xarray and dimensions/coordinates in a dataset, a scalar value refers to a single value that does not depend on any dimensions. It is a 0-dimensional value. On the other hand, a non-scalar value is an array or a DataArray that depends on one or more dimensions and has corresponding coordinates.

Let's consider an example to illustrate the difference:

Suppose we have a dataset with dimensions "time", "lat", and "lon". The dataset contains a variable "temperature" with corresponding coordinates for each dimension.

# Display help for a specific method
help(NWP_Stats.compute_fss)
- Scalar value: If we calculate the mean temperature over all dimensions using `temperature.mean()`, the resulting value will be a scalar. It will be a single value that does not depend on any dimensions.

# Display help for all available methods
help(NWP_Stats)
- Non-scalar value: If we calculate the mean temperature over a specific dimension, such as `temperature.mean(dim="time")`, the resulting value will be a non-scalar DataArray. It will have dimensions "lat" and "lon" and corresponding coordinates, but it will not depend on the "time" dimension anymore.

In the updated `compute_rpss` method, the line `base_rate = obs_binary.mean(dim=dim)` calculates the mean of `obs_binary` over the specified dimensions `dim`. If `dim` is None, it will calculate the mean over all dimensions, resulting in a scalar value. If `dim` is a specific dimension or a list of dimensions, it will calculate the mean over those dimensions, resulting in a non-scalar DataArray.

The subsequent lines of code in the `compute_rpss` method handle both cases correctly:

```python
rps_climo = ((xr.full_like(model_binary, 0).cumsum(dim) - obs_binary.cumsum(dim)) ** 2).mean(dim=dim)
rps_climo = rps_climo + base_rate * (1 - base_rate)
```

If `base_rate` is a scalar value, it will be broadcasted to match the shape of `rps_climo`, and the calculation will be performed element-wise. If `base_rate` is a non-scalar DataArray, it will be aligned with `rps_climo` based on the common dimensions, and the calculation will be performed element-wise.

Now, whether this will work with data of different coordinates??? The updated `compute_rpss` method should work correctly as long as the dimensions and coordinates of `obs_binary` and `model_binary` are compatible. The method relies on xarray's broadcasting and alignment rules to handle data with different coordinates.

However, it's important to note that if the coordinates of `obs_binary` and `model_binary` are completely different or incompatible, you may encounter issues with dimension alignment or broadcasting. In such cases, you would need to ensure that the coordinates are properly aligned or resampled before applying the `compute_rpss` method.

In summary, the updated `compute_rpss` method should work correctly for both scalar and non-scalar `base_rate` values, and it should handle data with different coordinates as long as the dimensions and coordinates are compatible between `obs_binary` and `model_binary`.

### Bug Fixes

- Fixed minor bugs and improved code stability.
Expand All @@ -25,6 +76,6 @@ help(NWP_Stats)

- The package has been moved from the 3-Alpha stage to the 4-Beta stage in development, indicating that it has undergone further testing and refinement.

Please note that this is a beta release (version 1.5.0b2), and while it includes significant enhancements and bug fixes, it may still have some known limitations or issues. We encourage users to provide feedback and report any bugs they encounter.
Please note that this is a beta release (version 1.5.1beta5), and while it includes significant enhancements and bug fixes, it may still have some known limitations or issues. We encourage users to provide feedback and report any bugs they encounter.

We appreciate your interest in the NWPeval package and thank you for your support!
Binary file added examples/metrics_diurnal.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/metrics_time_avg.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
145 changes: 145 additions & 0 deletions examples/test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import nwpeval as nw

# Load observation and model data
obs_data = xr.open_dataset("india_obs_output_1km_realistic_storm.nc")
model_data = xr.open_dataset("india_model_output_1km_realistic_storm.nc")

# Extract lightning density variables
obs_lightning = obs_data["lightning_density"]
model_lightning = model_data["lightning_density"]

# Create an instance of the NWP_Stats class
metrics_obj = nw.NWP_Stats(obs_lightning, model_lightning)

# Define the thresholds for metric calculations
thresholds = {
'SEDS': 0.0002,
'SEDI': 0.0002,
'RPSS': 0.0002
}

# Calculate time-averaged metrics
metrics_time_avg = {}
for metric, threshold in thresholds.items():
metrics_time_avg[metric] = metrics_obj.compute_metrics([metric], thresholds={metric: threshold}, dim="time")[metric]

# Calculate area-average diurnal cycle metrics
metrics_diurnal = {}
for metric, threshold in thresholds.items():
metrics_diurnal[metric] = metrics_obj.compute_metrics([metric], thresholds={metric: threshold}, dim=["lat", "lon"])[metric].groupby("time.hour").mean()

# Plot time-averaged metrics
fig, axs = plt.subplots(1, 3, figsize=(18, 6))
for i, metric in enumerate(thresholds.keys()):
metrics_time_avg[metric].plot(ax=axs[i], cmap="coolwarm", vmin=-1, vmax=1)
axs[i].set_title(f"Time-Averaged {metric}")
axs[i].set_xlabel("Longitude")
axs[i].set_ylabel("Latitude")
fig.tight_layout()
plt.savefig("metrics_time_avg.png")

# Plot area-average diurnal cycle metrics
fig, axs = plt.subplots(1, 3, figsize=(18, 6))
for i, metric in enumerate(thresholds.keys()):
metrics_diurnal[metric].plot(ax=axs[i])
axs[i].set_title(f"Area-Average Diurnal Cycle {metric}")
axs[i].set_xlabel("Hour")
axs[i].set_ylabel(metric)
axs[i].grid(True)
fig.tight_layout()
plt.savefig("metrics_diurnal.png")

plt.show()




import numpy as np
import xarray as xr
import nwpeval as nw

# Generate random data with longitude and latitude dimensions
lon = np.linspace(0, 360, 10)
lat = np.linspace(-90, 90, 5)
time = np.arange(100)

obs_data = xr.DataArray(np.random.rand(100, 5, 10), dims=('time', 'lat', 'lon'), coords={'time': time, 'lat': lat, 'lon': lon})
model_data = xr.DataArray(np.random.rand(100, 5, 10), dims=('time', 'lat', 'lon'), coords={'time': time, 'lat': lat, 'lon': lon})

# Create an instance of the NWP_Stats class
metrics_obj = nw.NWP_Stats(obs_data, model_data)

# Define the thresholds for the metrics
thresholds = {
'SEDS': 0.6,
'SEDI': 0.7,
'RPSS': 0.8
}

# Compute the metrics
metrics = ['SEDS', 'SEDI', 'RPSS']
metric_values = metrics_obj.compute_metrics(metrics, thresholds=thresholds)

print(metric_values)




import numpy as np
import xarray as xr
import nwpeval as nw

# Generate random data without longitude and latitude dimensions
time = np.arange(100)

obs_data = xr.DataArray(np.random.rand(100), dims=('time'), coords={'time': time})
model_data = xr.DataArray(np.random.rand(100), dims=('time'), coords={'time': time})

# Create an instance of the NWP_Stats class
metrics_obj = nw.NWP_Stats(obs_data, model_data)

# Define the thresholds for the metrics
thresholds = {
'SEDS': 0.6,
'SEDI': 0.7,
'RPSS': 0.8
}

# Compute the metrics
metrics = ['SEDS', 'SEDI', 'RPSS']
metric_values = metrics_obj.compute_metrics(metrics, thresholds=thresholds)

print(metric_values)



import numpy as np
import xarray as xr
import nwpeval as nw

# Generate random data as lists
obs_data = np.random.rand(100).tolist()
model_data = np.random.rand(100).tolist()

# Convert the lists to xarray.DataArray objects
obs_data_array = xr.DataArray(obs_data, dims=['time'])
model_data_array = xr.DataArray(model_data, dims=['time'])

# Create an instance of the NWP_Stats class
metrics_obj = nw.NWP_Stats(obs_data_array, model_data_array)

# Define the thresholds for the metrics
thresholds = {
'SEDS': 0.6,
'SEDI': 0.7,
'RPSS': 0.8
}

# Compute the metrics
metrics = ['SEDS', 'SEDI', 'RPSS']
metric_values = metrics_obj.compute_metrics(metrics, thresholds=thresholds)

print(metric_values)
19 changes: 10 additions & 9 deletions nwpeval/nwpeval.py
Original file line number Diff line number Diff line change
Expand Up @@ -658,29 +658,30 @@ def compute_sedi(self, threshold, dim=None):
def compute_rpss(self, threshold, dim=None):
"""
Compute the Ranked Probability Skill Score (RPSS) for a given threshold.
Args:
threshold (float): The threshold value for binary classification.
dim (str, list, or None): The dimension(s) along which to compute the RPSS.
If None, compute the RPSS over the entire data.
If None, compute the RPSS over the entire data.
Returns:
xarray.DataArray: The computed RPSS values.
"""
# Convert data to binary based on the threshold
obs_binary = (self.obs_data >= threshold).astype(int)
model_binary = (self.model_data >= threshold).astype(int)

# Calculate the RPS for the model data
rps_model = ((model_binary.cumsum('dim_0') - obs_binary.cumsum('dim_0')) ** 2).mean()
rps_model = ((model_binary.cumsum(dim) - obs_binary.cumsum(dim)) ** 2).mean(dim=dim)

# Calculate the RPS for the climatology (base rate)
base_rate = obs_binary.mean(dim=dim)
rps_climo = ((xr.full_like(model_binary, base_rate).cumsum('dim_0') - obs_binary.cumsum('dim_0')) ** 2).mean()

rps_climo = ((xr.full_like(model_binary, 0).cumsum(dim) - obs_binary.cumsum(dim)) ** 2).mean(dim=dim)
rps_climo = rps_climo + base_rate * (1 - base_rate)

# Calculate the RPSS
rpss = 1 - rps_model / rps_climo

return rpss

def compute_tse(self, dim=None):
Expand Down

0 comments on commit deec65b

Please sign in to comment.