Revision for compute_RSS method

Debasish-Mahapatra · Apr 4, 2024 · deec65b · deec65b
1 parent 1f37fdf
commit deec65b
Show file tree

Hide file tree

Showing 5 changed files with 217 additions and 20 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,22 +1,73 @@
 # Changelog
 
-## Version 1.5.0b2
+## Version 1.5.1beta5 
 
-### New Features
+### MAJOR REVISION OF CODE 
 
-- Added the `help()` function to provide a convenient way to explore the available methods in the NWP_Stats class and view their descriptions and usage instructions. Users can now easily access the documentation for specific methods or view the entire list of available methods. This enhancement addresses the issue #1 raised in the GitHub repository.
+Did a major fix to ```comute_rpss``` to work for both scalar and non-scalar values.
 
-Example usage:
-```python
-from nwpeval import NWP_Stats
+``` python
+
+    def compute_rpss(self, threshold, dim=None):
+        """
+        Compute the Ranked Probability Skill Score (RPSS) for a given threshold.
+    
+        Args:
+            threshold (float): The threshold value for binary classification.
+            dim (str, list, or None): The dimension(s) along which to compute the RPSS.
+                                  If None, compute the RPSS over the entire data.
+    
+        Returns:
+            xarray.DataArray: The computed RPSS values.
+        """
+        # Convert data to binary based on the threshold
+        obs_binary = (self.obs_data >= threshold).astype(int)
+        model_binary = (self.model_data >= threshold).astype(int)
+
+        # Calculate the RPS for the model data
+        rps_model = ((model_binary.cumsum(dim) - obs_binary.cumsum(dim)) ** 2).mean(dim=dim)
+
+        # Calculate the RPS for the climatology (base rate)
+        base_rate = obs_binary.mean(dim=dim)
+        rps_climo = ((xr.full_like(model_binary, 0).cumsum(dim) - obs_binary.cumsum(dim)) ** 2).mean(dim=dim)
+        rps_climo = rps_climo + base_rate * (1 - base_rate)
+
+        # Calculate the RPSS
+        rpss = 1 - rps_model / rps_climo
+
+        return rpss
+
+```
+
+The updated `compute_rpss` method will work correctly for both scalar and non-scalar `base_rate` values.
+
+In the context of xarray and dimensions/coordinates in a dataset, a scalar value refers to a single value that does not depend on any dimensions. It is a 0-dimensional value. On the other hand, a non-scalar value is an array or a DataArray that depends on one or more dimensions and has corresponding coordinates.
+
+Let's consider an example to illustrate the difference:
+
+Suppose we have a dataset with dimensions "time", "lat", and "lon". The dataset contains a variable "temperature" with corresponding coordinates for each dimension.
 
-# Display help for a specific method
-help(NWP_Stats.compute_fss)
+- Scalar value: If we calculate the mean temperature over all dimensions using `temperature.mean()`, the resulting value will be a scalar. It will be a single value that does not depend on any dimensions.
 
-# Display help for all available methods
-help(NWP_Stats)
+- Non-scalar value: If we calculate the mean temperature over a specific dimension, such as `temperature.mean(dim="time")`, the resulting value will be a non-scalar DataArray. It will have dimensions "lat" and "lon" and corresponding coordinates, but it will not depend on the "time" dimension anymore.
+
+In the updated `compute_rpss` method, the line `base_rate = obs_binary.mean(dim=dim)` calculates the mean of `obs_binary` over the specified dimensions `dim`. If `dim` is None, it will calculate the mean over all dimensions, resulting in a scalar value. If `dim` is a specific dimension or a list of dimensions, it will calculate the mean over those dimensions, resulting in a non-scalar DataArray.
+
+The subsequent lines of code in the `compute_rpss` method handle both cases correctly:
+
+```python
+rps_climo = ((xr.full_like(model_binary, 0).cumsum(dim) - obs_binary.cumsum(dim)) ** 2).mean(dim=dim)
+rps_climo = rps_climo + base_rate * (1 - base_rate)
 ```
 
+If `base_rate` is a scalar value, it will be broadcasted to match the shape of `rps_climo`, and the calculation will be performed element-wise. If `base_rate` is a non-scalar DataArray, it will be aligned with `rps_climo` based on the common dimensions, and the calculation will be performed element-wise.
+
+Now, whether this will work with data of different coordinates??? The updated `compute_rpss` method should work correctly as long as the dimensions and coordinates of `obs_binary` and `model_binary` are compatible. The method relies on xarray's broadcasting and alignment rules to handle data with different coordinates.
+
+However, it's important to note that if the coordinates of `obs_binary` and `model_binary` are completely different or incompatible, you may encounter issues with dimension alignment or broadcasting. In such cases, you would need to ensure that the coordinates are properly aligned or resampled before applying the `compute_rpss` method.
+
+In summary, the updated `compute_rpss` method should work correctly for both scalar and non-scalar `base_rate` values, and it should handle data with different coordinates as long as the dimensions and coordinates are compatible between `obs_binary` and `model_binary`.
+
 ### Bug Fixes
 
 - Fixed minor bugs and improved code stability.
@@ -25,6 +76,6 @@ help(NWP_Stats)
 
 - The package has been moved from the 3-Alpha stage to the 4-Beta stage in development, indicating that it has undergone further testing and refinement.
 
-Please note that this is a beta release (version 1.5.0b2), and while it includes significant enhancements and bug fixes, it may still have some known limitations or issues. We encourage users to provide feedback and report any bugs they encounter.
+Please note that this is a beta release (version 1.5.1beta5), and while it includes significant enhancements and bug fixes, it may still have some known limitations or issues. We encourage users to provide feedback and report any bugs they encounter.
 
 We appreciate your interest in the NWPeval package and thank you for your support!
diff --git a/examples/metrics_diurnal.png b/examples/metrics_diurnal.png
diff --git a/examples/metrics_time_avg.png b/examples/metrics_time_avg.png
diff --git a/examples/test.py b/examples/test.py
@@ -0,0 +1,145 @@
+import xarray as xr
+import numpy as np
+import matplotlib.pyplot as plt
+import nwpeval as nw
+
+# Load observation and model data
+obs_data = xr.open_dataset("india_obs_output_1km_realistic_storm.nc")
+model_data = xr.open_dataset("india_model_output_1km_realistic_storm.nc")
+
+# Extract lightning density variables
+obs_lightning = obs_data["lightning_density"] 
+model_lightning = model_data["lightning_density"] 
+
+# Create an instance of the NWP_Stats class
+metrics_obj = nw.NWP_Stats(obs_lightning, model_lightning)
+
+# Define the thresholds for metric calculations
+thresholds = {
+    'SEDS': 0.0002,
+    'SEDI': 0.0002,
+    'RPSS': 0.0002
+}
+
+# Calculate time-averaged metrics
+metrics_time_avg = {}
+for metric, threshold in thresholds.items():
+    metrics_time_avg[metric] = metrics_obj.compute_metrics([metric], thresholds={metric: threshold}, dim="time")[metric]
+
+# Calculate area-average diurnal cycle metrics
+metrics_diurnal = {}
+for metric, threshold in thresholds.items():
+    metrics_diurnal[metric] = metrics_obj.compute_metrics([metric], thresholds={metric: threshold}, dim=["lat", "lon"])[metric].groupby("time.hour").mean()
+
+# Plot time-averaged metrics
+fig, axs = plt.subplots(1, 3, figsize=(18, 6))
+for i, metric in enumerate(thresholds.keys()):
+    metrics_time_avg[metric].plot(ax=axs[i], cmap="coolwarm", vmin=-1, vmax=1)
+    axs[i].set_title(f"Time-Averaged {metric}")
+    axs[i].set_xlabel("Longitude")
+    axs[i].set_ylabel("Latitude")
+fig.tight_layout()
+plt.savefig("metrics_time_avg.png")
+
+# Plot area-average diurnal cycle metrics
+fig, axs = plt.subplots(1, 3, figsize=(18, 6))
+for i, metric in enumerate(thresholds.keys()):
+    metrics_diurnal[metric].plot(ax=axs[i])
+    axs[i].set_title(f"Area-Average Diurnal Cycle {metric}")
+    axs[i].set_xlabel("Hour")
+    axs[i].set_ylabel(metric)
+    axs[i].grid(True)
+fig.tight_layout()
+plt.savefig("metrics_diurnal.png")
+
+plt.show()
+
+
+
+
+import numpy as np
+import xarray as xr
+import nwpeval as nw
+
+# Generate random data with longitude and latitude dimensions
+lon = np.linspace(0, 360, 10)
+lat = np.linspace(-90, 90, 5)
+time = np.arange(100)
+
+obs_data = xr.DataArray(np.random.rand(100, 5, 10), dims=('time', 'lat', 'lon'), coords={'time': time, 'lat': lat, 'lon': lon})
+model_data = xr.DataArray(np.random.rand(100, 5, 10), dims=('time', 'lat', 'lon'), coords={'time': time, 'lat': lat, 'lon': lon})
+
+# Create an instance of the NWP_Stats class
+metrics_obj = nw.NWP_Stats(obs_data, model_data)
+
+# Define the thresholds for the metrics
+thresholds = {
+    'SEDS': 0.6,
+    'SEDI': 0.7,
+    'RPSS': 0.8
+}
+
+# Compute the metrics
+metrics = ['SEDS', 'SEDI', 'RPSS']
+metric_values = metrics_obj.compute_metrics(metrics, thresholds=thresholds)
+
+print(metric_values)
+
+
+
+
+import numpy as np
+import xarray as xr
+import nwpeval as nw
+
+# Generate random data without longitude and latitude dimensions
+time = np.arange(100)
+
+obs_data = xr.DataArray(np.random.rand(100), dims=('time'), coords={'time': time})
+model_data = xr.DataArray(np.random.rand(100), dims=('time'), coords={'time': time})
+
+# Create an instance of the NWP_Stats class
+metrics_obj = nw.NWP_Stats(obs_data, model_data)
+
+# Define the thresholds for the metrics
+thresholds = {
+    'SEDS': 0.6,
+    'SEDI': 0.7,
+    'RPSS': 0.8
+}
+
+# Compute the metrics
+metrics = ['SEDS', 'SEDI', 'RPSS']
+metric_values = metrics_obj.compute_metrics(metrics, thresholds=thresholds)
+
+print(metric_values)
+
+
+
+import numpy as np
+import xarray as xr
+import nwpeval as nw
+
+# Generate random data as lists
+obs_data = np.random.rand(100).tolist()
+model_data = np.random.rand(100).tolist()
+
+# Convert the lists to xarray.DataArray objects
+obs_data_array = xr.DataArray(obs_data, dims=['time'])
+model_data_array = xr.DataArray(model_data, dims=['time'])
+
+# Create an instance of the NWP_Stats class
+metrics_obj = nw.NWP_Stats(obs_data_array, model_data_array)
+
+# Define the thresholds for the metrics
+thresholds = {
+    'SEDS': 0.6,
+    'SEDI': 0.7,
+    'RPSS': 0.8
+}
+
+# Compute the metrics
+metrics = ['SEDS', 'SEDI', 'RPSS']
+metric_values = metrics_obj.compute_metrics(metrics, thresholds=thresholds)
+
+print(metric_values)
diff --git a/nwpeval/nwpeval.py b/nwpeval/nwpeval.py
@@ -658,29 +658,30 @@ def compute_sedi(self, threshold, dim=None):
     def compute_rpss(self, threshold, dim=None):
         """
         Compute the Ranked Probability Skill Score (RPSS) for a given threshold.
-        
+    
         Args:
             threshold (float): The threshold value for binary classification.
             dim (str, list, or None): The dimension(s) along which to compute the RPSS.
-                                      If None, compute the RPSS over the entire data.
-        
+                                  If None, compute the RPSS over the entire data.
+    
         Returns:
             xarray.DataArray: The computed RPSS values.
         """
         # Convert data to binary based on the threshold
         obs_binary = (self.obs_data >= threshold).astype(int)
         model_binary = (self.model_data >= threshold).astype(int)
-        
+
         # Calculate the RPS for the model data
-        rps_model = ((model_binary.cumsum('dim_0') - obs_binary.cumsum('dim_0')) ** 2).mean()
-        
+        rps_model = ((model_binary.cumsum(dim) - obs_binary.cumsum(dim)) ** 2).mean(dim=dim)
+
         # Calculate the RPS for the climatology (base rate)
         base_rate = obs_binary.mean(dim=dim)
-        rps_climo = ((xr.full_like(model_binary, base_rate).cumsum('dim_0') - obs_binary.cumsum('dim_0')) ** 2).mean()
-
+        rps_climo = ((xr.full_like(model_binary, 0).cumsum(dim) - obs_binary.cumsum(dim)) ** 2).mean(dim=dim)
+        rps_climo = rps_climo + base_rate * (1 - base_rate)
+
         # Calculate the RPSS
         rpss = 1 - rps_model / rps_climo
-        
+
         return rpss
 
     def compute_tse(self, dim=None):