Skip to content

Commit

Permalink
Merge branch 'main' into add_mvp_description_for_review
Browse files Browse the repository at this point in the history
  • Loading branch information
Polichinel committed Oct 29, 2024
2 parents 6011524 + 9724c9a commit 2b58c2f
Show file tree
Hide file tree
Showing 129 changed files with 21,260 additions and 1,443 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,12 @@ venv.bak/
# mkdocs documentation
/site

# Global cache
.global_cache.pkl

# Generated calibration logs
*calibration_log.txt

# mypy
.mypy_cache/
.dmypy.json
Expand Down
177 changes: 176 additions & 1 deletion common_utils/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,181 @@ Overview of utils package scripts:
- `utils_df_to_vol_conversion.py`: Functions to convert data frames and volumes (used in purple_alien)
- `utils_evaluation_metrics.py`: Class defining evaluation metrics
- `utils_model_outputs.py`: Class for storing and managing model outputs for evaluation and true forcasting
- `utils_wandb.py`: Sets up and logs monthly evaluation metrics in WandB, using a specific step metric for tracking.


To run tests: `pytest -v common_utils`

To do list:
- Align the function generate_metric_dict in utils_evaluation_metrics.py with Simon's eval function
- Align the function generate_metric_dict in utils_evaluation_metrics.py with Simon's eval function

# ModelPath (common_utils/model_path.py)

The `ModelPath` class is designed to manage model paths and directories within the ViEWS Pipeline. It provides a structured way to handle various directories and scripts associated with a specific model.

### Initialization

To start using the `ModelPath` class, you need to initialize it with a specific model name. You can optionally validate whether the directories and scripts exist.

```python
from utils_model_paths import ModelPath

# Initialize ModelPath with a model name
purple_alien_paths = ModelPath("purple_alien", validate=True)
```

* `model_name_or_path`: The name or path of the model you are working with. This will be used to locate the corresponding directories and scripts.
* `validate`: If set to True, the class will check if the specified directories and scripts exist and raise errors if they do not. Defaults to True.

### Viewing Directories and Scripts
Once the ModelPath instance is created, you can view all the directories and scripts that are relevant to your model.

* **View Directories**: This method prints a formatted list of all directories associated with the model.
```python
purple_alien_paths.view_directories()
```

* **View Scripts**: This method lists all expected scripts for the model.
```python
purple_alien_paths.view_scripts()
```

### Working with model and script paths
```python
purple_alien_paths.get_directories()
```
This method returns a dictionary of directory names and their absolute paths for the current model. The method scans through all class attributes and collects directories that are part of the model's structure, excluding internal or unrelated attributes.
#### Key Points:
- **Returns**: A dictionary where keys are directory names (as `str`) and values are the absolute paths (as `str`) of the corresponding directories.
```python
{
'architectures': '/path/to/models/purple_alien/src/architectures',
'artifacts': '/path/to/models/purple_alien/artifacts',
'configs': '/path/to/models/purple_alien/configs',
'dataloaders': '/path/to/models/purple_alien/src/dataloaders',
...
}
```

```python
purple_alien_paths.get_scripts()
```
This method retrieves a dictionary of script file names and their absolute paths. It looks for the specific script files related to the model (such as configuration, training, and evaluation scripts) that are predefined during class initialization.
#### Key Points:
- **Returns**: A dictionary where keys are script names (as str) and values are their absolute paths (as str). If the script is not found, the value will be None.
```python
{
'config_deployment.py': '/path/to/models/purple_alien/configs/config_deployment.py',
'train_ensemble.py': '/path/to/models/purple_alien/src/training/train_ensemble.py',
'get_data.py': '/path/to/models/purple_alien/src/dataloaders/get_data.py',
...
}
```

### Usage Scenarios:
`get_directories()` can be used to verify the existence and location of the important directories for a given model, helping to ensure the model structure is in place.
`get_scripts()` can help confirm that all required scripts for the model (training, forecasting, evaluation, etc.) are available and accessible.

#### Example:
```python
directories = purple_alien_paths.get_directories()
```

### Adding and Removing Paths
The `ModelPath` class provides methods to add the relevant directories to Python's sys.path so that scripts can be easily imported, and to remove them when they are no longer needed.
* **Add Paths to sys.path**:
```python
purple_alien_paths.add_paths_to_sys()
```
* **Remove Paths from sys.path**:
```python
purple_alien_paths.remove_paths_from_sys()
```

### Working with Querysets
The `get_queryset()` method returns the queryset module for the model, which contains functions for data querying. This can be useful for retrieving specific data relevant to your model.
```python
queryset = purple_alien_paths.get_queryset()
```
The `ModelPath` class checks if the queryset file exists, attempts to import it, and logs the process. If validation is enabled and the queryset file is missing, it raises an error.

### Class Methods and Static Methods

The `ModelPath` class includes several class methods and static methods to manage and access commonly used paths which are unrelated to a specific model.

* `check_if_model_dir_exists(cls, model_name)`: Checks if the model directory exists.
```python
exists = ModelPath.check_if_model_dir_exists("purple_alien")
```

* `get_model_name_from_path(path)`: Returns the model name based on the provided path.
```python
model_name = ModelPath.get_model_name_from_path("/path/to/models/purple_alien/src/")
# Returns "purple_alien"
```

* `get_root(cls)`: Returns the root directory of the project.
```python
root = ModelPath.get_root()
```

* `get_models(cls)`: Returns the models directory.
```python
models = ModelPath.get_models()
```

* `get_common_utils(cls)`: Returns the common utilities directory.
```python
common_utils = ModelPath.get_common_utils()
```

* `get_common_configs(cls)`: Returns the common configurations directory. (In development)
```python
common_configs = ModelPath.get_common_configs()
```

* `get_common_querysets(cls)`: Returns the common querysets directory.
```python
common_querysets = ModelPath.get_common_querysets()
```

# EnsemblePath (common_utils/ensemble_path.py)

The `EnsemblePath` class extends the ModelPath class to manage ensemble paths and directories within the ViEWS Pipeline. It inherits all the functionalities of `ModelPath` and sets the target to 'ensemble'.

### Initialization

To start using the `EnsemblePath` class, you need to initialize it with a specific model name. You can optionally validate whether the directories and scripts exist.

```python
from common_utils.ensemble_path import EnsemblePath

# Initialize EnsemblePath with an ensemble name
white_mustang_paths = EnsemblePath("white_mustang", validate=True)
```

* `ensemble_name_or_path`: The name or path of the ensemble you are working with. This will be used to locate the corresponding directories and scripts.
* `validate`: If set to True, the class will check if the specified directories and scripts exist and raise errors if they do not. Defaults to True.

# GlobalCache (common_utils/global_cache.py) (Experimental)

`GlobalCache` is a thread-safe singleton cache class that uses a global cache file to store key-value pairs. It ensures that only one instance of the cache exists and provides methods to set, get, and delete values in the cache. The cache is saved to a file, ensuring persistence through single or parallel model executions and to avoid object duplication in memory while also ensuring destruction after pipeline excution or interrupt signals.

### Usage

You can initialize the `GlobalCache` with an optional filepath argument. If no filepath is provided, it defaults to `.global_cache.pkl` in the project root.

```python
from global_cache import GlobalCache

# Set a value to the cache
GlobalCache["key1"] = "value1"

# Get a value from the cache
print(GlobalCache["key1"]) # Output: value1

# Delete a value from the cache
GlobalCache.delete("key1")

print(GlobalCache["key1"]) # Returns None
```
46 changes: 46 additions & 0 deletions common_utils/ensemble_path.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
from model_path import ModelPath
import logging
from pathlib import Path
from typing import Union

logging.basicConfig(
level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
)
logger = logging.getLogger(__name__)


class EnsemblePath(ModelPath):
"""
A class to manage ensemble paths and directories within the ViEWS Pipeline.
Inherits from ModelPath and sets the target to 'ensemble'.
"""

_target = "ensemble"

@classmethod
def _initialize_class_paths(cls):
"""Initialize class-level paths for ensemble."""
super()._initialize_class_paths()
cls._models = cls._root / Path(cls._target + "s")
# Additional ensemble-specific initialization...

def __init__(
self, ensemble_name_or_path: Union[str, Path], validate: bool = True
) -> None:
"""
Initializes an EnsemblePath instance.
Args:c
ensemble_name_or_path (str or Path): The ensemble name or path.
validate (bool, optional): Whether to validate paths and names. Defaults to True.
"""
super().__init__(ensemble_name_or_path, validate)
# Additional ensemble-specific initialization...


# if __name__ == "__main__":
# ensemble_path = EnsemblePath("white_mustang", validate=True)
# ensemble_path.view_directories()
# ensemble_path.view_scripts()
# print(ensemble_path.get_queryset())
# del ensemble_path
Loading

0 comments on commit 2b58c2f

Please sign in to comment.