Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not possible to create graph/train with only state #125

Open
ealerskans opened this issue Feb 12, 2025 · 2 comments
Open

Not possible to create graph/train with only state #125

ealerskans opened this issue Feb 12, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@ealerskans
Copy link

Issue
Training/graph creation using only the state feature (i.e. excluding forcing and static doesn't work. It works fine if I exclude forcing but when I try without static it throws an error. I would expect it to work without either/both forcing and static.

More information
This is the mllam-data-prep config I used

schema_version: v0.6.0
dataset_version: v0.1.0

output:
  variables:
    state: [time, grid_index, state_feature]
  coord_ranges:
    time:
      start: 1990-09-03T00:00
      end: 1990-09-10T00:00
      step: PT3H
  chunking:
    time: 5
  splitting:
    dim: time
    splits:
      train:
        start: 1990-09-03T00:00
        end: 1990-09-06T00:00
        compute_statistics:
          ops: [mean, std, diff_mean, diff_std]
          dims: [grid_index, time]
      val:
        start: 1990-09-06T00:00
        end: 1990-09-08T00:00
      test:
        start: 1990-09-08T00:00
        end: 1990-09-10T00:00

inputs:
  danra_height_levels:
    path: /dcai/projects/cu_0003/data/sources/danra/v0.4.0/height_levels.zarr
    dims: [time, x, y, altitude]
    variables:
      u:
        altitude:
          values: [100,]
          units: m
      v:
        altitude:
          values: [100, ]
          units: m
    dim_mapping:
      time:
        method: rename
        dim: time
      state_feature:
        method: stack_variables_by_var_name
        dims: [altitude]
        name_format: "{var_name}{altitude}m"
      grid_index:
        method: stack
        dims: [x, y]
    target_output_variable: state

extra:
  projection:
    class_name: LambertConformal
    kwargs:
      central_longitude: 25.0
      central_latitude: 56.7
      standard_parallels: [56.7, 56.7]
      globe:
        semimajor_axis: 6367470.0
        semiminor_axis: 6367470.0

and this is the neural-lam config I used

datastore:
  kind: mdp
  config_path: /dcai/projects/cu_0003/user_space/ea/mllam-configs/danra.only_state.yaml
training:
  state_feature_weighting:
    __config_class__: UniformFeatureWeighting

and this is the error I get

The loaded datastore contains the following features:
 state   : u100m v100m
/dcai/projects01/cu_0003/user_space/ea/git-repos/mllam/neural-lam/neural_lam/datastore/mdp.py:182: UserWarning: no forcing data found in datastore
  warnings.warn("no forcing data found in datastore")
Traceback (most recent call last):
  File "/dcai/projects/cu_0003/user_space/ea/.venv/nlam_login02/lib/python3.10/site-packages/xarray/core/dataset.py", line 1512, in _construct_dataarray
    variable = self._variables[name]
KeyError: 'static_feature'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/dcai/projects/cu_0003/user_space/ea/.venv/nlam_login02/lib/python3.10/site-packages/xarray/core/dataset.py", line 1611, in __getitem__
    return self._construct_dataarray(key)
  File "/dcai/projects/cu_0003/user_space/ea/.venv/nlam_login02/lib/python3.10/site-packages/xarray/core/dataset.py", line 1514, in _construct_dataarray
    _, name, variable = _get_virtual_variable(self._variables, name, self.sizes)
  File "/dcai/projects/cu_0003/user_space/ea/.venv/nlam_login02/lib/python3.10/site-packages/xarray/core/dataset.py", line 221, in _get_virtual_variable
    raise KeyError(key)
KeyError: 'static_feature'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/dcai/projects01/cu_0003/user_space/ea/git-repos/mllam/neural-lam/neural_lam/create_graph.py", line 610, in <module>
    cli()
  File "/dcai/projects01/cu_0003/user_space/ea/git-repos/mllam/neural-lam/neural_lam/create_graph.py", line 598, in cli
    _, datastore = load_config_and_datastore(config_path=args.config_path)
  File "/dcai/projects01/cu_0003/user_space/ea/git-repos/mllam/neural-lam/neural_lam/config.py", line 188, in load_config_and_datastore
    datastore = init_datastore(
  File "/dcai/projects01/cu_0003/user_space/ea/git-repos/mllam/neural-lam/neural_lam/datastore/__init__.py", line 24, in init_datastore
    datastore = DatastoreClass(config_path=config_path)
  File "/dcai/projects01/cu_0003/user_space/ea/git-repos/mllam/neural-lam/neural_lam/datastore/mdp.py", line 78, in __init__
    if len(self.get_vars_names(category)) > 0:
  File "/dcai/projects01/cu_0003/user_space/ea/git-repos/mllam/neural-lam/neural_lam/datastore/mdp.py", line 184, in get_vars_names
    return self._ds[f"{category}_feature"].values.tolist()
  File "/dcai/projects/cu_0003/user_space/ea/.venv/nlam_login02/lib/python3.10/site-packages/xarray/core/dataset.py", line 1617, in __getitem__
    raise KeyError(message) from e
KeyError: "No variable named 'static_feature'. Variables on the dataset include ['lat', 'lon', 'split_name', 'split_part', 'splits', ..., 'state_feature_source_dataset', 'state_feature_units', 'time', 'x', 'y']"

The issue seems to be that i neural_lam/datastore/mdp.py there is an exception only for forcing and not for static:

if category not in self._ds and category == "forcing":
    warnings.warn("no forcing data found in datastore")

If we change it to

if category not in self._ds and category in ["forcing", "static"]:
    warnings.warn(f"no {category} data found in datastore")

that would make it possible to run without static features (courtesy of @leifdenby). This at least works for graph creation (I haven't tested training a model yet).

Question
But I guess the first question to ask is - should we be abe to run with only state?

@ealerskans ealerskans added the bug Something isn't working label Feb 12, 2025
@joeloskarsson
Copy link
Collaborator

I've been thinking that we should go over both graph-creation and models to make sure everything that is not strictly needed (forcing and static) is fully optional and treated correctly. I think we definitely should be able to run with only state.

I would be happy if someone wants to look over that, at least on the model side. Now I wouldn't put too much effort into making the graph creation work with only state, as that would be reworked with #83 anyhow.

@joeloskarsson
Copy link
Collaborator

I think this is more an enhancement than a bug, as we never claim that the model should work with an mdp zarr that does not provide any static features.

@ealerskans ealerskans added enhancement New feature or request and removed bug Something isn't working labels Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants