Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

command fails #4

Open
antithing opened this issue Sep 18, 2023 · 3 comments
Open

command fails #4

antithing opened this issue Sep 18, 2023 · 3 comments

Comments

@antithing
Copy link

Hi, and thank you for making this code available.

I am trying to run on the mip360 garden dataset, and when I run:

python launch.py --config configs/neus-colmap.yaml --gpu 0 --train dataset.root_dir=D://NERF//BakedSDF//torch-bakedsdf-main//torch-bakedsdf-main//load//unbounded360//garden//

I see this error:

D:\NERF\BakedSDF\torch-bakedsdf-main\torch-bakedsdf-main>python launch.py --config configs/neus-colmap.yaml --gpu 0 --train
Global seed set to 42
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
fatal: not a git repository (or any of the parent directories): .git
D:\NERF\BakedSDF\torch-bakedsdf-main\torch-bakedsdf-main\utils\callbacks.py:76: UserWarning: Code snapshot is not saved. Please make sure you have git installed and are in a git repository.
  rank_zero_warn("Code snapshot is not saved. Please make sure you have git installed and are in a git repository.")

  | Name  | Type      | Params
------------------------------------
0 | model | NeuSModel | 28.0 M
------------------------------------
28.0 M    Trainable params
0         Non-trainable params
28.0 M    Total params
55.913    Total estimated model params size (MB)
Traceback (most recent call last):
  File "D:\NERF\BakedSDF\torch-bakedsdf-main\torch-bakedsdf-main\launch.py", line 130, in <module>
    main()
  File "D:\NERF\BakedSDF\torch-bakedsdf-main\torch-bakedsdf-main\launch.py", line 119, in main
    trainer.fit(system, datamodule=dm)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 770, in fit
    self._call_and_handle_interrupt(
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 723, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 811, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1236, in _run
    results = self._run_stage()
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1323, in _run_stage
    return self._run_train()
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1353, in _run_train
    self.fit_loop.run()
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 266, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 208, in advance
    batch_output = self.batch_loop.run(batch, batch_idx)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\batch\training_batch_loop.py", line 88, in advance
    outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 203, in advance
    result = self._run_optimization(
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 256, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 369, in _optimizer_step
    self.trainer._call_lightning_module_hook(
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1595, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\core\lightning.py", line 1646, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\core\optimizer.py", line 168, in step
    step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\strategies\strategy.py", line 193, in optimizer_step
    return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\plugins\precision\native_amp.py", line 85, in optimizer_step
    closure_result = closure()
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 148, in __call__
    self._result = self.closure(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 134, in closure
    step_output = self._step_fn()
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 427, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *step_kwargs.values())
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1765, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\strategies\dp.py", line 125, in training_step
    return self.model(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\parallel\data_parallel.py", line 169, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\overrides\data_parallel.py", line 64, in forward
    output = super().forward(*inputs, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\pytorch_lightning\overrides\base.py", line 82, in forward
    output = self.module.training_step(*inputs, **kwargs)
  File "D:\NERF\BakedSDF\torch-bakedsdf-main\torch-bakedsdf-main\systems\neus.py", line 95, in training_step
    train_num_rays = int(self.train_num_rays * (self.train_num_samples / out['num_samples_full'].sum().item()))
ZeroDivisionError: division by zero
Epoch 0: : 0it [02:25, ?it/s]
[W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

What might be causing this?

Thank you!

@winterjonas
Copy link

Following up on this, did you manage to solve this error? Running into the same issue

@antithing
Copy link
Author

@winterjonas I never solved it. Please post here if you do!

@gnoilednad
Copy link

A similar problem has been resolved in another repository;
it appears to be a compatibility issue between Windows and pytorh_lighting.
Can you solve the problem in a similar way to this one?
xxlong0/Wonder3D#22

・remove all ".to(self.rank)" and "device=self.dataset.all_images.device"
・ add ".to(self.device)" to the data that need to send to gpu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants