Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dense backward test #92

Open
wants to merge 5 commits into
base: abokovoi/upstream
Choose a base branch
from

Conversation

avbokovoy
Copy link

Attempt to fix dense unit test. There are currently 5 issues:

  1. OOM (only on A100, hard to catch due to randomness of UTs)
  2. Memory access error (both MI300X and A100, hard to catch)
  3. Assertions failure in PoolingMode.MEAN test (both ROCm and Nvidia)
  4. Assertions failure in PoolingMode.SUM test (both ROCm and Nvidia)
  5. Wrong indexing in vbe test

Issues 1, 2, 5 are also observed in pytorch#3763

The initial intention of aligned_grad_output_tensor_for_cuda_backwards() function is unclear to me, so this fix particular might be "sub-optimal". Thus asking for some reviews

@avbokovoy avbokovoy self-assigned this Feb 18, 2025
@avbokovoy avbokovoy changed the title Abokovoi/fix dense backward test Fix dense backward test Feb 18, 2025
Copy link

@amirakb89 amirakb89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants