New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Support batch and classes for NonMaxSuppression #3999

Open

praveen-g-ctt wants to merge 2 commits into llvm:main from praveen-g-ctt:nms_batch_class

+336 −272

Contributor

praveen-g-ctt commented Feb 4, 2025

No description provided.

praveen-g-ctt mentioned this pull request

Support batches and classes for nms lowering using loops #3980

Closed

Contributor Author

praveen-g-ctt commented Feb 4, 2025

Following tests are passing

https://github.com/iree-org/iree-test-suites/tree/main/onnx_ops/onnx/node/generated/test_nonmaxsuppression_two_batches
https://github.com/iree-org/iree-test-suites/tree/main/onnx_ops/onnx/node/generated/test_nonmaxsuppression_two_classes

The following test fails in IREE for cpu due to large tensor size used for stack allocation for sort operation. Tested for smaller tensor sizes and it is working as expected with correct results

nod-ai/SHARK-ModelDev#891

praveen-g-ctt requested review from jinchen62, vivekkhandelwal1 and zjgarvey

February 4, 2025 11:45

praveen-g-ctt commented

View reviewed changes

lib/Conversion/TorchOnnxToTorch/DefaultDomainGtoP.cpp Outdated Show resolved Hide resolved

jinchen62 reviewed

View reviewed changes

Collaborator

jinchen62 left a comment •

edited

Loading

Left few comments, but I'm not quite clear about the nmsLoop part. @zjgarvey Would be nice to have your review too.

lib/Conversion/TorchOnnxToTorch/DefaultDomainGtoP.cpp

+                        auto finalResIdx = batchLoopBody->getArgument(2);
+                        auto numResultValues = batchLoopBody->getArgument(3);
+                        auto boxValue = rewriter.create<Torch::AtenSelectIntOp>(

Collaborator

jinchen62 Feb 10, 2025

I would use AtenSliceTensorOp, also for the slice tensor case in the rest of changes.

Contributor Author

praveen-g-ctt Feb 19, 2025

@jinchen62 I had used AtenSelectIntOp as the selected dim is to be removed for all the usages and it gets decomposed in the subsequent DecomposeComplexOps pass. Please let me know if it would be better to use add + slice_tensor + squeeze in this change itself ?

lib/Conversion/TorchOnnxToTorch/DefaultDomainGtoP.cpp Outdated Show resolved Hide resolved

lib/Conversion/TorchOnnxToTorch/DefaultDomainGtoP.cpp

+                              loc, emptyTensorTy, numOutputBoxes);
+                          Value maxBoxesPerClass =
+                              rewriter.create<Torch::PrimNumToTensorScalarOp>(
+                                  loc, emptyTensorTy, maxOutputBoxesPerClass);

Collaborator

jinchen62 Feb 10, 2025

I would use tensor type with shape [1] instead of [] since those few arguments are coming with [1] and you do Minimum op with them.

Contributor Author

praveen-g-ctt Feb 19, 2025

Both the values passed to Minimum op are scalars, which are from aten.size.int op, so had used the shape [] for minimum op

zjgarvey requested changes

View reviewed changes

Collaborator

zjgarvey left a comment

I'm mostly having difficulties parsing the nmsLoop. Could you give some details as to what the implementation there is trying to do?

lib/Conversion/TorchOnnxToTorch/DefaultDomainGtoP.cpp Outdated Show resolved Hide resolved

lib/Conversion/TorchOnnxToTorch/DefaultDomainGtoP.cpp Outdated Show resolved Hide resolved

lib/Conversion/TorchOnnxToTorch/DefaultDomainGtoP.cpp

+                              rewriter.create<Torch::AtenItemOp>(loc, intTy, minVal);
+                          // Loop through the nms result
+                          auto nmsLoop = rewriter.create<Torch::PrimLoopOp>(

Collaborator

zjgarvey Feb 12, 2025

I'm finding it difficult to parse this loop.

The result of the (per-batch per-channel) torchvision nms op has shape <num_selected>, and we need it to be <num_selected x 3>, where each triple is like [batch_index, class_index, selected_box_index]. Is the purpose of this loop to insert these elements into the final result? Is it possible to avoid using a loop for this and instead concatenate the nms result with some splat tensors, then insert that into the final result by keeping track of what the cumulative num_selected is?

Contributor Author

praveen-g-ctt Feb 19, 2025 •

edited

Loading

@zjgarvey @jinchen62 Updated the comments for the nmsLoop part. This loop is used to insert the triplet [batch_index, class_index, selected_box_index] at the required indices element by element.

" Is the purpose of this loop to insert these elements into the final result? Is it possible to avoid using a loop for this and instead concatenate the nms result with some splat tensors, then insert that into the final result by keeping track of what the cumulative num_selected is?"

-> Yes, I had already tried the approach with splat + concats as part of #3981
I was running into runtime issues like segfault / invalid mem access due to non handling of dynamic dims in IREE.

The IR using concat + splat method is here

I made use of loops so that we can have a working solution initially and then update the logic once issues in IREE are fixed. Please let me know your thoughts on this!

test/Conversion/TorchOnnxToTorch/simple_ops_g_to_p.mlir Outdated Show resolved Hide resolved

praveen-g-ctt added 2 commits

February 19, 2025 16:05


          Support batch and classes for NonMaxSuppression

7bb91f4


          Address review comments and simplify lit tests

420bbca

praveen-g-ctt force-pushed the nms_batch_class branch from e259e2b to 420bbca Compare

February 19, 2025 16:36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet