Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(evaluation): Design doc for 3D detection in ml-evaluation pipeline #4

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

KSeangTan
Copy link
Collaborator

@KSeangTan KSeangTan commented Feb 28, 2025

Summary

This PR drafts the discussion and necessary changes in the evaluation pipeline to use autoware_perception_evaluation to evaluate performance of perception models.

Problem Statement

The current ml pipeline uses t4_devkit when creating pickle files for training/evaluation, however, it still loads ground truths from nuscenes_devkit in the evaluation pipeline. This might actually introduce unintended behavior due to two different devkit we use in training and evaluation pipelines.
For example, it needs to transform both predictions and ground truths to the global coordinate before evaluating them, and it should be avoided since we know that predictions and ground truths (pickle files) are in the same coordinate system when retrieving them in t4_devkit.get_sample_data for pickles.
Besides, NuScenesMetric actually has different metrics compared to autoware_perception_eval, we should try to evaluate our ml experiments using the same evaluation to reduce the gap as much as possible.

Goals

  1. Remove all nuscene_devkit dependencies in T4Metric
  2. Use autoware_perception_eval and t4_devkit for evaluation
  3. Reduce evaluation time and make sure it passes regression testing

Designs

Current Pipeline

autoware_ml_current_evaluation drawio

Proposed Pipeline

autoware_propose_evaluation_pipeline drawio

Few considerations in the new pipeline:

  1. Computation and postprocessing should be done in autoware_perception_evaluation, T4Metric should be a wrapper/interface to run inference and call autoware_perception_evaluation only. In this case, we should rename it to T4Evaluator
  2. It should only calibrate confidence thresholds of every class in a validation set, and then it uses the values to run test set.
    • In another word, users should provide confidence thresholds when independently running evaluation on a test set
  3. Support configurations for the evaluation pipeline through configs instead of hard-coded filtering in the current pipeline
  4. We need to make few changes in autoware_perception_evaluation before starting to work on T4Metric:
    • Support nuscene metrics as mentioned in here
    • Make CriticalObjectFilterConfig and PerceptionPassFailConfig optional since it might not be used in ml experiments in the beginning
    • Support loading FrameGroundTruth and sensor data without providing dataset_paths
    • Support serialization/pickling in FrameGroundTruth and DynamicObject

Plan of PRs:

  • autoware_perception_evaluation:
    1. Implementation of nuScene metrics in autoware_perception_evaluation, this includes NDS and calibration of confidence thresholds (2 days)
    2. Make filter optional (0.5 day)
    3. Support loading FrameGroundTruth and sensor data without providing dataset_paths (0.5 day)
  • AWML:
    1. Introduce T4Frame and refactor inference to save predictions/gts in every step, also save intermediate results results.pickle for all scenes (1 day)
    2. Configuration of autoware_perception_evaluation through experiment configs, and process T4Frame with autoware_perception_evaluation.add_frame_result and autoware_perception_evaluation.get_scene_result (2 days)
    3. Visualize metrics and worst K samples (1.5 day)
    4. Unit tests for simple cases (0.5 day)

ETA: 8 - 9 days

To prevent significant impacts of ongoing PRs to running experiments, we will make changes of autoware_perception_evaluation and autoware_ml in an independent feature branch, and only merge it to main once it has been tested solid in both regression testing and running time.

@scepter914
Copy link
Collaborator

scepter914 commented Mar 5, 2025

@KSeangTan

Thank for PR and I really appreciate you preparing such detailed documentation.

Computation and postprocessing should be done in autoware_perception_evaluation, T4Metric should be a wrapper/interface to run inference and call autoware_perception_evaluation only. In this case, we should rename it to T4Evaluator

Since NuscenesMetric is already to run inference and call nuscenes-devkit, I think it is OK to call T4Metrics even if we use autoware_perception_evaluation if T4Metric creates result.json and metrics.json at the same script.
(And I think it is understandable for user of AWML and mmdetection3d, and it might be a good idea to allow recalculating from result.json by configuring it in the same way as setting a pre-trained model, using: result_json = None # or {path to json file})

It should only calibrate confidence thresholds of every class in a validation set, and then it uses the values to run test set.
In another word, users should provide confidence thresholds when independently running evaluation on a test set.

Yes, and we need to reconstruct train/val/test.

Support configurations for the evaluation pipeline through configs instead of hard-coded filtering in the current pipeline

Agreed.
Some threshold is hard coding now, so we rewrite for it in core libraries in some case like https://github.com/tier4/AWML/blob/main/autoware_ml/detection3d/evaluation/t4metric/t4metric.py#L136.

We need to make few changes in autoware_perception_evaluation before starting to work on T4Metric:

Understood.
Let's work on each task one by one.

Pipeline design in the figure

I want to put comments about "the new module in T4metric" written in the figure.
Can we replace from "T4Frame" to "PerceptionFrameResult"?
It seems to support loading pickle file in PerceptionAnalyzer3D class as the document
If we can use the class as same as autoware_perception_evaluation, it requires less maintenance.

As NITS comment, it is better to explain in the document about "T4 pickle".
It would be easier to understand if it were written by an example file name as "t4dataset_base_infos_val.pkl".

Plan of PRs

Your plan make sense.
Anyway, the whole pipeline looks great to me, so would you start to write the document written in PR to /docs/design/architecture_evaluation.md like dataset document in this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants