feat(evaluation): Design doc for 3D detection in ml-evaluation pipeline #4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR drafts the discussion and necessary changes in the evaluation pipeline to use autoware_perception_evaluation to evaluate performance of perception models.
Problem Statement
The current ml pipeline uses
t4_devkit
when creating pickle files for training/evaluation, however, it still loads ground truths fromnuscenes_devkit
in the evaluation pipeline. This might actually introduce unintended behavior due to two differentdevkit
we use in training and evaluation pipelines.For example, it needs to transform both predictions and ground truths to the global coordinate before evaluating them, and it should be avoided since we know that predictions and ground truths (pickle files) are in the same coordinate system when retrieving them in
t4_devkit.get_sample_data
for pickles.Besides,
NuScenesMetric
actually has different metrics compared toautoware_perception_eval
, we should try to evaluate our ml experiments using the same evaluation to reduce the gap as much as possible.Goals
nuscene_devkit
dependencies inT4Metric
autoware_perception_eval
andt4_devkit
for evaluationDesigns
Current Pipeline
Proposed Pipeline
Few considerations in the new pipeline:
autoware_perception_evaluation
,T4Metric
should be a wrapper/interface to run inference and callautoware_perception_evaluation
only. In this case, we should rename it toT4Evaluator
autoware_perception_evaluation
before starting to work onT4Metric
:CriticalObjectFilterConfig
andPerceptionPassFailConfig
optional since it might not be used in ml experiments in the beginningFrameGroundTruth
and sensor data without providingdataset_paths
FrameGroundTruth
andDynamicObject
Plan of PRs:
autoware_perception_evaluation
, this includes NDS and calibration of confidence thresholds (2 days)filter
optional (0.5 day)FrameGroundTruth
and sensor data without providingdataset_paths
(0.5 day)T4Frame
and refactor inference to save predictions/gts in every step, also save intermediate resultsresults.pickle
for all scenes (1 day)autoware_perception_evaluation
through experiment configs, and processT4Frame
withautoware_perception_evaluation.add_frame_result
andautoware_perception_evaluation.get_scene_result
(2 days)ETA: 8 - 9 days
To prevent significant impacts of ongoing PRs to running experiments, we will make changes of
autoware_perception_evaluation
andautoware_ml
in an independent feature branch, and only merge it to main once it has been tested solid in both regression testing and running time.