-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] BWC Rolling upgrade tests fails for SparseEncoder Processor during Batch Ingestion #1142
Comments
Is it flaky test? |
It's a serialization/deserialization issue from OpenSearch core, all query builders fail to serialization/deserialization between nodes. To reproduce:
here we can use any query(if leave it empty, it's match_all), and send request to any nodes. we get shard failure of serialization/deserialization error in response:
|
Hey @zhichao-aws It can be due the recent breaking changes in 3.0. Can you try the same with 2.18, 2.19? |
@martin-gaievski You did some deep-dive on the breaking changes coming from 3.0, can you share some insights here as we do not see this issue in bwc tests running on 2.x. |
I checked the lucene 10 changes, drafted PR to address them #1141. If that would be related to lucene 10 then code wouldn't even compile, there are some incompatible API changes. At the first glance that is related to the internal logic of the SparseEncoder, or maybe some resource that we setup in this particular test |
Do you mean the bwc test between 2.18, 2.19 or bwc test between main and 2.18? |
It's not due to the lucene 10 changes. It's caused by serialization/deserialization error from core opensearch-project/OpenSearch#17125 |
Based on the comment, the issue should be fixed after we bump to 3.0.0-alpha1 |
I've drafted PR for 3.0-alpha #1141, we can monitor progress over there, I see BWC are failing for now os it doesn't look like simple adoption of latest core takes care. maybe some additional steps are needed on neural search side |
Based on the error log, it seems to fail due to another reason (ml model). While the previous reason is related to query serialization/deserialization between nodes.
|
The bug is fixed after switch to 3.0.0-alpha1. |
What is the bug?
BatchIngestionIT.testBatchIngestion_SparseEncodingProcessor_E2EFlow test is failing with the following error.
How can one reproduce the bug?
Run the test locally or raise a PR on neural to see it in github CI check.
What is the expected behavior?
Test should pass successfully.
Do you have any additional context?
https://github.com/opensearch-project/neural-search/actions/runs/12942873461/job/36138496383?pr=1140
The text was updated successfully, but these errors were encountered: