-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract data models from RCA into separate Repo #307
Comments
Thanks for highlighting this @khushbr. Wondering if this refactoring is primarily to address the circular dependency or has other benefits as well? The shared library common-utils is currently overloaded and we have ongoing discussions for it to be deprecated. Adding more dependency could impact such decisions. |
Hey @getsaurabh02, Thank you for your comment. We will not be using common-utils, but rather have a separate library (Github Repository) with PA-RCA data models, named The aim of this exercise is to remove the code level dependency PA currently has on RCA repository, the benefits will further extend to making the github build, testing workflows cleaner and avoid the RCA packaging inside PA we currently do for the Maven releases.
We discussed this and came to the conclusion that async path and system stats metric collection will continue to be part of the current setup (PA-RCA), while the sync path metrics, a limited set (Latency, CPU, JVM) will be part of the upcoming Tracing Framework. |
Is there any way we can have the shared models moved to core and avoid creating a new package? |
For choosing this new package instead of
If currently if we do not want PA/RCA to be first class citizens in OpenSearch core, we should not move these data models into core. I think the only thing to be careful of creating a new package is the extra overhead to manage a new package for each OpenSearch release. If we can make this package very version agnostic as possible, that would be ideal. It could potentially have a separate versioning scheme from OpenSearch and we should make sure it has no dependency on OpenSearch core. |
Leaving some additional context and findings regarding the logical organization of what is currently PA and PA-RCA repo and the suggestions for improving during the separation process. Parts of our building, bwc, integ testing and local deployment workflows, scripts and practices are outdated (compared to other OS plugin repos), way too complicated and time consuming and should be reworked alongside the process of the repo separation.
The other aspect I want to talk about is the way for these two to be separated. I'm going to provide some additional context and present the architecture more in detail. Please raise concerns if any part seems incorrect.
Here is the extended picture of the explained architecture: In order for The way the code is currently split (if we ignore the cyclic dependency part) is that the code that runs inside the plugin is in the |
@Tjofil Thank you for the detailed Write-up. I agree with all of it except the last point, the PA |
@khushbr You're completely right. The way the code is split in terms of JVM it runs in ( |
Linking the older issue on PA side: opensearch-project/performance-analyzer#50 |
New Repo created: https://github.com/opensearch-project/performance-analyzer-commons |
Release done for PA-Commons 1.0.0 - https://github.com/opensearch-project/performance-analyzer-commons/releases/tag/1.0.0 |
Background Context
Performance Analyzer (Writer) plugin captures the system behavior at a fine grain(5s) level in the form of metrics and stats, which is periodically flushed to a shared memory location. The RCA Agent(Reader) component scans this directory for updates, parses the raw data into sqlite entries and populates the on-disk sqlite metricDB.
Problem Statement & Proposed Solution
In the past, a conscious design choice to segregate the OpenSearch independent code into a separate process(RCA) was made. This allowed to deploy changes to RCA without restarting the critical OpenSearch process. However, the code split wasn't done cleanly. The shared data models of the various metrics are currently part of the RCA codebase. This proposal is to address this circular dependency between PA and RCA, extracting out the common data models from RCA codebase to a shared library which can in-turn be consumed by the 2 components.
Feedback Requested:
Currently, within OpenSearch ecosystem, we have no precedent for an OpenSearch plugin - external Agent communication model. (Note: We have a shared library common-utils for between plugins reusable functionality and data models.) We are looking for feedback on how to better organize the code for such use-cases.
The text was updated successfully, but these errors were encountered: