Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Add support for hierarchical and auto-generated keys for log filtering. #62

Merged
merged 25 commits into from
Feb 28, 2025

Conversation

LinZhihao-723
Copy link
Member

@LinZhihao-723 LinZhihao-723 commented Feb 22, 2025

Description

This PR adds support for decoding timestamps and log levels from hierarchical keys. A hierarchical key contains a full branch from the root to a leaf node in a schema tree. It also allows users to specify whether the key belongs to the auto-generated schema tree or the user-generated schema tree.
To properly define the behavior, we add some constraints compared to the previous solution that only supports root-level surface keys:

  • The timestamp must be an integer-type node.
  • The log level must be a string-type node.

Since the leaf node is limited to a certain type, any root-to-leaf path that contains only string keys will be uniquely identified in a schema tree. Therefore, this PR requires a js-level input to be:

    emscripten::register_type<clp_ffi_js::ir::ReaderOptions>(
            "{logLevelKey: {isAutoGenerated: boolean; parts: string[];} | null;"
            " timestampKey: {isAutoGenerated: boolean; parts: string[];} | null;}"
    );

Notice that we also allow null as a reader option to indicate its absence.

When deserializing an IR stream, the IR unit handler will set the timestamp/log-level node IDs accordingly using the schema tree node insertion handler.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

const timestamp = Date.now();
import ModuleInit from "./cmake-build-release/ClpFfiJs-node.js?${timestamp}"
import fs from "node:fs"

const main = async () => {
    // const file = fs.readFileSync("./test-irv2.clp.zst")
    const file = fs.readFileSync("./test.clp.zst")

    console.time("perf")
    const Module = await ModuleInit()
    console.log(Module.MERGED_KV_PAIRS_AUTO_GENERATED_KEY);
    console.log(Module.MERGED_KV_PAIRS_USER_GENERATED_KEY);
    try {
        // const decoder = new Module.ClpStreamReader(new Uint8Array(file), {timestampKey: "@timestamp"})
        const decoder = new Module.ClpStreamReader(
                new Uint8Array(file),
                {timestampKey: {isAutoGenerated: true, parts: ["timestamp", "unix_millisecs"]},
                // {timestampKey: null,
                    logLevelKey: {parts: ["level", "name"], isAutoGenerated: true}})
        console.log("type:", decoder.getIrStreamType() === Module.IrStreamType.STRUCTURED,
            decoder.getIrStreamType() === Module.IrStreamType.UNSTRUCTURED)
        const numEvents = decoder.deserializeStream()
        console.log(numEvents)
        const results = decoder.decodeRange(0, numEvents, false)
        console.log(results)
    } catch (e) {
        console.error("Exception caught:", e.stack)
    }
    console.timeEnd("perf")
}

void main()

Summary by CodeRabbit

  • New Features

    • Enhanced logging configuration with advanced options for log levels and timestamps.
    • Introduced dynamic schema filtering to enable more flexible log event processing.
    • Added a new method for constructing schema tree branches based on filter options.
  • Refactor

    • Improved error handling and log parsing for more reliable event management.
    • Encapsulated branch-related logic within a new class for better structure and functionality.
  • Chore

    • Updated project dependencies to include the latest improvements.

Copy link

coderabbitai bot commented Feb 22, 2025

Warning

Rate limit exceeded

@LinZhihao-723 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 1 minutes and 40 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 03bd5b9 and 973966d.

📒 Files selected for processing (1)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (5 hunks)

Walkthrough

This pull request updates several C++ files and a submodule. The changes revise the type registration for ReaderOptions by replacing a simple string union with a structured object containing detailed properties. New methods are introduced for schema branch extraction and enhanced log level parsing with improved error handling using std::optional. Method signatures, internal variables, and class constructs—such as the new SchemaTreeFullBranch—have been updated for clarity and maintainability. Additionally, the submodule commit reference has been updated.

Changes

Files Change Summary
src/clp_ffi_js/ir/StreamReader.cpp
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
Updated ReaderOptions registration to use structured objects; added get_schema_tree_full_branch_from_filter_option method; applied [[nodiscard]] attribute to dump_json_with_replace; and introduced new constants for key handling.
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp
Enhanced log level parsing by switching to std::optional and adding parse_log_level_from_value; introduced a new nested class SchemaTreeFullBranch; modified handling of log events and schema tree node insertion; and revised member variables and method signatures accordingly.
src/submodules/…/clp Updated subproject commit reference from 2aa5c5c713b15f7b7dd911a746e386b306a95242 to 5dc26c2c4a00b3ededc247a23a2c83e2a5473894.

Sequence Diagram(s)

sequenceDiagram
    participant JS as JavaScript Filter Option
    participant SIR as StructuredIrStreamReader
    participant Branch as SchemaTreeFullBranch
    JS->>SIR: Call get_schema_tree_full_branch_from_filter_option(filter_option)
    alt filter_option is null
        SIR-->>JS: Return std::nullopt
    else
        SIR->>SIR: Extract isAutoGenerated and parts array
        SIR-->>JS: Return SchemaTreeFullBranch object
    end
Loading
sequenceDiagram
    participant Event as Log Event
    participant SIU as StructuredIrUnitHandler
    Event->>SIU: Trigger handle_log_event
    SIU->>SIU: Invoke parse_log_level(log_event.string_value)
    SIU->>SIU: Retrieve log level and timestamp from log_event
    SIU-->>Event: Process event with parsed log level and timestamp
Loading

Possibly related PRs

Suggested reviewers

  • junhaoliao
  • davemarco

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@@ -118,7 +118,7 @@ EMSCRIPTEN_BINDINGS(ClpStreamReader) {
emscripten::register_type<clp_ffi_js::ir::DataArrayTsType>("Uint8Array");
emscripten::register_type<clp_ffi_js::ir::LogLevelFilterTsType>("number[] | null");
emscripten::register_type<clp_ffi_js::ir::ReaderOptions>(
"{logLevelKey: string, timestampKey: string} | null"
"{logLevelKey: [bool, Array<string>], timestampKey: [bool, Array<string>]} | null"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any documentation directly related to this JS type. If there does exist such doc, we need to update them accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code has not been reviewed but will look something like this (subtype here)

Copy link
Contributor

@davemarco davemarco Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
logLevelKey:{hasAutoPrefix: boolean; splitKey: string[]},
timestampKey:{hasAutoPrefix: boolean; splitKey: string[]}
} however, may change pending review...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to have an agreement on these namings first.

  • For hasAutoPrefix, I'd prefer to explicitly call it isAutoGenerated
  • For splitKey, I don't have any comments

@kirkrodrigues @junhaoliao what do u think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Yes to isAutoGenerated
  • At first glance, splitKey sounds like some string we use to split another string up. How about parts (akin to pathlib's path parts)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I would still rather that StructuredIrReaderOptions used isAutoGenerated to keep the IR interface logical. I think ParsedKey can still use hasAutoPrefix. So in effect, I'm saying:

  • we should add a new type for StructuredIrReaderOptions to use that will use the name isAutoGenerated.
  • we should translate from hasAutoPrefix to isAutoGenerated when constructing StructuredIrReaderOptions.

Is that possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still rather that StructuredIrReaderOptions used isAutoGenerated to keep the IR interface logical.

I agree. Using hasAutoPrefix together with parts could be confusing since the path (the first key in the parts) should already have the auto-prefix stripped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay isAutoGenerated is good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LinZhihao-723, you might need to update this now, but maybe the array indexes will still work. idk

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Pushed in the latest commit. Notice that I also updated the code to support null input, so that the reader options are not all required.

return timestamp;

auto const timestamp_node_id = m_optional_timestamp_node_id.value();
if (false == id_value_pairs.contains(timestamp_node_id)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

auto const& optional_log_level_value{id_value_pairs.at(m_log_level_node_id.value())};

auto const log_level_node_id = m_optional_log_level_node_id.value();
if (false == id_value_pairs.contains(log_level_node_id)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check was missing before this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good catch

Comment on lines +170 to +171
std::optional<clp::ffi::SchemaTree::Node::id_t> m_optional_log_level_node_id;
std::optional<clp::ffi::SchemaTree::Node::id_t> m_optional_timestamp_node_id;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped the type alias and used std::optional explicitly to follow our C++ coding convention.

Copy link
Member Author

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I added some log statements for "Protocol Error". Technically these errors should be non-recoverable. Do we have sth similar as exceptions in js ffi? @junhaoliao

Copy link
Contributor

@davemarco davemarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some high level comments first to pipeline review. Still looking at actual code

@@ -118,7 +118,7 @@ EMSCRIPTEN_BINDINGS(ClpStreamReader) {
emscripten::register_type<clp_ffi_js::ir::DataArrayTsType>("Uint8Array");
emscripten::register_type<clp_ffi_js::ir::LogLevelFilterTsType>("number[] | null");
emscripten::register_type<clp_ffi_js::ir::ReaderOptions>(
"{logLevelKey: string, timestampKey: string} | null"
"{logLevelKey: [bool, Array<string>], timestampKey: [bool, Array<string>]} | null"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code has not been reviewed but will look something like this (subtype here)

} // namespace

auto StructuredIrUnitHandler::SchemaTreeFullBranch::match(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function had a bug and now it's been fixed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (4)

42-51: Consider improving parameter naming.

The constructor parameter is_auto_gen is inconsistent with the member variable name m_is_auto_generated. Consider using identical naming for better readability.

Also, the member variable is named m_leaf_to_root_path but the parameter is root_to_leaf_path, which might be confusing. The direction is reversed during initialization, but the naming should better reflect this transformation.

 SchemaTreeFullBranch(
-        bool is_auto_gen,
+        bool is_auto_generated,
         std::vector<std::string> root_to_leaf_path,
         clp::ffi::SchemaTree::Node::Type leaf_type
 )
         : m_is_auto_generated{is_auto_gen},
           m_leaf_to_root_path{std::move(root_to_leaf_path)},
           m_leaf_type{leaf_type} {
     std::ranges::reverse(m_leaf_to_root_path);
 }

67-67: Use explicit return.

For consistent style and improved readability, consider using an explicit return statement rather than a single-line return for the is_auto_generated() method.

-[[nodiscard]] auto is_auto_generated() const -> bool { return m_is_auto_generated; }
+[[nodiscard]] auto is_auto_generated() const -> bool {
+    return m_is_auto_generated;
+}

89-97: Update documentation to match parameter changes.

The documentation for the constructor (lines 89-90) still references the old parameter names log_level_key and timestamp_key, which should be updated to reflect the new parameters log_level_full_branch and timestamp_full_branch.

 /**
  * @param deserialized_log_events The vector in which to store deserialized log events.
- * @param log_level_key Key name of schema-tree node that contains the authoritative log level.
- * @param timestamp_key Key name of schema-tree node that contains the authoritative timestamp.
+ * @param log_level_full_branch Branch path to schema-tree node that contains the authoritative log level.
+ * @param timestamp_full_branch Branch path to schema-tree node that contains the authoritative timestamp.
  */

122-134: Documentation not updated for new parameter.

The documentation block for handle_schema_tree_node_insertion method has a new parameter schema_tree on line 127, but lacks a description of this parameter in the documentation.

 /**
  * Saves the node's ID if it corresponds to events' authoritative log level or timestamp
  * kv-pair.
  * @param is_auto_generated
  * @param schema_tree_node_locator
+ * @param schema_tree The schema tree to match against
  * @return IRErrorCode::IRErrorCode_Success
  */
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d5395e and 62d83bd.

📒 Files selected for processing (2)
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (6 hunks)
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ...

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint (ubuntu-latest)
  • GitHub Check: lint (macos-latest)
🔇 Additional comments (4)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (4)

4-4: Added include for algorithm header.

The include for <algorithm> is properly added to support the use of std::ranges::reverse in the SchemaTreeFullBranch constructor.


28-84: Well-structured new class implementation.

The SchemaTreeFullBranch class is well-designed with clear documentation, proper move semantics, and deleted copy operations. The implementation correctly encapsulates a branch path from root to leaf in a schema tree.

A few observations:

  • Constructor reverses the path for internal storage, which seems intentional for efficient tree traversal
  • Good use of [[nodiscard]] attribute for methods that return values
  • The declaration of match() method has its implementation elsewhere, following good separation practices

145-163: Documentation updated to reflect new optional member variables.

The documentation for both get_log_level and get_timestamp methods has been properly updated to reference the new optional member variables. This improves clarity on when default values are returned.


168-172: Renamed member variables for improved clarity.

The member variables have been renamed with the m_optional_ prefix to better reflect their purpose and type. This is a good practice that makes the code more self-documenting.

According to the past review comment on line 171-172, using std::optional explicitly follows the C++ coding convention for this project, which is good.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (3)

47-49: Potential confusion between parameter and member variable naming

The parameter name root_to_leaf_path suggests one direction (root→leaf), but it's stored in a member variable named m_leaf_to_root_path and then reversed. This could be confusing to future readers of the code.

Consider either:

  1. Renaming the parameter to match the storage direction (e.g., leaf_to_root_path and skip the reversal), or
  2. Renaming the member variable to match the parameter (e.g., m_root_to_leaf_path and keep the reversal)

52-55: Unnecessary defaulted move operations

The defaulted move constructor and assignment operator are not necessary since the class doesn't have any special resource management beyond what's handled by its member variables' own move operations.

These declarations can be safely removed as the compiler will implicitly generate them with identical behavior:

-        // Default move constructor and assignment operator
-        SchemaTreeFullBranch(SchemaTreeFullBranch&&) = default;
-        auto operator=(SchemaTreeFullBranch&&) -> SchemaTreeFullBranch& = default;

159-160: Inconsistent member variable name in documentation

The comment refers to m_optional_timestamp_id but the actual member variable is named m_optional_timestamp_node_id.

-     * - `m_optional_timestamp_id` is unset.
-     * - `m_optional_timestamp_id` is set but not appearing in the given node-id-value pairs.
+     * - `m_optional_timestamp_node_id` is unset.
+     * - `m_optional_timestamp_node_id` is set but not appearing in the given node-id-value pairs.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62d83bd and 4555da7.

📒 Files selected for processing (2)
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (5 hunks)
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ...

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint (ubuntu-latest)
  • GitHub Check: lint (macos-latest)
🔇 Additional comments (3)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (3)

28-83: Well-structured new class for schema tree branch representation

The new SchemaTreeFullBranch class provides a clean way to represent branches from root to leaf node in a schema tree. This improves the handling of hierarchical log level and timestamp keys.


132-132: Good parameter choice for schema_tree

Passing the schema tree as a const reference to a shared pointer is efficient and appropriate for this use case.


167-171: Improved structure for storing branch information

Replacing simple string keys with full branch objects allows for more flexible and structured handling of hierarchical paths in the schema tree.

Copy link
Contributor

@davemarco davemarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through again and added some smaller nits

@@ -118,7 +118,7 @@ EMSCRIPTEN_BINDINGS(ClpStreamReader) {
emscripten::register_type<clp_ffi_js::ir::DataArrayTsType>("Uint8Array");
emscripten::register_type<clp_ffi_js::ir::LogLevelFilterTsType>("number[] | null");
emscripten::register_type<clp_ffi_js::ir::ReaderOptions>(
"{logLevelKey: string, timestampKey: string} | null"
"{logLevelKey: [bool, Array<string>], timestampKey: [bool, Array<string>]} | null"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LinZhihao-723, you might need to update this now, but maybe the array indexes will still work. idk

Co-authored-by: davemarco <83603688+davemarco@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (1)

87-90: 🛠️ Refactor suggestion

Update parameter documentation to match new parameter types.

The documentation still refers to the old parameter names (log_level_key and timestamp_key) which were strings. Update it to reflect the new parameter types (SchemaTreeFullBranch).

-     * @param log_level_key Key name of schema-tree node that contains the authoritative log level.
-     * @param timestamp_key Key name of schema-tree node that contains the authoritative timestamp.
+     * @param log_level_full_branch Branch in the schema tree that contains the authoritative log level.
+     * @param timestamp_full_branch Branch in the schema tree that contains the authoritative timestamp.
🧹 Nitpick comments (3)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (3)

37-37: Parameter naming should be consistent.

The parameter name is_auto_gen is inconsistent with the member variable m_is_auto_generated and the method name is_auto_generated(). Consider using is_auto_generated for consistency with the rest of the codebase.

-         * @param is_auto_gen
+         * @param is_auto_generated

66-66: Use explicit boolean comparison per coding guidelines.

According to the coding guidelines, we should prefer false == <expression> or true == <expression> rather than directly returning the expression.

-        [[nodiscard]] auto is_auto_generated() const -> bool { return m_is_auto_generated; }
+        [[nodiscard]] auto is_auto_generated() const -> bool { return true == m_is_auto_generated; }

47-48: Variable naming inconsistency between parameter and member.

The constructor parameter is named root_to_leaf_path but the member variable is named m_leaf_to_root_path which might be confusing when reading the code. Consider using consistent naming or adding a comment explaining why the path is reversed and stored with a different name.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4555da7 and d5de853.

📒 Files selected for processing (1)
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ...

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint (ubuntu-latest)
  • GitHub Check: lint (macos-latest)
🔇 Additional comments (3)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (3)

168-169: Good use of more descriptive variable names.

Renaming these variables to include optional_ in their names provides better clarity about their nature. This makes the code more maintainable by explicitly indicating that these variables might not have a value.


27-83: Well-structured new class implementation.

The SchemaTreeFullBranch class is a well-designed addition that encapsulates the concept of a branch in a schema tree. The class properly implements move semantics while preventing copies, and provides a clear interface for checking if a branch is auto-generated and matching against a schema tree.


126-133: Method parameter addition improves functionality.

Adding the schema_tree parameter to the handle_schema_tree_node_insertion method enables the functionality to match schema tree branches, which is essential for supporting the hierarchical keys feature described in the PR objectives.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (1)

48-57: Potential documentation mismatch.

The comment references get_schema_tree_full_path_from_js_reader_option, yet the actual function name is get_schema_tree_full_path_from_reader_option. To avoid confusion, update the doc comment to match the function name.

- * @return A schema tree full branch constructed based on the provided reader option.
- * @return std::nullopt if `option` is `null`.
- */
-[[nodiscard]] auto get_schema_tree_full_path_from_js_reader_option(
+ * @return A schema tree full branch constructed based on the provided reader option.
+ * @return std::nullopt if `option` is `null`.
+ */
+[[nodiscard]] auto get_schema_tree_full_path_from_reader_option(
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (1)

27-83: Well-structured branch representation class.

SchemaTreeFullBranch neatly encapsulates a branch from the schema tree, reversing the path to store leaf-to-root. Document the rationale for reversing the vector to prevent confusion. The match method signature looks coherent, though ensure thorough testing for partial matches and unexpected input.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5de853 and 2093972.

📒 Files selected for processing (5)
  • src/clp_ffi_js/ir/StreamReader.cpp (1 hunks)
  • src/clp_ffi_js/ir/StreamReader.hpp (1 hunks)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (5 hunks)
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (5 hunks)
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/clp_ffi_js/ir/StreamReader.cpp
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ...

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • src/clp_ffi_js/ir/StreamReader.hpp
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint (ubuntu-latest)
  • GitHub Check: lint (macos-latest)
🔇 Additional comments (12)
src/clp_ffi_js/ir/StreamReader.hpp (1)

28-28: No concerns about new type declaration.

The addition of ReaderOption appears consistent and aligns with your existing types (DataArrayTsType, LogLevelFilterTsType). If there are no tests for the new type, consider adding a simple unit test to verify its usage and ensure future compatibility.

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (7)

6-15: Straightforward new includes.

Including <optional> and <clp/ffi/SchemaTree.hpp> is appropriate for the optional logic and schema-related functionality introduced. No issues observed.


32-33: Use of descriptive constant names.

Defining cReaderOptionIsAutoGeneratedKey and cReaderOptionPartsKey aligns with existing naming conventions. This helps ensure clarity when accessing the JS object properties.


46-47: Appropriate use of [[nodiscard]].

Marking dump_json_with_replace as [[nodiscard]] is a good practice to avoid inadvertently ignoring the returned string. No issues with this change.


63-75: Validate array content.

This function correctly handles null checks and extracts data from option. Consider additional validation for the parts array (e.g. ensuring the array is non-empty for a valid path) to avoid unexpected states in downstream logic.


99-106: Consistent usage of reader options.

Replacing direct string usage with calls to get_schema_tree_full_path_from_reader_option for both the log level and timestamp keys aligns with the PR objective. Good to see that log levels default to string type and timestamps default to integer type.


135-136: Consistent with coding guidelines.

Using if (false == ...) instead of if (!...) adheres to the project’s guidelines. The behaviour for returning null() when no filter map exists is clear.


160-161: Loop condition looks appropriate.

Looping until deserializer.is_stream_completed() is false is consistent. Just confirm that there are no edge cases that might lead to an infinite loop if the condition never evaluates to true.

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (4)

4-4: New include is acceptable.

The <algorithm> header is presumably for std::ranges::reverse. This is fine.


94-99: Optional schema tree branches well-integrated.

Switching to std::optional<SchemaTreeFullBranch> for both log level and timestamp clarifies that these branches are optional. This is a clean approach that preserves the original constructor’s intent.


151-162: Documentation is consistent and clear.

The docstrings for get_log_level and get_timestamp correctly outline fallback behaviour. The usage of LogLevel::None for unexpected values is a commendable solution. Confirm that all out-of-range scenarios are handled.
[approve]


165-169: Optional node IDs.

Storing the node IDs as std::optional matches your approach for the full branch. This ensures you skip logic when the node IDs are unavailable.

@junhaoliao
Copy link
Member

The timestamp part (for cache busting?) may not have any effects as the timestamp value is not populated in the regular string.

const timestamp = Date.now();
import ModuleInit from "./cmake-build-release/ClpFfiJs-node.js?${timestamp}"

if we really want to make it work, we can write

const ModuleInit = (await import(`./cmake-build-debug/ClpFfiJs-node.js?${timestamp}`)).default;

@junhaoliao
Copy link
Member

@LinZhihao-723 could you also share the IR file used in the validation as an attachment in the PR description?

@LinZhihao-723
Copy link
Member Author

@LinZhihao-723 could you also share the IR file used in the validation as an attachment in the PR description?

It's the same file in #60

Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm

at a high level: i understand that the current APIs don't provide a convenient way to report errors if the user specified log-level / timestamp key is not found in the schema tree. however, if after going through all nodes we still can't find the path specified by the user, can we log a message to help with debugging? (instead of silently treating all log events's log level as UNKNOWN for example)

@LinZhihao-723
Copy link
Member Author

Resolved by applying what @junhaoliao suggested

davemarco
davemarco previously approved these changes Feb 28, 2025
Copy link
Contributor

@davemarco davemarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small final changes

Co-authored-by: davemarco <83603688+davemarco@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (1)

67-69: Use preferred negation style.

The coding guidelines specify preferring false == <expression> over !<expression>.

-    if (filter_option.isNull() || filter_option.isUndefined()) {
+    if (false == !filter_option.isNull() || false == !filter_option.isUndefined()) {
         return std::nullopt;
     }
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (2)

38-40: Add more detailed parameter documentation.

Consider expanding the documentation for the constructor parameters to clarify their purpose.

        /**
         * @param is_auto_gen
-         * @param root_to_leaf_path
-         * @param leaf_type
+         * @param root_to_leaf_path Array of keys forming the path from root to leaf
+         * @param leaf_type Type of the leaf node in the schema tree
         */

47-49: Document the path reversal operation.

The constructor takes a root-to-leaf path but stores it as a leaf-to-root path. This transformation should be clearly documented.

                : m_is_auto_generated{is_auto_gen},
                  m_leaf_to_root_path{std::move(root_to_leaf_path)},
                  m_leaf_type{leaf_type} {
+            // Reverse the path to store it from leaf to root for easier traversal
             std::ranges::reverse(m_leaf_to_root_path);
         }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f46af70 and c364896.

📒 Files selected for processing (2)
  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (5 hunks)
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ...

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
  • src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: lint (ubuntu-latest)
  • GitHub Check: lint (macos-latest)
🔇 Additional comments (10)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (6)

32-33: String view constants should follow naming convention.

Consider using the constexpr std::string_view cFilterOptionAutoGeneratedKey{"isAutoGenerated"}; naming style to be more consistent with other constants in the file.

-constexpr std::string_view cFilterOptionIsAutoGeneratedKey{"isAutoGenerated"};
-constexpr std::string_view cFilterOptionPartsKey{"parts"};
+constexpr std::string_view cFilterOptionAutoGeneratedKey{"isAutoGenerated"};
+constexpr std::string_view cFilterOptionPartsKey{"parts"};

48-53: Improve documentation comments for get_schema_tree_full_branch_from_filter_option.

The documentation should clearly specify the leaf node type parameter and match the actual function signature.

 /**
  * @param filter_option The JavaScript object representing a filter option.
  * @param leaf_node_type The type of the leaf node in the constructed schema tree branch.
- * @return A schema tree full branch constructed based on the provided reader option.
+ * @return A schema tree full branch constructed based on the provided filter option and leaf node type.
  * @return std::nullopt if `option` is `null`.
  */

54-57: Function declaration could be formatted more consistently.

The closing parenthesis should be consistent with the opening parenthesis formatting style.

 [[nodiscard]] auto get_schema_tree_full_branch_from_filter_option(
         emscripten::val const& filter_option,
-        clp::ffi::SchemaTree::Node::Type leaf_node_type
-) -> std::optional<StructuredIrUnitHandler::SchemaTreeFullBranch>;
+        clp::ffi::SchemaTree::Node::Type leaf_node_type) 
+        -> std::optional<StructuredIrUnitHandler::SchemaTreeFullBranch>;

67-69: Check for both null and undefined for safety.

This is good defensive programming to check for both null and undefined values.


70-74: Using key constants correctly in the function implementation.

Make sure to use the constant variables defined above rather than hardcoding string literals.

     return StructuredIrUnitHandler::SchemaTreeFullBranch{
             filter_option[cFilterOptionIsAutoGeneratedKey.data()].as<bool>(),
             emscripten::vecFromJSArray<std::string>(filter_option[cFilterOptionPartsKey.data()]),
             leaf_node_type
     };

99-106: Function calls to get schema tree branches are well-structured.

The code correctly sets the appropriate node types for log level (string) and timestamp (integer) as required by the design.

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (4)

94-96: Constructor parameters renamed correctly.

The parameters are appropriately renamed from string keys to schema branch objects, matching the updated implementation.


130-133: Added schema_tree parameter to handle_schema_tree_node_insertion.

This is a good improvement that provides the function with the necessary context to properly match full branches.


165-169: Renamed member variables for clarity.

The member variables have been renamed with the 'optional' prefix to better indicate their nature.


74-77:

❓ Verification inconclusive

Verify match method declaration.

The match method is declared but its implementation isn't in the header file. Ensure it's properly implemented in the corresponding .cpp file.


🏁 Script executed:

#!/bin/bash
# Check if the match method is implemented in the cpp file
grep -A 15 "auto SchemaTreeFullBranch::match" src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp

Length of output: 93


Manual Verification Required: Confirm Implementation of the match Method

The current code in src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp declares the match method, yet our grep search in src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp using "auto SchemaTreeFullBranch::match" returned no results. This might indicate that the method is either not implemented or is defined under a different class name (e.g. StructuredIrUnitHandler::match). Please verify manually that:

  • The implementation of match exists in src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp.
  • The implementation’s class name matches the header declaration (i.e. use StructuredIrUnitHandler if that is the intended class).

@davemarco
Copy link
Contributor

davemarco commented Feb 28, 2025

Also rementioning title:
feat: Add support for hierarchical and auto-generated keys for log filtering.

@LinZhihao-723 LinZhihao-723 changed the title feat: Add support for decoding user-defined timestamps and log levels from hierarchical keys. feat!: Add support for hierarchical and auto-generated keys for log filtering. Feb 28, 2025
@LinZhihao-723
Copy link
Member Author

Changed the title with ! to indicate that this is a breaking change (since it changes the js level API)
Also I will need a re-approval since my last push invalidated ur approval. @davemarco

Co-authored-by: davemarco <83603688+davemarco@users.noreply.github.com>
davemarco
davemarco previously approved these changes Feb 28, 2025
Copy link
Contributor

@davemarco davemarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

building on yesterday's suggestion, I left a high-level comment. the rest lgtm

@@ -67,8 +98,14 @@ auto StructuredIrStreamReader::create(
*zstd_decompressor,
StructuredIrUnitHandler{
deserialized_log_events,
reader_options[cReaderOptionsLogLevelKey.data()].as<std::string>(),
reader_options[cReaderOptionsTimestampKey.data()].as<std::string>()
get_schema_tree_full_branch_from_filter_option(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ease of debugging -

can we print a log in deserialize_stream() after all IR units are decoded: if the m_optional_log_level_full_branch has value but m_optional_log_level_node_id is not populated, warn the user that the node cannot be found. Similarly for the timestamp we should also warn the user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@LinZhihao-723 LinZhihao-723 merged commit 737c9d5 into y-scope:main Feb 28, 2025
2 checks passed
@LinZhihao-723 LinZhihao-723 deleted the key-search branch February 28, 2025 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants