feat!: Add support for hierarchical and auto-generated keys for log filtering. #62

LinZhihao-723 · 2025-02-22T03:55:50Z

Description

This PR adds support for decoding timestamps and log levels from hierarchical keys. A hierarchical key contains a full branch from the root to a leaf node in a schema tree. It also allows users to specify whether the key belongs to the auto-generated schema tree or the user-generated schema tree.
To properly define the behavior, we add some constraints compared to the previous solution that only supports root-level surface keys:

The timestamp must be an integer-type node.
The log level must be a string-type node.

Since the leaf node is limited to a certain type, any root-to-leaf path that contains only string keys will be uniquely identified in a schema tree. Therefore, this PR requires a js-level input to be:

    emscripten::register_type<clp_ffi_js::ir::ReaderOptions>(
            "{logLevelKey: {isAutoGenerated: boolean; parts: string[];} | null;"
            " timestampKey: {isAutoGenerated: boolean; parts: string[];} | null;}"
    );

Notice that we also allow null as a reader option to indicate its absence.

When deserializing an IR stream, the IR unit handler will set the timestamp/log-level node IDs accordingly using the schema tree node insertion handler.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Ensure all workflows passed.
Using the example script below to validate that the deserialized log events have timestamp/log-level set properly. The associated test file can be found in feat: Add support for deserializing and decoding auto-generated keys in Structured IR streams. #60.

const timestamp = Date.now();
import ModuleInit from "./cmake-build-release/ClpFfiJs-node.js?${timestamp}"
import fs from "node:fs"

const main = async () => {
    // const file = fs.readFileSync("./test-irv2.clp.zst")
    const file = fs.readFileSync("./test.clp.zst")

    console.time("perf")
    const Module = await ModuleInit()
    console.log(Module.MERGED_KV_PAIRS_AUTO_GENERATED_KEY);
    console.log(Module.MERGED_KV_PAIRS_USER_GENERATED_KEY);
    try {
        // const decoder = new Module.ClpStreamReader(new Uint8Array(file), {timestampKey: "@timestamp"})
        const decoder = new Module.ClpStreamReader(
                new Uint8Array(file),
                {timestampKey: {isAutoGenerated: true, parts: ["timestamp", "unix_millisecs"]},
                // {timestampKey: null,
                    logLevelKey: {parts: ["level", "name"], isAutoGenerated: true}})
        console.log("type:", decoder.getIrStreamType() === Module.IrStreamType.STRUCTURED,
            decoder.getIrStreamType() === Module.IrStreamType.UNSTRUCTURED)
        const numEvents = decoder.deserializeStream()
        console.log(numEvents)
        const results = decoder.decodeRange(0, numEvents, false)
        console.log(results)
    } catch (e) {
        console.error("Exception caught:", e.stack)
    }
    console.timeEnd("perf")
}

void main()

Summary by CodeRabbit

New Features
- Enhanced logging configuration with advanced options for log levels and timestamps.
- Introduced dynamic schema filtering to enable more flexible log event processing.
- Added a new method for constructing schema tree branches based on filter options.
Refactor
- Improved error handling and log parsing for more reliable event management.
- Encapsulated branch-related logic within a new class for better structure and functionality.
Chore
- Updated project dependencies to include the latest improvements.

coderabbitai · 2025-02-22T03:55:57Z

Warning

Rate limit exceeded

@LinZhihao-723 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 1 minutes and 40 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 03bd5b9 and 973966d.

📒 Files selected for processing (1)

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (5 hunks)

Walkthrough

This pull request updates several C++ files and a submodule. The changes revise the type registration for ReaderOptions by replacing a simple string union with a structured object containing detailed properties. New methods are introduced for schema branch extraction and enhanced log level parsing with improved error handling using std::optional. Method signatures, internal variables, and class constructs—such as the new SchemaTreeFullBranch—have been updated for clarity and maintainability. Additionally, the submodule commit reference has been updated.

Changes

Files	Change Summary
`src/clp_ffi_js/ir/StreamReader.cpp` `src/clp_ffi_js/ir/StructuredIrStreamReader.cpp`	Updated `ReaderOptions` registration to use structured objects; added `get_schema_tree_full_branch_from_filter_option` method; applied `[[nodiscard]]` attribute to `dump_json_with_replace`; and introduced new constants for key handling.
`src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp` `src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp`	Enhanced log level parsing by switching to `std::optional` and adding `parse_log_level_from_value`; introduced a new nested class `SchemaTreeFullBranch`; modified handling of log events and schema tree node insertion; and revised member variables and method signatures accordingly.
`src/submodules/…/clp`	Updated subproject commit reference from `2aa5c5c713b15f7b7dd911a746e386b306a95242` to `5dc26c2c4a00b3ededc247a23a2c83e2a5473894`.

Sequence Diagram(s)

sequenceDiagram
    participant JS as JavaScript Filter Option
    participant SIR as StructuredIrStreamReader
    participant Branch as SchemaTreeFullBranch
    JS->>SIR: Call get_schema_tree_full_branch_from_filter_option(filter_option)
    alt filter_option is null
        SIR-->>JS: Return std::nullopt
    else
        SIR->>SIR: Extract isAutoGenerated and parts array
        SIR-->>JS: Return SchemaTreeFullBranch object
    end

sequenceDiagram
    participant Event as Log Event
    participant SIU as StructuredIrUnitHandler
    Event->>SIU: Trigger handle_log_event
    SIU->>SIU: Invoke parse_log_level(log_event.string_value)
    SIU->>SIU: Retrieve log level and timestamp from log_event
    SIU-->>Event: Process event with parsed log level and timestamp

Possibly related PRs

Add log-level filtering methods and associated refactoring: #12: The changes in the main PR, which modify the type registration for ReaderOptions, are related to the changes in the retrieved PR that enhance the StreamReader class with new methods for filtering log events.
Refactor StreamReader to modularize decoding logic. #22: The changes in the main PR regarding the type registration for ReaderOptions in StreamReader.cpp are related to the modifications in the StreamReader class in the retrieved PR, which also involves restructuring and enhancing the handling of data within the same file.
Update DecodedResultsTsType to use bigint in Emscripten binding (fixes #33). #34: The changes in the main PR regarding the type registration for ReaderOptions in StreamReader.cpp are related to the modifications in the retrieved PR that also involve updates to the StreamReader.cpp, specifically focusing on type declarations and error handling within the same class.

Suggested reviewers

junhaoliao
davemarco

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

LinZhihao-723 · 2025-02-22T03:57:01Z

src/clp_ffi_js/ir/StreamReader.cpp

@@ -118,7 +118,7 @@ EMSCRIPTEN_BINDINGS(ClpStreamReader) {
    emscripten::register_type<clp_ffi_js::ir::DataArrayTsType>("Uint8Array");
    emscripten::register_type<clp_ffi_js::ir::LogLevelFilterTsType>("number[] | null");
    emscripten::register_type<clp_ffi_js::ir::ReaderOptions>(
-            "{logLevelKey: string, timestampKey: string} | null"
+            "{logLevelKey: [bool, Array<string>], timestampKey: [bool, Array<string>]} | null"


I don't see any documentation directly related to this JS type. If there does exist such doc, we need to update them accordingly.

The code has not been reviewed but will look something like this (subtype here)

{
logLevelKey:{hasAutoPrefix: boolean; splitKey: string[]},
timestampKey:{hasAutoPrefix: boolean; splitKey: string[]}
} however, may change pending review...

We might want to have an agreement on these namings first.

For hasAutoPrefix, I'd prefer to explicitly call it isAutoGenerated

For splitKey, I don't have any comments

@kirkrodrigues @junhaoliao what do u think?

Yes to isAutoGenerated

At first glance, splitKey sounds like some string we use to split another string up. How about parts (akin to pathlib's path parts)?

I see. I would still rather that StructuredIrReaderOptions used isAutoGenerated to keep the IR interface logical. I think ParsedKey can still use hasAutoPrefix. So in effect, I'm saying:

we should add a new type for StructuredIrReaderOptions to use that will use the name isAutoGenerated.

we should translate from hasAutoPrefix to isAutoGenerated when constructing StructuredIrReaderOptions.

Is that possible?

I would still rather that StructuredIrReaderOptions used isAutoGenerated to keep the IR interface logical.

I agree. Using hasAutoPrefix together with parts could be confusing since the path (the first key in the parts) should already have the auto-prefix stripped.

okay isAutoGenerated is good

@LinZhihao-723, you might need to update this now, but maybe the array indexes will still work. idk

Done. Pushed in the latest commit. Notice that I also updated the code to support null input, so that the reader options are not all required.

LinZhihao-723 · 2025-02-22T03:58:08Z

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp

-        return timestamp;
+
+    auto const timestamp_node_id = m_optional_timestamp_node_id.value();
+    if (false == id_value_pairs.contains(timestamp_node_id)) {


LinZhihao-723 · 2025-02-22T03:58:23Z

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp

-    auto const& optional_log_level_value{id_value_pairs.at(m_log_level_node_id.value())};
+
+    auto const log_level_node_id = m_optional_log_level_node_id.value();
+    if (false == id_value_pairs.contains(log_level_node_id)) {


This check was missing before this PR.

This is good catch

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp

LinZhihao-723 · 2025-02-22T04:02:12Z

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp

+    std::optional<clp::ffi::SchemaTree::Node::id_t> m_optional_log_level_node_id;
+    std::optional<clp::ffi::SchemaTree::Node::id_t> m_optional_timestamp_node_id;


Dropped the type alias and used std::optional explicitly to follow our C++ coding convention.

LinZhihao-723

Also I added some log statements for "Protocol Error". Technically these errors should be non-recoverable. Do we have sth similar as exceptions in js ffi? @junhaoliao

davemarco

Some high level comments first to pipeline review. Still looking at actual code

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp

davemarco · 2025-02-24T17:38:16Z

src/clp_ffi_js/ir/StreamReader.cpp

@@ -118,7 +118,7 @@ EMSCRIPTEN_BINDINGS(ClpStreamReader) {
    emscripten::register_type<clp_ffi_js::ir::DataArrayTsType>("Uint8Array");
    emscripten::register_type<clp_ffi_js::ir::LogLevelFilterTsType>("number[] | null");
    emscripten::register_type<clp_ffi_js::ir::ReaderOptions>(
-            "{logLevelKey: string, timestampKey: string} | null"
+            "{logLevelKey: [bool, Array<string>], timestampKey: [bool, Array<string>]} | null"


The code has not been reviewed but will look something like this (subtype here)

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp

LinZhihao-723 · 2025-02-25T04:48:48Z

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp

 }  // namespace

+auto StructuredIrUnitHandler::SchemaTreeFullBranch::match(


This function had a bug and now it's been fixed.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (4)
42-51: Consider improving parameter naming.

The constructor parameter is_auto_gen is inconsistent with the member variable name m_is_auto_generated. Consider using identical naming for better readability.

Also, the member variable is named m_leaf_to_root_path but the parameter is root_to_leaf_path, which might be confusing. The direction is reversed during initialization, but the naming should better reflect this transformation.
 SchemaTreeFullBranch(
-        bool is_auto_gen,
+        bool is_auto_generated,
         std::vector<std::string> root_to_leaf_path,
         clp::ffi::SchemaTree::Node::Type leaf_type
 )
         : m_is_auto_generated{is_auto_gen},
           m_leaf_to_root_path{std::move(root_to_leaf_path)},
           m_leaf_type{leaf_type} {
     std::ranges::reverse(m_leaf_to_root_path);
 }
67-67: Use explicit return.

For consistent style and improved readability, consider using an explicit return statement rather than a single-line return for the is_auto_generated() method.
-[[nodiscard]] auto is_auto_generated() const -> bool { return m_is_auto_generated; }
+[[nodiscard]] auto is_auto_generated() const -> bool {
+    return m_is_auto_generated;
+}
89-97: Update documentation to match parameter changes.

The documentation for the constructor (lines 89-90) still references the old parameter names log_level_key and timestamp_key, which should be updated to reflect the new parameters log_level_full_branch and timestamp_full_branch.
 /**
  * @param deserialized_log_events The vector in which to store deserialized log events.
- * @param log_level_key Key name of schema-tree node that contains the authoritative log level.
- * @param timestamp_key Key name of schema-tree node that contains the authoritative timestamp.
+ * @param log_level_full_branch Branch path to schema-tree node that contains the authoritative log level.
+ * @param timestamp_full_branch Branch path to schema-tree node that contains the authoritative timestamp.
  */
122-134: Documentation not updated for new parameter.

The documentation block for handle_schema_tree_node_insertion method has a new parameter schema_tree on line 127, but lacks a description of this parameter in the documentation.
 /**
  * Saves the node's ID if it corresponds to events' authoritative log level or timestamp
  * kv-pair.
  * @param is_auto_generated
  * @param schema_tree_node_locator
+ * @param schema_tree The schema tree to match against
  * @return IRErrorCode::IRErrorCode_Success
  */

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d5395e and 62d83bd.

📒 Files selected for processing (2)

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (6 hunks)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ...

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: lint (ubuntu-latest)
GitHub Check: lint (macos-latest)

🔇 Additional comments (4)

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (4)

4-4: Added include for algorithm header.

The include for <algorithm> is properly added to support the use of std::ranges::reverse in the SchemaTreeFullBranch constructor.

28-84: Well-structured new class implementation.

The SchemaTreeFullBranch class is well-designed with clear documentation, proper move semantics, and deleted copy operations. The implementation correctly encapsulates a branch path from root to leaf in a schema tree.

A few observations:

Constructor reverses the path for internal storage, which seems intentional for efficient tree traversal

Good use of [[nodiscard]] attribute for methods that return values

The declaration of match() method has its implementation elsewhere, following good separation practices

145-163: Documentation updated to reflect new optional member variables.

The documentation for both get_log_level and get_timestamp methods has been properly updated to reference the new optional member variables. This improves clarity on when default values are returned.

168-172: Renamed member variables for improved clarity.

The member variables have been renamed with the m_optional_ prefix to better reflect their purpose and type. This is a good practice that makes the code more self-documenting.

According to the past review comment on line 171-172, using std::optional explicitly follows the C++ coding convention for this project, which is good.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (3)
47-49: Potential confusion between parameter and member variable naming

The parameter name root_to_leaf_path suggests one direction (root→leaf), but it's stored in a member variable named m_leaf_to_root_path and then reversed. This could be confusing to future readers of the code.

Consider either:

Renaming the parameter to match the storage direction (e.g., leaf_to_root_path and skip the reversal), or

Renaming the member variable to match the parameter (e.g., m_root_to_leaf_path and keep the reversal)

52-55: Unnecessary defaulted move operations

The defaulted move constructor and assignment operator are not necessary since the class doesn't have any special resource management beyond what's handled by its member variables' own move operations.

These declarations can be safely removed as the compiler will implicitly generate them with identical behavior:
-        // Default move constructor and assignment operator
-        SchemaTreeFullBranch(SchemaTreeFullBranch&&) = default;
-        auto operator=(SchemaTreeFullBranch&&) -> SchemaTreeFullBranch& = default;
159-160: Inconsistent member variable name in documentation

The comment refers to m_optional_timestamp_id but the actual member variable is named m_optional_timestamp_node_id.
-     * - `m_optional_timestamp_id` is unset.
-     * - `m_optional_timestamp_id` is set but not appearing in the given node-id-value pairs.
+     * - `m_optional_timestamp_node_id` is unset.
+     * - `m_optional_timestamp_node_id` is set but not appearing in the given node-id-value pairs.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62d83bd and 4555da7.

📒 Files selected for processing (2)

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (5 hunks)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ...

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: lint (ubuntu-latest)
GitHub Check: lint (macos-latest)

🔇 Additional comments (3)

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (3)

28-83: Well-structured new class for schema tree branch representation

The new SchemaTreeFullBranch class provides a clean way to represent branches from root to leaf node in a schema tree. This improves the handling of hierarchical log level and timestamp keys.

132-132: Good parameter choice for schema_tree

Passing the schema tree as a const reference to a shared pointer is efficient and appropriate for this use case.

167-171: Improved structure for storing branch information

Replacing simple string keys with full branch objects allows for more flexible and structured handling of hierarchical paths in the schema tree.

davemarco

Went through again and added some smaller nits

davemarco · 2025-02-25T18:48:18Z

src/clp_ffi_js/ir/StreamReader.cpp

@@ -118,7 +118,7 @@ EMSCRIPTEN_BINDINGS(ClpStreamReader) {
    emscripten::register_type<clp_ffi_js::ir::DataArrayTsType>("Uint8Array");
    emscripten::register_type<clp_ffi_js::ir::LogLevelFilterTsType>("number[] | null");
    emscripten::register_type<clp_ffi_js::ir::ReaderOptions>(
-            "{logLevelKey: string, timestampKey: string} | null"
+            "{logLevelKey: [bool, Array<string>], timestampKey: [bool, Array<string>]} | null"


@LinZhihao-723, you might need to update this now, but maybe the array indexes will still work. idk

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp

Co-authored-by: davemarco <83603688+davemarco@users.noreply.github.com>

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (1)
87-90: 🛠️ Refactor suggestion

Update parameter documentation to match new parameter types.

The documentation still refers to the old parameter names (log_level_key and timestamp_key) which were strings. Update it to reflect the new parameter types (SchemaTreeFullBranch).
-     * @param log_level_key Key name of schema-tree node that contains the authoritative log level.
-     * @param timestamp_key Key name of schema-tree node that contains the authoritative timestamp.
+     * @param log_level_full_branch Branch in the schema tree that contains the authoritative log level.
+     * @param timestamp_full_branch Branch in the schema tree that contains the authoritative timestamp.

🧹 Nitpick comments (3)

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (3)
37-37: Parameter naming should be consistent.

The parameter name is_auto_gen is inconsistent with the member variable m_is_auto_generated and the method name is_auto_generated(). Consider using is_auto_generated for consistency with the rest of the codebase.
-         * @param is_auto_gen
+         * @param is_auto_generated
66-66: Use explicit boolean comparison per coding guidelines.

According to the coding guidelines, we should prefer false == <expression> or true == <expression> rather than directly returning the expression.
-        [[nodiscard]] auto is_auto_generated() const -> bool { return m_is_auto_generated; }
+        [[nodiscard]] auto is_auto_generated() const -> bool { return true == m_is_auto_generated; }
47-48: Variable naming inconsistency between parameter and member.

The constructor parameter is named root_to_leaf_path but the member variable is named m_leaf_to_root_path which might be confusing when reading the code. Consider using consistent naming or adding a comment explaining why the path is reversed and stored with a different name.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4555da7 and d5de853.

📒 Files selected for processing (1)

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (5 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ...

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: lint (ubuntu-latest)
GitHub Check: lint (macos-latest)

🔇 Additional comments (3)

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (3)

168-169: Good use of more descriptive variable names.

Renaming these variables to include optional_ in their names provides better clarity about their nature. This makes the code more maintainable by explicitly indicating that these variables might not have a value.

27-83: Well-structured new class implementation.

The SchemaTreeFullBranch class is a well-designed addition that encapsulates the concept of a branch in a schema tree. The class properly implements move semantics while preventing copies, and provides a clear interface for checking if a branch is auto-generated and matching against a schema tree.

126-133: Method parameter addition improves functionality.

Adding the schema_tree parameter to the handle_schema_tree_node_insertion method enables the functionality to match schema tree branches, which is essential for supporting the hierarchical keys feature described in the PR objectives.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (1)
48-57: Potential documentation mismatch.

The comment references get_schema_tree_full_path_from_js_reader_option, yet the actual function name is get_schema_tree_full_path_from_reader_option. To avoid confusion, update the doc comment to match the function name.
- * @return A schema tree full branch constructed based on the provided reader option.
- * @return std::nullopt if `option` is `null`.
- */
-[[nodiscard]] auto get_schema_tree_full_path_from_js_reader_option(
+ * @return A schema tree full branch constructed based on the provided reader option.
+ * @return std::nullopt if `option` is `null`.
+ */
+[[nodiscard]] auto get_schema_tree_full_path_from_reader_option(
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (1)

27-83: Well-structured branch representation class.

SchemaTreeFullBranch neatly encapsulates a branch from the schema tree, reversing the path to store leaf-to-root. Document the rationale for reversing the vector to prevent confusion. The match method signature looks coherent, though ensure thorough testing for partial matches and unexpected input.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5de853 and 2093972.

📒 Files selected for processing (5)

src/clp_ffi_js/ir/StreamReader.cpp (1 hunks)
src/clp_ffi_js/ir/StreamReader.hpp (1 hunks)
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (5 hunks)
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp (5 hunks)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/clp_ffi_js/ir/StreamReader.cpp
src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ...

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StreamReader.hpp
src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: lint (ubuntu-latest)
GitHub Check: lint (macos-latest)

🔇 Additional comments (12)

src/clp_ffi_js/ir/StreamReader.hpp (1)

28-28: No concerns about new type declaration.

The addition of ReaderOption appears consistent and aligns with your existing types (DataArrayTsType, LogLevelFilterTsType). If there are no tests for the new type, consider adding a simple unit test to verify its usage and ensure future compatibility.

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (7)

6-15: Straightforward new includes.

Including <optional> and <clp/ffi/SchemaTree.hpp> is appropriate for the optional logic and schema-related functionality introduced. No issues observed.

32-33: Use of descriptive constant names.

Defining cReaderOptionIsAutoGeneratedKey and cReaderOptionPartsKey aligns with existing naming conventions. This helps ensure clarity when accessing the JS object properties.

46-47: Appropriate use of [[nodiscard]].

Marking dump_json_with_replace as [[nodiscard]] is a good practice to avoid inadvertently ignoring the returned string. No issues with this change.

63-75: Validate array content.

This function correctly handles null checks and extracts data from option. Consider additional validation for the parts array (e.g. ensuring the array is non-empty for a valid path) to avoid unexpected states in downstream logic.

99-106: Consistent usage of reader options.

Replacing direct string usage with calls to get_schema_tree_full_path_from_reader_option for both the log level and timestamp keys aligns with the PR objective. Good to see that log levels default to string type and timestamps default to integer type.

135-136: Consistent with coding guidelines.

Using if (false == ...) instead of if (!...) adheres to the project’s guidelines. The behaviour for returning null() when no filter map exists is clear.

160-161: Loop condition looks appropriate.

Looping until deserializer.is_stream_completed() is false is consistent. Just confirm that there are no edge cases that might lead to an infinite loop if the condition never evaluates to true.

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (4)

4-4: New include is acceptable.

The <algorithm> header is presumably for std::ranges::reverse. This is fine.

94-99: Optional schema tree branches well-integrated.

Switching to std::optional<SchemaTreeFullBranch> for both log level and timestamp clarifies that these branches are optional. This is a clean approach that preserves the original constructor’s intent.

151-162: Documentation is consistent and clear.

The docstrings for get_log_level and get_timestamp correctly outline fallback behaviour. The usage of LogLevel::None for unexpected values is a commendable solution. Confirm that all out-of-range scenarios are handled.
[approve]

165-169: Optional node IDs.

Storing the node IDs as std::optional matches your approach for the full branch. This ensures you skip logic when the node IDs are unavailable.

junhaoliao · 2025-02-26T20:39:04Z

The timestamp part (for cache busting?) may not have any effects as the timestamp value is not populated in the regular string.

const timestamp = Date.now();
import ModuleInit from "./cmake-build-release/ClpFfiJs-node.js?${timestamp}"

if we really want to make it work, we can write

const ModuleInit = (await import(`./cmake-build-debug/ClpFfiJs-node.js?${timestamp}`)).default;

junhaoliao · 2025-02-26T20:43:38Z

@LinZhihao-723 could you also share the IR file used in the validation as an attachment in the PR description?

LinZhihao-723 · 2025-02-26T21:04:53Z

@LinZhihao-723 could you also share the IR file used in the validation as an attachment in the PR description?

It's the same file in #60

junhaoliao

overall lgtm

at a high level: i understand that the current APIs don't provide a convenient way to report errors if the user specified log-level / timestamp key is not found in the schema tree. however, if after going through all nodes we still can't find the path specified by the user, can we log a message to help with debugging? (instead of silently treating all log events's log level as UNKNOWN for example)

src/clp_ffi_js/ir/StreamReader.cpp

LinZhihao-723 · 2025-02-27T22:00:19Z

Resolved by applying what @junhaoliao suggested

davemarco

Small final changes

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp

Co-authored-by: davemarco <83603688+davemarco@users.noreply.github.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (1)
67-69: Use preferred negation style.

The coding guidelines specify preferring false == <expression> over !<expression>.
-    if (filter_option.isNull() || filter_option.isUndefined()) {
+    if (false == !filter_option.isNull() || false == !filter_option.isUndefined()) {
         return std::nullopt;
     }
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (2)
38-40: Add more detailed parameter documentation.

Consider expanding the documentation for the constructor parameters to clarify their purpose.
        /**
         * @param is_auto_gen
-         * @param root_to_leaf_path
-         * @param leaf_type
+         * @param root_to_leaf_path Array of keys forming the path from root to leaf
+         * @param leaf_type Type of the leaf node in the schema tree
         */
47-49: Document the path reversal operation.

The constructor takes a root-to-leaf path but stores it as a leaf-to-root path. This transformation should be clearly documented.
                : m_is_auto_generated{is_auto_gen},
                  m_leaf_to_root_path{std::move(root_to_leaf_path)},
                  m_leaf_type{leaf_type} {
+            // Reverse the path to store it from leaf to root for easier traversal
             std::ranges::reverse(m_leaf_to_root_path);
         }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f46af70 and c364896.

📒 Files selected for processing (2)

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (5 hunks)
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (4 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ...

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: lint (ubuntu-latest)
GitHub Check: lint (macos-latest)

🔇 Additional comments (10)

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp (6)

32-33: String view constants should follow naming convention.

Consider using the constexpr std::string_view cFilterOptionAutoGeneratedKey{"isAutoGenerated"}; naming style to be more consistent with other constants in the file.
-constexpr std::string_view cFilterOptionIsAutoGeneratedKey{"isAutoGenerated"};
-constexpr std::string_view cFilterOptionPartsKey{"parts"};
+constexpr std::string_view cFilterOptionAutoGeneratedKey{"isAutoGenerated"};
+constexpr std::string_view cFilterOptionPartsKey{"parts"};
48-53: Improve documentation comments for get_schema_tree_full_branch_from_filter_option.

The documentation should clearly specify the leaf node type parameter and match the actual function signature.
 /**
  * @param filter_option The JavaScript object representing a filter option.
  * @param leaf_node_type The type of the leaf node in the constructed schema tree branch.
- * @return A schema tree full branch constructed based on the provided reader option.
+ * @return A schema tree full branch constructed based on the provided filter option and leaf node type.
  * @return std::nullopt if `option` is `null`.
  */
54-57: Function declaration could be formatted more consistently.

The closing parenthesis should be consistent with the opening parenthesis formatting style.
 [[nodiscard]] auto get_schema_tree_full_branch_from_filter_option(
         emscripten::val const& filter_option,
-        clp::ffi::SchemaTree::Node::Type leaf_node_type
-) -> std::optional<StructuredIrUnitHandler::SchemaTreeFullBranch>;
+        clp::ffi::SchemaTree::Node::Type leaf_node_type) 
+        -> std::optional<StructuredIrUnitHandler::SchemaTreeFullBranch>;
67-69: Check for both null and undefined for safety.

This is good defensive programming to check for both null and undefined values.

70-74: Using key constants correctly in the function implementation.

Make sure to use the constant variables defined above rather than hardcoding string literals.
     return StructuredIrUnitHandler::SchemaTreeFullBranch{
             filter_option[cFilterOptionIsAutoGeneratedKey.data()].as<bool>(),
             emscripten::vecFromJSArray<std::string>(filter_option[cFilterOptionPartsKey.data()]),
             leaf_node_type
     };
99-106: Function calls to get schema tree branches are well-structured.

The code correctly sets the appropriate node types for log level (string) and timestamp (integer) as required by the design.
src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp (4)

94-96: Constructor parameters renamed correctly.

The parameters are appropriately renamed from string keys to schema branch objects, matching the updated implementation.

130-133: Added schema_tree parameter to handle_schema_tree_node_insertion.

This is a good improvement that provides the function with the necessary context to properly match full branches.

165-169: Renamed member variables for clarity.

The member variables have been renamed with the 'optional' prefix to better indicate their nature.

74-77:
❓ Verification inconclusive

Verify match method declaration.

The match method is declared but its implementation isn't in the header file. Ensure it's properly implemented in the corresponding .cpp file.

🏁 Script executed:
#!/bin/bash
# Check if the match method is implemented in the cpp file
grep -A 15 "auto SchemaTreeFullBranch::match" src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp
Length of output: 93
Manual Verification Required: Confirm Implementation of the match Method

The current code in src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp declares the match method, yet our grep search in src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp using "auto SchemaTreeFullBranch::match" returned no results. This might indicate that the method is either not implemented or is defined under a different class name (e.g. StructuredIrUnitHandler::match). Please verify manually that:

The implementation of match exists in src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp.

The implementation’s class name matches the header declaration (i.e. use StructuredIrUnitHandler if that is the intended class).

davemarco · 2025-02-28T15:05:58Z

Also rementioning title:
feat: Add support for hierarchical and auto-generated keys for log filtering.

LinZhihao-723 · 2025-02-28T15:20:17Z

Changed the title with ! to indicate that this is a breaking change (since it changes the js level API)
Also I will need a re-approval since my last push invalidated ur approval. @davemarco

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp

Co-authored-by: davemarco <83603688+davemarco@users.noreply.github.com>

davemarco

lgtm

junhaoliao

building on yesterday's suggestion, I left a high-level comment. the rest lgtm

junhaoliao · 2025-02-28T15:52:09Z

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp

@@ -67,8 +98,14 @@ auto StructuredIrStreamReader::create(
            *zstd_decompressor,
            StructuredIrUnitHandler{
                    deserialized_log_events,
-                    reader_options[cReaderOptionsLogLevelKey.data()].as<std::string>(),
-                    reader_options[cReaderOptionsTimestampKey.data()].as<std::string>()
+                    get_schema_tree_full_branch_from_filter_option(


For ease of debugging -

can we print a log in deserialize_stream() after all IR units are decoded: if the m_optional_log_level_full_branch has value but m_optional_log_level_node_id is not populated, warn the user that the node cannot be found. Similarly for the timestamp we should also warn the user.

into key-search

LinZhihao-723 added 11 commits February 18, 2025 16:29

Add auto-gen key support

bcd3b69

Update...

872b69d

Apply code review comments

dd923c9

Update...

a6b70c5

Don't use .data()

bbf7340

Implement hierarchical log-level and timestamp key search.

165a890

Merge oss main

4618e0c

Latest clp

2a02ff8

Switching to my desktop

483079a

Refactoring

4dc0ab8

Done

9d5395e

LinZhihao-723 commented Feb 22, 2025

View reviewed changes

LinZhihao-723 requested a review from davemarco February 22, 2025 04:03

LinZhihao-723 commented Feb 22, 2025

View reviewed changes

davemarco requested changes Feb 24, 2025

View reviewed changes

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp Show resolved Hide resolved

src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp Outdated Show resolved Hide resolved

Apply code review comments

62d83bd

LinZhihao-723 commented Feb 25, 2025

View reviewed changes

coderabbitai bot reviewed Feb 25, 2025

View reviewed changes

Fix unused header

4555da7

LinZhihao-723 requested a review from davemarco February 25, 2025 05:04

coderabbitai bot reviewed Feb 25, 2025

View reviewed changes

davemarco requested changes Feb 25, 2025

View reviewed changes

davemarco reviewed Feb 25, 2025

View reviewed changes

src/clp_ffi_js/ir/StructuredIrUnitHandler.hpp Outdated Show resolved Hide resolved

Apply suggestions from code review

d5de853

Co-authored-by: davemarco <83603688+davemarco@users.noreply.github.com>

coderabbitai bot reviewed Feb 25, 2025

View reviewed changes

LinZhihao-723 added 2 commits February 25, 2025 21:56

Finalize js layer type.

2093972

Apply code review comments.

65837c9

coderabbitai bot reviewed Feb 26, 2025

View reviewed changes

junhaoliao requested changes Feb 27, 2025

View reviewed changes

src/clp_ffi_js/ir/StreamReader.cpp Outdated Show resolved Hide resolved

src/clp_ffi_js/ir/StreamReader.cpp Outdated Show resolved Hide resolved

LinZhihao-723 added 3 commits February 27, 2025 16:41

Apply code reivew comments

b8c9cf6

Remove header declaration of ReaderOption

f12cf04

Fixing...

f46af70

LinZhihao-723 requested a review from davemarco February 27, 2025 22:00

davemarco previously approved these changes Feb 28, 2025

View reviewed changes

Apply suggestions from code review

c364896

Co-authored-by: davemarco <83603688+davemarco@users.noreply.github.com>

LinZhihao-723 dismissed davemarco’s stale review via c364896 February 28, 2025 14:57

LinZhihao-723 requested a review from a team as a code owner February 28, 2025 14:57

coderabbitai bot reviewed Feb 28, 2025

View reviewed changes

Update src/clp_ffi_js/ir/StructuredIrStreamReader.cpp

03bd5b9

LinZhihao-723 changed the title ~~feat: Add support for decoding user-defined timestamps and log levels from hierarchical keys.~~ feat!: Add support for hierarchical and auto-generated keys for log filtering. Feb 28, 2025

Linting

c0120ec

davemarco reviewed Feb 28, 2025

View reviewed changes

src/clp_ffi_js/ir/StructuredIrStreamReader.cpp Show resolved Hide resolved

Update src/clp_ffi_js/ir/StructuredIrStreamReader.cpp

973966d

Co-authored-by: davemarco <83603688+davemarco@users.noreply.github.com>

davemarco previously approved these changes Feb 28, 2025

View reviewed changes

junhaoliao requested changes Feb 28, 2025

View reviewed changes

LinZhihao-723 added 2 commits February 28, 2025 11:34

Add log statements.

15211e2

Merge branch 'key-search' of https://github.com/LinZhihao-723/clp-ffi-js

eff990b

into key-search

LinZhihao-723 dismissed davemarco’s stale review via eff990b February 28, 2025 16:34

LinZhihao-723 requested a review from junhaoliao February 28, 2025 16:35

junhaoliao approved these changes Feb 28, 2025

View reviewed changes

LinZhihao-723 merged commit 737c9d5 into y-scope:main Feb 28, 2025
2 checks passed

LinZhihao-723 deleted the key-search branch February 28, 2025 16:49

		std::optional<clp::ffi::SchemaTree::Node::id_t> m_optional_log_level_node_id;
		std::optional<clp::ffi::SchemaTree::Node::id_t> m_optional_timestamp_node_id;

		} // namespace

		auto StructuredIrUnitHandler::SchemaTreeFullBranch::match(

feat!: Add support for hierarchical and auto-generated keys for log filtering. #62

feat!: Add support for hierarchical and auto-generated keys for log filtering. #62

Conversation

LinZhihao-723 commented Feb 22, 2025 • edited by coderabbitai bot Loading

Description

Checklist

Validation performed

Summary by CodeRabbit

coderabbitai bot commented Feb 22, 2025 • edited Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davemarco Feb 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LinZhihao-723 left a comment

Choose a reason for hiding this comment

davemarco left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

davemarco left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

junhaoliao commented Feb 26, 2025

junhaoliao commented Feb 26, 2025

LinZhihao-723 commented Feb 26, 2025

junhaoliao left a comment

Choose a reason for hiding this comment

LinZhihao-723 commented Feb 27, 2025

davemarco left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

davemarco commented Feb 28, 2025 • edited Loading

LinZhihao-723 commented Feb 28, 2025

davemarco left a comment

Choose a reason for hiding this comment

junhaoliao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LinZhihao-723 commented Feb 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 22, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

davemarco Feb 24, 2025 •

edited

Loading

davemarco commented Feb 28, 2025 •

edited

Loading