Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Add support for hierarchical and auto-generated keys for log filtering. #62

Merged
merged 25 commits into from
Feb 28, 2025
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
bcd3b69
Add auto-gen key support
LinZhihao-723 Feb 18, 2025
872b69d
Update...
LinZhihao-723 Feb 18, 2025
dd923c9
Apply code review comments
LinZhihao-723 Feb 19, 2025
a6b70c5
Update...
LinZhihao-723 Feb 20, 2025
bbf7340
Don't use .data()
LinZhihao-723 Feb 20, 2025
165a890
Implement hierarchical log-level and timestamp key search.
LinZhihao-723 Feb 20, 2025
4618e0c
Merge oss main
LinZhihao-723 Feb 21, 2025
2a02ff8
Latest clp
LinZhihao-723 Feb 21, 2025
483079a
Switching to my desktop
LinZhihao-723 Feb 22, 2025
4dc0ab8
Refactoring
LinZhihao-723 Feb 22, 2025
9d5395e
Done
LinZhihao-723 Feb 22, 2025
62d83bd
Apply code review comments
LinZhihao-723 Feb 25, 2025
4555da7
Fix unused header
LinZhihao-723 Feb 25, 2025
d5de853
Apply suggestions from code review
LinZhihao-723 Feb 25, 2025
2093972
Finalize js layer type.
LinZhihao-723 Feb 26, 2025
65837c9
Apply code review comments.
LinZhihao-723 Feb 26, 2025
b8c9cf6
Apply code reivew comments
LinZhihao-723 Feb 27, 2025
f12cf04
Remove header declaration of ReaderOption
LinZhihao-723 Feb 27, 2025
f46af70
Fixing...
LinZhihao-723 Feb 27, 2025
c364896
Apply suggestions from code review
LinZhihao-723 Feb 28, 2025
03bd5b9
Update src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
LinZhihao-723 Feb 28, 2025
c0120ec
Linting
LinZhihao-723 Feb 28, 2025
973966d
Update src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
LinZhihao-723 Feb 28, 2025
15211e2
Add log statements.
LinZhihao-723 Feb 28, 2025
eff990b
Merge branch 'key-search' of https://github.com/LinZhihao-723/clp-ffi…
LinZhihao-723 Feb 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/clp_ffi_js/ir/StreamReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ EMSCRIPTEN_BINDINGS(ClpStreamReader) {
emscripten::register_type<clp_ffi_js::ir::DataArrayTsType>("Uint8Array");
emscripten::register_type<clp_ffi_js::ir::LogLevelFilterTsType>("number[] | null");
emscripten::register_type<clp_ffi_js::ir::ReaderOptions>(
"{logLevelKey: string, timestampKey: string} | null"
"{logLevelKey: [bool, Array<string>], timestampKey: [bool, Array<string>]} | null"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any documentation directly related to this JS type. If there does exist such doc, we need to update them accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code has not been reviewed but will look something like this (subtype here)

Copy link
Contributor

@davemarco davemarco Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
logLevelKey:{hasAutoPrefix: boolean; splitKey: string[]},
timestampKey:{hasAutoPrefix: boolean; splitKey: string[]}
} however, may change pending review...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to have an agreement on these namings first.

  • For hasAutoPrefix, I'd prefer to explicitly call it isAutoGenerated
  • For splitKey, I don't have any comments

@kirkrodrigues @junhaoliao what do u think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Yes to isAutoGenerated
  • At first glance, splitKey sounds like some string we use to split another string up. How about parts (akin to pathlib's path parts)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I would still rather that StructuredIrReaderOptions used isAutoGenerated to keep the IR interface logical. I think ParsedKey can still use hasAutoPrefix. So in effect, I'm saying:

  • we should add a new type for StructuredIrReaderOptions to use that will use the name isAutoGenerated.
  • we should translate from hasAutoPrefix to isAutoGenerated when constructing StructuredIrReaderOptions.

Is that possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still rather that StructuredIrReaderOptions used isAutoGenerated to keep the IR interface logical.

I agree. Using hasAutoPrefix together with parts could be confusing since the path (the first key in the parts) should already have the auto-prefix stripped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay isAutoGenerated is good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LinZhihao-723, you might need to update this now, but maybe the array indexes will still work. idk

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Pushed in the latest commit. Notice that I also updated the code to support null input, so that the reader options are not all required.

);

// JS types used as outputs
Expand Down
13 changes: 11 additions & 2 deletions src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include <clp/Array.hpp>
#include <clp/ErrorCode.hpp>
#include <clp/ffi/ir_stream/Deserializer.hpp>
#include <clp/ffi/SchemaTree.hpp>
#include <clp/ir/types.hpp>
#include <clp/TraceableException.hpp>
#include <emscripten/bind.h>
Expand Down Expand Up @@ -63,12 +64,20 @@ auto StructuredIrStreamReader::create(
ReaderOptions const& reader_options
) -> StructuredIrStreamReader {
auto deserialized_log_events{std::make_shared<StructuredLogEvents>()};

auto const& log_level_config = reader_options[cReaderOptionsLogLevelKey.data()];
auto const& timestamp_config = reader_options[cReaderOptionsTimestampKey.data()];

auto result{StructuredIrDeserializer::create(
*zstd_decompressor,
StructuredIrUnitHandler{
deserialized_log_events,
reader_options[cReaderOptionsLogLevelKey.data()].as<std::string>(),
reader_options[cReaderOptionsTimestampKey.data()].as<std::string>()
{log_level_config[0].as<bool>(),
emscripten::vecFromJSArray<std::string>(log_level_config[1]),
clp::ffi::SchemaTree::Node::Type::Str},
{timestamp_config[0].as<bool>(),
emscripten::vecFromJSArray<std::string>(timestamp_config[1]),
clp::ffi::SchemaTree::Node::Type::Int}
}
)};
if (result.has_error()) {
Expand Down
211 changes: 149 additions & 62 deletions src/clp_ffi_js/ir/StructuredIrUnitHandler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#include <cstddef>
#include <iterator>
#include <memory>
#include <optional>
#include <string>
#include <string_view>
#include <type_utils.hpp>
Expand All @@ -14,6 +15,7 @@
#include <clp/ffi/KeyValuePairLogEvent.hpp>
#include <clp/ffi/SchemaTree.hpp>
#include <clp/ffi/Value.hpp>
#include <clp/ir/EncodedTextAst.hpp>
#include <clp/ir/types.hpp>
#include <clp/time_types.hpp>
#include <emscripten/val.h>
Expand All @@ -28,11 +30,23 @@ namespace {
* Parses a string to determine the corresponding `LogLevel` enum value.
* @param str
* @return `LogLevel` enum corresponding to `str` if `str` matches a string in `cLogLevelNames`.
* @return `LogLevel::NONE` otherwise.
* @return std::nullopt otherwise.
*/
auto parse_log_level(std::string_view str) -> LogLevel;
[[nodiscard]] auto parse_log_level(std::string_view str) -> std::optional<LogLevel>;

auto parse_log_level(std::string_view str) -> LogLevel {
/**
* Parses the log level from the given value.
* @param value
* @return The parsed log level forwarded from `parse_log_level`.
* @return std::nullopt on failures:
* - The given value's type cannot be decoded as a string.
* - Forwards `clp::ir::EncodedTextAst::decode_and_unparse`'s return values.
* - Forwards `parse_log_level`'s return values.
*/
[[nodiscard]] auto parse_log_level_from_value(clp::ffi::Value const& value
) -> std::optional<LogLevel>;

auto parse_log_level(std::string_view str) -> std::optional<LogLevel> {
// Convert the string to uppercase.
std::string log_level_name_upper_case{str};
std::ranges::transform(
Expand All @@ -48,18 +62,88 @@ auto parse_log_level(std::string_view str) -> LogLevel {
log_level_name_upper_case
);
if (it == cLogLevelNames.end()) {
return LogLevel::NONE;
return std::nullopt;
}

return static_cast<LogLevel>(std::distance(cLogLevelNames.begin(), it));
}

auto parse_log_level_from_value(clp::ffi::Value const& value) -> std::optional<LogLevel> {
if (value.is<std::string>()) {
return parse_log_level(value.get_immutable_view<std::string>());
}

if (value.is<clp::ir::FourByteEncodedTextAst>()) {
auto const optional_log_level
= value.get_immutable_view<clp::ir::FourByteEncodedTextAst>().decode_and_unparse();
if (false == optional_log_level.has_value()) {
return std::nullopt;
}
return parse_log_level(optional_log_level.value());
}

if (value.is<clp::ir::EightByteEncodedTextAst>()) {
auto const optional_log_level
= value.get_immutable_view<clp::ir::EightByteEncodedTextAst>().decode_and_unparse();
if (false == optional_log_level.has_value()) {
return std::nullopt;
}
return parse_log_level(optional_log_level.value());
}

SPDLOG_ERROR("Protocol Error: The log level value must be a valid string-convertible type.");
return std::nullopt;
}
} // namespace

auto StructuredIrUnitHandler::SchemaTreeFullBranch::match(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function had a bug and now it's been fixed.

clp::ffi::SchemaTree const& schema_tree,
clp::ffi::SchemaTree::NodeLocator const& leaf_locator
) const -> bool {
if (leaf_locator.get_type() != m_leaf_type) {
return false;
}

auto optional_node_id{schema_tree.try_get_node_id(leaf_locator)};
size_t matched_depth{0};
for (auto const& key : m_leaf_to_root_path) {
if (false == optional_node_id.has_value()) {
break;
}
auto const& node{schema_tree.get_node(optional_node_id.value())};
if (node.get_key_name() != key) {
return false;
}
++matched_depth;
optional_node_id = node.get_parent_id();
}

if (matched_depth != m_leaf_to_root_path.size()) {
// The given leaf-to-root path is shorter than the expected one.
return false;
}

if (optional_node_id.has_value()) {
// The root is not reached yet.
// The given leaf-to-root path is longer than the expected one.
return false;
}

return true;
}

auto StructuredIrUnitHandler::handle_log_event(StructuredLogEvent&& log_event
) -> clp::ffi::ir_stream::IRErrorCode {
auto const& id_value_pairs{log_event.get_user_gen_node_id_value_pairs()};
auto const timestamp = get_timestamp(id_value_pairs);
auto const log_level = get_log_level(id_value_pairs);
auto const timestamp = get_timestamp(
m_timestamp_full_branch.is_auto_generated()
? log_event.get_auto_gen_node_id_value_pairs()
: log_event.get_user_gen_node_id_value_pairs()
);
auto const log_level = get_log_level(
m_log_level_full_branch.is_auto_generated()
? log_event.get_auto_gen_node_id_value_pairs()
: log_event.get_user_gen_node_id_value_pairs()
);

m_deserialized_log_events->emplace_back(std::move(log_event), log_level, timestamp);

Expand All @@ -76,20 +160,29 @@ auto StructuredIrUnitHandler::handle_utc_offset_change(

auto StructuredIrUnitHandler::handle_schema_tree_node_insertion(
bool is_auto_generated,
clp::ffi::SchemaTree::NodeLocator schema_tree_node_locator
clp::ffi::SchemaTree::NodeLocator schema_tree_node_locator,
std::shared_ptr<clp::ffi::SchemaTree const> const& schema_tree
) -> clp::ffi::ir_stream::IRErrorCode {
if (is_auto_generated) {
// TODO: Currently, all auto-generated keys are ignored.
return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Success;
auto const optional_inserted_node_id{schema_tree->try_get_node_id(schema_tree_node_locator)};
if (false == optional_inserted_node_id.has_value()) {
return clp::ffi::ir_stream::IRErrorCode_Corrupted_IR;
}
auto const inserted_node_id{optional_inserted_node_id.value()};

++m_current_node_id;
if (false == m_optional_log_level_node_id.has_value()
&& is_auto_generated == m_log_level_full_branch.is_auto_generated())
{
if (m_log_level_full_branch.match(*schema_tree, schema_tree_node_locator)) {
m_optional_log_level_node_id.emplace(inserted_node_id);
}
}

auto const& key_name{schema_tree_node_locator.get_key_name()};
if (key_name == m_log_level_key) {
m_log_level_node_id.emplace(m_current_node_id);
} else if (key_name == m_timestamp_key) {
m_timestamp_node_id.emplace(m_current_node_id);
if (false == m_optional_timestamp_node_id.has_value()
&& is_auto_generated == m_timestamp_full_branch.is_auto_generated())
{
if (m_timestamp_full_branch.match(*schema_tree, schema_tree_node_locator)) {
m_optional_timestamp_node_id.emplace(inserted_node_id);
}
}

return clp::ffi::ir_stream::IRErrorCode::IRErrorCode_Success;
Expand All @@ -102,65 +195,59 @@ auto StructuredIrUnitHandler::handle_end_of_stream() -> clp::ffi::ir_stream::IRE
auto StructuredIrUnitHandler::get_log_level(
StructuredLogEvent::NodeIdValuePairs const& id_value_pairs
) const -> LogLevel {
LogLevel log_level{LogLevel::NONE};
constexpr LogLevel cDefaultLogLevel{LogLevel::NONE};

if (false == m_log_level_node_id.has_value()) {
return log_level;
if (false == m_optional_log_level_node_id.has_value()) {
return cDefaultLogLevel;
}
auto const& optional_log_level_value{id_value_pairs.at(m_log_level_node_id.value())};

auto const log_level_node_id = m_optional_log_level_node_id.value();
if (false == id_value_pairs.contains(log_level_node_id)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check was missing before this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good catch

return cDefaultLogLevel;
}

auto const& optional_log_level_value = id_value_pairs.at(log_level_node_id);
if (false == optional_log_level_value.has_value()) {
return log_level;
}
auto const log_level_value = optional_log_level_value.value();

if (log_level_value.is<std::string>()) {
auto const& log_level_str = log_level_value.get_immutable_view<std::string>();
log_level = parse_log_level(log_level_str);
} else if (log_level_value.is<clp::ffi::value_int_t>()) {
auto const& log_level_int = log_level_value.get_immutable_view<clp::ffi::value_int_t>();
if (log_level_int >= clp::enum_to_underlying_type(cValidLogLevelsBeginIdx)
&& log_level_int < clp::enum_to_underlying_type(LogLevel::LENGTH))
{
log_level = static_cast<LogLevel>(log_level_int);
}
} else {
auto log_event_idx = m_deserialized_log_events->size();
SPDLOG_ERROR(
"Authoritative log level's value is not an int or string for log event index {}",
log_event_idx
);
SPDLOG_ERROR("Protocol error: The log level cannot be an empty value.");
return cDefaultLogLevel;
}

auto const optional_log_level = parse_log_level_from_value(optional_log_level_value.value());
if (false == optional_log_level.has_value()) {
auto const log_event_idx = m_deserialized_log_events->size();
SPDLOG_INFO("Failed to parse log level for log event index {}", log_event_idx);
return cDefaultLogLevel;
}

return log_level;
return optional_log_level.value();
}

auto StructuredIrUnitHandler::get_timestamp(
StructuredLogEvent::NodeIdValuePairs const& id_value_pairs
) const -> clp::ir::epoch_time_ms_t {
clp::ir::epoch_time_ms_t timestamp{0};

if (false == m_timestamp_node_id.has_value()) {
return timestamp;
constexpr clp::ir::epoch_time_ms_t cDefaultTimestamp{0};
if (false == m_optional_timestamp_node_id.has_value()) {
return cDefaultTimestamp;
}
auto const& optional_timestamp_value{id_value_pairs.at(m_timestamp_node_id.value())};
if (false == optional_timestamp_value.has_value()) {
return timestamp;

auto const timestamp_node_id = m_optional_timestamp_node_id.value();
if (false == id_value_pairs.contains(timestamp_node_id)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

return cDefaultTimestamp;
}
auto const timestamp_value = optional_timestamp_value.value();

if (timestamp_value.is<clp::ffi::value_int_t>()) {
timestamp = static_cast<clp::ir::epoch_time_ms_t>(
timestamp_value.get_immutable_view<clp::ffi::value_int_t>()
);
} else {
// TODO: Add support for parsing string-type timestamp values.
auto log_event_idx = m_deserialized_log_events->size();
SPDLOG_ERROR(
"Authoritative timestamp's value is not an int for log event index {}",
log_event_idx
);
auto const& optional_ts = id_value_pairs.at(timestamp_node_id);
if (false == optional_ts.has_value()) {
SPDLOG_ERROR("Protocol error: The timestamp cannot be an empty value.");
return cDefaultTimestamp;
}

return timestamp;
auto const& timestamp{optional_ts.value()};
if (false == timestamp.is<clp::ffi::value_int_t>()) {
SPDLOG_ERROR("Protocol error: The timestamp value must be a valid integer.");
return cDefaultTimestamp;
}
return static_cast<clp::ir::epoch_time_ms_t>(
timestamp.get_immutable_view<clp::ffi::value_int_t>()
);
}
} // namespace clp_ffi_js::ir
Loading
Loading