Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add InvalidCharError::char #124

Closed
wants to merge 1 commit into from

Conversation

tankyleo
Copy link
Contributor

@tankyleo tankyleo commented Nov 9, 2024

Returns a char if the invalid byte is ascii, otherwise returns the raw u8.

Deprecates InvalidCharError::invalid_char.

Partially addresses #100

Returns a `char` if the invalid byte is ascii, otherwise returns the raw
`u8`.

Deprecates `InvalidCharError::invalid_char`.
@tankyleo
Copy link
Contributor Author

tankyleo commented Nov 9, 2024

This does not handle multi-byte characters yet.

  • All HexToBytesIter needs is a T: Iterator<Item = [u8; 2]> + ExactSizeIterator, as shown in HexToBytesIter::from_pairs. I am reluctant to add more constraints on the T to peek for multi-byte characters in an error scenario.
  • Even if we did handle multi-byte characters, not all of them would result in a nice visual picture. Is it worth the trouble then to handle multi-byte chars? I am not convinced.

@tankyleo
Copy link
Contributor Author

tankyleo commented Nov 9, 2024

Is HexToBytesIter::from_pairs meant to be part of the public API ? Right now it is a pub fn inside a private module.

@apoelstra
Copy link
Member

This does not handle multi-byte characters yet.

In this case the returned char is incorrect and we can't move forward with this approach.

@tankyleo
Copy link
Contributor Author

In this case the returned char is incorrect and we can't move forward with this approach.

At the moment we only return a char if it is a single-byte character right ?

I agree that multi-byte chars would need more work.

@apoelstra
Copy link
Member

At the moment we only return a char if it is a single-byte character right ?

No, we currently do not return a char under any circumstances, because the error type does not contain enough information to return a char.

@tankyleo
Copy link
Contributor Author

Right I was talking about this current commit here all good.

@apoelstra
Copy link
Member

apoelstra commented Nov 10, 2024

Oh, I see! I misunderstood your initial approach.

This is clever but I am still going to concept-NACK it. If users want to do this (treat ASCII specially) it is easy for them to do it themselves -- but this change forces them to do so, which adds extra code complexity.

It's also an ugly API. While the motivation makes sense, in terms of our existing code layout which prevents us from unconditionally getting a char, I don't think this is something we want to expose in a 1.0 version of the crate.

@tankyleo
Copy link
Contributor Author

Sounds great I agree.

Was there another approach you had in mind for #100 ? If I understand you correctly we would not address this issue.

@apoelstra
Copy link
Member

apoelstra commented Nov 11, 2024

What I would imagine is that if we read an invalid u8, we attempt to continue reading u8s until we've parsed a complete UTF-8 codepoint. (We would not attempt to deal with combining characters or anything like that because Rust itself makes it hard.)

And if we fail that...hmm. Maybe we would want to fall back to your API anyway? Or I think there's a standard UTF "unknown character" character we could just sub in.

@tcharding
Copy link
Member

@apoelstra we chatted about this, bump so you can update your position if you want to.

@apoelstra
Copy link
Member

Yeah -- I think we don't want to target this issue at all now for 1.0. If there is some noninvasive non-performance-affecting way to do it, that's great, but I don't believe there is.

Meanwhile, if a user has a hex string with a bad character in it, and wants to learn more specifically what that character is (and where it is, etc), it's not hard to do this without help from the crate. (s.chars_indices().find(|(_, x)| !x.is_ascii()) will do it for example).

@tankyleo
Copy link
Contributor Author

Sounds good, I close this PR.

Whenever appropriate, I let you close #100 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants