Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spellchecker #6305

Closed
wants to merge 23 commits into from
Closed

Add spellchecker #6305

wants to merge 23 commits into from

Conversation

arvaer
Copy link

@arvaer arvaer commented Jan 24, 2024

#6263

This PR targets the following issue: #6263
The main idea is to add spellchecking to the tokio documentation
Through the use of the cargo-spellcheck crate. I've added a .spellcheck.dic
and a .spellcheck.toml, which are configuration settings and a personal dictionary
for the Tokio project.

Motivation

This PR seeks to add two new files: .spellcheck.toml and .spellcheck.dic.
This essentially provides the 'personalized dicitonary' functionality
for Tokio with regards to the crate cargo-spellcheck-- which would normally
catch words that are not in the English Lexicon as incorrect. However,
there are many cases where words are Nonstandard but appropriate in the contxt of
a project.

Solution

The personal dictionary was synthesized, to the best of my abilities, by
using the cargo-spellcheck crate on each file within the tokio project.
There were many situations where Types were flagged for not being properly escaped,
such as the JoinHandle type. While creating the personal dictionary, I made sure not to include
any unescaped types or flags.
Likewise, for any terms that were not in the standard english lexicon, I did my best to categorize
them and preserve them in the dictionary. For example, the word async would normally be flagged.

Moreover, there were certain patterns that I chose to ignore, which are located in the .spellcheck.toml file:

transform_regex = ["^'([^\\s])'$", "^[0-9]+x$", "[0-9]+[n,m,f,p,μ,MB]?s", "(?s)```.+?```"]
  • ^'([^\\s])'$: Matches a single character enclosed in single quotes, ensuring the character is not a whitespace.

  • ^[0-9]+x$: Matches strings that are composed of one or more digits followed by an 'x', typically representing hexadecimal values or multiplication factors.

  • [0-9]+[n,m,f,p,μ,MB]?s: Matches a number followed optionally by units like 'ns', 'ms', 'fs', 'ps', 'μs', 'MBs', and ending with an 's'. This is commonly used for time durations or digital storage units.

  • (?s)```.+?````: Matches a block of text enclosed within triple backticks, often used for code blocks in markdown or documentation. The (?s)part allows the dot.` to match newline characters, enabling multi-line code block matching.

I want to point out the final rule may not be necessary. I added this rule because there were certain files that had code in the documentation which was getting picked up by cargo-spellcheck. I'm unsure of the rootcause for this. However, I found that, in those situations, changing the comment type from /// to //! would stop those code blocks from getting flagged in the output. a file where this happened is: tokio/src/net/windows/named_pipe.rs (around line 2128)

@maminrayej maminrayej added the A-ci Area: The continuous integration setup label Jan 24, 2024
Noticed I had some changes to non-relevant files
that I accidentally commited. These changes
essentially remove those changes, which were
just wrapped a type in backticks in two diff
erent files.
Fixes: tokio-rs#6263
this commit removes an extra file I once again
included in my carelessness. It is a
personal library of syscalls that are referenced
in the documentation. I included this in
the base personal library
Fixes: tokio-rs#6263
@maminrayej maminrayej changed the title Spellchecker 6263 Add spellchecker Jan 25, 2024
@arvaer arvaer marked this pull request as ready for review January 25, 2024 00:00
@maminrayej
Copy link
Member

maminrayej commented Jan 25, 2024

Thanks for the PR and the detailed explanation.

I couldn't pass the spellcheck using the files provided in this PR. Does cargo spellcheck check exit successfully for you?

@arvaer
Copy link
Author

arvaer commented Jan 25, 2024

@maminrayej the command I use to fire off the spell checker is

$cargo-spellcheck in the root directory

There are still lots of mistakes but theyre mostly Types that have been unescaped in the documentation

@arvaer arvaer closed this Jan 27, 2024
@arvaer arvaer deleted the spellchecker_6263 branch January 27, 2024 08:23
@Darksonn
Copy link
Contributor

I'm sorry that we didn't catch it earlier that we had two people working on this. I thought that you were the same person.

I want to thank you for spending your time on contributing to Tokio, even if it didn't work out this time around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ci Area: The continuous integration setup
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants