Skip to content

Commit

Permalink
updating parameter table
Browse files Browse the repository at this point in the history
Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
  • Loading branch information
AntonEliatra committed Oct 16, 2024
1 parent df93687 commit a331379
Showing 1 changed file with 8 additions and 13 deletions.
21 changes: 8 additions & 13 deletions _analyzers/tokenizers/ngram.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,19 +69,14 @@ The response contains the generated tokens:

## Configuration

The `ngram` tokenizer can be configured with the following parameters:

- `min_gram`: minimum length of n-grams. Default is `1`. (Integer, _Optional_)
- `max_gram`: maximum length of n-grams. Default is `2`. (Integer, _Optional_)
- `token_chars`: character classes to be included in tokenization. The following are the possible options:
- `letter`
- `digit`
- `whitespace`
- `punctuation`
- `symbol`
- `custom` (Parameter `custom_token_chars` needs to also be configured in this case)
Default is empty list (`[]`) which retains all the characters (List of strings, _Optional_)
- `custom_token_chars`: custom characters that will be included as part of the tokens. (String, _Optional_)
The `ngram` tokenizer can be configured with the following parameters.

Parameter | Required/Optional | Data type | Description
:--- | :--- | :--- | :---
`min_gram` | Optional | Integer | Minimum length of n-grams. Default is `1`.
`max_gram` | Optional | Integer | Maximum length of n-grams. Default is `2`.
`token_chars` | Optional | List of strings | Character classes to be included in tokenization. The following are the possible options:<br>- `letter`<br>- `digit`<br>- `whitespace`<br>- `punctuation`<br>- `symbol`<br>- `custom` (Parameter `custom_token_chars` needs to also be configured in this case)<br>Default is empty list (`[]`) which retains all the characters
`custom_token_chars` | Optional | String | Custom characters that will be included as part of the tokens.

### Maximum difference between `min_gram` and `max_gram`

Expand Down

0 comments on commit a331379

Please sign in to comment.