Skip to content

Commit e62bf1a

Browse files
HUSTERGSgesong.samuel
and
gesong.samuel
authored
Wildcard field use only 3-gram to index (opensearch-project#17349)
* support 3gram wildcard Signed-off-by: gesong.samuel <gesong.samuel@bytedance.com> * add changelog-3 Signed-off-by: gesong.samuel <gesong.samuel@bytedance.com> * add rolling upgrade test for wildcard field Signed-off-by: gesong.samuel <gesong.samuel@bytedance.com> * remove test case added in opensearch-project#16827 Signed-off-by: gesong.samuel <gesong.samuel@bytedance.com> --------- Signed-off-by: gesong.samuel <gesong.samuel@bytedance.com> Co-authored-by: gesong.samuel <gesong.samuel@bytedance.com>
1 parent 91a93da commit e62bf1a

File tree

7 files changed

+715
-130
lines changed

7 files changed

+715
-130
lines changed

CHANGELOG-3.0.md

+1
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
3737
- Stop minimizing automata used for case-insensitive matches ([#17268](https://github.com/opensearch-project/OpenSearch/pull/17268))
3838
- Refactor the `:server` module `org.opensearch.client` to `org.opensearch.transport.client` to eliminate top level split packages for JPMS support ([#17272](https://github.com/opensearch-project/OpenSearch/pull/17272))
3939
- Use Lucene `BM25Similarity` as default since the `LegacyBM25Similarity` is marked as deprecated ([#17306](https://github.com/opensearch-project/OpenSearch/pull/17306))
40+
- Wildcard field index only 3gram of the input data [#17349](https://github.com/opensearch-project/OpenSearch/pull/17349)
4041

4142
### Deprecated
4243

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
# refactored from rest-api-spec/src/main/resources/rest-api-spec/test/search/270_wildcard_fieldtype_queries.yml
2+
---
3+
"search on mixed state":
4+
# "term query matches exact value"
5+
- do:
6+
search:
7+
index: test
8+
body:
9+
query:
10+
term:
11+
my_field: "AbCd"
12+
- match: { hits.total.value: 1 }
13+
- match: { hits.hits.0._id: "5" }
14+
15+
- do:
16+
search:
17+
index: test
18+
body:
19+
query:
20+
term:
21+
my_field.doc_values: "AbCd"
22+
- match: { hits.total.value: 1 }
23+
- match: { hits.hits.0._id: "5" }
24+
25+
# term query matches lowercase-normalized value
26+
- do:
27+
search:
28+
index: test
29+
body:
30+
query:
31+
term:
32+
my_field.lower: "abcd"
33+
- match: { hits.total.value: 2 }
34+
- match: { hits.hits.0._id: "5" }
35+
- match: { hits.hits.1._id: "7" }
36+
37+
- do:
38+
search:
39+
index: test
40+
body:
41+
query:
42+
term:
43+
my_field.lower: "ABCD"
44+
- match: { hits.total.value: 2 }
45+
- match: { hits.hits.0._id: "5" }
46+
- match: { hits.hits.1._id: "7" }
47+
48+
- do:
49+
search:
50+
index: test
51+
body:
52+
query:
53+
term:
54+
my_field: "abcd"
55+
- match: { hits.total.value: 0 }
56+
57+
# wildcard query matches
58+
- do:
59+
search:
60+
index: test
61+
body:
62+
query:
63+
wildcard:
64+
my_field:
65+
value: "*Node*Exception*"
66+
- match: { hits.total.value: 1 }
67+
- match: { hits.hits.0._id: "1" }
68+
69+
# wildcard query matches lowercase-normalized field
70+
- do:
71+
search:
72+
index: test
73+
body:
74+
query:
75+
wildcard:
76+
my_field.lower:
77+
value: "*node*exception*"
78+
- match: { hits.total.value: 1 }
79+
- match: { hits.hits.0._id: "1" }
80+
81+
- do:
82+
search:
83+
index: test
84+
body:
85+
query:
86+
wildcard:
87+
my_field.lower:
88+
value: "*NODE*EXCEPTION*"
89+
- match: { hits.total.value: 1 }
90+
- match: { hits.hits.0._id: "1" }
91+
92+
- do:
93+
search:
94+
index: test
95+
body:
96+
query:
97+
wildcard:
98+
my_field:
99+
value: "*node*exception*"
100+
- match: { hits.total.value: 0 }
101+
102+
# prefix query matches
103+
- do:
104+
search:
105+
index: test
106+
body:
107+
query:
108+
prefix:
109+
my_field:
110+
value: "[2024-06-08T"
111+
- match: { hits.total.value: 3 }
112+
113+
# regexp query matches
114+
- do:
115+
search:
116+
index: test
117+
body:
118+
query:
119+
regexp:
120+
my_field:
121+
value: ".*06-08.*cluster-manager node.*"
122+
- match: { hits.total.value: 2 }
123+
124+
# regexp query matches lowercase-normalized field
125+
- do:
126+
search:
127+
index: test
128+
body:
129+
query:
130+
regexp:
131+
my_field.lower:
132+
value: ".*06-08.*Cluster-Manager Node.*"
133+
- match: { hits.total.value: 2 }
134+
135+
- do:
136+
search:
137+
index: test
138+
body:
139+
query:
140+
regexp:
141+
my_field:
142+
value: ".*06-08.*Cluster-Manager Node.*"
143+
- match: { hits.total.value: 0 }
144+
145+
# wildcard match-all works
146+
- do:
147+
search:
148+
index: test
149+
body:
150+
query:
151+
wildcard:
152+
my_field:
153+
value: "*"
154+
- match: { hits.total.value: 6 }
155+
156+
# regexp match-all works
157+
- do:
158+
search:
159+
index: test
160+
body:
161+
query:
162+
regexp:
163+
my_field:
164+
value: ".*"
165+
- match: { hits.total.value: 6 }
166+
167+
# terms query on wildcard field matches
168+
- do:
169+
search:
170+
index: test
171+
body:
172+
query:
173+
terms: { my_field: [ "AbCd" ] }
174+
- match: { hits.total.value: 1 }
175+
- match: { hits.hits.0._id: "5" }
176+
177+
# case insensitive query on wildcard field
178+
- do:
179+
search:
180+
index: test
181+
body:
182+
query:
183+
wildcard:
184+
my_field:
185+
value: "AbCd"
186+
- match: { hits.total.value: 1 }
187+
- match: { hits.hits.0._id: "5" }
188+
189+
- do:
190+
search:
191+
index: test
192+
body:
193+
query:
194+
wildcard:
195+
my_field:
196+
value: "AbCd"
197+
case_insensitive: true
198+
- match: { hits.total.value: 2 }
199+
- match: { hits.hits.0._id: "5" }
200+
- match: { hits.hits.1._id: "7" }

0 commit comments

Comments
 (0)