Skip to content

Commit a88f9ed

Browse files
msfrohwangdongyu.danny
authored and
wangdongyu.danny
committed
Add support for wildcard field type (opensearch-project#13461)
This adds support for the "wildcard" field type that supports efficient execution of wildcard, prefix, and regexp queries by matching first against trigrams (or bigrams or individual characters), then post-filtering by evaluating the original field value against the pattern. --------- Signed-off-by: Michael Froh <froh@amazon.com>
1 parent e83a146 commit a88f9ed

File tree

7 files changed

+1601
-1
lines changed

7 files changed

+1601
-1
lines changed

CHANGELOG.md

+1
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
1414
- [Remote Store] Upload translog checkpoint as object metadata to translog.tlog([#13637](https://github.com/opensearch-project/OpenSearch/pull/13637))
1515
- Add getMetadataFields to MapperService ([#13819](https://github.com/opensearch-project/OpenSearch/pull/13819))
1616
- [Remote State] Add async remote state deletion task running on an interval, configurable by a setting ([#13131](https://github.com/opensearch-project/OpenSearch/pull/13131))
17+
- Add "wildcard" field type that supports efficient wildcard, prefix, and regexp queries ([#13461](https://github.com/opensearch-project/OpenSearch/pull/13461))
1718
- Allow setting query parameters on requests ([#13776](https://github.com/opensearch-project/OpenSearch/issues/13776))
1819
- Add capability to disable source recovery_source for an index ([#13590](https://github.com/opensearch-project/OpenSearch/pull/13590))
1920
- Add remote routing table for remote state publication with experimental feature flag ([#13304](https://github.com/opensearch-project/OpenSearch/pull/13304))
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
setup:
2+
- skip:
3+
version: " - 2.99.99"
4+
reason: "Added in 2.15, but need to skip pre-3.0 before backport"
5+
6+
- do:
7+
indices.create:
8+
index: test
9+
body:
10+
mappings:
11+
properties:
12+
my_field:
13+
type: wildcard
14+
fields:
15+
lower:
16+
type: wildcard
17+
normalizer: lowercase
18+
doc_values:
19+
type: wildcard
20+
doc_values: true
21+
22+
- do:
23+
index:
24+
index: test
25+
id: 1
26+
body:
27+
my_field: "org.opensearch.transport.NodeDisconnectedException: [node_s0][127.0.0.1:39953][disconnected] disconnected"
28+
- do:
29+
index:
30+
index: test
31+
id: 2
32+
body:
33+
my_field: "[2024-06-08T06:31:37,443][INFO ][o.o.c.c.Coordinator ] [node_s2] cluster-manager node [{node_s0}{Nj7FjR7hRP2lh_zur8KN_g}{OTGOoWmmSsWP_RQ3tIKJ9g}{127.0.0.1}{127.0.0.1:39953}{imr}{shard_indexing_pressure_enabled=true}] failed, restarting discovery"
34+
35+
- do:
36+
index:
37+
index: test
38+
id: 3
39+
body:
40+
my_field: "[2024-06-08T06:31:37,451][INFO ][o.o.c.s.ClusterApplierService] [node_s2] cluster-manager node changed {previous [{node_s0}{Nj7FjR7hRP2lh_zur8KN_g}{OTGOoWmmSsWP_RQ3tIKJ9g}{127.0.0.1}{127.0.0.1:39953}{imr}{shard_indexing_pressure_enabled=true}], current []}, term: 1, version: 24, reason: becoming candidate: onLeaderFailure"
41+
- do:
42+
index:
43+
index: test
44+
id: 4
45+
body:
46+
my_field: "[2024-06-08T06:31:37,452][WARN ][o.o.c.NodeConnectionsService] [node_s1] failed to connect to {node_s0}{Nj7FjR7hRP2lh_zur8KN_g}{OTGOoWmmSsWP_RQ3tIKJ9g}{127.0.0.1}{127.0.0.1:39953}{imr}{shard_indexing_pressure_enabled=true} (tried [1] times)"
47+
- do:
48+
index:
49+
index: test
50+
id: 5
51+
body:
52+
my_field: "AbCd"
53+
- do:
54+
index:
55+
index: test
56+
id: 6
57+
body:
58+
other_field: "test"
59+
- do:
60+
indices.refresh: {}
61+
62+
---
63+
"term query matches exact value":
64+
- do:
65+
search:
66+
index: test
67+
body:
68+
query:
69+
term:
70+
my_field: "AbCd"
71+
- match: { hits.total.value: 1 }
72+
- match: { hits.hits.0._id: "5" }
73+
74+
- do:
75+
search:
76+
index: test
77+
body:
78+
query:
79+
term:
80+
my_field.doc_values: "AbCd"
81+
- match: { hits.total.value: 1 }
82+
- match: { hits.hits.0._id: "5" }
83+
84+
---
85+
"term query matches lowercase-normalized value":
86+
- do:
87+
search:
88+
index: test
89+
body:
90+
query:
91+
term:
92+
my_field.lower: "abcd"
93+
- match: { hits.total.value: 1 }
94+
- match: { hits.hits.0._id: "5" }
95+
96+
- do:
97+
search:
98+
index: test
99+
body:
100+
query:
101+
term:
102+
my_field.lower: "ABCD"
103+
- match: { hits.total.value: 1 }
104+
- match: { hits.hits.0._id: "5" }
105+
106+
- do:
107+
search:
108+
index: test
109+
body:
110+
query:
111+
term:
112+
my_field: "abcd"
113+
- match: { hits.total.value: 0 }
114+
115+
---
116+
"wildcard query matches":
117+
- do:
118+
search:
119+
index: test
120+
body:
121+
query:
122+
wildcard:
123+
my_field:
124+
value: "*Node*Exception*"
125+
- match: { hits.total.value: 1 }
126+
- match: { hits.hits.0._id: "1" }
127+
128+
---
129+
"wildcard query matches lowercase-normalized field":
130+
- do:
131+
search:
132+
index: test
133+
body:
134+
query:
135+
wildcard:
136+
my_field.lower:
137+
value: "*node*exception*"
138+
- match: { hits.total.value: 1 }
139+
- match: { hits.hits.0._id: "1" }
140+
141+
- do:
142+
search:
143+
index: test
144+
body:
145+
query:
146+
wildcard:
147+
my_field.lower:
148+
value: "*NODE*EXCEPTION*"
149+
- match: { hits.total.value: 1 }
150+
- match: { hits.hits.0._id: "1" }
151+
152+
- do:
153+
search:
154+
index: test
155+
body:
156+
query:
157+
wildcard:
158+
my_field:
159+
value: "*node*exception*"
160+
- match: { hits.total.value: 0 }
161+
162+
---
163+
"prefix query matches":
164+
- do:
165+
search:
166+
index: test
167+
body:
168+
query:
169+
prefix:
170+
my_field:
171+
value: "[2024-06-08T"
172+
- match: { hits.total.value: 3 }
173+
174+
---
175+
"regexp query matches":
176+
- do:
177+
search:
178+
index: test
179+
body:
180+
query:
181+
regexp:
182+
my_field:
183+
value: ".*06-08.*cluster-manager node.*"
184+
- match: { hits.total.value: 2 }
185+
186+
---
187+
"regexp query matches lowercase-normalized field":
188+
- do:
189+
search:
190+
index: test
191+
body:
192+
query:
193+
regexp:
194+
my_field.lower:
195+
value: ".*06-08.*Cluster-Manager Node.*"
196+
- match: { hits.total.value: 2 }
197+
198+
- do:
199+
search:
200+
index: test
201+
body:
202+
query:
203+
regexp:
204+
my_field:
205+
value: ".*06-08.*Cluster-Manager Node.*"
206+
- match: { hits.total.value: 0 }
207+
208+
---
209+
"wildcard match-all works":
210+
- do:
211+
search:
212+
index: test
213+
body:
214+
query:
215+
wildcard:
216+
my_field:
217+
value: "*"
218+
- match: { hits.total.value: 5 }
219+
---
220+
"regexp match-all works":
221+
- do:
222+
search:
223+
index: test
224+
body:
225+
query:
226+
regexp:
227+
my_field:
228+
value: ".*"
229+
- match: { hits.total.value: 5 }

server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -703,7 +703,7 @@ protected void parseCreateField(ParseContext context) throws IOException {
703703
}
704704
}
705705

706-
private static String normalizeValue(NamedAnalyzer normalizer, String field, String value) throws IOException {
706+
static String normalizeValue(NamedAnalyzer normalizer, String field, String value) throws IOException {
707707
try (TokenStream ts = normalizer.tokenStream(field, value)) {
708708
final CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
709709
ts.reset();

0 commit comments

Comments
 (0)