Skip to content

Commit 7bd2b33

Browse files
committed
Add semantic field mapper to simplify neural search set up.
Signed-off-by: Bo Zhang <bzhangam@amazon.com>
1 parent 57124dd commit 7bd2b33

20 files changed

+1549
-13
lines changed

CHANGELOG.md

+1
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
55

66
## [Unreleased 3.x](https://github.com/opensearch-project/neural-search/compare/main...HEAD)
77
### Features
8+
- Support semantic field type to simplify neural search set up([#1225](https://github.com/opensearch-project/neural-search/pull/1225)).
89
- Lower bound for min-max normalization technique in hybrid query ([#1195](https://github.com/opensearch-project/neural-search/pull/1195))
910
### Enhancements
1011
### Bug Fixes

DEVELOPER_GUIDE.md

+12-12
Original file line numberDiff line numberDiff line change
@@ -351,9 +351,9 @@ through the same build issue.
351351

352352
### Class and package names
353353

354-
Class names should use `CamelCase`.
354+
Class names should use `CamelCase`.
355355

356-
Try to put new classes into existing packages if package name abstracts the purpose of the class.
356+
Try to put new classes into existing packages if package name abstracts the purpose of the class.
357357

358358
Example of good class file name and package utilization:
359359

@@ -371,7 +371,7 @@ methods rather than a long single one and does everything.
371371
### Documentation
372372

373373
Document you code. That includes purpose of new classes, every public method and code sections that have critical or non-trivial
374-
logic (check this example https://github.com/opensearch-project/neural-search/blob/main/src/main/java/org/opensearch/neuralsearch/query/NeuralQueryBuilder.java#L238).
374+
logic (check this example https://github.com/opensearch-project/neural-search/blob/main/src/main/java/org/opensearch/neuralsearch/query/NeuralQueryBuilder.java#L238).
375375

376376
When you submit a feature PR, please submit a new
377377
[documentation issue](https://github.com/opensearch-project/documentation-website/issues/new/choose). This is a path for the documentation to be published as part of https://opensearch.org/docs/latest/ documentation site.
@@ -384,17 +384,17 @@ For the most part, we're using common conventions for Java projects. Here are a
384384

385385
1. Use descriptive names for classes, methods, fields, and variables.
386386
2. Avoid abbreviations unless they are widely accepted
387-
3. Use `final` on all method arguments unless it's absolutely necessary
387+
3. Use `final` on all method arguments unless it's absolutely necessary
388388
4. Wildcard imports are not allowed.
389389
5. Static imports are preferred over qualified imports when using static methods
390390
6. Prefer creating non-static public methods whenever possible. Avoid static methods in general, as they can often serve as shortcuts.
391391
Static methods are acceptable if they are private and do not access class state.
392-
7. Use functional programming style inside methods unless it's a performance critical section.
392+
7. Use functional programming style inside methods unless it's a performance critical section.
393393
8. For parameters of lambda expression please use meaningful names instead of shorten cryptic ones.
394394
9. Use Optional for return values if the value may not be present. This should be preferred to returning null.
395395
10. Do not create checked exceptions, and do not throw checked exceptions from public methods whenever possible. In general, if you call a method with a checked exception, you should wrap that exception into an unchecked exception.
396396
11. Throwing checked exceptions from private methods is acceptable.
397-
12. Use String.format when a string includes parameters, and prefer this over direct string concatenation. Always specify a Locale with String.format;
397+
12. Use String.format when a string includes parameters, and prefer this over direct string concatenation. Always specify a Locale with String.format;
398398
as a rule of thumb, use Locale.ROOT.
399399
13. Prefer Lombok annotations to the manually written boilerplate code
400400
14. When throwing an exception, avoid including user-provided content in the exception message. For secure coding practices,
@@ -440,17 +440,17 @@ Fix any new warnings before submitting your PR to ensure proper code documentati
440440

441441
### Tests
442442

443-
Write unit and integration tests for your new functionality.
443+
Write unit and integration tests for your new functionality.
444444

445445
Unit tests are preferred as they are cheap and fast, try to use them to cover all possible
446-
combinations of parameters. Utilize mocks to mimic dependencies.
446+
combinations of parameters. Utilize mocks to mimic dependencies.
447447

448-
Integration tests should be used sparingly, focusing primarily on the main (happy path) scenario or cases where extensive
449-
mocking is impractical. Include one or two unhappy paths to confirm that correct response codes are returned to the user.
450-
Whenever possible, favor scenarios that do not require model deployment. If model deployment is necessary, use an existing
448+
Integration tests should be used sparingly, focusing primarily on the main (happy path) scenario or cases where extensive
449+
mocking is impractical. Include one or two unhappy paths to confirm that correct response codes are returned to the user.
450+
Whenever possible, favor scenarios that do not require model deployment. If model deployment is necessary, use an existing
451451
model, as tests involving new model deployments are the most resource-intensive.
452452

453-
If your changes could affect backward compatibility, please include relevant backward compatibility tests along with your
453+
If your changes could affect backward compatibility, please include relevant backward compatibility tests along with your
454454
PR. For guidance on adding these tests, refer to the [Backwards Compatibility Testing](#backwards-compatibility-testing) section in this guide.
455455

456456
### Outdated or irrelevant code

build.gradle

+1
Original file line numberDiff line numberDiff line change
@@ -250,6 +250,7 @@ def knnJarDirectory = "$buildDir/dependencies/opensearch-knn"
250250

251251
dependencies {
252252
api "org.opensearch:opensearch:${opensearch_version}"
253+
implementation group: 'org.opensearch.plugin', name:'mapper-extras-client', version: "${opensearch_version}"
253254
zipArchive group: 'org.opensearch.plugin', name:'opensearch-job-scheduler', version: "${opensearch_build}"
254255
zipArchive group: 'org.opensearch.plugin', name:'opensearch-knn', version: "${opensearch_build}"
255256
zipArchive group: 'org.opensearch.plugin', name:'opensearch-ml-plugin', version: "${opensearch_build}"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
/*
2+
* Copyright OpenSearch Contributors
3+
* SPDX-License-Identifier: Apache-2.0
4+
*/
5+
package org.opensearch.neuralsearch.constants;
6+
7+
/**
8+
* Constants related to the index mapping.
9+
*/
10+
public class MappingConstants {
11+
/**
12+
* Name for the field type. In index mapping we use this key to define the field type.
13+
*/
14+
public static final String TYPE = "type";
15+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
/*
2+
* Copyright OpenSearch Contributors
3+
* SPDX-License-Identifier: Apache-2.0
4+
*/
5+
package org.opensearch.neuralsearch.constants;
6+
7+
/**
8+
* Constants for semantic field
9+
*/
10+
public class SemanticFieldConstants {
11+
/**
12+
* Name of the model id parameter. We use this key to define the id of the ML model that we will use for the
13+
* semantic field.
14+
*/
15+
public static final String MODEL_ID = "model_id";
16+
17+
/**
18+
* Name of the search model id parameter. We use this key to define the id of the ML model that we will use to
19+
* inference the query text during the search. If this parameter is not defined we will use the model_id instead.
20+
*/
21+
public static final String SEARCH_MODEL_ID = "search_model_id";
22+
23+
/**
24+
* Name of the raw field type parameter. We use this key to define the field type for the raw data. It will control
25+
* how to store and query the raw data.
26+
*/
27+
public static final String RAW_FIELD_TYPE = "raw_field_type";
28+
29+
/**
30+
* Name of the raw field type parameter. We use this key to define a custom field name for the semantic info.
31+
*/
32+
public static final String SEMANTIC_INFO_FIELD_NAME = "semantic_info_field_name";
33+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,267 @@
1+
/*
2+
* Copyright OpenSearch Contributors
3+
* SPDX-License-Identifier: Apache-2.0
4+
*/
5+
package org.opensearch.neuralsearch.mapper;
6+
7+
import lombok.Getter;
8+
import lombok.Setter;
9+
import org.opensearch.core.xcontent.XContentBuilder;
10+
import org.opensearch.index.mapper.BinaryFieldMapper;
11+
import org.opensearch.index.mapper.KeywordFieldMapper;
12+
import org.opensearch.index.mapper.MappedFieldType;
13+
import org.opensearch.index.mapper.Mapper;
14+
import org.opensearch.index.mapper.MapperParsingException;
15+
import org.opensearch.index.mapper.MatchOnlyTextFieldMapper;
16+
import org.opensearch.index.mapper.ParametrizedFieldMapper;
17+
import org.opensearch.index.mapper.ParseContext;
18+
import org.opensearch.index.mapper.TextFieldMapper;
19+
import org.opensearch.index.mapper.TokenCountFieldMapper;
20+
import org.opensearch.index.mapper.WildcardFieldMapper;
21+
import org.opensearch.neuralsearch.constants.MappingConstants;
22+
import org.opensearch.neuralsearch.mapper.semanticfieldtypes.SemanticFieldTypeFactory;
23+
import org.opensearch.neuralsearch.mapper.semanticfieldtypes.SemanticParameters;
24+
25+
import java.io.IOException;
26+
import java.util.HashMap;
27+
import java.util.List;
28+
import java.util.Map;
29+
import java.util.Set;
30+
31+
import static org.opensearch.neuralsearch.constants.SemanticFieldConstants.MODEL_ID;
32+
import static org.opensearch.neuralsearch.constants.SemanticFieldConstants.RAW_FIELD_TYPE;
33+
import static org.opensearch.neuralsearch.constants.SemanticFieldConstants.SEARCH_MODEL_ID;
34+
import static org.opensearch.neuralsearch.constants.SemanticFieldConstants.SEMANTIC_INFO_FIELD_NAME;
35+
36+
/**
37+
* FieldMapper for the semantic field. It will hold a delegate field mapper to delegate the data parsing and query work
38+
* based on the raw_field_type.
39+
*/
40+
public class SemanticFieldMapper extends ParametrizedFieldMapper {
41+
public static final String CONTENT_TYPE = "semantic";
42+
private final SemanticParameters semanticParameters;
43+
44+
@Setter
45+
@Getter
46+
private ParametrizedFieldMapper delegateFieldMapper;
47+
48+
protected SemanticFieldMapper(
49+
String simpleName,
50+
MappedFieldType mappedFieldType,
51+
MultiFields multiFields,
52+
CopyTo copyTo,
53+
ParametrizedFieldMapper delegateFieldMapper,
54+
SemanticParameters semanticParameters
55+
) {
56+
super(simpleName, mappedFieldType, multiFields, copyTo);
57+
this.delegateFieldMapper = delegateFieldMapper;
58+
this.semanticParameters = semanticParameters;
59+
}
60+
61+
@Override
62+
public Builder getMergeBuilder() {
63+
Builder semanticFieldMapperBuilder = (Builder) new Builder(simpleName(), SemanticFieldTypeFactory.getInstance()).init(this);
64+
ParametrizedFieldMapper.Builder delegateBuilder = delegateFieldMapper.getMergeBuilder();
65+
semanticFieldMapperBuilder.setDelegateBuilder(delegateBuilder);
66+
return semanticFieldMapperBuilder;
67+
}
68+
69+
@Override
70+
public final ParametrizedFieldMapper merge(Mapper mergeWith) {
71+
if (mergeWith instanceof SemanticFieldMapper) {
72+
try {
73+
delegateFieldMapper = delegateFieldMapper.merge(((SemanticFieldMapper) mergeWith).delegateFieldMapper);
74+
} catch (IllegalArgumentException e) {
75+
String err = "Failed to update the mapper ["
76+
+ this.name()
77+
+ "] because failed to update the delegate "
78+
+ "mapper for the raw_field_type "
79+
+ this.semanticParameters.getRawFieldType()
80+
+ ". "
81+
+ e.getMessage();
82+
throw new IllegalArgumentException(err, e);
83+
}
84+
}
85+
return super.merge(mergeWith);
86+
}
87+
88+
@Override
89+
protected void parseCreateField(ParseContext context) throws IOException {
90+
delegateFieldMapper.parse(context);
91+
}
92+
93+
@Override
94+
protected String contentType() {
95+
return CONTENT_TYPE;
96+
}
97+
98+
public static class Builder extends ParametrizedFieldMapper.Builder {
99+
@Getter
100+
protected final Parameter<String> modelId = Parameter.stringParam(
101+
MODEL_ID,
102+
true,
103+
m -> ((SemanticFieldMapper) m).semanticParameters.getModelId(),
104+
null
105+
);
106+
@Getter
107+
protected final Parameter<String> searchModelId = Parameter.stringParam(
108+
SEARCH_MODEL_ID,
109+
true,
110+
m -> ((SemanticFieldMapper) m).semanticParameters.getSearchModelId(),
111+
null
112+
);
113+
@Getter
114+
protected final Parameter<String> rawFieldType = Parameter.stringParam(
115+
RAW_FIELD_TYPE,
116+
false,
117+
m -> ((SemanticFieldMapper) m).semanticParameters.getRawFieldType(),
118+
TextFieldMapper.CONTENT_TYPE
119+
);
120+
@Getter
121+
protected final Parameter<String> semanticInfoFieldName = Parameter.stringParam(
122+
SEMANTIC_INFO_FIELD_NAME,
123+
false,
124+
m -> ((SemanticFieldMapper) m).semanticParameters.getSemanticInfoFieldName(),
125+
null
126+
);
127+
128+
@Setter
129+
protected ParametrizedFieldMapper.Builder delegateBuilder;
130+
private final SemanticFieldTypeFactory semanticFieldTypeFactory;
131+
132+
protected Builder(String name, SemanticFieldTypeFactory semanticFieldTypeFactory) {
133+
super(name);
134+
this.semanticFieldTypeFactory = semanticFieldTypeFactory;
135+
}
136+
137+
@Override
138+
protected List<Parameter<?>> getParameters() {
139+
return List.of(modelId, searchModelId, rawFieldType, semanticInfoFieldName);
140+
}
141+
142+
@Override
143+
public SemanticFieldMapper build(BuilderContext context) {
144+
final ParametrizedFieldMapper delegateMapper = delegateBuilder.build(context);
145+
146+
final SemanticParameters semanticParameters = this.getSemanticParameters();
147+
final MappedFieldType semanticFieldType = semanticFieldTypeFactory.createSemanticFieldType(
148+
delegateMapper,
149+
rawFieldType.getValue(),
150+
semanticParameters
151+
);
152+
153+
return new SemanticFieldMapper(
154+
name,
155+
semanticFieldType,
156+
multiFieldsBuilder.build(this, context),
157+
copyTo.build(),
158+
delegateMapper,
159+
semanticParameters
160+
);
161+
}
162+
163+
public SemanticParameters getSemanticParameters() {
164+
return new SemanticParameters(
165+
modelId.getValue(),
166+
searchModelId.getValue(),
167+
rawFieldType.getValue(),
168+
semanticInfoFieldName.getValue()
169+
);
170+
}
171+
}
172+
173+
public static class TypeParser implements Mapper.TypeParser {
174+
175+
private final static Set<String> SUPPORTED_RAW_FIELD_TYPE = Set.of(
176+
TextFieldMapper.CONTENT_TYPE,
177+
KeywordFieldMapper.CONTENT_TYPE,
178+
MatchOnlyTextFieldMapper.CONTENT_TYPE,
179+
WildcardFieldMapper.CONTENT_TYPE,
180+
TokenCountFieldMapper.CONTENT_TYPE,
181+
BinaryFieldMapper.CONTENT_TYPE
182+
);
183+
184+
@Override
185+
public Builder parse(String name, Map<String, Object> node, ParserContext parserContext) throws MapperParsingException {
186+
final String rawFieldType = (String) node.getOrDefault(RAW_FIELD_TYPE, TextFieldMapper.CONTENT_TYPE);
187+
188+
validateRawFieldType(rawFieldType);
189+
190+
final ParametrizedFieldMapper.TypeParser typeParser = (ParametrizedFieldMapper.TypeParser) parserContext.typeParser(
191+
rawFieldType
192+
);
193+
final Builder semanticFieldMapperBuilder = new Builder(name, SemanticFieldTypeFactory.getInstance());
194+
195+
// semantic field mapper builder parse semantic fields
196+
Map<String, Object> semanticConfig = extractSemanticConfig(node, semanticFieldMapperBuilder.getParameters(), rawFieldType);
197+
semanticFieldMapperBuilder.parse(name, parserContext, semanticConfig);
198+
199+
// delegate field mapper builder parse remaining fields
200+
ParametrizedFieldMapper.Builder delegateBuilder = typeParser.parse(name, node, parserContext);
201+
semanticFieldMapperBuilder.setDelegateBuilder(delegateBuilder);
202+
203+
return semanticFieldMapperBuilder;
204+
}
205+
206+
private void validateRawFieldType(final String rawFieldType) {
207+
if (rawFieldType == null || !SUPPORTED_RAW_FIELD_TYPE.contains(rawFieldType)) {
208+
throw new IllegalArgumentException(
209+
RAW_FIELD_TYPE
210+
+ ": ["
211+
+ rawFieldType
212+
+ "] is not supported. It "
213+
+ "should be one of ["
214+
+ String.join(", ", SUPPORTED_RAW_FIELD_TYPE)
215+
+ "]"
216+
);
217+
}
218+
}
219+
220+
/**
221+
* In this function we will extract all the parameters defined in the semantic field mapper builder and parse it
222+
* later. The remaining parameters will be processed by the type parser of the raw field type. Here we cannot
223+
* pass the parameters defined by semantic field to the delegate type parser of the raw field type because it
224+
* cannot recognize them.
225+
* @param node field config
226+
* @param parameters parameters for semantic field
227+
* @param rawFieldType field type of the raw data
228+
* @return semantic field config
229+
*/
230+
private Map<String, Object> extractSemanticConfig(Map<String, Object> node, List<Parameter<?>> parameters, String rawFieldType) {
231+
final Map<String, Object> semanticConfig = new HashMap<>();
232+
for (Parameter<?> parameter : parameters) {
233+
Object config = node.get(parameter.name);
234+
if (config != null) {
235+
semanticConfig.put(parameter.name, config);
236+
node.remove(parameter.name);
237+
}
238+
}
239+
semanticConfig.put(MappingConstants.TYPE, SemanticFieldMapper.CONTENT_TYPE);
240+
node.put(MappingConstants.TYPE, rawFieldType);
241+
return semanticConfig;
242+
}
243+
}
244+
245+
@Override
246+
protected void doXContentBody(XContentBuilder builder, boolean includeDefaults, Params params) throws IOException {
247+
builder.field(MappingConstants.TYPE, contentType());
248+
249+
// semantic parameters
250+
final List<Parameter<?>> parameters = getMergeBuilder().getParameters();
251+
for (Parameter<?> parameter : parameters) {
252+
// By default, we will not return the default value. But raw_field_type is useful info to let users know how
253+
// we will handle the raw data. So we explicitly return it even it is using the default value.
254+
if (RAW_FIELD_TYPE.equals(parameter.name)) {
255+
parameter.toXContent(builder, true);
256+
} else {
257+
parameter.toXContent(builder, includeDefaults);
258+
}
259+
}
260+
261+
// non-semantic parameters
262+
// semantic field mapper itself does not handle multi fields or copy to. The delegate field mapper will handle it.
263+
delegateFieldMapper.multiFields().toXContent(builder, params);
264+
delegateFieldMapper.copyTo().toXContent(builder, params);
265+
delegateFieldMapper.getMergeBuilder().toXContent(builder, includeDefaults);
266+
}
267+
}

0 commit comments

Comments
 (0)