-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve geocoder matches for numeric adjectives #5997
Improve geocoder matches for numeric adjectives #5997
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev-2.x #5997 +/- ##
===========================================
Coverage 69.73% 69.73%
- Complexity 17297 17317 +20
===========================================
Files 1954 1960 +6
Lines 74160 74263 +103
Branches 7595 7603 +8
===========================================
+ Hits 51717 51791 +74
- Misses 19806 19832 +26
- Partials 2637 2640 +3 ☔ View full report in Codecov by Sentry. |
acf9ada
to
24975b4
Compare
|
||
@ParameterizedTest | ||
@ValueSource( | ||
strings = { "Meridian Ave N & N 148th", "Meridian Ave N & N 148", "Meridian Ave N N 148" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, is this supposed to work with "Meridian & N 148"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to check this, but the answer is yes!
I wrote a few more test cases to make sure the it's really the case.
8254797
to
e471aaa
Compare
dc686f9
to
45d6b68
Compare
45d6b68
to
edea05e
Compare
Today I had an idea how I can solve this problem even better: I am now stripping the number suffixes like "th" from the numbers so that "148th" becomes just "148". This is really effective at matching numbers (even one or two digit ones) without having to become too fuzzy for regular text search. This means that "arts" no longer matches "Arthur Place" so a good solution all around. |
70f99d4
to
f0a2c6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works well in the Atlanta Midtown area.
src/ext/java/org/opentripplanner/ext/geocoder/EnglishNGramAnalyzer.java
Outdated
Show resolved
Hide resolved
a8573c7
to
bf02dae
Compare
Improve geocoder matches for numeric adjectives
Summary
This PR improves the sandbox geocoder by shortening the minimal n-gram to 3 from 4.
This is so that the stop name of "Meridian Ave N & N 148th St" is matched when the users search for "Meridian Ave N & N 148". Previously "148th" would only be tokenised to "148t" but not "148".
This also changes one of the regression test cases: When you search for "arts" you now also get "arthur place" which before you didn't. @miles-grant-ibigroup said that this is fine.Unit tests
Added.