Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve geocoder matches for numeric adjectives #5997

Merged
merged 11 commits into from
Aug 9, 2024

Conversation

leonardehrenfried
Copy link
Member

@leonardehrenfried leonardehrenfried commented Aug 6, 2024

Summary

This PR improves the sandbox geocoder by shortening the minimal n-gram to 3 from 4.

This is so that the stop name of "Meridian Ave N & N 148th St" is matched when the users search for "Meridian Ave N & N 148". Previously "148th" would only be tokenised to "148t" but not "148".

This also changes one of the regression test cases: When you search for "arts" you now also get "arthur place" which before you didn't. @miles-grant-ibigroup said that this is fine.

Unit tests

Added.

Copy link

codecov bot commented Aug 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.73%. Comparing base (14e7b3a) to head (bf02dae).
Report is 32 commits behind head on dev-2.x.

Additional details and impacted files
@@             Coverage Diff             @@
##             dev-2.x    #5997    +/-   ##
===========================================
  Coverage      69.73%   69.73%            
- Complexity     17297    17317    +20     
===========================================
  Files           1954     1960     +6     
  Lines          74160    74263   +103     
  Branches        7595     7603     +8     
===========================================
+ Hits           51717    51791    +74     
- Misses         19806    19832    +26     
- Partials        2637     2640     +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


@ParameterizedTest
@ValueSource(
strings = { "Meridian Ave N & N 148th", "Meridian Ave N & N 148", "Meridian Ave N N 148" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is this supposed to work with "Meridian & N 148"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to check this, but the answer is yes!

I wrote a few more test cases to make sure the it's really the case.

@leonardehrenfried
Copy link
Member Author

Today I had an idea how I can solve this problem even better: I am now stripping the number suffixes like "th" from the numbers so that "148th" becomes just "148".

This is really effective at matching numbers (even one or two digit ones) without having to become too fuzzy for regular text search.

This means that "arts" no longer matches "Arthur Place" so a good solution all around.

@leonardehrenfried leonardehrenfried changed the title Improve fuzziness of geocoder Improve geocoder matches for number suffixes Aug 7, 2024
Copy link
Contributor

@binh-dam-ibigroup binh-dam-ibigroup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well in the Atlanta Midtown area.

@leonardehrenfried leonardehrenfried changed the title Improve geocoder matches for number suffixes Improve geocoder matches for numeric adjectives Aug 9, 2024
@leonardehrenfried leonardehrenfried merged commit ebdb572 into opentripplanner:dev-2.x Aug 9, 2024
5 checks passed
@leonardehrenfried leonardehrenfried deleted the th-geocoder branch August 9, 2024 13:03
habrahamsson-skanetrafiken pushed a commit to Skanetrafiken/OpenTripPlanner that referenced this pull request Aug 21, 2024
Improve geocoder matches for numeric adjectives
@t2gran t2gran added this to the 2.6 milestone Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants