Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rolling mean explainer and update README #109

Merged
merged 6 commits into from
Feb 26, 2025

Conversation

dfsnow
Copy link
Member

@dfsnow dfsnow commented Feb 26, 2025

This PR tidies up the README and finalizes it for the 2025 model. It primarily adds a short section explaining the construction and purpose of the rolling mean feature.

Closes #105.

@dfsnow dfsnow marked this pull request as ready for review February 26, 2025 17:52
Copy link
Contributor

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

We leverage the qualities above to produce a leave-one-out, time-weighted, rolling average sale price for
each building/sale. In layman's terms, we take the average of the sales in the same building from the prior 5 years, _excluding the current sale_. Here's what the rolling windows look like for sales in the training data, where the last row represents the actual assessment scenario (where the "sale" occurs on the lien date):

![](./docs/figures/rolling_mean.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Thought, non-blocking] One potential issue with this diagram is that it doesn't illustrate the time weighting. Not necessarily a problem as long as readers understand that the diagram is ignoring time weighting, but there's potential for misunderstanding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to exclude the time weights here, lest it become even more confusing.

Comment on lines +70 to +75
Some additional technical notes on this feature:

- The time weights used to weight sales are _global_, rather than building-specific. They follow a simple logistic curve centered 3 years before the most recent sale i.e. sales close to the lien date are weighted most heavily.
- The feature is calculated using sales from the _entire range_ of the training data. This means that the test set version of the feature has seen training data sales from the same building, but not the sale being predicted. We contend that this is not data leakage, as it mirrors the real-world, production scenario where the models sees all sales in the building from the five years prior to the lien date.
- We contend that this feature is _not_ sales chasing (in the IAAO sense) because it excludes the _current_ sale from the average. This means that the model is not using the current sale to predict itself, but rather using the _prior_ sales to predict the current sale.
- The average excludes outlier sales and sales of non-livable units.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion, non-blocking] Should we mention here that PIN10s don't always correspond exactly to buildings, although we use the term "building" for convenience?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be too deep in the weeds even for this section.

Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>
@dfsnow dfsnow merged commit 08da37d into 2025-assessment-year Feb 26, 2025
4 checks passed
@dfsnow dfsnow deleted the dfsnow/tidy-2025 branch February 26, 2025 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add explanation of building means feature to README
2 participants