Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use conda-env-lock #211

Closed
wants to merge 2 commits into from
Closed

Use conda-env-lock #211

wants to merge 2 commits into from

Conversation

jonashaag
Copy link
Contributor

@jonashaag jonashaag commented Feb 14, 2024

I want to make conda-env-lock open source and need an open-source example repo. How about datajudge?

Copy link

codecov bot commented Feb 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (41c884a) 93.76% compared to head (c07051a) 92.76%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #211      +/-   ##
==========================================
- Coverage   93.76%   92.76%   -1.01%     
==========================================
  Files          18       18              
  Lines        1894     1894              
==========================================
- Hits         1776     1757      -19     
- Misses        118      137      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jonashaag jonashaag marked this pull request as ready for review February 14, 2024 16:15
@ivergara ivergara added the ready label Feb 14, 2024
@ivergara
Copy link
Collaborator

@jonashaag Thank you for the work!

@jonashaag
Copy link
Contributor Author

Hmm, two problems:

  • ibm_db is not compatible with 3.12. We could go back to 3.10
  • There are bunch of extra dependencies for the DB-specific CI steps. Those aren't part of the environment.yml that's used by conda-env-lock. Conda-env-lock also doesn't currently have any feature to add additional dependencies (nor is that planned). While adding the extra dependencies in the workflow works for most environments, it does not work for Snowflake due to a conflict with the pre-solved and locked dependencies. Maybe conda-env-lock is not a good tool for this CI pipeline.

@ivergara
Copy link
Collaborator

Hmm, two problems:

* ibm_db is not compatible with 3.12. We could go back to 3.10

You upgraded the CI to 3.12 :P

* There are bunch of extra dependencies for the DB-specific CI steps. Those aren't part of the `environment.yml` that's used by conda-env-lock. Conda-env-lock also doesn't currently have any feature to add additional dependencies (nor is that planned). While adding the extra dependencies in the workflow works for most environments, it does not work for Snowflake due to a conflict with the pre-solved and locked dependencies. Maybe conda-env-lock is not a good tool for this CI pipeline.

Yes, it's a somewhat complex CI. From my point of view, I Guess we could create a different environment for each DB specific CI job. From my understanding of conda-env-lock we should be able to do multiple lockfiles.

@jonashaag
Copy link
Contributor Author

jonashaag commented Feb 15, 2024

should be able to do multiple lockfiles

Yes but you'd need multiple environment.yml files :(

Actually something I'm happy to add to conda-env-lock is to use multiple environment.yml files as input, so you could have a file that contains only ibm_db as a dependency

@ivergara
Copy link
Collaborator

should be able to do multiple lockfiles

Yes but you'd need multiple environment.yml files :(

That compared to the situation where we magically add dependencies at runtime for testing in CI sounds like a fair tradeoff for reproducibility and sane debugging.

Actually something I'm happy to add to conda-env-lock is to use multiple environment.yml files as input, so you could have a file that contains only ibm_db as a dependency

🚀

@jonashaag
Copy link
Contributor Author

I started implementing this but not sure if we actually want that in conda-env-lock: https://github.com/Quantco/conda-env-lock/issues/109

Maybe we should just auto-generate the derivative environment.ymls in this repo via a pre-commit hook.

@jonashaag
Copy link
Contributor Author

I'm giving up on this for now. Datajudge's CI uses a lot of environments:

  • 6x unit test
  • 12x Postgres
  • 3x DB2
  • 6x MSSQL
  • 1x Snowflake, Impala, BigQuery

In conda-env-lock since there is no "templating" of lock file definitions, that would require 30 [[tool.conda-env-lock.variants]] sections. Which would be a huge mess. And another 30 sections if we wanted to add macOS lock files for local development.

IMO not a good fit for conda-env-lock at the moment.

@jonashaag jonashaag marked this pull request as draft February 15, 2024 13:16
@kklein
Copy link
Collaborator

kklein commented Feb 15, 2024

@jonashaag Thanks for looking into this!

I think we're very open to suggestions if you have some ideas on how to streamline our CI environment setup in the future.

@jonashaag
Copy link
Contributor Author

I don't have any good ideas, you're not doing anything wrong. The only thing that would simplify things is to test fewer combinations. For example I wonder if it doesn't suffice to test "oldest SA and Python" and "newest SA and Python" as opposed to all combinations. OTOH machine labor is cheap and human labor is expensive so maybe it makes sense to keep all those tests

@kklein
Copy link
Collaborator

kklein commented Mar 4, 2024

@jonashaag I will take the liberty to close this for now - feel free to reopen at any point in time. :)

@kklein kklein closed this Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants