Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: priority on environment variable vs compiling time value #1225

Closed
TommyCpp opened this issue Aug 23, 2023 · 26 comments · Fixed by #1323
Closed

Discussion: priority on environment variable vs compiling time value #1225

TommyCpp opened this issue Aug 23, 2023 · 26 comments · Fixed by #1323
Assignees
Labels
A-common Area:common issues that not related to specific pillar

Comments

@TommyCpp
Copy link
Contributor

TommyCpp commented Aug 23, 2023

Most of our creates generally allow two ways to configure values:

  • Use environment variable fetched during runtime
  • User provided value during compiling time(with_xxx)

However, this is unclear when both of the configurations present which one should take priority. Regardless we should align on this and use it as a tenet across all our creates.

cc @open-telemetry/rust-approvers

@TommyCpp TommyCpp added A-common Area:common issues that not related to specific pillar release:required-for-stable Must be resolved before GA release, or nice to have before GA. labels Aug 23, 2023
@TommyCpp
Copy link
Contributor Author

TommyCpp commented Aug 23, 2023

We discussed it in weekly meetings. Some notes

  • Compiling time value
    • @TommyCpp @cijothomas agree with this
    • @shaun-cox is not objecting if we advise users to not configure anything in code and leave it to ENV.
    • NET, Go, Python, follow this.
  • Environment variables

Edit (lalitb) - C++ also follow "code over env"

@djc
Copy link
Contributor

djc commented Aug 24, 2023

I find it very confusing to make the compiled-in value take priority over the run-time value (from an environment variable). Code is usually written (and compiled) to function across many environments, whereas the runtime environment is specific to a particular invocation. Clearly the runtime environment, with its better specificity, should be trusted more than the code.

Maybe we should just not provide any way to configure these values in code? Or clarify with function names etc that the values provided in code are just fallbacks if the environment doesn't provide the required information?

See also this recent discussion in Cargo on a related problem:

rust-lang/cargo#12515

In this case, there is no value provided in code, but there's a clear hierarchy of specificity.

@TommyCpp
Copy link
Contributor Author

I think most of people favors the environment variables. I think we can favor env vars here and make sure it's consistent.

Please leave comments if you disgree @open-telemetry/rust-approvers

@hdost
Copy link
Contributor

hdost commented Oct 14, 2023

I think most of people favors the environment variables. I think we can favor env vars here and make sure it's consistent.

Please leave comments if you disgree @open-telemetry/rust-approvers

So while this is not a deviation (more of a clarification), it might also be useful to document. #1297

@q3k
Copy link

q3k commented May 29, 2024

FWIW I've just gotten confused by OTEL_EXPORTER_OTLP_ENDPOINT taking precedence over .with_endpoint. This was in a scenario where code was written without the expectation of this variable being set, but run in a different environment where (junk) env vars took over and broke export with counter-intuitive transport errors.

I would've instead expected having to specify some kind of (imaginary) .with_env_vars() to enable environment variables being consulted. Otherwise it's not clear from glancing at the code that any environment variables are even considered (at least to people not familiar with the OpenTelemetry specs).

I'm not necessarily advocating for changing this behaviour, just providing a data point from a library user.

@cijothomas
Copy link
Member

Reopening to further discuss given above.

@cijothomas cijothomas reopened this Jun 3, 2024
@cijothomas cijothomas removed the release:required-for-stable Must be resolved before GA release, or nice to have before GA. label Feb 3, 2025
@cijothomas
Copy link
Member

This is not consistent enforced everywhere. Adding to Log SDK stable milestone, as this is important to get closure.

@lalitb
Copy link
Member

lalitb commented Feb 3, 2025

Good catch.

I believe the priority order as discussed above should be

  • Signal specific env-variables
  • Generic env-variables
  • with_xxx compiled values.
  • default values

Quick look at BatchConfigBuilder , the priority order seems to be:

  • with_xxx compiled values
  • env-variables
  • default values

This needs to be fixed. Will raise an issue for this first.

@cijothomas
Copy link
Member

Thanks. I'd like to revisit the priority order also.

Re-stating my opinion:

if a user does with_xxx in code, then it should be final, and ENV variable should not override that. if user prefer ENV variable based config, then they should not do with_xxx in code and leave it untouched, so they can use ENV variable to configure things.

This was never clarified in spec, so we are left with own policy, but I also want to recheck with Spec to see if there is some incoming guidance.

@lalitb
Copy link
Member

lalitb commented Feb 3, 2025

if a user does with_xxx in code, then it should be final, and ENV variable should not override that. if user prefer ENV variable based config, then they should not do with_xxx in code and leave it untouched, so they can use ENV variable to configure things.

Typically, environment variables are prioritized over compiled values because they allow runtime configuration without requiring recompilation. I have seen this convention more widely. Do we have a strong reason for otherwise? I can think of - malicious manipulation of environment variables, or someone leaving environment variables set by mistake - but these risks should be mitigated by proper access control and policies.

@cijothomas
Copy link
Member

Do we have a strong reason for otherwise?

Yes. (not sure if my opinion is a "strong" reason 🤣)
If the user explicitly configured something in code (with_endpoint("my-endpoint")), then an ENV variable overriding it can be surprising. By explicitly calling with_endpoint, user is making a conscious decision that this is the final value, and it cannot be overridden, so they can be assured that the App behaves the same way and is unaffected by environment. (env variables can influence multiple applications, and it may/may-not be the intent. also the case of leaving env vars by mistake.)

Of course a lot of users prefer to have their app configured based on ENV. For those users, they can decide to not to explicitly call with_endpoint in code.

If we go with this, all docs for "with_**" methods for which an ENV config exists, it must be explicitly documented to users that invoking this method will seal the config, and no ENV variable can change that, and advice them to not call these methods, if they prefer ENV based configuration only.

(I wish spec would have made a decision here, as without clearly specifying the priority/order across all languages, things can get very messy in polyglot environments. Will check with spec again.)

@MikeGoldsmith
Copy link
Member

I disagree with code choices being the final decision maker for configuration values.

As an operator of software, being able to alter behaviour from the outside (eg using environment variables) at runtime is super important. Otherwise development-time choices require software to be modified (eg rebuilt) which is slow & inefficient.

@cijothomas
Copy link
Member

I disagree with code choices being the final decision maker for configuration values.

As an operator of software, being able to alter behaviour from the outside (eg using environment variables) at runtime is super important. Otherwise development-time choices require software to be modified (eg rebuilt) which is slow & inefficient.

Users can decide not to programmatically configure settings, if they want to ensure ENV can be used to determine behavior without redeployment.

Either way, spec has not clarified this behavior so far, leaving this up to individual implementations and is consistent across languages.
open-telemetry/opentelemetry-specification#4413

@scottgerring
Copy link
Contributor

scottgerring commented Feb 18, 2025

I would've instead expected having to specify some kind of (imaginary) .with_env_vars() to enable environment variables being consulted

This is I would say the common pattern and would make it very explicit, but I suppose it's a breaking change. I think this would be the optimal way to do it greenfields. Are we against this ?

If the user explicitly configured something in code (with_endpoint("my-endpoint")), then an ENV variable overriding it can be surprising

I agree with this. The issue is that the call here is so explicit - "use this". Hierarchy of preference works cleanly when its all implicit - "1/ check env_1, 2/ check_file, 3/ check env_var_2" . The sort of thing you'd see with e.g. the AWS SDK credential resolution chain.

https://12factor.net/config encourages the use of environment variables.

The AWS SDK credential resolution process is a cool example too, I think - you either 1/ let it do magic automatic resolution, which has a hierarchy, or 2/ very explicitly use an API to specify.

@cijothomas
Copy link
Member

From today's SIG call:

Nahum:
Env is preferred to take over from own experience.
But we need to throw some error level internal-logs when both are specified, so users are aware of what's going on, instead of being taken by surprise.

Anton:
Hardcoded code should be higher priority to avoid being overrides by someone else.

Leaving open for 2 more days and make a final decision and implement it.

@cijothomas
Copy link
Member

open-telemetry/opentelemetry-specification#4413 (comment)
It looks like majority of OTel implementations follow the "explicit-code-takes-precedence-over-env" approach. (It is aligned with my proposal).

As discussed in community call today, will leave it open for comments for another 2 days.

@TommyCpp
Copy link
Contributor Author

TommyCpp commented Feb 19, 2025

explicit-code-takes-precedence-over-env

Languages also supports users to config it to be auto resolved by env var. We can revert the behaivor to be by default favor compiling value but I think we should add options like with_env_vars to allow code owners to state they allow env var take over. Also makes the migration easier

@cijothomas
Copy link
Member

Languages also supports users to config it to be auto resolved by env var.

Rust is also the same right? if user does not explicitly configure in code, then env-var is respected. (Sorry I may not have fully understood this statement!)

but I think we should add options like with_env_vars to allow code owners to state they allow env var take over

I am not so sure if maintaining additional complexity is worth it. For users who want env_var, they can choose to not programmatically configure it.
Overall, OTel is moving to a totally new configuration model which would become the norm in the future. ENV variable support remains to maintain back-compatibility.

@cijothomas
Copy link
Member

open-telemetry/opentelemetry-specification#4413 (comment) Majority is treating Code over env variable. Rust looks like the odd one out now!

I propose to make the change to treat code as final, and update docs/design etc. to match it.

@cijothomas
Copy link
Member

From community call today:
@TommyCpp
It is better to offer a with_env_vars so users who rely on existing behavior has an easy migration.
Maybe okay to keep it just for 1-2 release and not a permanent setting.

Link to OTel Spec for new declarative config: open-telemetry/opentelemetry-specification#4374

@cijothomas
Copy link
Member

First PR done #2781
More to come.

@cijothomas
Copy link
Member

First PR done #2781 More to come.

This is the only Logging SDK specific change required. Removing this issue from Log SDK stable milestone. and moving to Metric SDK milestone.

@rubberduck203
Copy link

Can someone clarify for me what the current priority is and what it will be after these PRs land?
I'm currently pretty convinced there was a behavior change between v0.22 and v0.27 that broke me and I'm trying to figure out the best solution in light of this discussion and any upcoming changes.

@cijothomas
Copy link
Member

Can someone clarify for me what the current priority is and what it will be after these PRs land? I'm currently pretty convinced there was a behavior change between v0.22 and v0.27 that broke me and I'm trying to figure out the best solution in light of this discussion and any upcoming changes.

it is not consistent today. Some prefer env some does code 😢
The goal is to :

  1. Make it consistent everywhere in the repo
  2. Make code take higher priority than env
  3. Make the priority order clear in the doc

@rubberduck203
Copy link

Thank you @cijothomas. That at least tells me how to prepare for the upcoming changes. Much appreciated.

@cijothomas
Copy link
Member

Closing this discussion as we reached an agreement on what to do. (i.e treat code as final, and higher ranked than env variables).
Changes are done for SDK.
Will open specific issues for each component, starting with OTLP Exporter here : #2818

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-common Area:common issues that not related to specific pillar
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants