Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use different tokens instead of forcing WD and all HMS to use the same delegatetoken in the kerberos environment #313

Merged
merged 9 commits into from
May 29, 2024

Conversation

flaming-archer
Copy link
Contributor

📝 Description

1.Use WD's own token.
2. Support HMS to use different tokens separately.
3. Refactored the delegatetoken section, greatly simplifying the code logic.
4. After extensive testing in our environment, it is OK.

🔗 Related Issues

#309

@flaming-archer
Copy link
Contributor Author

@patduin Could you review this or have someone else review it as well. We have been making this change for several months, and tested many times. I think it can really help many people.

@flaming-archer
Copy link
Contributor Author

The failed reason seems to have nothing to do with my pr
Error: Failed to execute goal org.sonatype.plugins:nexus-staging-maven-plugin:1.6.8:deploy (injected-nexus-deploy) on project waggle-dance-rpm: Failed to deploy artifacts: Could not transfer artifact com.hotels:waggle-dance-api:jar:javadoc:4.0.0-20240412.083012-2 from/to sonatype-nexus-snapshots (https://oss.sonatype.org/content/repositories/snapshots): authentication failed for https://oss.sonatype.org/content/repositories/snapshots/com/hotels/waggle-dance-api/4.0.0-SNAPSHOT/waggle-dance-api-4.0.0-20240412.083012-2-javadoc.jar, status: 401 Unauthorized -> [Help 1]

@patduin
Copy link
Contributor

patduin commented Apr 12, 2024 via email

Copy link
Contributor

@patduin patduin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot accept this PR as it is now. This needs some refactoring to separate kerberos code from the non kerberos paths and remove all the static class dependencies introduced.

@flaming-archer
Copy link
Contributor Author

I cannot accept this PR as it is now. This needs some refactoring to separate kerberos code from the non kerberos paths and remove all the static class dependencies introduced.

Wait for me for a while, I will make some changes and submit it again.

@flaming-archer
Copy link
Contributor Author

@patduin @jmnunezizu I have completed the modifications, please help me take a look

  1. The static part of the code has been replaced.
  2. Strategy model for SaslThriftMetastoreClientManager.

@flaming-archer
Copy link
Contributor Author

At the same time, I changed the code to not require manual kinit and can automatically renew tickets.

@flaming-archer
Copy link
Contributor Author

I think the failed test case getTableMeta and Could not transfer artifact are not related to me.

@flaming-archer
Copy link
Contributor Author

@patduin @jmnunezizu Could you help review it? This feature is really useful for users in the kerberos environment.

@jmnunezizu
Copy link

Hi @flaming-archer – we'll get to it as soon as we can. Sorry for the delay and thanks for the additional changes and contribution.

@flaming-archer
Copy link
Contributor Author

Hi @flaming-archer – we'll get to it as soon as we can. Sorry for the delay and thanks for the additional changes and contribution.

It's been a week now....

@flaming-archer
Copy link
Contributor Author

@patduin @jmnunezizu This feature is really great. Could you help me take a look.

Copy link
Contributor

@patduin patduin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some cleanups but looks ok, thanks for patience while we found time to review.

@patduin
Copy link
Contributor

patduin commented May 27, 2024

I've merged the other hive3 PR please have a look at the conflicts, thank you.

@flaming-archer
Copy link
Contributor Author

I've merged the other hive3 PR please have a look at the conflicts, thank you.

Sorry, I only saw it now. I saw that PR, it was submitted by my colleague, and I know her changes. I will modify it together based on this. Wait for me for a moment.

yangyuxia and others added 4 commits May 28, 2024 22:33
…aGroup#315)

* Add personalized configuration parameters for each metastore.

* Add personalized configuration parameters for each metastore

* Recover

* Update junit test

* Update Junit Test

* Update Junit Test

* Update Junit Test

* Format the code and update the readme

* Revert

* Update FederatedMetaStoreTest.java

* Update PrimaryMetaStoreTest.java

* Update AbstractMetaStore.java

using new HashMap so the generated Yaml doesn't generate an anchor (reference &id001)

* Update YamlFederatedMetaStoreStorageTest.java

fixing test

---------

Co-authored-by: yangyx <360508847@qq.com>
Co-authored-by: Patrick Duin <patduin@gmail.com>
# Conflicts:
#	README.md
#	waggle-dance-api/src/main/java/com/hotels/bdp/waggledance/api/model/AbstractMetaStore.java
#	waggle-dance-api/src/test/java/com/hotels/bdp/waggledance/api/model/FederatedMetaStoreTest.java
#	waggle-dance-api/src/test/java/com/hotels/bdp/waggledance/api/model/PrimaryMetaStoreTest.java
#	waggle-dance-core/src/test/java/com/hotels/bdp/waggledance/mapping/service/impl/YamlFederatedMetaStoreStorageTest.java
@flaming-archer
Copy link
Contributor Author

@patduin @jmnunezizu Modified, pls take a look.

Copy link

@jmnunezizu jmnunezizu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment. Then I'll approve.

@flaming-archer
Copy link
Contributor Author

@patduin pls take a look at it.

Copy link
Contributor

@patduin patduin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@patduin patduin merged commit e774cb1 into ExpediaGroup:hive-3.x May 29, 2024
0 of 2 checks passed
@l-jelly
Copy link

l-jelly commented Dec 17, 2024

at

Hello, after the above modification, can krbs be automatically renewed?

@flaming-archer
Copy link
Contributor Author

at

Hello, after the above modification, can krbs be automatically renewed?
Sure, welcome to verify.

@l-jelly
Copy link

l-jelly commented Dec 25, 2024

Sure, welcome to verify.

acaaefc817558ef01726e0707536368e_compress

As shown in the above figure, the zk related configuration is added in hive-site. Use the hadoop user to start the process, and the renewal will fail after one day, with the following error:

87a0493c0ccc1552b5ba87adbae70ce7_compress

Please help me if you have time.

@l-jelly
Copy link

l-jelly commented Dec 25, 2024

at

Hello, after the above modification, can krbs be automatically renewed?
Sure, welcome to verify.

Please help me if you have time.

@flaming-archer
Copy link
Contributor Author

As shown in the above figure, the zk related configuration is added in hive-site. Use the hadoop user to start the process, and the renewal will fail after one day, with the following error:

In fact, when we use it, WD is deployed independently and does not rely on hive-site.xml. In addition, the configuration of zk is written in this configuration file.

Pls ref https://github.com/ExpediaGroup/waggle-dance/blob/hive-3.x/HowToKerberize.md.

Perhaps I should remind you of something, auto renew is stored in Hadoop UGI for TGT.

@l-jelly
Copy link

l-jelly commented Dec 31, 2024

As shown in the above figure, the zk related configuration is added in hive-site. Use the hadoop user to start the process, and the renewal will fail after one day, with the following error:

In fact, when we use it, WD is deployed independently and does not rely on hive-site.xml. In addition, the configuration of zk is written in this configuration file.

Pls ref https://github.com/ExpediaGroup/waggle-dance/blob/hive-3.x/HowToKerberize.md.

Perhaps I should remind you of something, auto renew is stored in Hadoop UGI for TGT.

I made the changes as you said. The hadoop cluster can be renewed successfully, but the waggle-dance renewal failed.

@flaming-archer
Copy link
Contributor Author

As shown in the above figure, the zk related configuration is added in hive-site. Use the hadoop user to start the process, and the renewal will fail after one day, with the following error:

In fact, when we use it, WD is deployed independently and does not rely on hive-site.xml. In addition, the configuration of zk is written in this configuration file.
Pls ref https://github.com/ExpediaGroup/waggle-dance/blob/hive-3.x/HowToKerberize.md.
Perhaps I should remind you of something, auto renew is stored in Hadoop UGI for TGT.

I made the changes as you said. The hadoop cluster can be renewed successfully, but the waggle-dance renewal failed.

24h renew change comes from b3758e3.

Perhaps you can look for the error log of renew in Hadoop UGI, the one above you is not. The one above you said there are no tickets left, you need to look for logs similar to renewal failures. Alternatively, you can debug the Hadoop UGI renew code yourself. The logic behind this is very simple: Hadoop UGI opens a thread and calls "kinit -kt" approximately every 24 hours. It shouldn't be difficult for you to identify the problem by observing the relevant code.

@l-jelly
Copy link

l-jelly commented Feb 10, 2025

如上图所示,在hive-site中添加了zk相关配置,使用hadoop用户启动进程,一天后续费失败,错误如下:

其实我们在使用的时候,WD 是独立部署的,不依赖 hive-site.xml。另外,zk 的配置也写在这个配置文件里,
可以参考https://github.com/ExpediaGroup/waggle-dance/blob/hive-3.x/HowToKerberize.md。
可能要提醒大家一点,TGT 的 auto renew 是存放在 Hadoop UGI 里的。

我按照你说的做了修改,hadoop集群可以更新成功,但是waggle-dance更新失败了。

24小时更新变化来自b3758e3

也许你可以看看 Hadoop UGI 中 renew 的错误日志,你上面的那个没有。你上面那个说没有剩余的 ticket,你需要寻找类似 renew 失败的日志。或者,你可以自己调试 Hadoop UGI renew 代码。这背后的逻辑很简单:Hadoop UGI 大约每 24 小时打开一个线程并调用“kinit -kt”。通过观察相关代码,你应该不难识别问题所在。

OK, thanks for the guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants