Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] RMU Unlearning #66

Open
1 of 5 tasks
ruidazeng opened this issue Mar 3, 2025 · 2 comments
Open
1 of 5 tasks

[Feature Request] RMU Unlearning #66

ruidazeng opened this issue Mar 3, 2025 · 2 comments
Labels
unlearning method Request to include new unlearning method

Comments

@ruidazeng
Copy link
Contributor

Tasks

  • Benchmark
  • Unlearning method
  • Evaluation
  • Dataset
  • None of the above

Feature request

RMU is a state-of-the-art unlearning method based on controlling model representations. RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs.

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Paper: https://arxiv.org/abs/2403.03218
Site: https://www.wmdp.ai/
Representation Engineering: https://www.ai-transparency.org/

Motivation

RMU apparently performs really well on the WMDP dataset

Image

@ruidazeng ruidazeng changed the title Support for RMU Unlearning [Feature Request] RMU Unlearning Mar 3, 2025
@Dornavineeth Dornavineeth added the unlearning method Request to include new unlearning method label Mar 3, 2025
@ruidazeng
Copy link
Contributor Author

@ruidazeng
Copy link
Contributor Author

ruidazeng commented Mar 3, 2025

GitHub repo for representation engineering: https://github.com/andyzoujm/representation-engineering

Paper for Representation Engineering: https://arxiv.org/abs/2310.01405

Center for AI Safety Blog about Representation Engineering:
https://www.safe.ai/blog/representation-engineering-a-new-way-of-understanding-models

Center for AI Safety Video about RMU: https://www.youtube.com/watch?v=2U5NNiGC9yk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unlearning method Request to include new unlearning method
Projects
None yet
Development

No branches or pull requests

2 participants