You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RMU is a state-of-the-art unlearning method based on controlling model representations. RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs.
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Tasks
Feature request
RMU is a state-of-the-art unlearning method based on controlling model representations. RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs.
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Paper: https://arxiv.org/abs/2403.03218
Site: https://www.wmdp.ai/
Representation Engineering: https://www.ai-transparency.org/
Motivation
RMU apparently performs really well on the WMDP dataset
The text was updated successfully, but these errors were encountered: