Skip to content

Pull requests: huggingface/nanotron

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Recommend the use of Spack on supercomputers
#282 opened Feb 19, 2025 by thomas-bouvier Loading…
Add MLA
#278 opened Feb 5, 2025 by zzhhjjj Loading…
Add nanotron performance
#274 opened Jan 23, 2025 by xrsrke Loading…
fp8
#266 opened Dec 18, 2024 by xrsrke Loading…
Fix wrong initialization of lr scheduler
#256 opened Nov 29, 2024 by kylematoba Loading…
[NEW] Llama3.2 weight converters 🦙
#255 opened Nov 28, 2024 by TJ-Solergibert Loading…
6 tasks
Fix initial_lr when resuming training
#243 opened Nov 17, 2024 by Lauler Loading…
Load random states from checkpoint
#238 opened Nov 2, 2024 by gritukan Loading…
lighteval support after checkpoint, UX refactor
#222 opened Aug 24, 2024 by eliebak Loading…
Refactor pre tokenization tool
#219 opened Aug 21, 2024 by eliebak Loading…
Created interconnect benchmark before the training
#200 opened Jun 22, 2024 by RamenBuddha Loading…
Move MoE Implementation into src/, add Load Balancing Losses
#192 opened Jun 6, 2024 by haeggee Loading…
1 task done
[Feature] Monitor model states during training
#183 opened May 25, 2024 by xrsrke Loading…
Fix overflow in nanosets with big datasets
#182 opened May 23, 2024 by jquesnelle Loading…
Ring attention
#181 opened May 23, 2024 by zzhhjjj Loading…
ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.