Npu adapt megatron by addsubmuldiv · Pull Request #153 · modelscope/twinkle

addsubmuldiv · 2026-04-13T08:39:12Z

PR Type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

Summary

This PR completes Twinkle's NPU Megatron adaptation and targets the Twinkle + Megatron-LM 0.15.3 + MindSpeed 0.15.3 + mcore-bridge stack. The goal is to make the dense / LoRA 8-card training path stable on NPU.

Main changes:

Move MindSpeed bootstrap before mcore_bridge is imported to avoid late patching and early binding of TE / Megatron symbols.
Build MindSpeed runtime args from the current ModelConfig and the runtime parallel topology, then call repatch() when the runtime signature changes.
Fix distributed initialization and metric gathering on NPU:
- add a default PG fallback for single-rank local smoke
- reuse Megatron's Gloo DP group for Python object gathering on NPU
Fix causal mask handling for NPU FlashAttention:
- stop feeding Twinkle's 4D dense causal mask directly into the MindSpeed TE flash path
- let MindSpeed generate the compressed causal mask on the causal NPU path
Complete multi-LoRA compatibility for the NPU Megatron path:
- multi-tenant LoRA training
- multi-tenant save/export flow
- optimizer capability selection cleanup

What Changed

1. MindSpeed runtime bootstrap

Added an NPU-only runtime bootstrap to ensure MindSpeed patching happens before mcore_bridge import.
Unified MindSpeed runtime arg generation into one path so Twinkle and MindSpeed do not read inconsistent runtime state.

2. Process group / metric gather

Fixed default PG initialization for single-rank Megatron smoke.
Changed NPU gather_object() to prefer Megatron's Gloo DP group to avoid hangs in metrics / Python object gathering.
Kept the DP+CP group selection for CP-enabled runs.

3. NPU FlashAttention

Fixed causal attention mask handling on NPU.
For causal NPU paths, no longer pass Twinkle's 4D dense mask directly, avoiding the MindSpeed TE FlashAttention shape mismatch.

4. LoRA / Multi-LoRA

Fixed runtime checks for LoRA finalize so a bare model with ddp_config is not incorrectly treated as a model that can run native finalize.
Cleaned up optimizer capability selection for multi-LoRA so it uses the local bf16 optimizer path that fits the model structure.
Fixed the multi-LoRA save callback signature so the current tenant adapter is correctly passed through during save.

5. Documentation

Updated the NPU support docs with Megatron backend installation and usage guidance.
Added installation notes for Megatron / MindSpeed / mcore-bridge and the matching cookbook smoke entrypoints.

Notes

This PR targets the following version stack:

Megatron-LM 0.15.3
MindSpeed 0.15.3
mcore-bridge
Twinkle NPU environment

gemini-code-assist

Code Review

This pull request introduces comprehensive NPU support for the Megatron backend, featuring documentation updates for environment requirements and a new MindSpeed runtime bootstrap for NPU-specific patching and argument synthesis. Key technical changes include refined process group initialization for single-rank environments, optimized attention mask handling for NPU FlashAttention, and the use of Gloo groups for object gathering on NPU to prevent hangs. Review feedback pointed out a potential initialization error regarding invalid arguments in init_process_group, a hard dependency on megatron-core in utility functions, hardcoded paths in the documentation, and suggested expanding the mask-dropping logic to all causal NPU configurations.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Copilot

Pull request overview

This PR completes Twinkle’s NPU Megatron integration targeting the Megatron-LM 0.15.3 + MindSpeed 0.15.3 + mcore-bridge stack, focusing on stabilizing 8-card dense/LoRA training on NPU by fixing MindSpeed bootstrap timing, distributed/metric collectives, and NPU FlashAttention mask handling.

Changes:

Add an NPU MindSpeed bootstrap layer to ensure adaptor patching happens before mcore_bridge imports Megatron/TE, and synthesize/refresh MindSpeed runtime args from ModelConfig.
Adjust Megatron initialization for NPU (default PG fallback, Gloo process groups, metrics/object-gather behavior) and fix causal mask handling for NPU FlashAttention.
Update NPU documentation and add Megatron NPU smoke cookbooks/scripts.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/twinkle/utils/framework.py	Prefer Megatron’s Gloo DP group for `all_gather_object` on NPU to avoid HCCL hangs during metric/object collection.
src/twinkle/model/megatron/strategy/megatron.py	NPU-specific Megatron init tweaks (Gloo PG creation, device binding cleanup), MoE sequence-parallel auto-enable, and MindSpeed runtime arg configuration.
src/twinkle/model/megatron/multi_lora_megatron.py	Reorder MindSpeed patching ahead of `mcore_bridge` import for NPU multi-LoRA Megatron path.
src/twinkle/model/megatron/megatron.py	Add default-PG fallback for single-rank smoke, ensure early MindSpeed patching, and drop dense 4D causal masks on NPU causal TE flash path.
src/twinkle/model/megatron/_mindspeed_runtime.py	New module implementing early MindSpeed adaptor patching + runtime args synthesis + conditional repatching.
docs/source_en/Usage Guide/NPU-Support.md	Update NPU dependency guidance, add Megatron backend install steps, and point to Megatron NPU smoke cookbooks.
cookbook/megatron/ascend/tp_npu.py (+ .sh)	Add 8-card TP/PP/DP NPU Megatron smoke script.
cookbook/megatron/ascend/tp_moe_npu.py (+ .sh)	Add 8-card MoE NPU smoke script.
cookbook/megatron/ascend/tp_moe_cp_npu.py (+ .sh)	Add 8-card MoE+CP NPU smoke script (megatron_cp_algo path).

addsubmuldiv added 3 commits April 11, 2026 21:55

adapt npu megatron 0.15

504acae

fix

b955af9

update doc of npu

1adf223

gemini-code-assist bot reviewed Apr 13, 2026

View reviewed changes

Comment thread src/twinkle/model/megatron/megatron.py Outdated

Comment thread src/twinkle/utils/framework.py Outdated

Comment thread docs/source_en/Usage Guide/NPU-Support.md Outdated

Comment thread src/twinkle/model/megatron/megatron.py Outdated

addsubmuldiv and others added 4 commits April 13, 2026 16:50

Update docs/source_en/Usage Guide/NPU-Support.md

2f9dd40

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

fix

b493a22

update cookbook and doc

09ce014

update

917c4f0

addsubmuldiv marked this pull request as ready for review April 13, 2026 11:42

Copilot AI review requested due to automatic review settings April 13, 2026 11:42

Copilot started reviewing on behalf of addsubmuldiv April 13, 2026 11:43 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Comment thread src/twinkle/model/megatron/megatron.py

Comment thread src/twinkle/model/megatron/megatron.py

Comment thread src/twinkle/model/megatron/strategy/megatron.py

Comment thread src/twinkle/utils/framework.py

Comment thread src/twinkle/model/megatron/strategy/megatron.py

tastelikefeet reviewed Apr 14, 2026

View reviewed changes

Comment thread src/twinkle/model/megatron/megatron.py Outdated

tastelikefeet reviewed Apr 14, 2026

View reviewed changes

Comment thread src/twinkle/model/megatron/megatron.py Outdated

tastelikefeet reviewed Apr 14, 2026

View reviewed changes

Comment thread src/twinkle/utils/framework.py Outdated

addsubmuldiv added 3 commits April 14, 2026 15:06

fix

0a52cfb

fix

41b0dcc

fix

2ff82f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Npu adapt megatron#153

Npu adapt megatron#153
addsubmuldiv wants to merge 10 commits intomodelscope:mainfrom
addsubmuldiv:npu_adapt_megatron

addsubmuldiv commented Apr 13, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

addsubmuldiv commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Summary

What Changed

1. MindSpeed runtime bootstrap

2. Process group / metric gather

3. NPU FlashAttention

4. LoRA / Multi-LoRA

5. Documentation

Notes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

addsubmuldiv commented Apr 13, 2026 •

edited

Loading