support qwen3.6 grpo & in-place add lora by hjh0119 · Pull Request #163 · modelscope/twinkle

hjh0119 · 2026-04-16T14:16:00Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces a GRPO training script for the GSM8K dataset and implements support for Mixture-of-Experts (MoE) models within the Ray-based training framework. Key changes include specialized LoRA weight synchronization for MoE layers in the vLLM sampler, which now buckets expert weights separately and accumulates them on the CPU to prevent memory issues. Feedback was provided regarding a potential runtime error in the Megatron model logic where a variable could be accessed before being defined.

hjh0119 added 2 commits April 15, 2026 22:32

add base_layer suffix for expert weights

cccc369

qwen3.6 grpo

457f941

gemini-code-assist bot reviewed Apr 16, 2026

View reviewed changes

Comment thread src/twinkle/model/megatron/megatron.py

tastelikefeet approved these changes Apr 16, 2026

View reviewed changes

hjh0119 added 3 commits April 16, 2026 22:23

Merge branch 'main' into expert-lora

8f232d2

adjust gpu_memory_utilization to avoid oom

6968438

reuse ipc buffer to a avoid oom

44969e8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support qwen3.6 grpo & in-place add lora#163

support qwen3.6 grpo & in-place add lora#163
hjh0119 wants to merge 5 commits intomodelscope:mainfrom
hjh0119:expert-lora

hjh0119 commented Apr 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hjh0119 commented Apr 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants