qwen3_5_moe: add OpenAI serving entrypoint by mergennachin · Pull Request #20313 · pytorch/executorch

mergennachin · 2026-06-16T21:42:25Z

No description provided.

pytorch-bot · 2026-06-16T21:42:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20313

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Unrelated Failures, 4 Unclassified Failures

As of commit 13b2ff0 with merge base 551e90e ():

NEW FAILURE - The following job has failed:

pull / unittest-editable / windows / windows-job (gh)
examples/models/test/test_export.py::ExportTest::test_efficient_sam_export_to_executorch

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

Build Aarch64 Linux Wheels / pytorch/executorch / build-wheel-py3_10-cpu-aarch64 (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
/__w/executorch/executorch/pytorch/executorch/backends/apple/coreml/runtime/inmemoryfs/inmemory_filesystem.cpp:722:48: error: ‘inmemoryfs::InMemoryFileSystem::InMemoryNode::Kind’ has not been declared
Build Aarch64 Linux Wheels / pytorch/executorch / upload / upload-wheel-py3_10-cpu-aarch64 (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_aarch64
cuda-perf / benchmark-cuda (SocialLocalMobile/Qwen3.5-35B-A3B-HQQ-INT4, quantized-int4-tile-packed, SocialLoc... / linux-job (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: server did not become healthy: <urlopen error [Errno 111] Connection refused>
Test CUDA Builds / test-model-cuda-e2e (SocialLocalMobile, Qwen3.5-35B-A3B-HQQ-INT4, quantized-int4-tile-packed) / linux-job (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: server did not become healthy: <urlopen error [Errno 111] Connection refused>

FLAKY - The following job failed but was likely due to flakiness present on trunk:

MLX / test-mlx-voxtral / test-mlx-voxtral (gh) (detected as infra flaky with no log or failing log classifier)

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-16T21:43:33Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

Adds an OpenAI-compatible serving entrypoint for the Qwen3.5 MoE example model by introducing a model-specific Python launcher (control plane) and a dedicated C++ worker binary (data plane) that speaks the generic examples/llm_server JSONL protocol.

Changes:

Introduce executorch.examples.models.qwen3_5_moe.serve plus hermetic tests asserting control-plane/model-code separation and correct worker spawn args.
Add qwen3_5_moe_worker executable target and wire it into Qwen3.5 MoE CMake presets.
Extend CI to export additional tokenizer files and run a CUDA OpenAI-serving smoke test; document serving usage in the model README.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
examples/models/qwen3_5_moe/test_serve.py	Adds hermetic tests for the serving launcher and separation guarantees.
examples/models/qwen3_5_moe/serve.py	New OpenAI-compatible control-plane entrypoint that spawns the worker and builds the FastAPI app.
examples/models/qwen3_5_moe/README.md	Documents how to run the server and integrate it with pi.
examples/models/qwen3_5_moe/qwen35_moe_worker.cpp	New C++ worker binary for model execution via llm_server JSONL protocol.
examples/models/qwen3_5_moe/CMakePresets.json	Adds the worker target to CUDA/Metal build presets.
examples/models/qwen3_5_moe/CMakeLists.txt	Defines the `qwen3_5_moe_worker` executable and stripping/link options.
.ci/scripts/test_model_e2e.sh	Adds CUDA serving smoke test exercising `/health`, `/v1/models`, and `/v1/chat/completions`.
.ci/scripts/export_model_artifact.sh	Exports `tokenizer_config.json` alongside `tokenizer.json` for serving templating.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+def _default_worker_bin() -> str:
+    repo_root = Path(__file__).resolve().parents[3]
+    return str(
+        repo_root
+        / "cmake-out"
+        / "examples"
+        / "models"
+        / "qwen3_5_moe"
+        / "qwen3_5_moe_worker"
+    )


+_HERE = pathlib.Path(serve.__file__).resolve().parent
+_REPO_ROOT = _HERE.parents[2]
+


+    offenders = [
+        p
+        for p in server_dir.rglob("*.py")
+        if "qwen3_5_moe" in p.read_text() or "_qwen35_moe" in p.read_text()
+    ]


qwen3_5_moe: add OpenAI serving entrypoint

13b2ff0

Copilot AI review requested due to automatic review settings June 16, 2026 21:42

mergennachin requested review from kirklandsign and larryliu0820 as code owners June 16, 2026 21:42

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 16, 2026

mergennachin temporarily deployed to cadence June 16, 2026 21:42 — with GitHub Actions Inactive

mergennachin marked this pull request as draft June 16, 2026 21:42

Copilot started reviewing on behalf of mergennachin June 16, 2026 21:42 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

mergennachin temporarily deployed to upload-benchmark-results June 16, 2026 23:01 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen3_5_moe: add OpenAI serving entrypoint#20313

qwen3_5_moe: add OpenAI serving entrypoint#20313
mergennachin wants to merge 1 commit into
mainfrom
llm-qwen35-moe-serving

mergennachin commented Jun 16, 2026

Uh oh!

pytorch-bot Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		_HERE = pathlib.Path(serve.__file__).resolve().parent
		_REPO_ROOT = _HERE.parents[2]

Conversation

mergennachin commented Jun 16, 2026

Uh oh!

pytorch-bot Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20313

❌ 1 New Failure, 3 Unrelated Failures, 4 Unclassified Failures

Uh oh!

github-actions Bot commented Jun 16, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot Bot commented Jun 16, 2026 •

edited

Loading

This PR needs a `release notes:` label