Proxy add watchdog#1931
Conversation
|
@claude review |
|
Claude: ## Summary Moves proxy management from the trainer into the rollout subsystem via a new ProduceBatchResult impact: not affected IssuesCritical
Warning
Nit
VerdictREQUEST_CHANGES — the test/implementation mismatch needs a fix before merge. |
|
@claude review |
|
Claude: ## Summary Re-reviewed after new commits. The previous critical issue (test asserting logger.info that didn't exist) has been fixed. Thread safety concern addressed by moving lifecycle notifications inside Good architectural improvement: replaces hard-coded class-name proxy checks with a declarative ProduceBatchResult impact: not affected IssuesNit
VerdictAPPROVE — previous concerns resolved, architecture is clean, tests cover the key behaviors. |
| logger.warning( | ||
| f"Rollout worker group ranks={group.ranks} failed health check. Marking as inactive." | ||
| ) | ||
| self._notify_worker_group_inactive(group) |
There was a problem hiding this comment.
claude code 说有可能存在偶发的竞态。放到里面最保险
Uh oh!
There was an error while loading. Please reload this page.