Skip to content

feat(realtime): Semantic VAD EOU token#10444

Open
richiejp wants to merge 1 commit into
mudler:masterfrom
richiejp:feat/realtime-semantic-vad-eou
Open

feat(realtime): Semantic VAD EOU token#10444
richiejp wants to merge 1 commit into
mudler:masterfrom
richiejp:feat/realtime-semantic-vad-eou

Conversation

@richiejp

Copy link
Copy Markdown
Collaborator

Description

Use the EOU token from Parakeet transciption to implement semantic VAD.

Notes for Reviewers

  • feat(grpc): add AudioTranscriptionLive bidirectional RPC and TranscriptResult.eou
  • feat(grpc): wire AudioTranscriptionLive through client, server, base and embed
  • feat(parakeet-cpp): live transcription RPC with per-call engine locking
  • feat(core): live transcription session wrapper and pipeline turn_detection config
  • feat(realtime): EOU-driven semantic_vad turn detection with retranscribe gate
  • fix(traces): make audio clips playable — blob URLs and a live body cap
  • feat(realtime): commit immediately on EOU, drop the extra 0.3s silence window
  • fix(realtime): stop cutting the start of utterances on VAD buffer clears
  • feat(trace): record a model_load trace for every successful backend load
  • feat(realtime): per-turn live transcription traces and commit timing telemetry
  • fix(grpc): release bidi stream conns on terminal Recv; PCM/lookup cleanups
  • feat(parakeet-cpp): bump to parakeet.cpp ABI v5 — incremental mel and EOU/EOB split
  • feat(realtime): live input captions — stream transcription deltas while the user speaks
  • fix(realtime): stop dropping the EOU token to the eagerness timeout
  • feat(realtime): synthesize clauses off the LLM token callback
  • fix(ui): show the realtime pipeline components as a vertical list

Signed commits

  • Yes, I signed my commits.

@richiejp richiejp force-pushed the feat/realtime-semantic-vad-eou branch from a9e9372 to 447708f Compare June 23, 2026 09:12
Add a `semantic_vad` turn-detection mode to the realtime API that feeds
the transcription model live and decides "the user finished speaking"
from the `<EOU>` end-of-utterance token rather than from silence alone.
When EOU fires the turn commits immediately (~0.3s); otherwise it falls
back to an eagerness-scaled silence threshold (low/med/high = 8/4/2s).

Plumbing, bottom to top:

- proto: `AudioTranscriptionLive` bidirectional RPC (config-first oneof,
  mono float PCM @16k, ready-ack / Unimplemented degrade signal) plus
  `TranscriptResult.eou` for the unary retranscribe gate.
- pkg/grpc: client/server/base/embed scaffolding for the bidi stream,
  modeled on AudioTransformStream; release stream conns on terminal Recv.
- parakeet-cpp: live transcription RPC with per-C-call engine locking
  (one live stream per turn, finalize+free at commit); bump parakeet.cpp
  to ABI v5 — incremental StreamingMel (no more quadratic per-feed mel
  recompute that delayed EOU on long turns) and the <EOU>/<EOB> split;
  strip the literal <EOU>/<EOB> from offline text and set Eou.
- core/backend: LiveTranscriptionSession wrapper + pipeline
  `turn_detection:` config block (type/eagerness/retranscribe).
- realtime: semantic_vad integration — live input captions streamed as
  transcription deltas while the user speaks, EOU-immediate commit with
  eagerness fallback, optional retranscribe gate (batch re-decode must
  also end in <EOU> to confirm), clause synthesis off the LLM token
  callback, and per-turn live-transcription / model_load telemetry.
- UI: show the realtime pipeline components as a vertical list.

Docs and tests included; opt-in via the pipeline YAML or per-session
`session.update`. Non-streaming STT backends degrade to silence-only.

Assisted-by: Claude Code:claude-opus-4-8 [Read] [Edit] [Write] [Bash]
Assisted-by: Claude Code:claude-fable-5 [Read] [Edit] [Bash]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
@richiejp richiejp force-pushed the feat/realtime-semantic-vad-eou branch from 447708f to 772bdb3 Compare June 23, 2026 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant