Skip to content

Support end-to-end ECE negotiation in RDMA handshake V3#3352

Open
chenBright wants to merge 1 commit into
apache:masterfrom
chenBright:ece
Open

Support end-to-end ECE negotiation in RDMA handshake V3#3352
chenBright wants to merge 1 commit into
apache:masterfrom
chenBright:ece

Conversation

@chenBright

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: resolve

Problem Summary:

In #3255, BringUpQp() only did a local ibv_query_ece + ibv_set_ece
roundtrip and never exchanged ECE capabilities with the peer.

What is changed and the side effects?

Changed:

This PR implements an end-to-end ECE negotiation in the v3 handshake.

  • Client (RdmaHandshakeClientV3::SendLocalHello): query local ECE,
    store it in _outgoing_ece, advertise it in the v3 hello.
  • Server (BringUpQp): apply the client's ECE during INIT->RTR (ibv_set_ece).
    After QP reaches RTS, ibv_query_ece to obtain the reduced/negotiated ECE
    (the subset both peers support) and store it in _outgoing_ece.
  • Server (SendLocalHelloFillLocalRdmaHello): advertise the negotiated ECE
    in the reply hello.
  • Client (BringUpQp): apply the server's reduced ECE during INIT->RTR.

Side effects:

  • Performance effects:

  • Breaking backward compatibility:


Check List:

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds end-to-end ECE (Enhanced Connection Establishment) negotiation to the RDMA handshake v3 (“RDM3”), so client/server exchange ECE capabilities and apply the negotiated (reduced) result during QP bring-up, while gracefully degrading when ECE isn’t supported.

Changes:

  • Extend the v3 protobuf handshake (RdmaHello) to optionally carry ECE capabilities/negotiated results.
  • Implement role-specific ECE handling: client queries+advertises local ECE; server applies client ECE during INIT→RTR, then queries and advertises negotiated ECE after RTS; client applies server’s negotiated ECE.
  • Make ibv_query_ece/ibv_set_ece optional dynamic symbols and add unit tests + broaden CI test execution.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/brpc_rdma_unittest.cpp Adds v3 handshake tests covering ECE presence/absence and server behavior with ECE-enabled/disabled.
src/brpc/rdma/rdma_helper.cpp Loads ECE verbs symbols optionally so older libibverbs can still run RDMA without failing init.
src/brpc/rdma/rdma_handshake.proto Adds RdmaEce and an optional ece field in RdmaHello for v3 negotiation.
src/brpc/rdma/rdma_handshake.h Extends parsed hello to carry optional ibv_ece for v3.
src/brpc/rdma/rdma_handshake.cpp Adds FLAGS_rdma_ece, advertises/parses ECE in v3 hello, and queries local ECE on the v3 client.
src/brpc/rdma/rdma_endpoint.h Stores _outgoing_ece to advertise role-specific ECE in the next hello.
src/brpc/rdma/rdma_endpoint.cpp Updates QP bring-up to accept parsed hello (incl. optional ECE) and queries negotiated ECE on the server after RTS.
.github/workflows/ci-linux.yml Runs the full //test/... suite under the RDMA Bazel config (removes previous gtest filter).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/brpc_rdma_unittest.cpp
Comment thread src/brpc/rdma/rdma_endpoint.cpp Outdated
Comment thread src/brpc/rdma/rdma_endpoint.cpp Outdated
Comment thread src/brpc/rdma/rdma_endpoint.cpp
Previously BringUpQp only did a local ibv_query_ece + ibv_set_ece
roundtrip and never exchanged ECE capabilities with the peer.
This patch wires up the standard requestor/responder ECE negotiation
flow on top of the existing v3 handshake without adding any extra
round trip:

1. Client queries local ECE, advertises it in its v3 hello.

2. Server applies the client's ECE in INIT->RTR (set_ece), then
   after RTS queries the reduced/negotiated ECE and sends it back
   in the reply hello.

3. Client applies the server's reduced ECE in INIT->RTR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants