Add split_key_prefix_len to index config to shard S3 object keys by fulmicoton-dd · Pull Request #6529 · quickwit-oss/quickwit

fulmicoton-dd · 2026-06-19T08:02:49Z

Summary

Adds split_key_prefix_len: u8 to IndexConfig (and IndexTemplate) to configure S3 key sharding per index
Adds compute_split_key_prefix(split_id, prefix_len) to quickwit-common — extracts N characters from the ULID random portion (positions 10–25) as a prefix; logs a rate-limited warning and falls back to the flat scheme if the split ID is too short
Adds split_storage_path(split_id, prefix) to quickwit-common — builds the storage path from a precomputed prefix string; empty prefix = legacy flat scheme
Validates that split_key_prefix_len <= 16 (ULID random portion length) at config load time
SplitMetadata will gain a prefix: String field in a follow-up PR to wire the full pipeline (uploader, leaf search, merge, GC)

Backward compatibility: the field defaults to 0 (serde default), so all existing indexes and splits are unaffected. New splits on indexes with split_key_prefix_len: 2 will land at ND/01ARZ3.../01ARZ3....split paths, distributing across 1024 S3 partitions instead of one.

Test plan

cargo nextest run -p quickwit-common -p quickwit-config --all-features — 221 tests pass
cargo clippy --workspace --all-features --tests — no warnings
cargo +nightly fmt --all -- --check — no issues
Set split_key_prefix_len: 2 in an index config, verify it round-trips through serde correctly
Set split_key_prefix_len: 17, verify it is rejected with a clear error message

🤖 Generated with Claude Code

Recent splits share a ULID timestamp prefix, causing S3 key hotspots under high read load. Setting split_key_prefix_len (e.g. 2) on an index extracts N characters from the ULID random portion (positions 10–25) as a subdirectory prefix, distributing new splits across 32^N S3 partitions. Move split_key_prefix_len into IndexingSettings Replace split_key_prefix_len config field with QW_SPLIT_KEY_PREFIX_LEN env var Remove now-meaningless config changes and stray whitespace

guilload · 2026-06-22T13:58:56Z

@@ -0,0 +1,115 @@
+// Copyright 2021-Present Datadog, Inc.


Having this file located under actors is surprising because this is not an actor.

ok moved it to models.

guilload · 2026-06-22T14:01:00Z

+/// The value is read from the environment once and cached for the lifetime of the process.
+fn split_key_prefix_len() -> u8 {
+    static SPLIT_KEY_PREFIX_LEN: LazyLock<u8> = LazyLock::new(|| {
+        let prefix_len: u8 = quickwit_common::get_from_env("QW_SPLIT_KEY_PREFIX_LEN", 0u8, false);


I wonder if we should harcode the len to a reasonable default and forget about it reducing the amount of boilerplate in the config. We used to used 4 chars at Airbnb: we would hash the original path and use the first four char of the hash.

split ulid are base32. 5 bits per chars. a prefix of 2 already gives us 1024 buckets too (and then the regular ulid).

the reason why I didn't do more was, on the off change that we need to find all splits within a range it can be done in 1024 requests. If we don't care then ok for 4. Let me know what you think.

guilload · 2026-06-22T14:11:03Z

+    if prefix.is_empty() {
+        return format!("{split_id}.split");
+    }
+    format!("{prefix}/{split_id}.split")


I think the / as a separator is not great because the AWS UI and S3 CLI is going to show them as a directories.

ok changing that for - then?

actually I cannot change this without breaking yahoo.

we can delete the splits if needed

fulmicoton force-pushed the paul-masurel/split-key-prefix-sharding branch 5 times, most recently from a0bce5d to 45cf1a3 Compare June 19, 2026 08:21

fulmicoton-dd force-pushed the paul-masurel/split-key-prefix-sharding branch 4 times, most recently from 1331d4a to f3151db Compare June 22, 2026 08:29

fulmicoton-dd force-pushed the paul-masurel/split-key-prefix-sharding branch from f3151db to 63e9c35 Compare June 22, 2026 09:10

guilload reviewed Jun 22, 2026

View reviewed changes

CR comment

57a2195

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add split_key_prefix_len to index config to shard S3 object keys#6529

Add split_key_prefix_len to index config to shard S3 object keys#6529
fulmicoton-dd wants to merge 2 commits into
mainfrom
paul-masurel/split-key-prefix-sharding

fulmicoton-dd commented Jun 19, 2026 •

edited by fulmicoton

Loading

Uh oh!

guilload Jun 22, 2026

Uh oh!

fulmicoton Jun 22, 2026

Uh oh!

guilload Jun 22, 2026

Uh oh!

fulmicoton Jun 22, 2026

Uh oh!

guilload Jun 22, 2026

Uh oh!

fulmicoton Jun 22, 2026

Uh oh!

fulmicoton Jun 22, 2026

Uh oh!

fmassot Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

fulmicoton-dd commented Jun 19, 2026 • edited by fulmicoton Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fulmicoton-dd commented Jun 19, 2026 •

edited by fulmicoton

Loading