Skip to content

feat(pipeline): surface per-distribution RDF-validity verdicts#476

Open
ddeboer wants to merge 1 commit into
mainfrom
feat/pipeline-distribution-validity
Open

feat(pipeline): surface per-distribution RDF-validity verdicts#476
ddeboer wants to merge 1 commit into
mainfrom
feat/pipeline-distribution-validity

Conversation

@ddeboer

@ddeboer ddeboer commented Jun 15, 2026

Copy link
Copy Markdown
Member

Closes #469. Part of the distribution-health feature: netwerk-digitaal-erfgoed/dataset-register#2103.

What

Wires @lde/distribution-health into @lde/pipeline so every distribution the pipeline touches gets an RDF-validity verdict, surfaced as a plain TypeScript value through a new reporter callback. The pipeline emits no RDF and coins no vocabulary — consumers map the verdict to their own RDF (the “RDF emission & vocabulary boundary” decision in #2103).

New surface:

ProgressReporter.distributionValidated?(distribution: Distribution, verdict: ValidityVerdict): void

Behaviour

  • Deep verdict from the import outcome:
    • import failureparse-error, surfaced even when the dataset is then skipped (the previously-silent drop that motivated #2103);
    • import successvalid, or empty when it yielded no triples.
    • Guarded on an actual data-dump import, so native SPARQL endpoints (not imported) carry no deep verdict.
  • Shallow verdict from the probe's existing body validation, per probed distribution that yields a validity signal.
  • Each verdict carries the distribution's observed source fingerprint; the fingerprint is also added to the reachability result (DistributionAnalysisResult.fingerprint) so it is the shared key across the reachability and validity rails.

Tests

311 pipeline tests pass (5 new, behaviour-asserted through the reporter): deep invalid / valid / empty, shallow invalid, and the reachability fingerprint. Lint + typecheck clean.

Notes

  • The success path reconstructs an ImportSuccessful to feed the shared mapper. A possible later cleanup is to compute the verdict where the real import outcomes live (importResolver) and thread the fingerprint in; left as-is for now since it's localised.
  • Vocabulary/RDF stays out of LDE entirely (returns TS); the def.nde.nl mapping lands in the consumers (epic tasks 4 & 5).

@ddeboer ddeboer closed this Jun 15, 2026
@ddeboer ddeboer reopened this Jun 15, 2026
@ddeboer ddeboer force-pushed the feat/pipeline-distribution-validity branch 2 times, most recently from 7dd5b6e to faf55f5 Compare June 15, 2026 18:20
Wire @lde/distribution-health into the pipeline so each distribution's
validity is reported as a plain TypeScript verdict via a new
ProgressReporter.distributionValidated(distribution, verdict) callback. The
pipeline emits no RDF and coins no vocabulary; consumers map the verdict to
their own RDF (see netwerk-digitaal-erfgoed/dataset-register#2103).

- Deep verdict from the import outcome: invalid (parse-error) on import
  failure - surfaced even when the dataset is then skipped, so an invalid
  distribution is recorded rather than silently dropped - and valid, or
  empty when the import yielded no triples, on success.
- Shallow verdict from the probe's body validation, per probed distribution.
- Each verdict carries the distribution's observed source fingerprint; the
  fingerprint is also added to the reachability result
  (DistributionAnalysisResult.fingerprint) so it is the shared key across the
  reachability and validity rails.

Part of netwerk-digitaal-erfgoed/dataset-register#2103. Closes #469.
@ddeboer ddeboer force-pushed the feat/pipeline-distribution-validity branch from faf55f5 to af2273e Compare June 15, 2026 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

@lde/pipeline: emit per-distribution validity + record reachability fingerprint

1 participant