Funder Mandate Alignment

Q: Can retention be enforced without a specific number of years from the funder?

Yes. When a mandate expresses a meaningful period rather than a fixed duration, the applicable window is set by the chosen repository preservation commitment and the institution records-retention requirements, and that value is what retention_years should carry. The pipeline still enforces a concrete write-once lifecycle rule; the number simply comes from institutional policy instead of the funder.

Funder mandates have moved from advisory guidance into executable constraints that must be enforced deterministically at the moment a dataset is deposited. For research data managers and academic IT teams, reconciling those policies by hand introduces latency, metadata drift, and audit exposure: a grant’s data-sharing conditions surface only at reporting time, licenses are missing or incompatible, and the retention window nobody encoded quietly lapses. This guide covers the control layer that compiles each funder’s policy language into machine-readable validation rules, routes artifacts to compliant storage tiers, and enforces retention schedules with no human in the loop. It sits inside the broader Open Science Infrastructure Planning pipeline, immediately after governance defines the rules and immediately before repository deposit, and it is written for the Python automation engineers who own that checkpoint in production.

The core shift is treating a mandate as compiled configuration rather than prose to be read. A parser ingests the funder’s published requirements, emits a typed policy object, and that object becomes the source of truth for four downstream gates: the metadata fields a deposit must carry, the licenses it may bear, the access scope and embargo it must respect, and the number of years it must be preserved. Every deposit is checked against the compiled policy for its funder_id before it advances; a failure is quarantined with full context, never waved through. The sub-pipeline below shows how a mandate decomposes into those enforced controls.

A compiled mandate fans into four clauses, two enforcement gates decide pass or route, and every decision lands on the audit log that establishes FAIR compliance.

Concepts, Standards, and the Compliance Contract

A funder mandate is only enforceable once it is normalized against named, versioned standards — otherwise “share the data openly” cannot be turned into a pass/fail test. Four external standards, each cited here by its full name, anchor the compiled policy, and each is covered in depth elsewhere in this section.

The FAIR Guiding Principles (Findable, Accessible, Interoperable, Reusable) define what “compliant” ultimately means. The way each principle becomes an enforced checkpoint rather than an after-the-fact audit is the subject of the FAIR Principle Breakdown, which maps the Accessible and Reusable principles directly onto the access-scope and license gates described here.
The SPDX License List, published by the Software Package Data Exchange project, supplies the canonical, case-sensitive identifiers (CC-BY-4.0, CC0-1.0, Apache-2.0) that make license resolution deterministic. The registry-driven selection logic lives in Open License Configuration.
The Crossref Funder Registry and the Research Organization Registry (ROR) provide persistent identifiers for the funding body itself, so funder_id resolves to an authoritative record instead of a free-text agency name that drifts between grants.
The DataCite Metadata Schema defines the fundingReferences and rightsList structures the deposit must ultimately carry; the field-level crosswalks that produce those structures are covered in Metadata Schema Mapping.

Three properties separate a research-grade mandate compiler from a hand-maintained checklist. It must be deterministic — the same mandate document always compiles to the same rules, so a reviewer can reproduce any historical decision. It must be schema-preserving — a deposit that fails the compiled policy is never quietly downgraded to a relaxed path. And it must be auditable — every gate emits a provenance record so an external reviewer can reconstruct exactly why a dataset was accepted, quarantined, or embargoed. Payloads are serialized as JSON-LD (JSON for Linking Data) so each policy decision carries a self-describing, machine-actionable body rather than an opaque blob.

Step-by-Step Implementation

The following four steps build a production mandate-alignment service: a resilient parser, a typed policy model, a license resolver, and a retention/routing emitter. Each code block uses Python 3.10+ with full type hints and the Pydantic V2 API.

Step 1 — Ingest and parse the mandate into a typed policy object

The foundational task is converting heterogeneous funder requirements into a single structured schema. Mandates expose required metadata fields, embargo windows, acceptable licensing frameworks, and retention periods, published as JSON-LD, YAML, or SPDX-compatible policy documents at a funder endpoint. The ingestion service polls that endpoint and validates the payload at load time, so a malformed mandate fails here — where a human can review it — rather than deep inside the dispatch loop. The compliance rationale is that a typed FunderMandate object becomes the auditable source of truth every downstream gate reads from; if the parse is wrong, nothing that follows can be trusted.

Network handling is explicit and categorized: transient failures trigger exponential backoff with jitter, while a schema mismatch raises immediately and routes to a quarantine queue for manual policy review.

python

import time
import random
import requests
from pydantic import BaseModel, ValidationError, field_validator
from typing import Optional


class FunderMandate(BaseModel):
    funder_id: str
    required_metadata_fields: list[str]
    embargo_max_months: Optional[int] = None
    license_whitelist: list[str]
    retention_years: int

    @field_validator("retention_years")
    @classmethod
    def validate_retention(cls, v: int) -> int:
        if v <= 0:
            raise ValueError("Retention period must be a positive integer")
        return v


def fetch_mandate_with_retry(
    endpoint: str, max_retries: int = 3, base_delay: float = 1.0
) -> FunderMandate:
    """Fetch a mandate schema with exponential backoff and jitter."""
    for attempt in range(max_retries):
        try:
            response = requests.get(endpoint, timeout=10)
            response.raise_for_status()
            return FunderMandate(**response.json())
        except requests.RequestException as exc:
            if attempt == max_retries - 1:
                raise ConnectionError(
                    f"Mandate API unreachable after {max_retries} attempts: {exc}"
                )
            delay = (base_delay * (2 ** attempt)) + random.uniform(0, 0.5)
            time.sleep(delay)
        except ValidationError as exc:
            # Schema mismatch is permanent: quarantine for manual policy review.
            raise ValueError(f"Schema mismatch in mandate payload: {exc}")

Every dataset that passes this stage carries a validated funder_id tag. For complex federal requirements — such as those normalized in Aligning NIH data sharing policies with FAIR principles — the parser additionally flattens nested policy objects into the flat constraints the gates expect.

Step 2 — Compile the metadata-requirement gate

With a typed mandate in hand, the next step turns required_metadata_fields into a hard pass/fail gate over each deposit. This is the same Pydantic schema validation discipline the ingestion stage applies to datasets, turned toward funder policy: the gate confirms every mandated field is present and non-empty before the deposit is allowed to advance. The compliance rationale is that a missing fundingReferences entry or absent rightsList is a Findable/Reusable violation that is far cheaper to catch at ingestion than at a funder audit two years later.

python

from dataclasses import dataclass


@dataclass(frozen=True)
class GateResult:
    passed: bool
    missing_fields: list[str]
    funder_id: str


def check_metadata_requirements(
    deposit: dict[str, object], mandate: FunderMandate
) -> GateResult:
    """Assert every mandated field is present and non-empty on the deposit."""
    missing = [
        field
        for field in mandate.required_metadata_fields
        if not deposit.get(field)  # covers missing keys, None, "", and []
    ]
    return GateResult(
        passed=not missing,
        missing_fields=missing,
        funder_id=mandate.funder_id,
    )

A GateResult with passed=False never blocks the whole pipeline; it is attached to the deposit and routed to quarantine (see Error Handling) so the originating research team can supply the missing metadata without losing the submission.

Step 3 — Resolve a compliant, SPDX-canonical license

Funder requirements frequently dictate specific open licenses, and the pipeline must choose one that satisfies both the mandate and institutional policy — deterministically, using SPDX License List identifiers so CC-BY-4.0 is never confused with a free-text “CC BY”. The resolver intersects the mandate’s license_whitelist with the institution’s registered defaults and selects the preferred match by an explicit permissiveness ordering. The compliance rationale is that a silently mis-assigned license is a Reusable violation that can invalidate downstream reuse; the intersection makes the choice defensible and reproducible.

python

# Institutional preference order, most-permissive first. In production this is
# loaded from the license registry, not hardcoded, so policy changes are a diff.
PERMISSIVENESS_ORDER: tuple[str, ...] = (
    "CC0-1.0",
    "CC-BY-4.0",
    "Apache-2.0",
    "CC-BY-SA-4.0",
)


def resolve_compliant_license(
    mandate_whitelist: list[str],
    institutional_defaults: list[str],
    preference: tuple[str, ...] = PERMISSIVENESS_ORDER,
) -> str:
    """Select the preferred SPDX license present in both mandate and institution."""
    intersection = set(mandate_whitelist) & set(institutional_defaults)
    if not intersection:
        # Institution is silent on this mandate: fall back to mandate-only options.
        intersection = set(mandate_whitelist)
    if not intersection:
        raise RuntimeError(
            "No compliant license found across mandate and institutional policies"
        )
    # Deterministic: walk the explicit preference order, not sorted() alphabetics.
    for spdx_id in preference:
        if spdx_id in intersection:
            return spdx_id
    # No ranked match — surface the ambiguity rather than guessing.
    raise RuntimeError(
        f"Compliant licenses {sorted(intersection)} are unranked; extend the "
        "preference order before auto-assigning."
    )

Detailed patterns for maintaining the license registry without hardcoding policy exceptions are documented in Open License Configuration.

Step 4 — Emit retention and storage-routing policy

The resolved license and the mandate’s embargo_max_months and retention_years fields dictate where the artifact lands and how long it lives. Hot storage handles active collaboration and embargo periods; cold archival tiers enforce long-term preservation. Rather than relying on manual sweeps, the service generates a native object-storage lifecycle rule at ingestion time, so retention is enforced at the infrastructure layer. The compliance rationale is that infrastructure-level enforcement is the only kind that survives staff turnover and re-orgs — the rule outlives the person who created it.

python

from datetime import datetime, timedelta, timezone


def build_lifecycle_policy(
    mandate: FunderMandate, license_id: str, ingest: datetime | None = None
) -> dict[str, object]:
    """Emit a storage lifecycle + routing directive from the compiled mandate."""
    now = ingest or datetime.now(timezone.utc)
    embargo_months = mandate.embargo_max_months or 0
    public_release = now + timedelta(days=embargo_months * 30)
    # Approximate calendar years; a real system clamps Feb 29 like the retention job.
    expiry = now + timedelta(days=mandate.retention_years * 365)

    return {
        "funder_id": mandate.funder_id,
        "license_spdx": license_id,
        "initial_tier": "hot" if embargo_months else "warm",
        "public_release_after": public_release.isoformat(),
        "archival_tier": "cold",
        "retention_expiry": expiry.isoformat(),
        "worm_lock": True,  # write-once until retention_expiry
    }

A scheduled reconciliation job then compares manifest checksums against storage inventories, verifies embargo dates were respected, and triggers automated public-release workflows — any deviation raises a high-priority compliance ticket and suspends further ingestion for the affected project namespace. The retention lifecycle it enforces is shown below.

The retention lifecycle enforced at the infrastructure layer — a WORM archival state re-verified every quarter, with any checksum mismatch quarantined rather than trusted.

Reference: Mandate Clause to Technical Control

Every clause in a funder mandate maps to exactly one enforcement point in the pipeline. The table below is the authoritative crosswalk from policy language to the control that implements it and the FAIR principle it satisfies.

Mandate clause	Compiled field	Enforcement point	FAIR principle
“Deposit in an established repository”	`funder_id` → routing table	Storage tier routing	Findable
“Include standardized metadata”	`required_metadata_fields`	Schema validation gate (Step 2)	Findable, Interoperable
“Apply an open license”	`license_whitelist`	SPDX license resolver (Step 3)	Reusable
“Respect an access embargo”	`embargo_max_months`	Lifecycle `public_release_after`	Accessible
“Preserve for N years”	`retention_years`	WORM lifecycle rule (Step 4)	Accessible
“Provide auditable provenance”	all gate results	Append-only audit log	Reusable

The compiled FunderMandate fields themselves carry exact validation constraints; misreading one silently changes the enforced policy.

Field	Type	Required	Validation rule	Example
`funder_id`	string	yes	Resolves to a Crossref Funder Registry or ROR identifier	`https://ror.org/01cwqze88`
`required_metadata_fields`	list[str]	yes	Non-empty; each maps to a DataCite property	`["titles", "creators", "fundingReferences"]`
`embargo_max_months`	int \| null	no	`null` means no embargo; `0` means immediate release	`12`
`license_whitelist`	list[str]	yes	Every entry is a valid SPDX License List identifier	`["CC-BY-4.0", "CC0-1.0"]`
`retention_years`	int	yes	Must be a positive integer (`> 0`)	`10`

Error Handling and Edge Cases

Failures are categorized so each takes the right recovery path. A transient network failure while fetching a mandate is retried in place with jittered exponential backoff (Step 1); only after the attempt cap is exhausted does it surface as a ConnectionError. A schema mismatch in the mandate payload is permanent — retrying will not fix malformed policy — so it raises immediately and the mandate document is routed to a policy-review quarantine rather than silently ignored, which would leave deposits ungated.

Deposit-level failures never vanish. A deposit that fails the metadata gate, has no resolvable compliant license, or carries an unranked license is written to a dead-letter queue with full context attached: the funder_id, the failing gate, the missing fields or ambiguous licenses, and the last validated payload. This is the same dead-letter and quarantine discipline described in API Routing & Fallbacks; once the originating team supplies the missing metadata or the license registry is extended, a reconciliation worker replays the record through the same gates. Nothing is downgraded to a relaxed path to force a pass — doing so would defeat the compliance contract and pollute the archive with records that satisfy the mirror but fail the mandate.

Verification and Testing

Assert the gates without touching a live funder endpoint by exercising them against a compiled mandate directly. The test below confirms that a deposit missing a mandated field is caught, and that the license resolver prefers CC0-1.0 over CC-BY-4.0 when both are compliant.

python

import pytest


def _mandate() -> FunderMandate:
    return FunderMandate(
        funder_id="https://ror.org/01cwqze88",
        required_metadata_fields=["titles", "creators", "fundingReferences"],
        embargo_max_months=12,
        license_whitelist=["CC-BY-4.0", "CC0-1.0"],
        retention_years=10,
    )


def test_metadata_gate_flags_missing_field() -> None:
    deposit = {"titles": ["Genome assembly"], "creators": ["Dr. Vale"]}
    result = check_metadata_requirements(deposit, _mandate())
    assert result.passed is False
    assert result.missing_fields == ["fundingReferences"]


def test_license_resolver_prefers_most_permissive() -> None:
    chosen = resolve_compliant_license(
        mandate_whitelist=["CC-BY-4.0", "CC0-1.0"],
        institutional_defaults=["CC0-1.0", "CC-BY-4.0", "Apache-2.0"],
    )
    assert chosen == "CC0-1.0"


def test_retention_must_be_positive() -> None:
    with pytest.raises(ValueError):
        FunderMandate(
            funder_id="x",
            required_metadata_fields=["titles"],
            license_whitelist=["CC-BY-4.0"],
            retention_years=0,
        )

Run it with pytest -q test_mandate_alignment.py. A passing run prints 3 passed; a deposit that fails the metadata gate emits the structured log line WARNING gate=metadata funder_id=https://ror.org/01cwqze88 missing=['fundingReferences'], which is the same JSON telemetry the audit log and compliance dashboards consume.

Gotchas and Known Pitfalls

SPDX identifiers are case-sensitive. Lowercasing or “normalizing” a license string turns CC-BY-4.0 into a value that is not on the SPDX License List, so the resolver silently rejects a perfectly valid license. Fix: only strip() surrounding whitespace; never alter the case of an SPDX identifier.
sorted() is not a permissiveness order. Alphabetically sorting the compliant set returns Apache-2.0 before CC-BY-4.0 and would auto-assign the wrong license. Fix: rank against an explicit PERMISSIVENESS_ORDER (Step 3) and raise on an unranked match rather than guessing.
embargo_max_months = 0 is not the same as None. Treating a falsy 0 as “no embargo” is correct, but treating None as 0 publishes an embargoed dataset immediately. Fix: keep embargo_max_months optional and branch on is None versus == 0 deliberately, as the lifecycle builder does.
Free-text funder names drift between grants. “NIH”, “National Institutes of Health”, and “N.I.H.” are three different strings but one funder. Fix: resolve funder_id to a Crossref Funder Registry or ROR identifier at parse time so every gate keys off a stable URI.
A malformed mandate silently disables enforcement. If a schema mismatch is swallowed instead of raised, deposits flow through with no gate at all — the most dangerous failure mode, because it looks like success. Fix: raise on ValidationError in the parser and quarantine the mandate document; never fall back to an empty policy.

Frequently Asked Questions

How do I handle a funder that publishes its mandate only as a PDF, not a machine-readable endpoint?

Treat PDF extraction as a separate, human-reviewed step that produces the same typed FunderMandate object the endpoint parser emits. Extract the required fields, embargo window, license list, and retention period into a version-controlled YAML document, have a data manager sign off on it, and feed that document to the same compiler. The rest of the pipeline never learns whether the policy came from an API or a curator — it only consumes the validated FunderMandate.

What happens when two funders on a co-funded grant impose conflicting mandates?

Resolve to the strictest constraint on each axis independently: the intersection of both license_whitelist sets, the longer retention_years, and the shorter permitted embargo. If the license intersection is empty, the deposit is quarantined for a curator to negotiate an exception rather than auto-assigned a license neither funder accepts. Encoding “strictest wins” as explicit set operations keeps the merged policy deterministic and auditable.

Can retention be enforced without a specific number of years from the funder?

Yes. When a mandate expresses “a meaningful period” rather than a fixed duration, the applicable window is set by the chosen repository’s preservation commitment and the institution’s records-retention requirements, and that value is what retention_years should carry. The pipeline still enforces a concrete WORM lifecycle rule; the number simply comes from institutional policy instead of the funder. This distinction is worked through in detail for Aligning NIH data sharing policies with FAIR principles.

How does mandate alignment stay current when a funder revises its policy?

Schedule a nightly sync against each funder endpoint that re-runs the parser and diffs the compiled FunderMandate against the stored version. A changed policy produces a reviewable diff and a new policy_version tag; in-flight deposits continue under the version active at submission, while new deposits pick up the revision. Versioning the compiled policy is what lets an audit reconstruct which rules applied to any historical deposit.

Data Governance Frameworks — the policy-as-code layer that defines the institutional rules a funder mandate is merged against.
Institutional Repository Strategy — the ingestion, tiering, and audit-trail pipeline that consumes the lifecycle directives emitted here.
Open License Configuration — the SPDX license registry and resolution rules the license gate depends on.
Aligning NIH data sharing policies with FAIR principles — a worked federal example of compiling one funder’s policy into DataCite metadata and retention controls.
Open Science Infrastructure Planning — the parent overview showing where mandate alignment sits in the governance-to-repository pipeline.

Funder Mandate Alignment: Compiling Data-Sharing Policy into Enforceable Pipeline Controls #

Concepts, Standards, and the Compliance Contract #

Step-by-Step Implementation #

Step 1 — Ingest and parse the mandate into a typed policy object #

Step 2 — Compile the metadata-requirement gate #

Step 3 — Resolve a compliant, SPDX-canonical license #

Step 4 — Emit retention and storage-routing policy #

Reference: Mandate Clause to Technical Control #

Error Handling and Edge Cases #

Verification and Testing #

Gotchas and Known Pitfalls #

Frequently Asked Questions #

How do I handle a funder that publishes its mandate only as a PDF, not a machine-readable endpoint? #

What happens when two funders on a co-funded grant impose conflicting mandates? #

Can retention be enforced without a specific number of years from the funder? #

How does mandate alignment stay current when a funder revises its policy? #

Related Guides #

Explore this section

Funder Mandate Alignment: Compiling Data-Sharing Policy into Enforceable Pipeline Controls

Concepts, Standards, and the Compliance Contract

Step-by-Step Implementation

Step 1 — Ingest and parse the mandate into a typed policy object

Step 2 — Compile the metadata-requirement gate

Step 3 — Resolve a compliant, SPDX-canonical license

Step 4 — Emit retention and storage-routing policy

Reference: Mandate Clause to Technical Control

Error Handling and Edge Cases

Verification and Testing

Gotchas and Known Pitfalls

Frequently Asked Questions

How do I handle a funder that publishes its mandate only as a PDF, not a machine-readable endpoint?

What happens when two funders on a co-funded grant impose conflicting mandates?

Can retention be enforced without a specific number of years from the funder?

How does mandate alignment stay current when a funder revises its policy?

Related Guides