Systematic non-disclosure of training data sources enabling plausible deniability for large-scale unauthorized content appropriation

str 8 3/14/2026 · 1 article

structural · regulatory · AI · US, UK

Analysis

AI companies are obscuring the origin and scope of training datasets through third-party scraping intermediaries and lack of transparency, creating legal and ethical cover for mass copyright infringement while making enforcement and compensation impossible.

Key actors

Midjourneytech companies

Source article

AI is dressing up greed as progress on creative rights

Financial Times — AI, Data, Robotics and Digital Power · 3/14/2026 · extracted in run pdf-import-2026-03-14-1779224753682-53 · 5/19/2026, 10:12:12 PM

"some companies are accused of obscuring the trail by paying third-party scrapers to do the work. They do not disclose the datasets" [do not disclose the datasets]

The quote identifies the specific mechanism: companies use intermediaries to obscure sourcing and refuse dataset disclosure, creating structural opacity that prevents rights holders from knowing what was taken or demanding compensation.

Reasoning from this article

The article reveals that non-disclosure is not accidental but deliberate strategy. By paying third parties to scrape and refusing to disclose training data, companies create a structural barrier to enforcement: creators cannot identify infringement, courts cannot assess damages, and companies can claim ignorance. This is distinct from the legal question of whether training constitutes infringement—it's about making infringement undetectable and uncompensable.