Opinion

Generative AI is undercutting publishers

In November 2022, OpenAI released ChatGPT, a natural language processing tool that interprets human queries or requests and answers them in real time. Since ChatGPT was released, big tech platforms have raced to release their own AI products and integrations.[1] These software solutions are referred to as Large Language Models (LLMs). To generate natural language responses to user prompts, LLMs must be trained on vast amounts of data, including copyrighted data published online. Those publishing businesses who are used to produce the “training content” that makes generative AI more accurate, lose traffic and will be unable to compete for attractive and targeted advertising revenue.

The training data used on LLMs is collected or “scraped” from the web. This can sometimes include copyrighted material and it is also often done without express permission. For example, when responding to a question about accommodation in Brussels, Microsoft’s AI product used content from various sources, including TripAdvisor, and prominently displayed a map of accommodation in Brussels, similar to that which TripAdvisor displays on its own website.

Litigation around this exact issue has already begun to trickle in, in relation to AI. A California class action has been filed against GitHub, Microsoft and OpenAI[2] for their Copilot product which “outputs text derived from the [complainants’] licensed materials without adhering to the applicable license terms and applicable laws”.[3] Copilot essentially redistributes code that is sourced from open-source repositories, without attribution, and monetises it by keeping it inside a GitHub-controlled paywall. Though the complaint was only filed late last year, the implications of this case will be far-reaching for AI. 

Getty Images has also initiated legal proceedings against Stability AI in the United Kingdom, relating to Stability AI’s alleged copying of millions of its images, according to a recent press release,[4] with a similar suit brought by artists in California.[5] Getty Images was also one of the successful claimants against Google in the original EU Search and Shopping case.

While litigation on these issues remains nascent, it is clear that some solution must be reached, else publishers will be disincentivised from producing content, if they can no longer effectively monetise it. 


[1] Google has since released a rival to ChatGPT called ‘Bard’Microsoft is integrating Bing AI into the Bing Mobile app and SkypeMeta has introduced LLaMA.

[2] GitHub Copilot litigation · Joseph Saveri Law Firm & Matthew Butterick.

[3] Microsoft Word – 2022-11-02 Copilot Complaint (near final) (githubcopilotlitigation.com) at 3. 

[4] Getty Images Statement – Getty Images.

[5] Anderson v. Stability AI Ltd, U.S. District Court for the Norther District of California, No. 3.23 -cv-00201.