Opinion

IAB AI Legislation Threatens AI Free-For-All

The new draft US “AI Accountability for Publishers Act,” circulating through IAB channels, looks on paper like a victory for publishers tired of Gen AI companies scraping their content without permission. But read the fine print and it’s clear the draft largely protects Google, Meta, and other Big Tech Generative AI companies rather than publishers and the news industry.

The key carve‑out hides in the definition of consent. Under Section 2, a publisher grants “express, prior consent” if a bot follows standard web access protocols like robots.txt. This means that unless you tell crawlers to not steal your content via the popular robots.txt file, you’ve already agreed to be scraped. That turns silence into consent and effectively legalizes search engines’ existing data collection for their training and summaries.

Another loophole narrows “covered content” to human‑generated, copyrighted material — excluding the factual, public, or structured data that fuels most AI systems such as sporting fixtures, pricing information, or timetables. Combined with an easy “affirmative defense” for anyone citing consent, enforcement becomes almost meaningless.

The IAB’s proposed bill also preempts stronger state IP laws, exempts “non‑commercial” uses tied to corporate partners, and shields everyday services when offered to consumers for free within AI platforms. Together, these clauses build a legal moat around the biggest data collectors while tightening rules on smaller AI shops and publishers.

The title says accountability, but the subtext says status quo. Publishers hunting for genuine content licensing leverage should treat this draft not as protection — but as a preemptive strike to lock current scraping norms into US federal law.

The key loopholes in the document that publishers should be aware of include:

1. The “Consent via Robots.txt” Loophole (Sec. 2(7)(ii))

This clause rewrites the standard for express, prior consent to include compliance with “any licensing or access terms … made available by such digital property governing automated access, such as robots.txt.” That phrase is deceptive:

  • Effect: It turns passive non‑blocking (e.g., lack of a “Disallow”) in a robots.txt file into a positive grant of consent.
  • For Big Tech: Google, Meta, and similar companies can continue to scrape and index any publisher not explicitly blocking them — as they already do — and claim that the publisher gave “express, prior consent.”
  • Contrast: True express consent would require a contractual license; this clause quietly converts “implied consent from silence” into a statutory safe harbor.

In other words, this definition launders the existing status quo indexing behavior into legally “consensual” activity, shielding web crawlers controlled by the dominant platforms. Any amends to the robots.txt specifications, or similar protocols, by technical standards bodies like the IETF or W3C will have profound implications under US laws.

The “Affirmative Defense of Consent” in Sec. 3(d) suggests that “authorization” can be broadly inferred from robots.txt presence. This defense is trivially easy to invoke and shifts the burden onto plaintiffs to prohibit theft of their copyrighted content.

A company like Google can simply submit server logs showing that a property didn’t opt out of crawling.

2. The “Bot” Definition Carveout (Sec. 2(4) and Sec. 3(a))‑Out (Sec. 2(4) and Sec. 3(a))

The act defines a “bot” broadly but limits liability to bots that “exploit the covered content … without express, prior consent.”

Problem: Google’s and Meta’s crawler networks already send headers identifying themselves (e.g. “Googlebot”), and publishers typically allow them. So:

  • The requirement of a civil cause of action only applies if the bot is unauthorized for any use.
  • Since authorization is now equated with not blocking access (per Sec. 2(7)(ii)), all indexed scraping under the normal web-crawling protocol becomes presumptively authorized.

Effectively, this framework grandfathers in the entire search indexing ecosystem.

3. Broad Safe Harbor for “Individual Users” (Sec. 4(d))

This clause exempts “individual users of an AI product or service … consistent with the terms of use.”

  • Function: Exempts liability from companies that manufacturer OS or browsers with AI scraping embedded within them.
  • Implication: If Apple, Meta and Google continue to integrate generative AI features into consumer tools (e.g., search summaries, smart replies), those organizations are explicitly shielded from any downstream copyright claims — reducing pressure on platforms from publisher claims.

Moreover, the broad safe harbor proposed for Gen AI services that are provided for free (Sec. 2(1)) is also problematic. The prohibition on training only for Gen AI that is “sold, rented, or licensed” excludes the Gen AI services that are commonly offered to the public for free (e.g., Google’s AI Overviews and Apple’s Siri).

As seen above, the IAB’s draft legislation is written to benefit Big Tech Gen AI companies who are among its largest donors.