Can AI just use your online photos for training? An analysis of the LAION ruling

The rapid rise of generative AI is forcing legal practice to redefine the boundaries of copyright. In an important ruling of 10 December 2024, the Hanseatic Higher Regional Court in Hamburg ruled that downloading copyright-protected images for the creation of training datasets is permitted under the exceptions for text and data mining (TDM). For rights holders, the message is clear: a written prohibition in general terms and conditions is not sufficient; a “machine-readable” opt-out is technically necessary.

The facts and legal context

This case revolves around a conflict between photographer Robert Kneschke and LAION e.V., a non-profit organization that develops open-source datasets for AI training.

LAION created the dataset “LAION-5B,” which contains links and descriptions of 5.85 billion images. To build this dataset, LAION downloaded and analyzed billions of images from the internet to verify that the descriptions matched the images. A photo of the claimant was also downloaded from a stock photo site.

The photographer claimed that this was an infringement of his copyright. He specifically referred to the website's terms of use, which stated in “natural language” that automated programs (bots, scrapers) were not allowed to download or index the content.

The photographer claimed that this was an infringement of his copyright and took the matter to court. In the first instance however the Hamburg Regional Court ruled in favor of LAION and dismissed the claim. The photographer did not accept this and appealed to the Hanseatic Higher Regional Court.

The key question in this appeal remained whether these acts fell under the exceptions for text and data mining (TDM) as introduced in German copyright law (UrhG) following the implementation of the European DSM directive (2019/790).

The decision

The Hamburg Court of Appeal upheld the first instance judgement and dismissed the photographer's appeal. The decision rests on two pillars:

  • Scientific research: The Court ruled that LAION qualifies as a research organization and that the creation of the dataset itself is a form of scientific research. This allowed LAION to invoke the specific TDM exception for research (§ 60d UrhG), which does not allow for an opt-out option for rights holders. The fact that commercial parties may use the dataset at a later date does not affect this.
  • The general TDM exception and the invalid opt-out: Even without the scientific purpose, the act would be permitted under the general TDM exception (§ 44b UrhG). Although the photographer had made a reservation (via the website), the Court ruled that this reservation was not valid because it was not “machine-readable.” A prohibition in the general terms and conditions, written in human language, is not sufficient to stop automated scrapers.

Legal analysis and interpretation

Although this is a German ruling, its impact is directly relevant to Belgian legal practice. The German legislation in question is, after all, an implementation of the European DSM Directive. In Belgium, the relevant provisions can be found in Article XI.190, 20° (general TDM) and Article XI.191/1, §1, 7° (scientific TDM) of the Code of Economic Law (CEL).

The definition of “machine-readable”

The most important aspect of this ruling is the interpretation of a valid legal reservation (opt-out). As in Germany, Article XI.190, 20° CEL states that, in the case of online content, this reservation is only considered appropriate if machine-readable means are used. The Hamburg Court of Appeal now takes a strict view: it is not sufficient for a machine to be able to read the text (e.g. via OCR or text recognition); the machine must be able to interpret the prohibition and execute it automatically. The Court ruled that in 2021 (the time of the facts), the technology was not yet advanced enough for AI crawlers to flawlessly understand complex legal texts in general terms and conditions. This means that rights holders must take technical measures, such as using a robots.txt file or specific protocols.

The scope of “scientific research”

The Court applies a broad definition of research. The mere compilation and validation of a dataset is considered a fundamental step in AI research. Furthermore, a non-profit organization does not lose its status simply because it collaborates with commercial players or because commercial companies benefit from the open-source results, as long as those companies do not have a decisive influence on the organization. This opens the door to structures in which non-profit organizations build datasets that are then used by industry.

The three-step test

The Court also assessed the exception against the famous three-step test (see Art. XI.192/3 WER). It was ruled that the normal exploitation of the photo is not compromised, since the dataset only contains links and metadata and does not permanently store the image itself for public display. The fact that AI models may later generate images that compete with the photographer was considered too abstract and too far in the future to prohibit the creation of the dataset itself.

👉Read our previous blog about AI training and the three-step test..

What this specifically means

This ruling significantly shifts the responsibility towards the right holder.

  • For photographers, publishers, and content creators: A warning in your footer or terms and conditions (“No scraping for AI”) has no legal value as an opt-out under the general TDM exception. You must implement technical barriers that are recognized by bots as a stop sign. Consider correctly configuring your robots.txt file or implementing new standards such as the TDM Reservation Protocol. Without these technical measures, your content is essentially “open” for AI training by commercial parties.
  • For AI developers and tech companies: This ruling provides greater legal certainty when scraping the public internet. As long as you respect machine-readable protocols (such as robots.txt), the way seems clear to collect data under the general TDM exception. For organizations that qualify as research organizations, the scope is even greater, as they do not even have to take opt-outs into account.
  • For legal advisors: They must advise their clients to digitize their IP strategy. A legal clause without technical implementation has become a ‘paper tiger’ in the context of AI training.

FAQ (Frequently Asked Questions)

Does this ruling also apply in Belgium?
Although this is a German ruling, it is based on harmonized European legislation (the DSM Directive). Belgian courts will likely look closely at this reasoning when interpreting Article XI.190, 20° CEL, especially in the absence of their own precedents.

Is a note saying “No AI Training” in the metadata of my photo sufficient?
That is uncertain. The Court requires that the opt-out can be “automatically found, understood, and correctly assigned.” If crawlers did not read that metadata by default in 2021, it was insufficient. In the future, it is expected that standards that read metadata (such as C2PA) will be considered machine-readable.

Is this the final decision?
No, the Court has allowed the (cassation) appeal to the German Federal Court of Justice because of the fundamental importance of the case. The final word on this matter has therefore not yet been spoken.

Conclusion

The LAION ruling confirms that copyright does not constitute an absolute barrier to AI training. The court has opted for a pragmatic, technological approach: anyone who does not want their work to be used must speak “the language of the machine.” Passivity is no longer an option for rights holders.


Joris Deene

Attorney-partner at Everest Attorneys

Contact

Questions? Need advice?
Contact Attorney Joris Deene.

Phone: 09/280.20.68
E-mail: joris.deene@everest-law.be

Topics