“While the court’s decision clarifies that non-commercial AI research may qualify for certain exceptions, the broader applicability of these exceptions, particularly for commercial entities, remains unresolved.”
In a landmark judgment with far reaching ramifications, a German court recently held that the copying of images by Large-scale Artificial Intelligence Open Network (LAION) – a nonprofit organization that provides datasets, tools and models to liberate machine learning research – did not infringe copyright law.
The Kneschke v. LAION case, heard by the Hamburg Regional Court, centered on LAION’s automatic downloading of images, including a copyrighted work by photographer Robert Kneschke, for AI training purposes.
In 2021, LAION, based in Hamburg, automatically downloaded images from the internet, including Kneschke’s photo from Bigstock, to create a dataset (LAION 5B) containing image-text pairs for training AI. Kneschke claimed LAION infringed his copyright by copying his image without permission to create a dataset that linked images with descriptive text. LAION had downloaded the photo from a licensed website to check if it matched the description using its software.
LAION denied the copyright infringement, arguing that its actions fell under one of the three copyright exceptions provided by German and EU law. The case focuses on whether copying an image to create an AI training dataset constitutes copyright infringement, not on AI model training or content generation.
In September 2024, The Civil Chamber 10 of Hamburg Regional Court (Case No. 310 O 227/23) dismissed Kneschke’s copyright infringement claim against LAION. The judgment is provisionally enforceable, with the plaintiff bearing the legal costs.
The ruling touches on a number of unresolved legal issues, such as whether AI data scraping qualifies as text and data mining and how rights holders can block such activities.
Potential Exceptions
Kneschke sought an order for LAION to cease reproducing his image for AI dataset creation, while LAION invoked three potential copyright exceptions under German and EU law:
Based on the arguments in the first hearing of the case in July 2024, it was widely expected that the case would be decided based on the general text and data mining (TDM) exception to copyright infringement in Section 44b UrhG. However, the court instead dismissed Kneschke’s claim on the basis of the “text and data mining for scientific research purpose” exception in section 60d UrhG.
Scientific Research Exception (Section 60d UrhG)
Section 60d UrhG (which implements Article 3 of the Digital Copyright Directive) permits research organizations to make copies of copyright works for TDM for scientific research. Research organizations are defined as universities or research institutions conducting non-commercial research, reinvesting profits in research, or acting in the public interest.
Kneschke contended that LAION did not qualify as a research organization, citing its ties to commercial entities. He argued that LAION’s relationship with private enterprises disqualified it under the proviso of Section 60d, which forbids cooperation with private companies that exert influence or gain preferential access to research results.
However, the court ruled that Kneschke failed to provide sufficient evidence that LAION did not meet the criteria for a research organization. The court emphasized LAION’s transparent and non-commercial approach, as its datasets were made freely available online to all researchers.
“[The creation of the data set] … is a fundamental step with the aim of using the data set for the purpose of later gaining knowledge…
“It can be affirmed that such an objective also existed in the present case. For this purpose, it is sufficient that the data set was – undisputedly – published free of charge and thus made available to researchers, especially in the field of artificial neural networks.”
Accordingly, the use by LAION was authorized by Section 60d UrhG and the action was dismissed.
General Text and Data Mining Exception (Sec. 44b UrhG)
Section 44b UrhG (implementing Article 4 of the DSM Copyright Directive) allows the reproduction of lawfully accessible works for text and data mining, provided the rightsholder has not reserved the use. A reservation of use must be machine-readable to be effective.
One issue in the case was whether the presence of the following wording in Bigstock’s terms of use served as an effective reservation of rights in a machine-readable format.
“YOU MAY NOT […] Use automated programs, applets, bots or the like to access the Bigstock.com website or any content thereon for any purpose, including, by way of example only, downloading Content, indexing, scraping or caching any content on the website.”
As the court had decided that the exception under section 60(d) UrhG applied, its comments in relation to section 44b UrhG did not form part of the rationale for its judgment, but are of interest insofar as they indicate how the court may approach this issue in future cases. LAION argued that the website terms were not sufficient and that a robot.txt file should have been used. The court commented:
“However, there are some indications that the exception of Section 44b (2) UrhG does not apply in the present case – without this requiring a final decision – as there was an effectively declared reservation of use within the meaning of paragraph 3 of the provision; in particular, the reservation of use indisputably declared on the [Bigstock website] is likely to meet the requirements for machine readability within the meaning of Section 44b (3) sentence 2 UrhG.”
The court suggested that machine-readability must be judged based on the state of technology at the time of reproduction. It implied that as AI tools become more advanced, rights holders might not need to rely on computer code to block TDM, as AI could potentially interpret natural language instructions.
Commentary
This judgment is significant for AI developers who are non-commercial entities undertaking TDM for non-commercial purposes. The decision confirms that the TDM exception for scientific research purposes under Section 60d UrhG is available for non-commercial research organizations undertaking TDM for the purpose of AI training. Organizations that do not give away the research for free or have any for-profit affiliation that benefits from the research likely will not qualify for the exception, however.
Of wider interest is the court’s commentary on the general TDM exception under Section 44b UrhG and in particular the judges’ indication that a natural language reservation of rights within a website’s terms of use may be sufficient to constitute a “machine readable” opt-out because AI systems (and in particular large language models (LLMs)) are now sufficiently advanced and accessible that they can be used to read and interpret such text. But the court did not address the fact that in 2021, when LAION downloaded the image in question, advanced LLMs such as Chat GPT-3 had not been released and the court did not consider what systems were actually used, or were available for use, by LAION at that time. This therefore remains an area in which the law is unclear and where we await clarification of the requirements for a “machine readable” opt out.
The court rejected an argument that AI content scraping should not qualify as TDM at all under copyright law and that the TDM exceptions therefore should not apply. It considered recent academic research, commissioned by the Authors’ Right Initiative, which argues that AI scraping falls outside TDM exceptions, both in terms of the law’s intent and the technical details of what AI tools actually scrape. The judges questioned the academics’ opinion, noting that the EU AI Act expressly contemplates the relevance of TDM to AI training (the AI Act mandates that general-purpose AI providers must respect copyright law, including honoring rights holders’ ability to prevent TDM under Article 4 of the DSM Copyright Directive). The judges also found that applying the TDM exception would not violate the “three-step test” in EU copyright law, which restricts exceptions to cases that do not conflict with normal exploitation or harm the rights holder’s legitimate interests.
The decision may be subject to appeal, with the Hanseatic Higher Regional Court potentially revisiting key questions about LAION’s status as a research organization and whether AI scraping falls within the TDM exception.
Takeaways
This case addresses important questions about the intersection of AI, copyright, and text and data mining. While the court’s decision clarifies that non-commercial AI research may qualify for certain exceptions, the broader applicability of these exceptions, particularly for commercial entities, remains unresolved. Additionally, the role of AI in interpreting copyright reservations will likely remain a topic of debate, as both technology and the legal landscape evolve.
Image source: Deposit Photos
Author: NiroDesign
Image ID: 110771344