Copyright AI nerds have been eagerly awaiting a decision in the German case of Kneschke v LAION (previous blog post about the case here), and yesterday we got a ruling (text of the decision in German here, courtesy of Mirko Brüß). In short, LAION was successful in its defence against claims for copyright infringement.
The case was brought by German photographer Robert Kneschke, who found that some of his photographs had been included in the LAION dataset. He requested the images to be removed, but LAION argued that they had no images, only links to where the images could be found online. Kneschke argued that the process of collecting the dataset had included making copies of the images to extract information, and that this amounted to copyright infringement. He sued in the regional court of Hamburg, arguing that copies had been made of the photographs in question, and that these did not fall under the exceptions present in German copyright law in sections 44a (temporary copies), 44b (text and data mining), and 60d (text and data mining for scientific purposes).
LAION did not contest that a copy had been made. The main legal argument presented by the defendants was that they were in compliance with the exception for text and data mining present in German law, which is a transposition of Article 3 of the 2019 Digital Single Market Directive, and as such they are allowed to make a reproduction of a work for the purpose of extracting information. The defendants argued that they were indeed covered by the exceptions contained in 44a, 44b, and 60d. The court decided that LAION was a research organisation and as such was covered by section 60d of the act (Art 3 of the DSM Directive), so it did not need to consider the defence in section 44a, but it considered that the copy was not a temporary copy, so now we have the first test of that theory. Regarding text and data mining for scientific purposes, the court argues:
“The term scientific research, since it already allows the methodical and systematic “pursuit” of new knowledge to be sufficient, should not be understood so narrowly that it would only cover the work steps directly related to the acquisition of knowledge; rather, it is sufficient that the work step in question is aimed at (later) gaining knowledge, as is the case, for example, with numerous data collections that must first be carried out in order to then draw empirical conclusions. In particular, the concept of scientific research does not presuppose any subsequent research success.
Accordingly, contrary to the plaintiff’s opinion, the creation of a data set of the type in dispute, which can be the basis for training AI systems, can certainly be regarded as scientific research in the above sense. Although the creation of the data set as such may not yet be associated with a gain in knowledge, it is a fundamental work step with the aim of using the data set for the purpose of later gaining knowledge. It can be affirmed that such an objective also existed in the present case. For this, it is sufficient that the data set was – undisputedly – published free of charge and thus made available to researchers (also) in the field of artificial neural networks. […]
The fact that the data set in dispute may also be used by commercially active companies for training or further developing their AI systems is, however, irrelevant for the classification of the defendant’s activities. The mere fact that individual members of the defendant also pursue paid activities for such companies in addition to their activities for the association is not sufficient to attribute the activities of these companies to the defendant as their own.”
The court argued that while LAION had been used by commercial organisations, the dataset itself had been released to the public free of charge, and no evidence was presented that any commercial body had control over its operations. Therefore, the dataset is non-commercial and for scientific research. So LAION’s actions are covered by section 60d of the German Copyright Act (Art 3 DSM), and consequently there is no copyright infringement. Case dismissed.
While LAION is not liable, the court discussed in an obiter dicta the potential application of section 44b, that is, text and data mining for commercial uses (Art 4 of the DSM). This is an exception that exists unless the parties have made a reservation of rights in the manner of an opt-out. The text of the directive and of the law call for this reservation to be “machine-readable”, and while recital 18 of the DSM opens the door for this reservation to be made by text, the prevalent opinion so far has been that it needs to be capable of being read by some automated process (recitals are not law, they help in the interpretation of the law). This was relevant here because there was a reservation of rights contained in the terms and conditions of the website where the photographs were being shared, but this was in plain text and not in the form of an automated system.
The court argues that had it come to this exception, it did not believe that LAION would have been covered by it, as the terms and conditions would have acted as a proper reservation of rights. In what I think is a rather controversial statement, the court says:
“However, the Chamber tends to consider as “machine-understandable” also a purely “natural” Language” […]. However, the question of whether and under what specific conditions a reservation declared in “natural language” can also be regarded as “machine-intelligible” will always have to be answered depending on the technical developments existing at the relevant time of use of the work. […]
However, “state-of-the-art technologies” also unambiguously include AI applications that are able to grasp the content of text written in natural language […]. In this respect, everything suggests that the legislator of the AI Regulation had precisely such AI applications in mind when he referred to “state-of-the-art technologies”.”
I find this quite controversial. The court seems to be arguing that because machines are increasingly capable of reading and understanding plain text, then the requirement for reservations to be “machine-readable” can be met simply by a website’s terms of service, or any other such reservation. I don’t particularly find that compelling, but we’ll see.
Finally, there’s another very interesting discussion here with regards to whether the DSM Directive even covers AI training. The court came up with an interesting classification of the three main stages of training an AI model:
- “the creation of a data set (which is the sole subject of the dispute here) that can also be used for AI training,
- on the other hand, the subsequent training of the artificial neural network with this data set and
- thirdly, the subsequent use of the trained AI for the purpose of creating new image content.”
I feel that this is the right approach, and matches the widely used division between inputs and outputs; the first two are in the input phase, while the third one describes the output phase. The court is not dealing with the last two, however, as it argues that LAION is just a dataset. So the case rests entirely on the application of the DSM to the making of a dataset, and here the court argues that making a copy to extract data from it is covered by the TDM exception.
I know what you’re thinking, astute reader: “wait a second Andres, why would the court bother making a distinction in the first place?” The issue is that since generative AI started becoming popular, and people like Yours Truly invoked the DSM Directive, some people have been arguing that the TDM exception does not cover training a model. To me there is quite a lot of hair-splitting here, but the argument is that the legislators didn’t intend to cover generative AI when they passed the DSM, so text and data mining does not cover the training of a model, just the making of a copy to extract information from it. The argument is that making a copy to extract information to create a dataset is fine, as the court agreed here, but the making of a copy in order to extract information to make a model is not. I somehow think that this completely misses the way in which a model is trained; a dataset can have copies of a work, or in the case of LAION, links to the copies of the work. A trained model doesn’t contain copies of the works with which it was trained, and regurgitation of works in the training data in an output is another legal issue entirely.
The court didn’t have to make that decision in this point as they argued that LAION is covered by the exceptions as it is mining information about images in the making of a dataset. But the court did discuss the applicability of the DSM to the training of a model, and to me it’s clear that they agree that the law as written also covers that. The court says:
“Finally, the argument that the European legislator in 2019 “simply did not have the AI problem in mind” when drafting the underlying directive provision (Art. 4 DSM-RL) […] Is insufficient for a teleological reduction. This is especially true because the technical development in the field of so-called artificial intelligence since 2019 concerns not so much the extent of (the disputed) data mining for the creation of training datasets, but rather the performance of the artificial neural networks trained with the data […]. It should also be noted that the database of the Common Crawl Foundation accessed by the defendant has been created since 2008 (see https://commoncrawl.org/overview).
Moreover, the current European legislator has clearly expressed in the AI Act (Regulation (EU) 2024/1689 of 13.06.2024, OJ L of 12.07.2024 p. 1) that the creation of datasets intended for training artificial neural networks also falls under the limitation provision of Art. 4 DSM-RL. According to Art. 53 para. 1 lit. c of the AI Act, providers of AI models with general use are obliged to establish a strategy, particularly to identify and comply with a reservation of rights declared in accordance with Art. 4 para. 3 DSM-RL.” (emphasis mine)
The court here is saying that while model training may not have been on the radar of the legislators in 2019, it certainly is now, and they use the AI Act as evidence of this. They point out that the AI Act clearly and unequivocally covers model training as this is the subject of some of the transparency provisions contained in the text, and that the AI Act is referring back to the reservation of rights contained in Art 4 of the DSM. I believe that this finally should settle the question once and for all, and I am surprised that there is still any doubt left as to whether the exception covers training, when both the AI Act and this decision provide plenty of evidence that it does. The court also states that the DSM provisions are compatible with the three-step test, another argument that has been used against them.
Wider implications of the case
Evidently, this is a significant case from a historic perspective; this is the first legal test of the TDM exceptions contained in the DSM Directive, and I suspect that it will not be the last. However, the legal impact itself may very well be quite limited and courts in other countries, and even in Germany, may come to different conclusions. It’s possible that Mr Kneschke will appeal, but given the narrow nature of the decision, that may be difficult. In the short term, it’s clear that LAION may rest easy; the fact that it has not been the subject of litigation in the ongoing US cases is quite indicative that they appear to be operating on the right side of the exceptions.
However, it must be stressed again that this is a relatively narrow decision. It only covers the use of text and data mining for the making of a dataset, and it does not cover specifically the actual training of a model, although I would argue that the obiter dicta is quite useful in laying that particular controversy to rest, particularly because of the existence of the AI Act.
I wonder if we will see any EU cases against the model trainers, which should be the logical next stage. But for now, AI dataset makers have scored one small victory in the ongoing AI Wars.
Recent Comments