Will Wortley, Associate, and Phil Sherrell, Partner & Head of London, Bird & Bird, discuss questions regardingthe role that copyright has to play in multiple stages of the lifecycleof AI products, tools and models

Copyright has been at the centre of a number of the key debates during the rapid development of AI technology. Two aspects in particular have garnered significant attention in the last 18 months, namely: whether the use of copyright protected works in training AI models amounts to copyright infringement (thus requiring the permission of the copyright owner) and whether output generated by using generative AI tools is capable of being protected by copyright.

While the answers to these questions will have a significant impact on how AI technology develops in the future, they are not the only aspects of AI related innovations for which copyright considerations are relevant. In this article, we explore the overall landscape of copyright and AI technology, and the potential impact of copyright considerations on future innovation in this space.

Training data

Data as copyright?

In order to train an AI model, it is necessary to provide the model in question with a large amount of training data. Under English law, information is not considered a form of property per se, however copyright and/or database rights form part of a group of rights that might be used to protect such data or govern the terms of its use.

Copyright can only protect creations (or works) that are original, in that they are the “author’s own intellectual creation”, which requires the work in question to reflect the author’s personality, such that in creating the work the author has been able to express their free and creative choices (Infopaq International A/S)

Data itself may qualify for copyright protection where it meets such criteria. However, whether the data can be said to be the creation of the author of the work will depend to some extent on any technical constraints on the possible forms of expression. In short, the fewer technical restraints that are in place, the greater the possibility that that data would be considered a copyright protectable original work.

Infringement in training and development

As outlined above, one of the most prominent issues confronting the courts and regulators is whether the training of AI models using creative works amounts to copyright infringement. There are a number of questions to be resolved in relation to this. One such question is whether in training the model, any reproductions of copyright works are merely temporary or ephemeral in nature, such use being permitted in many jurisdictions (including the United Kingdom and EU) where it falls within the scope of transient copying exceptions. A recent German case held that reproductions in this context were not temporary and so did not fall within the scope of the exception, although the question is technical in nature and thus if a model is trained differently that may impact on whether it is found to be infringing.

Copyright is also relevant to the question of what data has actually been used in training the AI model in question and where/how it was sourced. Some models, for example, are trained on information derived by the use of web-scraping techniques, which comb the Internet for publicly available material. Several jurisdictions have specific exceptions for text and data mining, permitting web scraping in certain circumstances. In the United Kingdom, s29a of the Copyright, Designs and Patents Act 1988 provides for an exception to copyright protection for text and data mining for non-commercial research where that person has existing lawful access (although this does not apply to database rights). There have been discussions in recent years about creating a broader TDM exception following Brexit, however the previous Conservative government shelved those plans in March last year.

In the EU, the DSM Directive (which came into force in 2019) introduced two new TDM exceptions that may be relevant to the training of AI models. Article 3 includes a relatively broad exception for the purposes of scientific research, which requires that the person seeking to rely on the exception has lawful access to the material. This exception is only available to “research organisations and cultural heritage institutions” but can be a useful tool for (in particular) academic institutions seeking to develop their own AI models or otherwise innovate in this area.

Article 4 of the DSM Directive contains a more general TDM exception, which is not limited to scientific research. Again, it requires the person seeking to rely on it to have lawful access to the underlying work. However, it does not apply to works that have been expressly reserved by their rightsholders in an “appropriate manner” (including machine readable means for content made available online), thus allowing rightsholder the option to “opt out” of having their works/data collected in this way. This exception is also referred to in the German decision, with reports suggesting that the court was minded to view the type of web scraping being engaged in as falling within the definition of TDM for the purposes of the DSM Directive, but that it had not yet decided on the question of whether the plain text opt out was effective and whether the opt out needed to be in a machine readable (eg, robot.txt) format to be effective (a requirement under Section 44 of the German Copyright Code). A French music collecting society has sought to exercise its opt-out by means of a public statement.

The United States has a broader fair use exception to copyright protection, which has historically been permissive of other uses of copyright protected works. For example, in the Author’s Guild v Google case, the Second Circuit Court of Appeal upheld a first instance decision that Google’s mass digitisation of copyright works was fair use. That case involved machine learning techniques and it is being argued by the developers of AI models in more recent cases in the US that such cases point to a “long history of precedent holding that it is perfectly lawful to use copyrighted content as part of a technological process that (as here) results in the creation of new, different, and innovative products”.

The other way that copyright can be relevant in relation to training data is in relation to the database itself. It is possible for a database to be protected as an original literary work where it can be considered the intellectual creation of its author by virtue of the selection and/or arrangement of its contents (SAS Institute Inc v World Programme Ltd, 2013)

Database rights

A training data set could also be protected by sui generis database rights, which will subsist where a substantial investment has been by the database maker in the obtaining, verifying or presenting of the contents of that database (Section 13(1) The Copyright and Rights in Databases Regulations 1997). In other words, investment in the creation of the underlying data will not be sufficient for such database rights to arise. Database rights prevent third parties from extracting or reutilising all of the database, or a substantial part thereof, without the consent of the database owner.

There are, however, complications around the subsistence of database rights in the United Kingdom and EU following Brexit. For databases created after 1 January 2020, it is no longer possible for EEA persons/businesses to get automatic protection of their databases in the United Kingdom, and vice versa. As such, if the relevant investment is made solely in the United Kingdom, a database right would arise there but not in the EEA (and vice versa). Where investments have been made jointly in the United Kingdom and EEA jurisdictions, the picture is less certain.

As such, in order to utilise the dataset in, for example, the training of further AI models, a prospective AI business might need or want to consider licensing that particular dataset.

AI Systems

In the United Kingdom there have been a number of significant decisions in relation to the patentability of AI-related inventions. Last year, the Supreme Court found that an AI system could not be named as the inventor of a patent under the requirements of the Patents Act 1977, holding that the inventor must be a natural person (Thaler v Comptroller-General of Patents, Designs and Trade Marks [2023] UKSC 49). More recently, the Court of Appeal overturned the High Court’s decision in the Emotional Perception AI case, finding that artificial neural networks did not fall outside the ban on patent protection for computer programs as such by virtue of their unique features.

Copyright protection for software code

Although, as we have seen, the potential requirement for copyright licences creates a potential barrier to the creation of AI systems, copyright may also play a role in protecting such innovations. In respect of traditional software with a human author, copyright can protect the source or object code as a literary work, provided it meets the criteria discussed above. The scope of such protection is limited, as with all copyright works, to the expression of the ideas and principles that underly the development of the software, rather than protecting the ideas and principles themselves (which can be copied freely by others). Elements of an AI system which consist of such traditional software code (eg, the source code) are likely to be protected by copyright (and may also be protected by way of database rights and as trade secrets).

Copyright protection for other components of AI systems

Potential copyright protection for AI systems themselves differs from normal considerations in relation to software code, in part, because of the automated nature of the training process. However, this is not a matter which has been addressed by UK courts to date.

Model architecture

The AI Model architecture is the structured design outlining how the model’s components interact and generate outputs. Each model will vary but, essentially, the architecture consists of several layers of interconnected nodes (or “neurons), including: (1) an “input” layer (which receives data); (2) “hidden” layers (which process the input data through transformations); and (3) the “output” layer, which is the end result (eg, a classification label, value, or creative work).

It is possible that the model architecture could be protected as its own copyright work, provided it could be considered an expression of a human author reflecting their personality and free creative choices. However, as with all copyright works, copyright protection does not extend to ideas and principles (eg, fundamental mathematical logic) underlying the model and as such that protection will be somewhat limited.

Model weights

In particular, the model weights (the parameters within a machine learning environment that dictate the strength of the connection between neurons in an artificial neural network) present issues from a copyright protection perspective because they are, effectively, numerical values and therefore not the sort of work that would traditionally be protected. The functional nature of the weights makes it unclear whether they could be said to be an expression of the author’s personality, although that will ultimately be a question of fact as to the extent to which any human intelligence can be said to have influenced how they are configured.

It is quite possible that a database of model weights could fall within the scope of a database for the purposes of UK or EU law, however as explained above whether protection could be granted will depend on what type of investment is being made. Some commentators have argued that as model weights are numbers, they are pre-existing data and that the investment being made by the person or organisation who makes the database is in the collection and organisation of those pre-existing data, such that the model weights should qualify for database protection. However, this is an open question that has not been addressed by the UK courts. The significance, noted by some commentators, is that database rights might be an additional tool for the makers of AI models which are licensed through open-source licences.

GenAI Outputs

Perhaps the most interesting debate around copyright and AI has been centred on whether the outputs of Generative AI models can be protected by copyright. This is another area that is subject to dispute in a number of jurisdictions. In the United Kingdom, there is some uncertainty about whether authorship can be attributed to works created using generative AI technology, because the creative output is generated by a tool (rather than a human). However, unlike the position in most jurisdictions, the CDPA does contain provisions to determine authorship in respect of computer-generated works where there is no human author (s9(a) CDPA 1988). It will likely be argued, therefore, that by acknowledging in principle that a computer generated work may be assigned an author, the United Kingdom has opened the door to computer generated works being treated as copyright works. The Act does not, however, address the question as to whether a computer-generated work would meet the originality standards required to qualify for copyright protection, and this may pose an insuperable hurdle, in particular given that the UK has moved from a skill, labour and judgement standard to determine originality towards the EU-derived author’s own intellectual creation standard. It may be hard to argue that an AI model in creating a work has exercised free and creative choices.

Pending these complex questions being resolved, in practice, the terms and conditions for many models provide that the user is the owner of generative output created when using the models (to the extent permitted by law). Several popular general-purpose models also include a licence back to enable them to improve the service of the model.

Landscape for innovation

As can be seen, there are a number of open questions regarding the role that copyright has to play in multiple stages of the lifecycle of AI products, tools and models. At the training and development stage, the outcome of court decisions that will be handed down in the coming years will help determine the extent to which rightsholders are able to license (and therefore monetise) their content for training purposes and the freedom that AI model developers have to continue innovating in the field. At the system stage, the possibility that copyright could be used to protect AI systems themselves provides a further consideration for developers of those systems, that should be considered in conjunction with considerations as to any patent protection that may be available. In terms of outputs of generative AI systems, much of the press has focused on the potential effect of generative AI outputs on existing creative industries. However, the ability to utilise these tools to create new works, regardless of whether they are protected by copyright or not, is already leading to innovative applications across many sectors.



Source link

Leave a Comment

Your email address will not be published. Required fields are marked *