Artificial Intelligence (AI) – ethical and legal considerations
PART I – Copyright and Database Right
To fully understand the legal and ethical considerations it is important to understand how the technology works in principle. While traditional computer software operates based on a chain of simple yes or no decisions which will always generate the same results, an AI based program does not produce predictable results. The developer of such a program must first train the software on a dataset which is either pre-defined (and therefore limited) or ’scraped’ from the internet, and therefore, potentially unlimited.
How does AI work?
The machine learning software (as programmed by the developer) will extract certain information from the dataset. To do this, it may make a copy of the material contained in the dataset, e.g. images, text, film, anything it has found on the internet or which was contained in the pre-selected dataset.
However, the copy is not used for very long by the software (it could be regarded as transient copying – more about this later).
The supervised machine learning model
In the first so-called supervised machine learning the developer would provide the software with ‘known responses’ alongside the pre-selected database. For example, if the dataset contains songs of various bird species, it would indicate the song and bird species. The desired outcome would be that the machine learns to recognize first that a sound is a bird song and that the song is of a particular bird species. If a user then wishes to create a bird song by a particular bird, the software would create the song – not by copying a track it has found in the database but by creating a new song from the information it has ‘learned’. It may even be able to create a new bird song combining two species of birds.
Here is the model explained using the example of a known dataset of apples:
The unsupervised machine learning system
This model uses data usually scraped (or mined) from the internet (raw data). It is not been given set outputs but ‘learns’ by looking for common parameters or structures in the information it extracts from the dataset and groups these together in new data. The outcome is not predictable and will depend on the quality of the raw data and processing ability of the software. This is the most sophisticated AI system to date and is used by ChatGPT developed by OpenAI
Here is a model representing an unsupervised machine learning system using the example of a dataset of various fruit:
Most of the material used in both described systems to train the software is copyright protected.
However, for the purpose of training the software, only a transient copy is required just enough for the model to extract the information contained in the material. However, since copying under the UK Copyright, Designs and Patents Act 1988 means taking a substantial part of an original copyright protected work, one can never be sure if the ‘information’ taken by the software, is regarded as an infringing copy (in some instances, a few words taken from a newspaper article can indeed be deemed a copy of the entire article). For the creator of the AI system, this means, in order to minimise the risk of copyright infringement claims it would be prudent to clear all rights in the dataset, the AI system is using.
But how likely is it that copying in this instance can even be shown by the rights holder?
Take the example of ChatGPT. The software is trained on presumably millions of literary works from newspaper articles to novels to poems to advertising copy. The author or publisher of a newspaper article will hardly recognize whether the text generated by ChatGPT has copied this particular newspaper article to create a work that is likely to infringe the original newspaper article unless this particular article is cited widely by others which cover the same topic. Only under these circumstances increases the risk that enough of a particular article is reproduced if a user enters certain key words which relate to a topic that a rights holder may be able to show copyright infringement. In other words, the more prominent the article is in relation to a certain idea, the more likely it is that an AI system uses its ‘information’ to create its output and the more likely is it that a rights holder can show that the system has ‘copied’ this particular piece of copyright work.
Having said that, content creators now have the means to search popular AI training databases to find out if their content has been included in such databases: https://haveibeentrained.com/
However, the fact that an image has been included in a database does not provide a rights holder with the necessary knowledge of whose AI system has in fact used the content for training purposes. And even if a rights holder can establish that this particular dataset containing its copyright work was used for training purposes, there remains still the problem of establishing infringement of copyright.
In the UK, AI tech companies may not only rely on the inability of a rights holder to show copying where the company does not disclose its dataset used to train their AI system but may also be aided by the legislation which has been drafted long before the technology existed, leaving rights holders with very limited means to defend their rights.
The most cited defences are the exceptions to copyright infringement in the Copyright Designs and Patents Act 1988
- Section 28A – making of a temporary copy:
Copyright in a literary work, other than a computer program or a database, or in a dramatic, musical or artistic work, the typographical arrangement of a published edition, a sound recording or a film, is not infringed by the making of a temporary copy which is transient or incidental, which is an integral and essential part of a technological process and the sole purpose of which is to enable—
(a) a transmission of the work in a network between third parties by an intermediary; or
(b) a lawful use of the work;
and which has no independent economic significance.
There are two potential pitfalls to consider: Copyright material must be deleted as soon as the software has been trained by extracting relevant information as otherwise, the reproduction of the work is not ‘temporary’. Furthermore, it is unlikely that ‘lawful use of the work’ would include extracting information from an original which would lead to an output provided by the system to a user which would be deemed a copy of the original. However, the latter cannot be entirely prevented by an AI tech company since their machine learning software is trained to identify patterns and structures within the material they are copying (even if only temporarily). It cannot therefore be excluded that, as in the example used above, such an output is a reproduction of an original work in the sense that it is ‘an author’s own intellectual creation’ and is therefore an infringing copy as established by Infopac I
- Section 30(1)(1ZA) and (1A) – Fair dealing exceptions:
- Fair dealing with a work for the purpose of criticism or review, of that or another work or of a performance of a work, does not infringe any copyright in the work provided that it is accompanied by a sufficient acknowledgement (unless this would be impossible for reasons of practicality or otherwise) and provided that the work has been made available to the public.
(1ZA) Copyright in a work is not infringed by the use of a quotation from the work (whether for criticism or review or otherwise) provided that—
- the work has been made available to the public,
- the use of the quotation is fair dealing with the work,
- the extent of the quotation is no more than is required by the specific purpose for which it is used, and
- the quotation is accompanied by a sufficient acknowledgement (unless this would be impossible for reasons of practicality or otherwise).
(1A) For the purposes of subsections (1) and (1ZA) a work has been made available to the public if it has been made available by any means, including—
- the issue of copies to the public;
- making the work available by means of an electronic retrieval system;
- the rental or lending of copies of the work to the public;
- the performance, exhibition, playing or showing of the work in public;
- the communication to the public of the work,
but in determining generally for the purposes of those subsections whether a work has been made available to the public no account shall be taken of any unauthorised act.
This exception has potentially several pitfalls:
Firstly, if the material has been scraped from websites, which have provisions in their terms and conditions of use, which prevent mining of data expressively, the use of the material would be arising from an ‘unauthorised act’ namely breach of contract. If bots are used, which impersonate real persons to mine data, it could also be said that this is deemed dishonestly making a false representation in order to make a gain for himself or another in breach of section 2 of the Fraud Act 2006, potentially a criminal offence, and lastly, such mining of material on the internet may also fall under the provisions of the Computer Misuse Act 1990, which states at section 1 that a person is guilty of an offence if – (a) he causes a computer to perform any function with intent to secure access to any program or data held in any computer, (b) the access he intends to secure is unauthorised, and (c) he knows at the time when he causes the computer to perform the function that that is the case.
Secondly, the dealing with the work may not be ‘fair’ if its use (for example for the purpose of ‘quotation’) is commercially competing with the interest of the copyright holder to exploit the original, i.e. if it could be deemed a substitute for the original, it may also not be fair if a work has been used that has not yet been published or otherwise exposed to the public (this may be the case where a work was only shared with a certain circle of people but is otherwise ‘confidential’) and lastly, use may not be fair, if an unnecessary large amount has been reproduced.
As already explained above, an AI tech company will not be able to control the output of the software application and prevent it from using copyright material in a way, which makes it unlikely that a ‘fair dealing’ exception applies. Furthermore, it is arguable whether the use of large amounts of copyright works are not deemed ‘use, which is commercially competing with the exploitation of the work by the rights holder’ especially if the result of the use of the works are new works (who may or may not infringe the copyright of the original material) which are in many cases seen as substitutes of the works used to train the system and are already competing with the commercial interest of the rights holders of the original material.
And lastly, use may also not be ‘fair’ if a fundamental principle of copyright law is completely disregarded, namely the principle that the law has to balance the interest of the creator to be remunerated fairly with the interest of society to enable free access to knowledge in the interest of innovation.
- Section 171(3) – public interest exception:
‘Nothing in this Part affects any rule of law preventing or restricting the enforcement of copyright, on grounds of public interest or otherwise.’
It is doubtful if this provision adds any more power to defences against copyright infringement claims which have not already been addressed in the ‘fair dealing’ exception.
· Section 29A – text and data mining exception:
Copies for text and data analysis for non-commercial research
(1) The making of a copy of a work by a person who has lawful access to the work does not infringe copyright in the work provided that—
(a) the copy is made in order that a person who has lawful access to the work may carry out a computational analysis of anything recorded in the work for the sole purpose of research for a non-commercial purpose, and
(b) the copy is accompanied by a sufficient acknowledgement (unless this would be impossible for reasons of practicality or otherwise).
This exception does not apply to AI tech companies who use copyright protected content to train software applications which are subsequently commercialized.
For example, the dataset used to train the AI system Stable Diffusion (funded by Stability AI – now the defendant in the copyright claim brought by Getty Images in the US and the UK – see below) was initially built by LAION, a German AI non-profit for the purpose of non-commercial analysis and research. It contains amongst many others (apparently 2bn images) photographs by Getty Images, which are copyright protected. It is very unlikely that the use of the dataset to train Stable Diffusion for commercial purposes would fall under this particular exception and therefore, unless AI tech companies clear rights in the copyright material contained in the LAION database, they would be liable for copyright infringement. Christoph Schuhmann, the founder of LAION emphasizes that if companies want to use the dataset commercially they do this at their own risk and responsibility.
Getty Images v Stability AI  EWHC 3090 (Ch)
Getty Images itself will face many more hurdles to overcome the defences against their copyright infringement claim against Stability AI. It has overcome only last week the first hurdle when Mrs Justice Joanna Smith rejected two applications for summary judgment by Stability AI.
However, it is interesting to note the grounds raised in Stability AI’s applications for summary judgment:
Firstly, there was no prospect of success of Getty Images’ claim since infringement has not taken place in the UK (‘the Location Issue’) and secondly, the making available of the pre-trained Stable Diffusion software in the UK does not constitute an infringement pursuant to sections 22, 23 and 27 of the CDPA 1988 which provide that copyright in a work is infringed by a person, who without the permission of the copyright holder
- (s 22): imports an ‘article’ which is, and which he knows or has reason to believe is an infringing copy of the work;
- (s 23):
- possesses in the course of a business,
- sells or lets for hire, or offers or exposes for sale or hire,
- in the course of a business exhibits in public or distributes, or
- distributes otherwise than in the course of a business to such an extent as to affect prejudicially the owner of the copyright,
an ‘article’ which is, and which he knows or has reason to believe is, an infringing copy of the work.
- (s 27) – defines the meaning of ‘infringing copy’ in relation to ‘articles’, ie ‘an article is an infringing copy if its making constituted an infringement of the copyright in the work in question’ (primary infringement – note by author) followed by definitions of ‘infringing copies’ arising out of acts of secondary infringements, amongst others, ‘An article is also an infringing copy if (a) it has been or is proposed to be imported into the United Kingdom, and (b) its making in the UK would have constituted an infringement of the copyright in the work in question, […].’
Mrs Justice Joanna Smith noted at paragraph 79 of the judgment: “Ultimately, however, as Mr Saunders [Counsel for the Defendant – note by author] accepted in his oral submissions, it really stands or falls on one point of law, namely the true interpretation of the word “article” in sections 22, 23 and 27 of the CDPA. In particular, whether sections 22 and 23 CDPA are limited to dealings in “articles” which are tangible things or whether they may also encompass dealings in intangible things (such as making available software on a website).”
Mrs Justice Joanna Smith rejected both applications by the Defendant: the first (‘the Location Issue’) based on conflicting or incomplete evidence before her, the second (‘Secondary Infringement’) because she thought there were good reasons to postpone to trial a decision on the true statutory interpretation of the word ‘article’.
Infringement of Database Right
Getty Images has amongst others (namely, infringement of trade mark and passing off), also included a claim for the infringement of its rights in the extensive image database it has collated over the years. Database Rights in the UK arise in two ways:
A database is copyright protected as a literary work under section 3(1)(d) of CDPA 1988 provided that it is original. In accordance with section 3A(2) a database is only original if, and only if, by reason of the selection or arrangement of the contents of the database the database constitutes the author’s own intellectual creation.
Database rights also arise under Reg 13(1) of the Copyright and Rights in Databases Regulations 1997, a piece of EU legislation which has been implemented in UK law by statutory instrument (with amendments) and which is still valid under current legislation. It gives the proprietor of the database a property right (database right) in a database, if there has been a substantial investment in obtaining, verifying or presenting the contents of the database. For this purpose, it is immaterial whether or not the database or any of its contents is a copyright work, within the meaning of Part I of the CDPA 1988.
It is not known based on which statutory rights exactly Getty Images has claimed infringement of their database rights by Stability AI. It is also not clear how they claim their database was used for the training of the Stable Diffusion software which would constitute infringement of their database rights. It would be interesting to read their statement of case. However, Getty Images could perhaps argue that their database does contain categories and topics of images grouped together in a way which is unique to the database (this would indicate that the database may be protected by copyright). This particular quality of the database is likely to be utilised by the machine learning AI system to create new images based on the same or very similar categories and topics which Getty Images provides to its users when they try to find images representing certain key words.
Another option is to argue that the database is a proprietary right under the Regulation. In this case, the copyright exceptions do not apply. Any copy of the database (however transient) is supposedly an infringement of the database right if, (reg 16 (1)) without the consent of the owner of the right, he extracts or re-utilises all or a substantial part of the contents of the database. Furthermore, (reg 16(2)) for the purposes of this Part, the repeated and systematic extraction or re-utilisation of insubstantial parts of the contents of a database may amount to the extraction or re-utilisation of a substantial part of those contents.
Under Reg 12 (1) – “extraction”, in relation to any contents of a database, means the permanent or temporary transfer of those contents to another medium by any means or in any form;
“re-utilisation”, in relation to any contents of a database, means making those contents available to the public by any means;
Since it is only necessary to ‘extract’ the contents of such a database without necessarily making its contents available to the public to infringe the database right under the Regulation, Getty Images may not have great difficulties to persuade the court that using its database to train the AI system constitutes ‘an extraction’ for the purpose of the Regulation and therefore an infringement of this kind of proprietory database right.
Having now discussed the complex topic of infringement of copyright and touched on infringement of database right by AI tech companies and how they may minimize the risk of liability, I will discuss in a second article (Part II) the questions whether copyright does indeed subsist in works produced by such an AI system and if so who may own copyright. In Part III, I will show some fun and subversive methods developed by content creators to ‘play’ AI systems to reveal some of its problematic features we must be aware of.
 This section implements section 5(1) of Directive 2001/29/EC of 22 May 2001 on the harmonization of certain aspects of copyright and related rights in the information society. ‘Lawful use of the work’ is not further defined in the Directive apart from comments in recital (33) of the Directive, which state that ‘The acts of reproduction concerned should have no separate economic value on their own’ and ‘A use should be considered lawful where it is authorized by the rightholder or not restricted by law’.
 Infopac International A/S v Danske Dagblades Forening [ECLI:EU:C 2009: 465] (Case C-5.08) (Infopac I) discusses whether an extract of a newspaper article consisting of only 11 words could be an infringement of the original literary work. It emphasized that a work is infringing copyright only if the subject matter reproduced is original in the sense that it is the ‘author’s own intellectual creation’. Furthermore, it was held that the output of the system, namely extracts of newspaper articles, which have been scanned, analysed and then made available to clients of Infopac in the form of printed extracts of only 11 words, is not deemed ‘a transient act’ for the purpose of Article 5 of the Directive.