[This Post has been authored by our former blogger Varsha Jhavar. Varsha is a lawyer based in Delhi and is a graduate of Hidayatullah National Law University, Raipur. Her previous posts on the blog can be viewed here, here, here, here.]
After making an argument for the need of regulating AI from an IP perspective in Part I, Part II of the post focuses on the different aspects which can be regulated to develop a responsible and ethical AI.
Licensing of training datasets
The licensing of datasets – for the concerned rights under Sec. 14 of the Copyright Act, 1957 (the Act), along with attribution seems like a possible solution that would address the concerns raised in the above cases. The problem is how do you license all the copyrighted images/information available on the web?
Some have argued in favour of fair use, at least in the US context. It has been contended that use of databases should generally be allowed for training, whether the contents of such database are copyrighted or not. This has been supported by reasons – broad access to training sets will make AI better, fairer and safer; use of data by AI is transformative; as training sets are likely to contain millions of works with different owners it is not possible to find and negotiate licenses for these photos, texts, videos, code etc. for use in training; and broader access to quality data can help with issues of bias. Essentially, they have argued that copyrighted works should be allowed to be copied for non-expressive uses, such as AI learning to recognise stop signs, self-driving cars or learning how words are sequenced in conversations, but argued that the question of fair use should become tougher when learning is being done to copy expressions. The fact that we have in the past not dealt with issues of such complexity, i.e. the licensing of large volumes of data from different people who have not organised themselves as an organisation or society, does not mean that copyright infringement should go unchecked.
The argument regarding allowing copying of data without permission for non-expressive uses such as for teaching self-driving cars how cycles look like from images could be found as not amounting to infringement. It could be argued that such use is protected under Section 52(1)(b) of the Act (i.e. transient storage of work in the process of transmission to the public) or as fair use. However, not taking licenses for other kinds of uses such as training of ChatGPT, Stable Diffusion, Github Copilot, etc. by developers, could be considered commercial exploitation and probably might not qualify as fair dealing. In the past, Calcutta HC and Delhi HC have found websites/platforms engaged in streaming of copyrighted songs for money to not be covered under Section 52(1)(a)(i) of the Act (i.e. private or personal use). Let’s see if it would qualify as fair use? As commercial subscriptions are offered by platforms like ChatGPT and Midjourney, the first factor—character and purpose of use—is likely to be found against AI developers. However, transformative use argument could be taken by AI developers. The second factor – nature of the copyrighted work (i.e. whether factual or creative) is also likely to go against them. The third factor – amount and substantiality of the portion used in relation to the copyrighted work as a whole, is a fact-specific enquiry and can go either way depends on the circumstances. AI might fall afoul of the fourth factor as well – effect of the use upon the potential market for the copyrighted work, as the output might act as a market substitute and also, the fact that non-compensation of copyright owners for training might destroy copyright owners’ licensing markets.
Practically, when it comes to licensing data for the training of generative AI like ChatGPT, unlike music which is mostly organised (in terms of there being copyright societies/music labels that will license large repertoires under a single agreement without requiring platforms to deal with thousands of individuals) no such equivalent exists for information on the web. Practically, it might not be so difficult to secure licenses from stock photography websites for images such as Getty Images (and other stock photography websites), which has in the past licensed content from its platform to an AI art generator. Instead of taking multiple licenses, maybe there could be a one-stop window from where you can license the database for the any specific territory. There can be separate databases for images, videos, information/data, etc. and AI developers could license the relevant database as per their requirements. For example, an AI developer for a platform like Midjourney, can take license for the images database. This would encourage the development of AI, in a manner that is respectful of copyright law and the human creators whose work is being used as part of training datasets. Also, it would be fair to those who are taking licenses for training AI. For coding, there is The Stack, a 6.4 TB dataset of source code with permissive licenses i.e. licenses which have the least restrictions in terms of copying, modification and redistribution, and developers can also request for removal of their code from the database. Even for information/data on the web, the creation of a database including the data on the web for which there are permissive licenses, not including works which are covered by licenses like CC by ND and additionally, provide an option to opt out of inclusion of their work on the database. The database could be licensed to all on reasonable terms, irrespective of the fact that the licensee is a competitor. Licensing of data/copyrighted works and providing compensation for the same protects the copyright owners’ licensing markets, and also, the human creators’ incentive to create new works. There needs to be transparency about the manner in which a training dataset has been obtained/licensed by the AI developer.
What about the moral rights of the individuals whose works form a part of the training database and are used by AI in a different context? For example, what would happen if AI took a religious painting and used it in a context or manner not envisioned or intended by the author? Attribution or the right to paternity is important in order to recognise the contributions of the human creators. It is also important to prevent false attribution to humans when a work has been created by AI, so that humans won’t attempt to pass off the AI’s work as their work. It might even be possible for AI developers to ensure the right to paternity is respected, but the right of integrity is the one that would be difficult for AI developers to guarantee, because unlike human beings, AI can’t judge/determine if use in a certain manner/context could be prejudicial to the reputation of the concerned author. The determination by courts in respect of right to integrity will depends on the facts of the case and also depend on the interpretation of “other act in relation to the said work”. However, it is important that the author of a work be recognised, not only in order to fulfil the objective of the grant of copyright protection i.e. encourage the creation of more works, but also to ensure that the works include in a training database are appropriately licensed (as not all open-source licenses have the same terms and conditions i.e. sometimes they can be incompatible with each other).
Liability for copyright infringement
Human Artistry Campaign which has received support from groups representing artists, performers, writers etc. has adopted certain principles. One of them is that the use of copyrighted works, voices and likeness of performers should be subject to licensing and be in compliance with the concerned laws. Additionally, it is also stated that “[c]reating special shortcuts or legal loopholes for AI would harm creative livelihoods, damage creators’ brands, and limit incentives to create and invest in new works”. Although, no loopholes/shortcuts have been specifically mentioned, the above statement refers to allowing fair use exemption (or introduction of similar exceptions under the law) for utilisation of copyrighted works for AI training purposes. This should be considered when it comes to framing laws in respect of AI, as the creators/artists are some of the people most affected by AI. In this background, it is comforting to note that Firefly, Adobe’s AI art generator has been trained its AI on licensed works and can produce output which is safe for commercial use, and Adobe has assured compensation for the creators of such works. Shutterstock has also launched its AI image generation platform and has promised that it shall pay artists for their contributions.
In the Getty Images suit, relief was also sought in respect of trademark infringement claiming that the images generated by Stable Diffusion contained modified version of Getty’s watermark (displayed on all images on its website), thus implying association with Getty and to address this issue, AI developers could be required to not use trademarks in a way that would cause any likelihood of confusion or likelihood of association. Therefore, the above are a few aspects that should be considered when considering the regulation of AI. Interestingly, the Ministry of Electronics and IT, in a written reply in the Lok Sabha has stated that “the government is not considering bringing a law or regulating the growth of artificial intelligence in the country”. Innovation is vital, but it should be in a way that is beneficial to humanity.