[This Post has been authored by our former blogger Varsha Jhavar. Varsha is a lawyer based in Delhi and is a graduate of Hidayatullah National Law University, Raipur. Her previous posts on the blog can be viewed here, here, here, here.]
Considering the widespread use of AI today, there appears to be a need to regulate its development and ensure it is actually benefitting humanity. Countries have started taking steps in that direction – such as the USA’s Algorithmic Accountability Act of 2022, European Commission’s proposed Artificial Intelligence Act, Canada’s proposed Artificial Intelligence and Data Act and Beijing AI Principles. Italy’s Data Protection Authority went a step further and ordered ChatGPT to stop the processing people’s data with immediate effect, on the basis that no information is provided to the users whose data is collected by Open AI. It was noted that there does not appear to be any legal basis for such data collection and its processing for training of the chatbot’s algorithm. However, the ban has been lifted after data privacy improvements by ChatGPT’s developers.
In Part I of this post, I have contended that there is a need to regulate the development and use of AI, specifically from an IP-centric perspective. In Part II, I have attempted to explore certain aspects that could to be regulated, in order to ensure that AI is responsibly and ethically developed. To clarify, I have not argued for the introduction of laws for the grant of authorship/inventorship and/or ownership to AI, as not only will it require time and resources, but the need for the same may not be as urgent, as the need to regulate the use of materials/content protected under copyright and trademark law for the training of AI!
AI training data sets & Copyright Infringement
The generative AI systems i.e. type of AI that can generate text, code, art, music, etc., are trained on data that has been scraped from websites and could be the subject matter of copyright protection. In such circumstances, would it be liable for copyright infringement? If the user provides an input that is unlawful or would result in copyright infringement, should the AI be held liable? Well, it could potentially be considered to be an intermediary, as it is generally used akin to a search engine. However, it does not host links to third party websites. Unlike search engines which reflect the information/content available online, ChatGPT or AI art generators do not provide links to the websites where the relevant information is available, rather they provide an answer without attribution.
The suits that have been filed in the US provide some insight into the potential issues that could arise on a large scale when the data that AI has been trained on is subject to copyright and trademark protection or when AI ends up replicating such data upon being provided with an input. In this post, the aim is to explore ways to deal with such issues. The copyright implications of ChatGPT are also relevant for this post and have been previously covered on the blog.
Getty Images Suit
However, the transitory nature claim appears to be incorrect, as the images/dataset that the AI has been trained on are not stored or reproduced within Stable Diffusion, but instead the model analyses the similarities/patterns between the images and stores such information. This information is used by the AI to generate new images. This is a factor that could affect Getty Images’ chances of success in court, but the fact that Getty Images has in the past licensed content from its platform to an AI art generator might help tilt the scales in its favour. Thus, the potential copyright and trademark implications of widespread copying by generative AI can be seen from this suit.
Github Copilot Suit
Potential aspects for regulation
The long-term monopoly provided by copyright law is what supposedly motivates authors to create works of literature, art, and music. How does this incentive mechanism work with AI-generated creations? Recently, an AI-generated song which replicated AI vocals in the style of Drake duetting The Weeknd, was uploaded on platforms such as Apple and Spotify. In the wake of AI-generated songs becoming available on streaming services, Universal Music Group has asked streaming platforms to block AI from scraping lyrics and musical compositions from their copyrighted songs. In fact, Google has refrained from launching its text-to-music AI, called MusicLM, as about 1 percent of the music generated by it was found to be a direct reproduction of copyrighted works.
A Disney illustrator, Hollie Mengert found that 32 of her pieces were downloaded by a student and used to train Stable Diffusion to recreate her art style. In scenarios like this, the courts might come to the conclusion that the student’s use amounts to copyright infringement, but another factor that would need to be taken into account is that the use was non-commercial in nature. What if someone feeds all of Dan Brown’s novels into an AI and asks it produce a novel in his style? Or if someone asks ChatGPT to show important extracts from a certain chapter of a copyrighted book? In India, the use of training dataset without permission (and for paid versions of generative AI), is likely to be held to be copyright infringement (discussed here previously) and thus, there seems to be a need to regulate certain aspects of AI and Part II of this post shall explore some of these aspects.